I saw on slack recently a question about how the ssis file enumeraror orders (or more specifically doesn't order) files. I have been thinking about ssis quite a lot lately and whil I am in no hurry to start using it day to day it is quite an interesting tool.
So anyway, I saw this question that went like:
"does anyone know in what order files are processed in 'Foreach File Enumerator'?
I used to think it was alphabetically but after some testing this is not always the case?
Second part is there anyway to specify the order by say size or date?"
So how does SSIS order files or doesn't order files?
The answer to this is pretty simple and I thouhgt I knew the answer but wanted to confirm it. In my mind I thought, "how do they get a directory listing?", and my mind responses "probably using the win32 api's find file etc", my mind then wondered somewhere else before writing a quick package that:
- 1. Has a ForEach loop and a breakpoint set at pre-execute
- 2. Has a single task in the ForEach loop and a breakpoint set at pre-execute
- 3. A variable to hold the file name
Pretty simple, the ssis package looked like:
Pretty simple hey :)
I set the file path for the enumerator to c:\ssisSearch and put a load of files and directories in (because the win32 find functions have a buffer and you need to call it multiple times - I wanted to make sure we covered cases where there were multipl find calls). Then I reached for my favorite tool of all procmon.exe (I say favorite, it used to be then I had a job where I used it literally every single days for hours and hated it so stopped using it but now i'm back with it!). In procmon I set a filter on the c:\cssisSearch folder and also DtsDebugHost.exe and ran my package - the files were returned in alphabetical order.
I then went into procmon and to the properties of the "QueryDirectory" operation on that folder and (when the symbols had loaded) I could see that the call ssis was making was from the ForEachFileEnumerator.dll (native not .net so we can't grab reflector) and that calls "FindFirstFileW".
A quick hop skip and jump to msdn and FindFirstFile discusses the ordering of files here:
The FindFirstFile function opens a search handle and returns information about the first file that the file system finds with a name that matches the specified pattern. This may or may not be the first file or directory that appears in a directory-listing application (such as the dir command) when given the same file name string pattern. This is because FindFirstFile does no sorting of the search results. For additional information, see FindNextFile.
FindNextFile has this:
The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. If the data must be sorted, the application must do the ordering after obtaining all the results.
So basically ntfs is alphabetical, fat date but don't rely on either.
Just a final thought, ssis runs on linux so no idea the order there :)