While working on a recent project for a customer that involved importing both Excel 97-2003 and Excel 2007/2010 files, I was a little surprised to discover that the Foreach Loop File Enumerator will return both *.xls and *.xlsx files even if you only specify to return *.xls files.
I tested this behaviour with similar results with other file formats as well. During my testing, I created three files:
See the screenshot of the three test files created below:
Next, I created an SSIS Package with a Foreach Loop Container with Foreach File Enumerator as the enumerator type and specified *.txt as the file mask. The file names read by the Foreach Loop Container are being assigned to a user variable called varFileName. Then, inside the Foreach Loop Container I added a simple VB Script Task that returns the value of the user variable varFileName inside a message box.
See the Foreach Loop Containter configurations below:
You would expect only file_txt.txt to be returned, but as it turns out all three files were returned as can bee seen on the screen captures for the message boxes below:
It looks like the Foreach File Enumerator appends an asterisk (*) at the end of the extension portion of the file mask you specify. This means, that specifying *.txt or *.xls is the same as specifying *.txt* or *.xls*. As an additional test, I executed the <dir> command in D.O.S. to see the results of both masks. The two commands executed are shown below along with the results:
Interestingly, we get the same result in D.O.S. if we specify *.txt or *.txt*. I assume then, that the Foreach Loop file enumeration behaviour may be bound to the D.O.S. output of the Operating System. The current version of D.O.S. on my Windows 7 machine is 6.1.7600. With the introduction of support for more than three letter extensions, this little issue might have been overlooked in D.O.S. for the <dir> command.
The only reference I found regarding this issue with SSIS and the For Each Loop Container was by Douglas Laudenschlager (Blog), technical writer for Microsoft on the SQL Server Integration Services documentation team. The blog post where he mentions this as a gotcha can be found here: http://dougbert.com/blogs/dougbert/archive/2008/06/16/excel-in-integration-services-part-1-of-3-connections-and-components.aspx
Douglas correctly expresses,
There appears to be no way to specify, “Give me .xls but not .xlsx”.
So, if you require to only limit your control flow execution for files with a specific extension, as in Douglas’ example .xls but not .xlsx, then your alternative will be to assign to a separate variable the extension portion of the file name contained in the varFileName variable of my SSIS package above. Once you capture the file extension in a variable, you can use a precedence constraint to restrict further control fow task execution for files of a specific file extension.
In my case, I declared a variable called varFileExtension and added the following line to the existing Script Task in my example:
I can now use a precedence constraint to check if the value contained in my varFileExtension is equal to “txt” as follows:
By specifying a precedence constraint that evaluates for the exact file extension I’m looking to work with, I can limit the rest of the execution for only these files. Furthermore, if I wanted to have separate control flow logics for files of type “.xls” and “.xlsx” for example, I can split my control flow using two precedence constraints. In such case I could change the code inside my script to include the last 4 characters instead of just three. My two precedence constraint expressions would look something like this then:
For XLS files -> @[User::varFileExtension]==”.xls” (period included since we are assigning the last 4 characters now)
For XLSX files -> @[User::varFileExtension]==”xlsx”
The SSIS package would look something like this now:
Filed under: Business Intelligence, Data Warehouse, ETL, SQL Server, SQLServerPedia Syndication, SSIS | Tagged: #sqlserver, 2008, business intelligence, container, data warehouse, dba, enumerator, ETL, extract, file, files, foreach, foreach loop container, load, loop, mask, SQL, SSIS, wildcard |