• Post Categories

  • Browse Blogs

  • Blog Stats

    • 634,113 hits
  • Syndications

    SQLServerPedia Contributor

SSIS Foreach File Enumerator returns more files than expected by appending a wildcard (*) to the file mask

While working on a recent project for a customer that involved importing both Excel 97-2003 and Excel 2007/2010 files, I was a little surprised to discover that the Foreach Loop File Enumerator will return both *.xls and *.xlsx files even if you only specify to return *.xls files.

I tested this behaviour with similar results with other file formats as well. During my testing, I created three files:

  1. File_txt.txt
  2. File_txts.txts
  3. File_txtsy.txtsy

See the screenshot of the three test files created below:

Next, I created an SSIS Package with a Foreach Loop Container with Foreach File Enumerator as the enumerator type and  specified *.txt as the file mask. The file names read by the Foreach Loop Container are being assigned to a user variable called varFileName. Then, inside the Foreach Loop Container I added a simple VB Script Task that returns the value of the user variable varFileName inside a message box.

See the Foreach Loop Containter configurations below:

See the VB script inside the Script Task below: 

You would expect only file_txt.txt to be returned, but as it turns out all three files were returned as can bee seen on the screen captures for the message boxes below:

It looks like the Foreach File Enumerator appends an asterisk (*) at the end of the extension portion of the file mask you specify. This means, that specifying *.txt or *.xls is the same as specifying *.txt* or *.xls*. As an additional test, I executed the <dir> command in D.O.S. to see the results of both masks. The two commands executed are shown below along with the results:

Interestingly, we get the same result in D.O.S. if we specify *.txt or *.txt*. I assume then, that the Foreach Loop file enumeration behaviour may be bound to the D.O.S. output of the Operating System. The current version of D.O.S. on my Windows 7 machine is 6.1.7600. With the introduction of support for more than three letter extensions, this little issue might have been overlooked in D.O.S. for the <dir> command.

The only reference I found regarding this issue with SSIS and the For Each Loop Container was by Douglas Laudenschlager (Blog), technical writer for Microsoft on the SQL Server Integration Services documentation team. The blog post where he mentions this as a gotcha can be found here: http://dougbert.com/blogs/dougbert/archive/2008/06/16/excel-in-integration-services-part-1-of-3-connections-and-components.aspx

Douglas correctly expresses,

 There appears to be no way to specify, “Give me .xls but not .xlsx”.

So, if you require to only limit your control flow execution for files with a specific extension, as in Douglas’ example .xls but not .xlsx, then your alternative will be to assign to a separate variable the extension portion of the file name contained in the varFileName variable of my SSIS package above. Once you capture the file extension in a variable, you can use a precedence constraint to restrict further control fow task execution for files of a specific file extension.

In my case, I declared a variable called varFileExtension and added the following line to the existing Script Task in my example:



I can now use a precedence constraint to check if the value contained in my varFileExtension is equal to “txt” as follows:

 By specifying a precedence constraint that evaluates for the exact file extension I’m looking to work with, I can limit the rest of the execution for only these files. Furthermore, if I wanted to have separate control flow logics for files of type “.xls” and “.xlsx” for example, I can split my control flow using two precedence constraints. In such case I could change the code inside my script to include the last 4 characters instead of just three. My two precedence constraint expressions would look something like this then:

  1. For XLS files -> @[User::varFileExtension]==”.xls” (period included since we are assigning the last 4 characters now)
  2. For XLSX files -> @[User::varFileExtension]==”xlsx”

The SSIS package would look something like this now:

 

Differences between Merge and Union All transformations in SSIS

 What are the differences between Merge and Union All transformations in SSIS ?

 The first and most obvious difference is that Merge can only accept two datasets while Union All can accept more than two datasets for input. The second difference is that Merge requires both datasets to be sorted while Union All does not require sorted datasets. Both transformations are considered to be partially-blocking.

As can be seen on the following image, Union All can accept more than two datasets as input while Merge is limited to only two input datasets. Also, notice that for the Merge transformation, the datasets need to be sorted before they can be accepted as valid input.

For simplification purposes,  a Sort transformation was used. Keep in mind that the Sort transformation is a full blocking operation because it needs to read all the rows in the data flow (buffer) in order to sort the data. No data will be sent downstream until all rows have been read first by the Sort transformation. A better approach would be to sort the datasets with an ORDER BY T-SQL clause at the data source component.

SSIS errors: Bulk Load failed. Cannot obtain the required interface (“IID_IColumnsInfo”) from OLE DB provider “BULK” for linked server “(null)

When working with the Bulk Insert Task in SSIS 2008 you may get the following error:

 The complete error message is:

[Bulk Insert Task] Error: An error occurred with the following error message: “Cannot obtain the required interface (“IID_IColumnsInfo”) from OLE DB provider “BULK” for linked server “(null)”.The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.”

This error description can make you jump through hoops trying to figure out why is it detecting an error with the OLE DB provider or why does SSIS thinks you are trying to execute this operation on a linked server?

The real issue here has nothing to do with the first  sentence in the error description. Sentence 4 gives you the actual error:

Verify that the field terminator and row terminator are specified correctly.

You may be experiencing this error due to one or more of the following 3 reasons:

1) You may be specifying a wrong CommonDelimiter for your source file.
For example, You may be trying to do a Bulk Insert operation from a Comma Separated Value (CSV) file but did not change the CommonDelimiter property to Comma {,}. When you drag in the Bulk Insert Task the CommonDelimiter property default value is Tab.

2) You may be specifying a wrong RowDelimiter for your source file.
For example, you may be trying to do a Bulk Insert operation from a Comma Separated Value (CSV) file whose row delimiter character is different than the RowDelimiter property default value of {CR}{LF}. In some cases, you may receive a file with a very long stream of text with no Carriage Return (\r)  & Line Feed (\n) characters, commonly denoted as {CR}{LF} in between rows. These hidden {CR}{LF} row delimiter characters are placed on a text file each time you hit the ENTER key on your keyboard denoting the end of a row and beginning of the next row. You may read a little more about the Carriage Return and Line Feed characters in Pinal Dave’s blog: http://blog.sqlauthority.com/2009/07/01/sql-server-difference-between-line-feed-n-and-carriage-return-r-t-sql-new-line-char/

As seen on the image below, there are two properties,  CommonDelimiter and RowDelimiter, that you need to make sure  you specify the correct values for depending on your input or source file format:

.
3) You may be using a format file with an incorrect or invalid format defined.
Format files can be non-xml, commonly with an *.fmt extension or for SQL Server 2005 and later only you can also use xml format files. For more information about format files read MSDN Books on Line http://msdn.microsoft.com/en-us/library/ms191516.aspx

If you are using a format file make sure you are pointing to the right format file or that the format defined in your format file is correct. (Notice that the Format property value changes from Specify to Use File):