• Post Categories

  • Browse Blogs

  • Blog Stats

    • 483,300 hits
  • Syndications

    SQLServerPedia Contributor

.Net Code Camp Orlando 2013

CodeCamp

This past Saturday March 16th, I had the opportunity to speak once again the .Net Orlando Code Camp.  I presented a session under the SQL Server track titled “Change Data Capture, a developer’s best unknown friend”.  I had better than expected turn-out with about 20 attendees and overall very good participation.

I was glad to hear from more than one attendee that they were planning to use Change Data Capture in their environment and best of all to get rid of triggers!

I had the opportunity to catch-up with some of my friends and fellow SQL-ers from PASS. Also, met some new folks from the App/Dev community. These type of events are always great to expand your network, share your knowledge and learn something new!

Thanks to the organizers, sponsors, volunteers and attendees to make this event a success and for having me once more.

You can download my PowerPoint presentation and demo files at: http://sdrv.ms/15UUX64

Codeplex: SSIS Community Tasks and Components

I recently bumped into this very exhaustive list of SSIS tasks, components and samples in CodePlex http://ssisctc.codeplex.com. The list is maintained by SQL Server MVP (Blog|Twitter) and  grouped in the following sections:

  • Tools
  • Connection Managers
  •  Log Providers
  • Tasks (for Control Flow)
    • Foreach Enumerators
    • Script Task
    • Script Samples
  • Components (for Data Flow)
    • Script Component Samples
    •  Sources
    • Transforms
    • Destinations

I hope this list keeps being updated and more people contribute. Very useful.

Todo lo que debes saber sobre SSIS en 1 hora!

Gracias a lo asistentes de mi charla Todo lo que debes saber sobre SSIS en 1 hora! en el evento 24HOP LATAM. La verdad me agrado mucho presentar en espanol. Aun cuando mi primera lengua es espanol, todas las presentaciones tecnicas en el pasado las he realizado en ingles.

Si deseas ver la presentacion PowerPoint puedes bajarla dando click en la imagen a continuacion.

Tambien puedes bajar el proyecto que utilize en mi presentacion dando click en la imagen a continuacion. Nota: Solo podras utilizar la base de datos y proyecto en SQL Server Denali, no servira en SQL Server 2008 o 2005.

SSIS Foreach File Enumerator returns more files than expected by appending a wildcard (*) to the file mask

While working on a recent project for a customer that involved importing both Excel 97-2003 and Excel 2007/2010 files, I was a little surprised to discover that the Foreach Loop File Enumerator will return both *.xls and *.xlsx files even if you only specify to return *.xls files.

I tested this behaviour with similar results with other file formats as well. During my testing, I created three files:

  1. File_txt.txt
  2. File_txts.txts
  3. File_txtsy.txtsy

See the screenshot of the three test files created below:

Next, I created an SSIS Package with a Foreach Loop Container with Foreach File Enumerator as the enumerator type and  specified *.txt as the file mask. The file names read by the Foreach Loop Container are being assigned to a user variable called varFileName. Then, inside the Foreach Loop Container I added a simple VB Script Task that returns the value of the user variable varFileName inside a message box.

See the Foreach Loop Containter configurations below:

See the VB script inside the Script Task below: 

You would expect only file_txt.txt to be returned, but as it turns out all three files were returned as can bee seen on the screen captures for the message boxes below:

It looks like the Foreach File Enumerator appends an asterisk (*) at the end of the extension portion of the file mask you specify. This means, that specifying *.txt or *.xls is the same as specifying *.txt* or *.xls*. As an additional test, I executed the <dir> command in D.O.S. to see the results of both masks. The two commands executed are shown below along with the results:

Interestingly, we get the same result in D.O.S. if we specify *.txt or *.txt*. I assume then, that the Foreach Loop file enumeration behaviour may be bound to the D.O.S. output of the Operating System. The current version of D.O.S. on my Windows 7 machine is 6.1.7600. With the introduction of support for more than three letter extensions, this little issue might have been overlooked in D.O.S. for the <dir> command.

The only reference I found regarding this issue with SSIS and the For Each Loop Container was by Douglas Laudenschlager (Blog), technical writer for Microsoft on the SQL Server Integration Services documentation team. The blog post where he mentions this as a gotcha can be found here: http://dougbert.com/blogs/dougbert/archive/2008/06/16/excel-in-integration-services-part-1-of-3-connections-and-components.aspx

Douglas correctly expresses,

 There appears to be no way to specify, “Give me .xls but not .xlsx”.

So, if you require to only limit your control flow execution for files with a specific extension, as in Douglas’ example .xls but not .xlsx, then your alternative will be to assign to a separate variable the extension portion of the file name contained in the varFileName variable of my SSIS package above. Once you capture the file extension in a variable, you can use a precedence constraint to restrict further control fow task execution for files of a specific file extension.

In my case, I declared a variable called varFileExtension and added the following line to the existing Script Task in my example:



I can now use a precedence constraint to check if the value contained in my varFileExtension is equal to “txt” as follows:

 By specifying a precedence constraint that evaluates for the exact file extension I’m looking to work with, I can limit the rest of the execution for only these files. Furthermore, if I wanted to have separate control flow logics for files of type “.xls” and “.xlsx” for example, I can split my control flow using two precedence constraints. In such case I could change the code inside my script to include the last 4 characters instead of just three. My two precedence constraint expressions would look something like this then:

  1. For XLS files -> @[User::varFileExtension]==”.xls” (period included since we are assigning the last 4 characters now)
  2. For XLSX files -> @[User::varFileExtension]==”xlsx”

The SSIS package would look something like this now:

 

SSIS errors: Bulk Load failed. Cannot obtain the required interface (“IID_IColumnsInfo”) from OLE DB provider “BULK” for linked server “(null)

When working with the Bulk Insert Task in SSIS 2008 you may get the following error:

 The complete error message is:

[Bulk Insert Task] Error: An error occurred with the following error message: “Cannot obtain the required interface (“IID_IColumnsInfo”) from OLE DB provider “BULK” for linked server “(null)”.The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.”

This error description can make you jump through hoops trying to figure out why is it detecting an error with the OLE DB provider or why does SSIS thinks you are trying to execute this operation on a linked server?

The real issue here has nothing to do with the first  sentence in the error description. Sentence 4 gives you the actual error:

Verify that the field terminator and row terminator are specified correctly.

You may be experiencing this error due to one or more of the following 3 reasons:

1) You may be specifying a wrong CommonDelimiter for your source file.
For example, You may be trying to do a Bulk Insert operation from a Comma Separated Value (CSV) file but did not change the CommonDelimiter property to Comma {,}. When you drag in the Bulk Insert Task the CommonDelimiter property default value is Tab.

2) You may be specifying a wrong RowDelimiter for your source file.
For example, you may be trying to do a Bulk Insert operation from a Comma Separated Value (CSV) file whose row delimiter character is different than the RowDelimiter property default value of {CR}{LF}. In some cases, you may receive a file with a very long stream of text with no Carriage Return (\r)  & Line Feed (\n) characters, commonly denoted as {CR}{LF} in between rows. These hidden {CR}{LF} row delimiter characters are placed on a text file each time you hit the ENTER key on your keyboard denoting the end of a row and beginning of the next row. You may read a little more about the Carriage Return and Line Feed characters in Pinal Dave’s blog: http://blog.sqlauthority.com/2009/07/01/sql-server-difference-between-line-feed-n-and-carriage-return-r-t-sql-new-line-char/

As seen on the image below, there are two properties,  CommonDelimiter and RowDelimiter, that you need to make sure  you specify the correct values for depending on your input or source file format:

.
3) You may be using a format file with an incorrect or invalid format defined.
Format files can be non-xml, commonly with an *.fmt extension or for SQL Server 2005 and later only you can also use xml format files. For more information about format files read MSDN Books on Line http://msdn.microsoft.com/en-us/library/ms191516.aspx

If you are using a format file make sure you are pointing to the right format file or that the format defined in your format file is correct. (Notice that the Format property value changes from Specify to Use File):

Change Data Capture as a tool for Business Intelligence, Disaster Recovery, Auditing, and more

This is a summary of the Change Data Capture presentation covered in my sessions at SQL Saturday #38 – Jacksonville and SQLSaturday #40 – South Florida.

Presentation Summary
Change Data Capture (CDC) is one of the new features available for SQL Server 2008 Enterprise and Developer Editions, which allows you to easily track and keep history of all DML (Data Manipulation Language) changes on user tables such as INSERTs, DELETEs, and UPDATEs with little or no overhead. Change Data Capture has been used primarily to track changes for incremental loads using SSIS for the ETL process in Data Warehousing and Business Intelligence because it allows you to identify new data and data that changed or was deleted since last incremental load.

Before Change Data Capture in SQL Server 2005 and prior versions, identifying new or changed data required some level of custom programming, triggers or third party tools. These solutions work well but introduce additional overhead to your production systems. This is the main reason why Change Data Capture was introduced in SQL Server 2008; to eliminate overhead while tracking new and changed data.

But Change Data Capture use is not limited as a tool for ETL Process. It can be used for many other purposes where overhead, simplicity and cost might impose a restriction.  Some of these scenarios include:

1) You are a developer adding or modifying existing code and you need to see the difference in the data output on a particular table.
2) You are a DBA and need to audit several tables in a database and need to report which data was changed, what were the new and previous values, who did the change and when was the change made.
3) You were tasked to delete several records on a highly transactional production database and unintentionally deleted records that you were not supposed to with no possibility of rolling back the changes (no explicit transaction) and no possibility of restoring a backup.
4) You are a DBA and need to know which tables and columns are being written to the most in your database.
5) You are a DBA and need to know how many new records are being entered into your database in a period of time and the percentage of change of existing records.

In most cases you would want to know what changed and which were the values before the change.

 One of the most important benefits of this feature is that it allows you to see the data before & after an UPDATE or DELETE statement, which in turn allows you to query and recover data overwritten or deleted quickly without the need of a database restore. It also captures each new row inserted.

CDC accomplishes this task by reading the committed operations from the log file and inserting the changed records in a tracking table that mirrors the source (tracked) table. The records inserted in this tracking table contain both the value before the change and the value after the change along with the metadata associated with the change. The metadata can be queried to identify the type of DML operation as follows:

1 = Delete
2 = Insert
3 = Update (record’s value before update)
4 = Update (record’s value after update)

More concepts are covered in my PowerPoint presentation which you can download here or by clicking slide below.

%d bloggers like this: