This is a summary of the Change Data Capture presentation covered in my sessions at SQL Saturday #38 – Jacksonville and SQLSaturday #40 – South Florida.
Change Data Capture (CDC) is one of the new features available for SQL Server 2008 Enterprise and Developer Editions, which allows you to easily track and keep history of all DML (Data Manipulation Language) changes on user tables such as INSERTs, DELETEs, and UPDATEs with little or no overhead. Change Data Capture has been used primarily to track changes for incremental loads using SSIS for the ETL process in Data Warehousing and Business Intelligence because it allows you to identify new data and data that changed or was deleted since last incremental load.
Before Change Data Capture in SQL Server 2005 and prior versions, identifying new or changed data required some level of custom programming, triggers or third party tools. These solutions work well but introduce additional overhead to your production systems. This is the main reason why Change Data Capture was introduced in SQL Server 2008; to eliminate overhead while tracking new and changed data.
But Change Data Capture use is not limited as a tool for ETL Process. It can be used for many other purposes where overhead, simplicity and cost might impose a restriction. Some of these scenarios include:
1) You are a developer adding or modifying existing code and you need to see the difference in the data output on a particular table.
2) You are a DBA and need to audit several tables in a database and need to report which data was changed, what were the new and previous values, who did the change and when was the change made.
3) You were tasked to delete several records on a highly transactional production database and unintentionally deleted records that you were not supposed to with no possibility of rolling back the changes (no explicit transaction) and no possibility of restoring a backup.
4) You are a DBA and need to know which tables and columns are being written to the most in your database.
5) You are a DBA and need to know how many new records are being entered into your database in a period of time and the percentage of change of existing records.
In most cases you would want to know what changed and which were the values before the change.
One of the most important benefits of this feature is that it allows you to see the data before & after an UPDATE or DELETE statement, which in turn allows you to query and recover data overwritten or deleted quickly without the need of a database restore. It also captures each new row inserted.
CDC accomplishes this task by reading the committed operations from the log file and inserting the changed records in a tracking table that mirrors the source (tracked) table. The records inserted in this tracking table contain both the value before the change and the value after the change along with the metadata associated with the change. The metadata can be queried to identify the type of DML operation as follows:
1 = Delete
2 = Insert
3 = Update (record’s value before update)
4 = Update (record’s value after update)
More concepts are covered in my PowerPoint presentation which you can download here or by clicking slide below.
Filed under: Business Intelligence, Data Warehouse, ETL, SQL Server, SQLServerPedia Syndication, SSIS Tagged: | CDC, Change Data Capture, data warehouse, ETL, extract, History tracking, incremental loads, load, SQL, SSIS, transform