• Post Categories

  • Browse Blogs

  • Blog Stats

    • 634,118 hits
  • Syndications

    SQLServerPedia Contributor

Creating a Top 1, Top 10, Top n Customer or Product List using Named Sets in SSAS Part 1 of 3

Just like stored procedures in SQL Server, Analysis Services provide  a similar functionality through stored MDX expressions called named sets.  SQL server stored procedures allow you to:
– Return a set of data from your database
– Can be reutilized by being called in scripts
– Are named descriptive of the data they return (ideally)

Similarly, SSAS allows you to create reusable pieces of MDX code that can be called in other MDX scripts by its name or alias and have the same characteristics as described above. Additionally, these reusable pieces of MDX code may or may not use parameters just like parameter-less stored procedures.  In essence a Named Set is a predefined subset of your cube or a sub-cube. For purpose of simplicity I will go over named sets that do not use parameters.

One of the most common queries requested by Sales and Marketing Managers is a list of Top 10 Customers.

This list of customers may be required in ad-hoc queries as well as weekly, monthly, quarterly or yearly sales reports. This is where Named Sets in SSAS are most beneficial. Instead of rewriting the same MDX query over and over and having to change it everywhere you use it each time the criteria for Top 10 customers changes (believe me it will), you simply call this named set and whatever changes you make to the named set trickles down to wherever you call it.

The general MDX syntax to define this Named Set is:

CREATE SET CURRENTCUBE.[Top 10 Customers]
AS
TopCount
(
  (Existing [Customer]. [Customer].[Customer].Members]),
  10,
  [Measures].[Sales Amount]
);

As you can see the MDX function that helps us easily create this list is TopCount(). The keyword *Existing* forces the data set to be reevaluated each time dimension criteria changes, for example if different time period such as Year,  Quarter or Month is selected or if the cube is sliced by a particular product. In this case, the Top 10 Customers list will be generated based on the criteria applied.

Conversely if you need to create a list of the 10 least profitable customers or least profitable products for example, you can use the BottomCount() MDX function in a similar fashion.

Special attention needs to be taken when using BottomCount() MDX function for customers or products that had no correponding sales amount for the given criteria. For example if during last month 15 customers have no sales [Sales Amount]=0 or [Sales Amount] is NULL, then BottomCount() customers list will be made up of customers with $0 or NULL  sales amounts. These $0 and NULL sales amount records can be filtered out using the Filter() MDX function for $0 values and with the NONEMPTY() MDX function.

In Part 2 of this post I will be providing some examples of Named
Sets using the Filter(),  NonEmpty() and another useful function called Item() which allows you to select a particular tuple from a set.

In Part 3, I will be explaining the difference between TopCount() vs. Head() and BottomCount() vs Tail() MDX functions. In some instances a particular function might be simpler and more efficient.

Change Data Capture as a tool for Business Intelligence, Disaster Recovery, Auditing, and more

This is a summary of the Change Data Capture presentation covered in my sessions at SQL Saturday #38 – Jacksonville and SQLSaturday #40 – South Florida.

Presentation Summary
Change Data Capture (CDC) is one of the new features available for SQL Server 2008 Enterprise and Developer Editions, which allows you to easily track and keep history of all DML (Data Manipulation Language) changes on user tables such as INSERTs, DELETEs, and UPDATEs with little or no overhead. Change Data Capture has been used primarily to track changes for incremental loads using SSIS for the ETL process in Data Warehousing and Business Intelligence because it allows you to identify new data and data that changed or was deleted since last incremental load.

Before Change Data Capture in SQL Server 2005 and prior versions, identifying new or changed data required some level of custom programming, triggers or third party tools. These solutions work well but introduce additional overhead to your production systems. This is the main reason why Change Data Capture was introduced in SQL Server 2008; to eliminate overhead while tracking new and changed data.

But Change Data Capture use is not limited as a tool for ETL Process. It can be used for many other purposes where overhead, simplicity and cost might impose a restriction.  Some of these scenarios include:

1) You are a developer adding or modifying existing code and you need to see the difference in the data output on a particular table.
2) You are a DBA and need to audit several tables in a database and need to report which data was changed, what were the new and previous values, who did the change and when was the change made.
3) You were tasked to delete several records on a highly transactional production database and unintentionally deleted records that you were not supposed to with no possibility of rolling back the changes (no explicit transaction) and no possibility of restoring a backup.
4) You are a DBA and need to know which tables and columns are being written to the most in your database.
5) You are a DBA and need to know how many new records are being entered into your database in a period of time and the percentage of change of existing records.

In most cases you would want to know what changed and which were the values before the change.

 One of the most important benefits of this feature is that it allows you to see the data before & after an UPDATE or DELETE statement, which in turn allows you to query and recover data overwritten or deleted quickly without the need of a database restore. It also captures each new row inserted.

CDC accomplishes this task by reading the committed operations from the log file and inserting the changed records in a tracking table that mirrors the source (tracked) table. The records inserted in this tracking table contain both the value before the change and the value after the change along with the metadata associated with the change. The metadata can be queried to identify the type of DML operation as follows:

1 = Delete
2 = Insert
3 = Update (record’s value before update)
4 = Update (record’s value after update)

More concepts are covered in my PowerPoint presentation which you can download here or by clicking slide below.