SQL Server 2008 – Using Merge From SSIS

In my last post, I talked about using the new MERGE statement in SQL Server 2008. It allows you to merge the data in a result set with a table, based on a join condition that you specify. This is a common scenario in data warehousing, where you often need to take a working table and compare it to a target table to determine if a row should inserted, updated, or deleted. This is often the case for dimension table. Currently, there is no SSIS component in the June 2007 CTP that leverages the MERGE statement directly. This post will explore one possible pattern for using MERGE from SSIS.

In SSIS 2005, a commonly used pattern for updating dimension tables was to retrieve all of the rows intended for the dimension table in a data flow via a source component. Then the rows would be run through a lookup component to determine if they existed in the target table. Rows to be updated (those found in the lookup) would be run through an OLE DB Command component. Rows to be inserted would be sent to an OLE DB Destination.

image

In cases where performance is a concern, the OLE DB Command could be replaced with another OLE DB Destination that writes all rows to be updated to a temporary table. In the control flow, an Execute SQL task would issue an UPDATE statement, using the temporary table as a source.

image

To implement this using a MERGE statement, we can modify the second pattern. All rows that are read from the source should be written to a temporary table. Then, the Execute SQL task will be used to run a MERGE statement that uses the temporary table as the source for the merge. The matching of the rows will be done through the join condition in the MERGE statement.

image

Why would using the MERGE statement be better than using the lookup pattern described above? For one thing, the data flow is simpler, with no branches in execution. Also, the data flow only has to write to a single table, instead of two tables. I haven’t done any performance testing on the MERGE statement yet, so I can’t say for certain whether it is faster.

I’m still hoping for a destination component that can be used directly in the data flow to perform the MERGE, but the SSIS team has indicated that it probably won’t happen.

4 Comments

  1. dougieday says:

    I had the same problem, but found a nice work around using the new table data types in SQL Server 2008 and the MERGE statement. Check out a later post of mine to see the solution for doing all of this in memory in dataflow

  2. Siva Prasad says:

    beautiful article..very useful and a turn around solution for SCD.. i have a small suggestion here..I dont think we need a Temp table here when we use Merge statement in Execute SQL Task.data will flow directly from OLE DB source to Execute SQL task. correct me if i’m wrong:)

  3. Richard says:

    thanks a million. Perfect working answer to what I was trying to do – I was stuck with a ‘working great in ssms code’ – pasted directly into (just one) sql-task. and not able to populate the merge’s ‘source’ dynamically. lol – thanks again.

Leave a Reply