In my last post, I talked about using the new MERGE statement in SQL Server 2008. It allows you to merge the data in a result set with a table, based on a join condition that you specify. This is a common scenario in data warehousing, where you often need to take a working table and compare it to a target table to determine if a row should inserted, updated, or deleted. This is often the case for dimension table. Currently, there is no SSIS component in the June 2007 CTP that leverages the MERGE statement directly. This post will explore one possible pattern for using MERGE from SSIS.
In SSIS 2005, a commonly used pattern for updating dimension tables was to retrieve all of the rows intended for the dimension table in a data flow via a source component. Then the rows would be run through a lookup component to determine if they existed in the target table. Rows to be updated (those found in the lookup) would be run through an OLE DB Command component. Rows to be inserted would be sent to an OLE DB Destination.
In cases where performance is a concern, the OLE DB Command could be replaced with another OLE DB Destination that writes all rows to be updated to a temporary table. In the control flow, an Execute SQL task would issue an UPDATE statement, using the temporary table as a source.
To implement this using a MERGE statement, we can modify the second pattern. All rows that are read from the source should be written to a temporary table. Then, the Execute SQL task will be used to run a MERGE statement that uses the temporary table as the source for the merge. The matching of the rows will be done through the join condition in the MERGE statement.
Why would using the MERGE statement be better than using the lookup pattern described above? For one thing, the data flow is simpler, with no branches in execution. Also, the data flow only has to write to a single table, instead of two tables. I haven’t done any performance testing on the MERGE statement yet, so I can’t say for certain whether it is faster.
I’m still hoping for a destination component that can be used directly in the data flow to perform the MERGE, but the SSIS team has indicated that it probably won’t happen.