At the Microsoft BI Conference, I did a chalk talk on creating custom components for SSIS. It seemed to go pretty well, as I got a lot of questions and interaction from the audience, which is the point of a chalk talk. As part of it, I demonstrated a component that can be used to perform a Merge (or any other batch oriented SQL statement) in the data flow. This is a pretty common request / complaint about SSIS, as evident from this post on Jamie’s blog.
There’s a pretty common pattern in SSIS for for performing updates from the data flow. You can use the OLE DB Command, but it processes data row by row, meaning that performance is not great. To get better performance, you can write the rows to be updated to a working table using an OLE DB Destination in the data flow, and then use an Execute SQL task in the control flow to perform a batch UPDATE against the true destination table. The same concept can be used with the new MERGE statement in SQL Server 2008, as shown in Using MERGE from SSIS. However, since by default SSIS will drop connections after each task completes, you can’t easily use a temporary table as the working table, so you have to create and clean up “permanent” tables, either in the control flow each time the package runs, or by just leaving them in the database.
Clearly, there are some drawbacks to this. You probably don’t want permanent tables in your database that are there only to support a back end process, and it would certainly be simpler to not have to create and delete the table in each package that needs to make use of them. To work around this, I created the Batch Destination component.
Basically, the component works by reading the metadata from the input columns in the data flow, and creating a working table in the target database. It then loads all the incoming rows to the working table. After all the incoming rows have been received and saved in the working table, it executes a SQL statement supplied by the developer. This SQL statement could be a MERGE statement, an UPDATE statement, or pretty much any other DML statement you’d like. After executing the SQL statement, it deletes the working table.
The user interface for the component is shown below. You provide an ADO.NET connection manager to a SQL Server database, a working table name (this can be a temp table, just use # in front of the name), and the SQL command to execute. On the Input Columns tab, you select the columns that you want to include in the working table.
There are a few limitations in this component. One, it only works with SQL Server through an ADO.NET connection manager currently. Two, it performs very little validation right now, so you can easily add a SQL command that is not actually valid, and you won’t find out till you run the component. Three, it’s written against the SQL Server 2008 API, so it won’t work in 2005.
I’m planning on making the source code available on CodePlex, but in the meantime, you can download a compiled version from my SkyDrive. To install it, you need to put the .DLL in the GAC, and copy it to your C:\Program Files (x86)\Microsoft SQL Server\100\DTS\PipelineComponents directory. To add it to the toolbox in BIDS, right-click on the toolbox, select “Choose Items…”, select the SSIS Data Flow Items tab, and select the BatchDestination item in the list.