Yesterday I posted a simple approach for inserting data into tables with a parent child relationship (like Order Header to Order Detail) where you need to know the generated identity key to tie the records together. That approach relied on having a business value that could be used to consistently tie the parent and child data together. I ended that post with a question – what happens if you don’t have that value?
One example of this might be an order feed that comes from external company. Each row contains the order header and order detail information. Each of these rows should be broken up and inserted into two tables, one for the header data, the other for the detail data. Assuming that there is no business value that can tie these together, you need to get the identity key for the parent row as soon as it is inserted.
One approach for this involves retrieving a result set using an Execute SQL task, and shredding it with the For Each Loop. This approach does have some downsides – please review the Pros and Cons section below for more details. Within the For Each loop, two Execute SQL Tasks are used – one to insert the header row and retrieve it’s ID value, the other to insert the child row.
Retrieving the result set in the initial Execute SQL task is straightforward – just retrieve all the order rows. You can use a For Each loop to shred the retrieved result set by selecting the For Each ADO enumerator, and mapping the columns in the result set to variables. The For Each will run once per row in the result set, setting the variable values to match the current row.
The next Execute SQL task is a little more interesting. It inserts the header row and uses the T-SQL OUTPUT clause to return the identity value for that row.
INSERT INTO parent (order_date, customer_id)
OUTPUT INSERTED.order_id
VALUES (GETDATE(), ?)
[/sourcecode]
If you aren’t using a database engine that supports OUTPUT or similar functionality, you can also execute a second SQL statement in the same task to get the identity value using @@IDENTITY (be aware that there are some constraints when using this), SCOPE_IDENTITY(), or the appropriate function for your database.
The Execute SQL task maps the returned identity value to another package variable (order_id in this example).
The next Execute SQL task can then use that variable when inserting the child record into the order details table.
Pros and Cons
This approach has the advantage of guaranteeing that your child records are matched to the correct parent records. This can be a big plus if you don’t have a reliable way to match the records when making two passes.
There are a couple of downsides, though. The biggest one is speed. This approach forces you into RBAR (row by agonizing row) mode, so it’s going to be slow, especially compared to a batch load operation. The second problem is that if you are dealing with a lot of columns, creating all the variables and maintaining the mappings isn’t a fun experience.
Overall, I prefer the approach I mentioned in the first article, and I tend to avoid this one if possible. In some cases, even if you don’t have a reliable key, you may be able to work around it by staging some data to a working table or modifying the data source to provide a reliable key.
The sample package for this post is on my SkyDrive.