Archive for October 2010

Insert Parent and Child Tables with SSIS Part 2

Yesterday I posted a simple approach for inserting data into tables with a parent child relationship (like Order Header to Order Detail) where you need to know the generated identity key to tie the records together. That approach relied on having a business value that could be used to consistently tie the parent and child data together. I ended that post with a question – what happens if you don’t have that value?

One example of this might be an order feed that comes from external company. Each row contains the order header and order detail information. Each of these rows should be broken up and inserted into two tables, one for the header data, the other for the detail data. Assuming that there is no business value that can tie these together, you need to get the identity key for the parent row as soon as it is inserted.

One approach for this involves retrieving a result set using an Execute SQL task, and shredding it with the For Each Loop. This approach does have some downsides – please review the Pros and Cons section below for more details. Within the For Each loop, two Execute SQL Tasks are used – one to insert the header row and retrieve it’s ID value, the other to insert the child row.

image

Retrieving the result set in the initial Execute SQL task is straightforward – just retrieve all the order rows. You can use a For Each loop to shred the retrieved result set by selecting the For Each ADO enumerator, and mapping the columns in the result set to variables. The For Each will run once per row in the result set, setting the variable values to match the current row.

SNAGHTMLf0da024

The next Execute SQL task is a little more interesting. It inserts the header row and uses the T-SQL OUTPUT clause to return the identity value for that row.

INSERT INTO parent (order_date, customer_id)
        OUTPUT INSERTED.order_id
        VALUES (GETDATE(), ?)

If you aren’t using a database engine that supports OUTPUT or similar functionality, you can also execute a second SQL statement in the same task to get the identity value using @@IDENTITY (be aware that there are some constraints when using this), SCOPE_IDENTITY(), or the appropriate function for your database.

The Execute SQL task maps the returned identity value to another package variable (order_id in this example).

SNAGHTMLf117a4c

The next Execute SQL task can then use that variable when inserting the child record into the order details table.

Pros and Cons

This approach has the advantage of guaranteeing that your child records are matched to the correct parent records. This can be a big plus if you don’t have a reliable way to match the records when making two passes.

There are a couple of downsides, though. The biggest one is speed. This approach forces you into RBAR (row by agonizing row) mode, so it’s going to be slow, especially compared to a batch load operation. The second problem is that if you are dealing with a lot of columns, creating all the variables and maintaining the mappings isn’t a fun experience.

Overall, I prefer the approach I mentioned in the first article, and I tend to avoid this one if possible. In some cases, even if you don’t have a reliable key, you may be able to work around it by staging some data to a working table or modifying the data source to provide a reliable key.

The sample package for this post is on my SkyDrive.

Inserting Parent and Child Tables with SSIS

A relatively frequent occurrence in data integration is the need to insert data into a parent table (like Order Header) and insert related records into a child table (Order Details). If the data is already populated with appropriate keys, and you are just copying it, this isn’t too complex – just copy the parent table first, then the child. What if the new tables use identity keys, though? You need to get the new identity key for each header row before you can insert the child row. This post is going to walk through one pattern for doing this, and I’ll show an alternate approach in my next post.

The first approach assumes that you have some common piece of information to link the data. In this case, often the simplest approach is to use two Data Flow tasks, run in sequence. This is my preferred solution to this problem, because it’s fast and it’s usually straightforward to implement.

The first Data Flow loads the parent records, and the second loads the child records. Note that the second Data Flow can’t run until the first succeeds.

image

The first Data Flow is pretty straight forward, and simply retrieves the parent (order header) data and inserts it into the parent table. The most important item here is that the source component retrieves the right data – that is, one row per order header, and that it includes some information that can be used to uniquely identify the order. In the sample package I’ve linked to below, you’ll see that the source of the order records is a single table, where a given row includes both header information and the detail. The source query for the data flow selects and groups on customer ID, as that uniquely identifies the order in this scenario (one order per customer, per day).

The second data flow retrieves the order detail for the same source table. It then uses a Lookup transform to retrieve the correct order ID (the identity key) from the parent table. The Lookup just needs enough data to make a unique match – in this case, that’s the current date and the customer id.

image

That’s really all there is to the simple pattern. You can find a sample package that illustrates this on my SkyDrive. But there can be more complex scenarios where you still need to handle a Parent / Child insert. For example, what if there is no reliable key to tie the order detail rows to the order header? In this case you can’t use the lookup. Stay tuned for the next post, where I’ll discuss a different pattern that can handle this scenario, but involves some tradeoffs.

QLSaturday #49 and #56 Recaps–A Little Different\

SQLSaturday #49

My recap on SQLSaturday #49 is waaay overdue (sorry, Jack and Andy). It was nice to get a chance to present at the home of SQLSaturday, and I have to say, the amount of experience they have in doing SQLSaturday events really shows. The event ran smoothly, and they had plenty of helpful volunteers to help in the appropriate places. I missed the speaker dinner, but heard it was good. There was also some excitement Friday night, when the hotel fire alarm when off at 3:30 AM. Not a pleasant wakeup experience. You could tell the SQLSaturday people who were staying at the hotel though – they were the ones in pajamas with laptop bags over their shoulders.

The event was well attended with 270 attendees and the sessions I attended were pretty full. My presentation was on SQL Azure and loading data to and from the cloud with SSIS and BCP. Fortunately, I was able to get a good internet connection (some other presenters had warned me that reception was a little weak in some rooms), so the demos ran as expected. The after-party was nice as well. I had the chance to catch up with some friends (thanks for the ride, Eric) and meet some new people from the area.

SQLSaturday #56

SQLSaturday #56 – BI Edition was this past weekend. It was another good event, and had about 215 attendees. I got to spent some time with Greg Galloway (one of the other developers on BIDS Helper), Tim Mitchell, and I met several new people as well. The event itself was held at the Microsoft campus in Dallas, and rooms were very nice. We did have a couple of tornado scares, the first of which came up just as my first presentation was starting. It’s a little disconcerting hear an announcement that everyone should proceed to the storm shelters two sentences into your presentation. But as it turned out, the presentation room we were in was a “tornado safe zone” (which I guess means safer that the surrounding rooms, which had lots of glass windows), so the presentation was able to continue. The event had a small number of volunteers, and it was nice to see speakers and attendees pitching in to help wherever needed, whether cleaning up or helping direct people to the appropriate rooms.

A Little Different

Something struck me about both these SQLSaturday events. They were both felt relatively low key and relaxed. I wasn’t involved in the organization for either of these, but I got the impression that the organizers felt that they wanted to provide a good event without going to extremes. They realized that not everything has to be perfect or prearranged to have a good event. At some other events, I got the distinct impression that the organizers sat down and said “How do we make this SQLSaturday the best one ever?”, really focused on setting the bar higher, and probably experienced a lot of stress in the process. Don’t get me wrong – I think improving the SQLSaturday experience is important, and continuing to try new things to improve that experience is good. I’ve really enjoyed those events that changed my concept of what a SQLSaturday could be.

However, some of the groups that have done the “higher-end” SQLSaturday events have had a lot of sponsors (which means a bigger budget) and a lot of volunteers (which means spreading the workload a lot more). For those of you considering organizing a SQLSaturday in your area, I think it’s important to know that you can organize a very successful SQLSaturday on a smaller budget and with fewer volunteers. If you focus on providing what the attendees are there for (focused, relevant content that helps them learn), the rest of it tends to sort itself out. Based on the speakers and attendees that I’ve talked to at various events, there’s not a significant difference in the satisfaction levels between events. Attendees are very happy to get free, quality training. Speakers are happy to have an audience. The other stuff (meals, giveaways, personal assistants for every attendee :) ) is secondary.

So, if you are considering a new SQLSaturday event, don’t worry about how it will compare with the one the next state over, or the one you gave last year. Most people will be very happy however you choose to organize it. And if someone refuses to attend because there won’t be free doughnuts, well, did you really want them there in the first place?

One Week to SQLSaturday #49

One week left till SQLSaturday #49 – there’s still time to register. I’m pretty excited about going, as it will be my first Orlando SQLSaturday. Also, it looks like a great lineup of speakers.

I’ll be presenting on ways to get data into and out of SQL Azure, with a focus on SSIS (not surprising that I’d put that spin on it, is it?) However, it will cover some of the other options too, and you may be surprised by the results. If you’re going to be around, please drop in, either to say hi, or as Eric plans to do, to throw fruit (looking forward to it, Eric). I’m happy either way.

SQLSaturday #48 Recap

I presented at SQLSaturday #48 this weekend. It was a great event, and I thoroughly enjoyed it. K. Brian Kelley and the other organizers put on a great event. This was the first SQLSaturday that I’ve been to in a while where I had the time and flexibility to attend several of the sessions. I particularly enjoyed Andy Warren’s presentation on personal development plans. It’s definitely something I need to spend more time on. Andrew Kelly’s presentation on storage and IO was good as well – I’ve seen it before, but I pick up something new every time.

From an attendee perspective, the event was very good. There were plenty of great sessions to choose from. The location (Midlands College) was also nice, with plenty of room, and a nice setup for both attending sessions and networking. The lunch room was a little crowded, but it wasn’t a problem – several of us just grabbed some space on the floor.

Looking at it from a speaker’s standpoint, it was well organized. The directions to various classrooms were clear, with schedules on the doors. The room proctors were very effective and helpful. I missed the speaker dinner, but I heard it was good. The only negative that I noticed is that some sessions were pretty lightly attended. I think this was mostly a factor of the number of simultaneous sessions spreading the attendees a little thin. It wasn’t a big deal, though – most of the speakers are happy to speak to small or large crowds.

Overall, definitely a good event, and I’ll be looking forward to the next one.