Archive for the ‘SSIS’ Category.

Creating a Basic Package Using Biml

This article is going to walk through the process of creating a simple package using Biml and the Biml Package Generator feature in BIDS Helper. To start out, you need to install the latest beta of BIDS Helper from CodePlex. Once that is set up, you should create a new Integration Services project in BIDS. In the project, right-click on the Project in the Solution Explorer. There’s a new item in this menu – Add New Biml File.

image

Clicking Add New Biml File will add a new file to the Miscellaneous folder in the solution named BimlScript.biml. (The name is automatically generated, so it may be BimlScript1.biml, etc). You can right-click on the file and choose rename to give the file a more specific name. For this example, rename the file “BasicPackage.biml”.

Double-clicking on the file will open the XML editor inside of BIDS. The editor supports Intellisense for Biml, so typing an opening tag (“<”) will give you a list of valid options for tags you can use. (If you aren’t seeing the Intellisense, please check this link for troubleshooting steps.)

image

For this example, copy and paste the following Biml into the document. Since the code below includes the document root tags (<Biml>), you’ll want to make sure you replace the entire contents of the Biml file.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <Connection Name="AdventureWorks" ConnectionString="Server=.;Initial Catalog=AdventureWorks;Integrated Security=SSPI;Provider=SQLNCLI10"/>
    </Connections>
    <Packages>
        <Package Name="Biml Sample" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
            <Tasks>
                <Dataflow Name="Extract Table List">
                    <Transformations>
                        <OleDbSource Name="Get Table List" ConnectionName="AdventureWorks">
                            <DirectInput>SELECT * FROM sys.tables</DirectInput>
                        </OleDbSource>
                        <Multicast Name="Multicast"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>

The first section (<Connections>) of this Biml defines an OleDbConnection that points to the AdventureWorks database. The next section (inside the <Packages> tag) defines a single package that contains a Dataflow task (the <Dataflow> tag). The Dataflow task contains two components, an OleDb Source and an Union All transformation.

The next step is to take this definition of a package, and actually generate the package from it. To do this, right-click on the Biml file, and choose Expand Biml File from the context menu.

image

A new package will be added to the SSIS Packages folder, named Biml Sample.dtsx. If you review the generated package, you’ll see that it matches up to what was defined in the Biml code.

imageimage

That’s a quick introduction to the Biml functionality in BIDS Helper. In the next article, we’ll set the stage for some more advanced (read: more interesting) uses of Biml, including some scripting.

Insert Parent and Child Tables with SSIS Part 2

Yesterday I posted a simple approach for inserting data into tables with a parent child relationship (like Order Header to Order Detail) where you need to know the generated identity key to tie the records together. That approach relied on having a business value that could be used to consistently tie the parent and child data together. I ended that post with a question – what happens if you don’t have that value?

One example of this might be an order feed that comes from external company. Each row contains the order header and order detail information. Each of these rows should be broken up and inserted into two tables, one for the header data, the other for the detail data. Assuming that there is no business value that can tie these together, you need to get the identity key for the parent row as soon as it is inserted.

One approach for this involves retrieving a result set using an Execute SQL task, and shredding it with the For Each Loop. This approach does have some downsides – please review the Pros and Cons section below for more details. Within the For Each loop, two Execute SQL Tasks are used – one to insert the header row and retrieve it’s ID value, the other to insert the child row.

image

Retrieving the result set in the initial Execute SQL task is straightforward – just retrieve all the order rows. You can use a For Each loop to shred the retrieved result set by selecting the For Each ADO enumerator, and mapping the columns in the result set to variables. The For Each will run once per row in the result set, setting the variable values to match the current row.

SNAGHTMLf0da024

The next Execute SQL task is a little more interesting. It inserts the header row and uses the T-SQL OUTPUT clause to return the identity value for that row.

INSERT INTO parent (order_date, customer_id)
        OUTPUT INSERTED.order_id
        VALUES (GETDATE(), ?)

If you aren’t using a database engine that supports OUTPUT or similar functionality, you can also execute a second SQL statement in the same task to get the identity value using @@IDENTITY (be aware that there are some constraints when using this), SCOPE_IDENTITY(), or the appropriate function for your database.

The Execute SQL task maps the returned identity value to another package variable (order_id in this example).

SNAGHTMLf117a4c

The next Execute SQL task can then use that variable when inserting the child record into the order details table.

Pros and Cons

This approach has the advantage of guaranteeing that your child records are matched to the correct parent records. This can be a big plus if you don’t have a reliable way to match the records when making two passes.

There are a couple of downsides, though. The biggest one is speed. This approach forces you into RBAR (row by agonizing row) mode, so it’s going to be slow, especially compared to a batch load operation. The second problem is that if you are dealing with a lot of columns, creating all the variables and maintaining the mappings isn’t a fun experience.

Overall, I prefer the approach I mentioned in the first article, and I tend to avoid this one if possible. In some cases, even if you don’t have a reliable key, you may be able to work around it by staging some data to a working table or modifying the data source to provide a reliable key.

The sample package for this post is on my SkyDrive.

Inserting Parent and Child Tables with SSIS

A relatively frequent occurrence in data integration is the need to insert data into a parent table (like Order Header) and insert related records into a child table (Order Details). If the data is already populated with appropriate keys, and you are just copying it, this isn’t too complex – just copy the parent table first, then the child. What if the new tables use identity keys, though? You need to get the new identity key for each header row before you can insert the child row. This post is going to walk through one pattern for doing this, and I’ll show an alternate approach in my next post.

The first approach assumes that you have some common piece of information to link the data. In this case, often the simplest approach is to use two Data Flow tasks, run in sequence. This is my preferred solution to this problem, because it’s fast and it’s usually straightforward to implement.

The first Data Flow loads the parent records, and the second loads the child records. Note that the second Data Flow can’t run until the first succeeds.

image

The first Data Flow is pretty straight forward, and simply retrieves the parent (order header) data and inserts it into the parent table. The most important item here is that the source component retrieves the right data – that is, one row per order header, and that it includes some information that can be used to uniquely identify the order. In the sample package I’ve linked to below, you’ll see that the source of the order records is a single table, where a given row includes both header information and the detail. The source query for the data flow selects and groups on customer ID, as that uniquely identifies the order in this scenario (one order per customer, per day).

The second data flow retrieves the order detail for the same source table. It then uses a Lookup transform to retrieve the correct order ID (the identity key) from the parent table. The Lookup just needs enough data to make a unique match – in this case, that’s the current date and the customer id.

image

That’s really all there is to the simple pattern. You can find a sample package that illustrates this on my SkyDrive. But there can be more complex scenarios where you still need to handle a Parent / Child insert. For example, what if there is no reliable key to tie the order detail rows to the order header? In this case you can’t use the lookup. Stay tuned for the next post, where I’ll discuss a different pattern that can handle this scenario, but involves some tradeoffs.