Archive for May 2010

Moving SSIS Packages with ADO.NET Destinations Between 2008 R2 and 2008

As noted by Matt Masson from the Integration Services team, not much has changed in SSIS 2008 R2. In fact, R2 is pretty much identical, with the exception of the ADO.NET Destination. So if you are developing packages, you can build them using the 2008 or 2008 R2 version of BIDS, and they can be used in either environment successfully, as long as the package doesn’t have an ADO.NET Destination. If it does, a package developed in the 2008 R2 version of BIDS will give errors when you open on a machine with the 2008 version of SSIS installed. This is because the 2008 version of the ADO.NET Destination doesn’t know what to do with the new property added to the R2 version.

This wouldn’t be a big deal, if you could have side-by-side installs of BIDS 2008 and BIDS 2008 R2, but the install for R2 replaces the 2008 version of BIDS. So, if you need to move packages developed in 2008 R2 to 2008 (a common scenario for me), you have to do a little extra work. There are a few values that need to be removed or changed by directly editing the package XML.

As always, it’s a good idea to make a backup of your package before editing the XML directly.

Open the package in your favorite text or XML editor, and look for the section that contains <components>. Underneath that, you need to locate the <component> tag that relates to your ADO NET Destination (the sample below is easy, since the name is “ADO NET Destination”, but that’s not the typical case (you do give your components meaningful names, right?).

<component id="16" name="ADO NET Destination" componentClassID="{2E42D45B-F83C-400F-8D77-61DDE6A7DF29}" description="Writes to a database using ADO.NET provider." localeId="-1" usesDispositions="true" validateExternalMetadata="True" version="1" pipelineVersion="0" contactInfo="Writes to a database using ADO.NET provider.;Microsoft Corporation; Microsoft SqlServer v10; © 2007 Microsoft Corporation; All Rights Reserved; http://www.microsoft.com/sql/support;1">
  <properties>
    <property id="23" name="TableOrViewName" dataType="System.String" state="default" isArray="false" description="The Destination database table name." typeConverter="" UITypeEditor="Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version= 10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" containsID="false" expressionType="Notify">"sample"</property>
    <property id="24" name="BatchSize" dataType="System.Int32" state="default" isArray="false" description="Specify the number of rows per batch. The default is 0 which uses the same size as SSIS internal buffer. " typeConverter="" UITypeEditor="" containsID="false" expressionType="Notify">0</property>
    <property id="25" name="CommandTimeout" dataType="System.Int32" state="default" isArray="false" description="The number of seconds before a command times out. A value of 0 indicates infinite time-out. " typeConverter="" UITypeEditor="" containsID="false" expressionType="Notify">30</property>
    <property id="26" name="UseBulkInsertWhenPossible" dataType="System.Boolean" state="default" isArray="false" description="Specifies whether to use the SqlBulkCopy interface to improve the performance of bulk insert operations. Only certain providers support this interface." typeConverter="" UITypeEditor="" containsID="false" expressionType="Notify">true</property>
    <property id="27" name="UserComponentTypeName" dataType="System.String" state="default" isArray="false" description="" typeConverter="" UITypeEditor="" containsID="false" expressionType="None">Microsoft.SqlServer.Dts.Pipeline.ADONETDestination, Microsoft.SqlServer.ADONETDest, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91</property>
  </properties>

Once you’ve located the right <component> tag, you need to make two changes:

One, change the version=”1” attribute in the <component> tag to version=”0”.

<component id="16"
           name="ADO NET Destination"
           componentClassID="{2E42D45B-F83C-400F-8D77-61DDE6A7DF29}"
           description="Writes to a database using ADO.NET provider."
           localeId="-1"
           usesDispositions="true"
           validateExternalMetadata="True"
           version="0"
           pipelineVersion="0"
           contactInfo="Writes to a database using ADO.NET provider.;Microsoft Corporation; Microsoft SqlServer v10; © 2007 Microsoft Corporation; All Rights Reserved; http://www.microsoft.com/sql/support;1">

Second, remove the entire <property name=”UseBulkInsertWhenPossible> element. You can comment it out, as shown below, or just delete it.

    <property id="25" name="CommandTimeout" dataType="System.Int32" state="default" isArray="false" description="The number of seconds before a command times out. A value of 0 indicates infinite time-out. " typeConverter="" UITypeEditor="" containsID="false" expressionType="Notify">30</property>
    <!--<property id="26" name="UseBulkInsertWhenPossible" dataType="System.Boolean" state="default" isArray="false" description="Specifies whether to use the SqlBulkCopy interface to improve the performance of bulk insert operations. Only certain providers support this interface." typeConverter="" UITypeEditor="" containsID="false" expressionType="Notify">true</property>-->
    <property id="27" name="UserComponentTypeName" dataType="System.String" state="default" isArray="false" description="" typeConverter="" UITypeEditor="" containsID="false" expressionType="None">Microsoft.SqlServer.Dts.Pipeline.ADONETDestination, Microsoft.SqlServer.ADONETDest, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91</property>

If you have a lot of packages to move back and forth, you probably don’t want to hand edit this each time. I wrote a little application that will process a folder of .dtsx files, and strip the 2008 R2 information out. This is done using an XSLT transform (available in the source code). The application and source are available to download from my SkyDrive.

Speaking at the Carolina Code Camp

I’ll be presenting a couple of sessions at the Carolina Code Camp on Saturday, May 15th. This will be my second time presenting at the Carolina Code Camp, and I’m looking forward to it. Last time, I had great audience participation, and the event was well organized and fun.

I’ll be presenting on “Processing Flat Files with SSIS” at 8:45 AM and “Unit Testing SSIS Packages” at 12:30 PM. Looking forward to both – it should be fun.

If you are in the area, feel free to drop in – there is no charge for the Code Camp.

T-SQL Tuesday #006 – LOBs in the SSIS Dataflow

The topic for T-SQL Tuesday #006 is LOB data, selected by this month’s host, Michael Coles. If you aren’t familiar with T-SQL Tuesdays, Michael has a nice summary in his post.

In database terms, LOBs are Large OBjects, also referred to as BLOBS (Binary Large OBjects). These are data types that can exceed the maximum row size of 8 KB in SQL Server. Most often, they are used for storing large amounts of text or binary data in the database. An example of this would be an application that stores documents in a database table. SSIS has some special features for handling LOBs in the dataflow, and this post will provide an overview of them and provide a few tips on using them wisely, as well as some approaches to keep them from killing your data flow performance.

Data Types for LOBs

LOBs are represented by three types in SSIS: DT_TEXT, DT_NTEXT, and DT_IMAGE. These map to the SQL Server types VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX). Each of these types can hold 2,147,483,647 bytes, or almost 2GB of data. Columns with these data types have some special rules.

First, if you want to use them in a Derived Column expression, you’ll probably need to cast it to another type, like DT_STR or DT_WSTR, as the LOB types aren’t supported by most of the built-in functions. One thing to be careful of with this is that the entire contents of the LOB may not fit in the data type you are casting to, which will cause a truncation error. You can either set the Derived Column transformation to ignore truncation errors, or redirect error rows so you can handle the longer values differently.

If you are accessing LOBs in a Script component, you have to use the AddBlobData(), GetBlobData(), and ResetBlobData() methods on the column to get to the data.

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    System.Text.UnicodeEncoding encoding= new System.Text.UnicodeEncoding();

    Row.LOBNTEXT.AddBlobData(encoding.GetBytes("My Test String"));
    byte[] bytes = Row.LOBNTEXT.GetBlobData(0, (int)Row.LOBNTEXT.Length);
    string myString = encoding.GetString(bytes);
}

Note that you should use the System.Text.AsciiEncoding class in the above code if you are working with non-Unicode data.

Transforms for LOB Data

There are two transformations in SSIS specifically designed for working with LOB data: Import Column and Export Column. The Import Column transformation is used to import the contents of a file from disk into a column in the data flow, as a LOB. The Export Column is used to do perform the reverse operation – that is, it saves the contents of a LOB column to a file on disk. Since you can process one file per row, these transformations can be really useful for moving multiple files in and out of a database.

There’s a few good walkthroughs on using these transformations, so rather than reposting the same information, here’s a few links:

The Import Column Transformation, by James Beresford (aka BI Monkey);

The Export Column Transformation, by James Beresford (aka BI Monkey)

Importing Files Using SSIS, by me

Performance

Because these data types have the potential to hold so much data, SSIS handles them a little differently than the standard data types. They are allocated separately from regular data in the buffers. When there is memory pressure, SSIS spools buffers to disk. The potential size of LOB data makes it very likely to be spooled, which can be a pretty major performance bottleneck. To minimize it, you want to make sure that the LOB data is being spooled to a nice, fast disk that doesn’t have a lot of contention. To control what disk they are spooled to, you can set the BLOBTempStoragePath property on the Data Flow task. Note that this is a different setting than BufferTempStoragePath, which controls where regular data is spooled. Ideally, you want to make sure each of these settings points to a different physical disk.

The other item to consider is whether you actually need LOB data. In some scenarios, the LOB data types are used for convenience or because there is a possibility that a small percentage of the data would exceed the maximum length of a regular data type. If you are processing a lot of data with LOB types, and most of the LOB data is small enough to fit in an regular data type, then you may be better off using a two pass approach. In one pass, select only the data with LOB values small enough to fit in a standard data type, and cast it in the SELECT statement. SSIS will process this set of data using the normal buffer approach, which should be faster. Then, do a second pass where you select only the data that is too large for a regular data type. This will minimize the amount of data that has to be processed using LOB data types. To figure out the length of a LOB column in SQL Server, you can use the DATALENGTH function.

SELECT DATALENGTH(MyLOBColumn)
FROM MyLOBTable

As usual with performance, your mileage may vary on this one. It’s worked well for me in a few scenarios, but there’s some overhead involved in using two passes. So, as with any performance tips, test it in your environment, with a solid representative sample of your data, before implementing it in production.

There it is – a whirlwind tour of LOBs in SSIS. If you have any questions or comments, please post them.