Archive for June 2009

Quick Tip for Specifying Columns in Source Components

It’s generally accepted* that when you are using an OLE DB Source or an ADO.NET Source, you should specify all the columns, rather than selecting the table from the dropdown, or using a SELECT * SQL statement.

If you’re like me, then you probably don’t have the columns in all your tables memorized, and you don’t like typing, so getting a complete list of columns is a bit painful. You either have to use the query designer to create the whole query, or go to SSMS and create the SELECT statement from there. However, there is a simpler way. Just type “SELECT * FROM [your table here]” into the source component’s SQL Command box, and then click the Build Query button. The query designer will automatically expand the * into the specific column names. Just click OK as soon as the designer opens, and you end up with a fully populated column list.

 

*For some of the reasons why this is generally accepted from an SSIS perspective, see these posts from Jamie Thomson: http://blogs.conchango.com/jamiethomson/archive/2006/02/21/SSIS_3A00_-SELECT-_2A002E002E002E00_-or-select-from-a-dropdown-in-an-OLE-DB-Source-component_3F00_.aspx and https://blogs.conchango.com/jamiethomson/archive/2006/01/05/2554.aspx (point #4). If you want reasons from a general SQL standpoint, search for “never use ”SELECT *” in SQL” in your favorite search engine.

Presenting at Midlands SQL PASS Chapter

I’ll be presenting at the Columbia, SC PASS user group on July 7. I’ll be speaking on “Getting Started with Analysis Services 2008”, which is a demo heavy introduction to using SQL Server Analysis Services 2008. If you want to use SSAS, but aren’t sure quite where to start, this session should help.


If you are in the Columbia area on July 7, please drop by. You can find more information here:

http://midlands.sqlpass.org/

Warning about Using String Variables in Derived Column Expressions

I ran across an interesting behavior in SSIS this week, and thought it was worth sharing. Occasionally, I’ll have the need to use the value of a variable (set in the control flow) in the data flow. The typical way to accomplish this is to use a Derived Column transformation to introduce the variable’s value into the data flow. However, you need to be cautious if you are doing this with a String variable.

When you are working with Derived Column expressions, the output type of the Derived Column expression is based on the input values. For example, inputting an expression like 1+2 will result in an output column data type of DT_I4 (an integer). Basically, the editor guesses the data type based on the input values. Generally, this is OK – if you are referencing other data flow columns, or variables that aren’t of type string, the lengths of the values are static. However, when the editor calculates the length of string variables, it uses the current value in the variable. So, if your variable has 3 characters, the output data type only expects three characters. If the value in the variable is static and never changes, this is fine. But if the value is updated regularly (perhaps it’s a variable in a template package) or the value is set through an expression (so that the value changes at runtime), this can cause truncation errors in the package.

I created a sample package that demonstrates this. The package is pretty simple: a single Data Flow task, with an OLE DB Source, a Derived Column transformation, and a Union All (just there to terminate the data flow).

image

There is a single variable named TestVar that contains the value “1234”.

image

In the Derived Column transformation, there is a single new column added. Note that the calculated length for the output column is 4, matching the value “1234” in the variable.

image

If the package is run right now, everything works fine. But that’s not the case if you change the value in the variable to “123456”.

image

Running the package after changing the value results in the following error:

[Derived Column [91]] Error: The "component "Derived Column" (91)" failed because truncation occurred, and the truncation row disposition on "output column "ColumnFromStringVar" (289)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.

This occurs because the metadata in the Derived Column transformation isn’t updated when the variable’s value is changed. So, to avoid seeing this error occur in your packages, you need to explicitly set the output column’s length.

In 2005, you could change the calculated data type by editing the data type, length, precision, and scale fields. In 2008, however, the values are locked. You can change the data type by going into the advanced editor for the Derived Column. However, it’s easier to simply cast the string value, to force the Derived Column editor to treat it as if it has constant length.

image

By using this approach, as long as your string variable’s value is less than 50 characters, the Derived Column will continue to work. It’s best to set the length of the cast to the same value as the destination column’s length.

There’s a Connect submission on improving this behavior, either by updating the pipeline’s metadata as the string variable’s value changes, or by throwing a validation warning or error if the current value of the variable exceeds the output length in the Derived Column transformation. If you agree that this could use some improvement, you can vote here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=470995

In the meantime, I’d highly recommend performing an explicit cast in any Derived Column expression that uses a string variable, particularly if the value is subject to change.

The sample package is on my SkyDrive, if you’d like to see the error in action.