Posts tagged ‘Script’

Presentations at Atlanta’s BI Edition SQL Saturday

I presented this weekend at SQL Saturday #477 in Atlanta. It was a great event, very well organized. I appreciate all the attendees at my sessions – there were some great questions and comments. I promised that I’d publish my slides and sample code, so here it is.

Getting Started with SSIS Script Tasks and Components

This session was an introduction to the scripting objects in SSIS, and how they can be used to extend the built in functionality. Download the files here.

Testing Data and Data-Centric Applications

This session was on testing data-centric applications, both during development and how you can continue validating your data in production. Download the files here.

Thanks again to eveyone who attended!

Advanced Scripting in SSIS

Last week I presented a session on Advanced Scripting for SSIS for Pragmatic Works. Thanks to everyone who attended, and all the great questions. I’ve had a few requests for the samples I used, so I wanted to make those available. You can download them from my OneDrive.

Naming Columns for the Script Component

Do you use whitespace or special characters in your column names? Most people don’t, because of the additional headaches it creates. You have to delimit the column names, come up with a work-around for tools that don’t support column names with special characters, etc. Underscores, however, are used pretty extensively in place of spaces. If you are using Script Components in SSIS, though, you may encounter an all-new headache with special characters or even underscores in your column names.

When you use a script component in SSIS, it generates some .NET code for you automatically, based on the metadata in the pipeline connected to the script component. However, when this code is generated, SSIS strips out any whitespace or special characters from the names of inputs, outputs, and columns. It only retains the letters and numbers (alphanumeric characters) in these names.

Here’s some examples of column name issues that I’ve run into with scripts (and while these specific items are made up, they represent real-world scenarios I’ve encountered – there’s some really horrible naming approaches out there):

Original Column Name Script Column Name
Account Account
Account# Account
Account Number AccountNumber
Account_Number AccountNumber
TI_TXN_ID TITXNID
TI_TXNID TITXNID

 

As you can see, once the alphanumeric characters have been stripped from these column names, they are no longer unique. That can pose a few problems in your script code. What’s worse, because this code is auto-generated by SSIS, you can’t fix it without changing the column names in the data flow, even though this is really purely a script thing (and not even a .NET limitation – underscores are perfectly valid in .NET naming). What’s even worse than that – you don’t get an error till the binary code is recompiled.

So, if you are working with script components, make sure all your column names are unique even when all non-alphanumeric characters have been stripped from them. The same thing applies to your output names – they must be unique based only on the alphanumeric characters.

Using OLE DB Connections from Script Tasks

I write scripts on a pretty regular basic, and often need to access database connections from them. It’s pretty easy to do this if you are using an ADO.NET connection. However, if you are using OLE DB, you have to go through a couple of additional steps to convert the connection to an ADO.NET version. Matt Masson posted a great code sample for doing this conversion.

I use a slightly altered version of this code pretty regularly. It’s been modified to support both OLE DB and ADO.NET connections, so that I can switch connections without having to change the script code.

To use it, you need to add a reference in your script project to Microsoft.SqlServer.DTSRuntimeWrap. Then, add the following to the usings section at the top of the script:

using System.Data.Common;
using Wrap = Microsoft.SqlServer.Dts.Runtime.Wrapper;

For the code to get the connection, use the following:

ConnectionManager cm = Dts.Connections["MyConnection"];
DbConnection conn = null;
if (cm.CreationName == "OLEDB")
{
	Wrap.IDTSConnectionManagerDatabaseParameters100 cmParams =
	cm.InnerObject as Wrap.IDTSConnectionManagerDatabaseParameters100;
	conn = cmParams.GetConnectionForSchema() as DbConnection;
}
else
{
	conn = cm.AcquireConnection(null) as DbConnection;
}

if (conn.State == ConnectionState.Closed)
{
	conn.Open();
}

// TODO: Add your code here

conn.Close();
Dts.TaskResult = (int)ScriptResults.Success;

You can use the “conn” object to perform actions against the connection. Since it’s using the common DBConnection interface, you can use it against any database connection that you have an ADO.NET provider for (which includes OLE DB providers).

Passing an Object from a Parent Package To a Child

Occasionally, you may run into the need to pass values between packages. In most cases, you can use a Parent Package Variable configuration to pass the value of a variable in the calling (or parent) package to the called (or child) package. However, Parent Package Variable configurations can’t be set up on variables of type Object. Fortunately, you can work around this pretty easily, thanks to the fact that the calling package variables are inherently accessible in the called packages. 

I’ve set up a sample parent and child package to illustrate this. The parent package is straightforward. It uses an Execute SQL task to populate an object variable named TestVar with a recordset object, and then calls the child package.

image image

The child package has a Foreach Loop Container to iterate over the recordset object. It has a Script task that is used to copy the parent package’s variable (TestVar) to a local variable named LocalVar. This is the variable that the Foreach Loop is configured to use. Why copy the value? If you don’t have a local variable to reference in the Foreach Loop, it won’t validate properly.

image image

The script in Copy Variable is pretty simple. It relies on the fact that you can reference parent package variables inherently, as they are included in the collection of variables accessible in the local package. The script just copies the value from one variable to the other, so that the Foreach Loop will have something to do.

public void Main()
{
    Variables vars = null;
    Dts.VariableDispenser.LockForWrite("User::LocalVar");
    Dts.VariableDispenser.LockForRead("User::TestVar");
    Dts.VariableDispenser.GetVariables(ref vars);

    vars["User::LocalVar"].Value = vars["User::TestVar"].Value;
    vars.Unlock();

    Dts.TaskResult = (int)ScriptResults.Success;
}

Please note that for this to work, you cannot have a variable in the child package with the same name as the variable in the parent package. If you do, the local variable will hide the parent variable. Outside of that, this works really well for passing object values between packages. The same technique can also be used in reverse to send values back to the parent package, if you have that need.

The sample has been uploaded to my Skydrive. Let me know if you have any questions.

Code Snippets for SSIS 2008

If you develop in Visual Studio using C# or VB, then you are probably familiar with code snippets. They are little chunks of code that you can insert into your application and save you some typing. For example, if you are writing a C# application, and you type “if” and hit TAB twice, the following will be inserted into your code:

image

You can enter in the value to be evaluated, and then type in the rest of your code. I like code snippets because they save me from typing the same things over and over again. However, in SSIS 2005 scripting, code snippets weren’t available, due to the VSA environment used by the script objects. Now that we have a real scripting environment in SSIS 2008, code snippets can be used, although there are a few things you need to know.

Two of the most common activities I do in SSIS scripts are reading from and writing to variables. So, I put together a few code snippets that do this. Here’s what the snippet for reading a value from a variable looks like:

image

To create a snippet, you define the code snippet in an XML format, and save it to a file with a .snippet extension. I’m not going to post the code for the snippet here, as it is fairly verbose, but you can look at the .snippet file in any text editor. Once you have the file, you need to put it in a specific location for the Visual Studio environment to recognize it. This is where it can get a little confusing. Normally, snippets go into the “C:\Users\<user>\Documents\Visual Studio 2008\Code Snippets\<language>\My Code Snippets” folder. However, this is the location used for the full Visual Studio environment. Since the script editors in SSIS use the Visual Studio Tools for Applications environments, they get their own folder. In fact, there are separate folders for the Script Task and the Script Component. Code snippets for the Script Task go into the “C:\Users\<user>\Documents\Integration Services Script Task\Code Snippets\<language>\My Code Snippets” folder, and snippets for the Script Component go into the “C:\Users\<user>\Documents\Integration Services Script Component\Code Snippets\<language>\My Code Snippets” folder.

This actually works out well, because the code for the snippets is slightly different between the Script Task and Script Component. This is due to the object references being slightly different between the two.

I’ve provided two code snippets for C# for reading from and writing to variables in the Script Task, and the same two in C#, but altered to work in the Script Component. To use them, you can simply copy them to the above folders, and then type “SSISReadOneVariable” or “SSISWriteOneVariable” as a shortcut, followed by two tabs to insert them. You can change the shortcut in the snippet file if you’d like to use something else. It also wouldn’t be difficult to convert them to VB, if that’s your preference. The snippets are located on my SkyDrive. If you have ideas for other useful snippets, post a comment. Enjoy!

Calculating a Running Total Part 2

To finish up the topic that I started in my previous post, I wanted to post a more complex example of running total calculations. This is based off a request on the MSDN forums, for a running total calculation that would “slide” along as additional rows were processed. In the poster’s example, the running total needed to reflect the total for the last 10 weeks.

To show how to accomplish this, I’m going to update the same package that was used in the previous example.

image

The script component has the same RunningTotal column defined on the output:

image

The primary difference is in the script itself. Since I want to keep a list of the previous 10 records, I am using a Queue object, which is part of the System.Collections.Generic namespace. This object lets you easily remove the oldest item from the queue, so it works very well for this example. In the script, if the queue has more than 10 items, the oldest item is removed prior to adding a new one. The running total is produced by iterating through all the items in the collection, and storing the result in the RunningTotal output column.

Public Class ScriptMain
    Inherits UserComponent
    Dim runningTotalCollection As Queue(Of Integer) = New Queue(Of Integer)
 
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        If runningTotalCollection.Count >= 10 Then
            runningTotalCollection.Dequeue()
        End If
 
        runningTotalCollection.Enqueue(Row.OrderQty)
 
        Dim x As Integer
        Dim sumX As Integer
        For Each x In runningTotalCollection
            sumX += x
        Next
 
        Row.RunningTotal = sumX
    End Sub
 
End Class

This is another example of the flexibility of the Script component. It also helps highlight that by using the Script component, you can store the values from previous rows, and then reference them later. The example package is posted on my Skydrive.

Calculating a Running Total In SSIS

Running totals are a common calculation for reporting, and are occasionally needed in ETL processes to populate aggregate tables. One way to accomplish this in SSIS, if you are retrieving data from a relational database, is to do the running total in the SELECT statement. There are a number of resources on the Internet that illustrate how to accomplish this. However, this may not work for you if you are retrieving information from a flat file. Also, the logic to implement a running total may not be straightforward in SQL, particularly if you need sliding windows for the totals (I’ll explain this later).

Fortunately, there are other ways to accomplish this in SSIS. A very flexible method is to use a script component, as this will let you customize the logic however you prefer. As an example, I’ll take a look at a couple of examples from AdventureWorks.

First, the more straightforward example. The AdventureWorks database has a Sales.SalesOrderDetail table that lists the itemized purchases for each order. Note – I’m only filtering the results for example purposes. The actual code in the package works with the whole table.

image

If I was to calculate a running total for the OrderQty for each SalesOrderID above, I’d expect to see:

SalesOrderID ProductID OrderQty RunningTotal
43685 765 3 3
43685 763 1 4
43685 754 1 5
43685 725 1 6
43686 758 3 3
43686 762 1 4
43686 770 1 5
43687 768 1 1
43687 765 2 3

Notice that the running total resets each time the SalesOrderID changes. To implement a script component that will calculate the running total, I’ve created a new script component and added an output column to it to hold the running total.

image

Inside the script component, I’ve added a variable to store the value of the control number (SalesOrderID), and another variable to capture the running total. When the control number changes, the running total is reset.

Public Class ScriptMain
    Inherits UserComponent
    Dim runningTotal As Integer
    Dim previousValue As Integer
 
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        If Not previousValue = Row.SalesOrderID Then
            runningTotal = 0
            previousValue = Row.SalesOrderID
        End If
 
        runningTotal += Row.OrderQty
 
        Row.RunningTotal = runningTotal
    End Sub
 
End Class

This is an easy method to calculate running totals in the data flow. One thing to note is that the data must be sorted by the control number (SalesOrderID in the example) in order for this to work properly.

This is in an example package on my SkyDrive here. I will follow up this post with another sample showing a more complex "running total" problem and how it can be solved in SSIS as well.

See if a File is Locked

A question that comes up occasionally on the forums is how to determine if a file is being written to by another process. It can cause issues if a file is partially written when a source component starts extracting data from it.

There is a way to check this through a Script Task. The script below attempts to lock the file for writing. If the file is be written to be another process, the attempt to lock it will fail. If the file is available, the attempt will succeed.

Public Sub Main()
    Dim vars As Variables
    Dim file As System.IO.FileStream
 
    Try
        file = System.IO.File.Open(Dts.Connections("FlatFile").ConnectionString, System.IO.FileMode.Open, System.IO.FileAccess.Write)
    Catch ex As Exception
        Dts.VariableDispenser.LockOneForRead("FileLocked", vars)
        vars("FileLocked").Value = True
        vars.Unlock()
    Finally
        If Not file Is Nothing Then
            file.Close()
        End If
    End Try
 
    Dts.TaskResult = Dts.Results.Success
End Sub

The package needs to have a connection manager that points to the file you want to check (named "FlatFile" in the script above), and a variable ("FileLocked") that is set to True if the file is locked. The variable should be set to False by default. The variable can be used in a precedence constraint to control whether other tasks execute.

Renaming the Script Project in SSIS 2008

I found something interesting in SSIS 2008 today, and thought I’d share it. I was working with David Darden on a package that included a script task. In 2008, the Visual Studio Tools for Applications (VSTA) environment is used to edit script tasks. It exposes a few more details about the script task, including the project name. By default, it creates a rather long project name, as you can see in the screen shot below. (Yes, that is C#. It’s my preferred language, so expect to see more of it here now that it’s available in 2008). The namespace is based on the project name, so you get the same long value for it.

image

While poking around in the script task’s expressions editor, we found the ScriptProjectName property. This property does not show up in the Properties window (F4), only in the expressions editor. By setting this, you can update the project name in the script. For example, setting this property to "NewScriptProjectName", as shown here:

image

results in this showing up in the script project:

image 

There are a few caveats to this. One, if you do this after initially editing the script, the project name will be updated, but the namespace will remain the same. Two, you must save the package after setting the expression, in order for the expression to be applied. If you add the expression, then edit the script without saving, it will not use the new project name.

So, if you want to name your script project with a friendly name, do the following:

  1. Add the Script Task.
  2. Add an expression to set the ScriptProjectName property to your desired value.
  3. Save the package.
  4. Now you can use the Edit Script button to edit the script project with the friendly name.

I do want to point out that the name of the project has no real impact on functionality. The script will not run any differently if you do this. It’s purely for aesthetic purposes. But if the long, random looking project names bother you, it’s nice to know you can fix them. There is also a MSDN forum post here that details the steps to edit the namespace after the script has been edited for the first time. It’s a little easier to set it this way, though, so that it is handled up front.