Archive for June 2008

Continuing a Loop After an Error

A common question on the SSIS forums is how to continue a loop after an error occurs. For example, you might have a For Each Loop container that iterates through a set of files in a folder. If one file is invalid, you want to continue processing the remaining files. Fortunately, there is a simple pattern that allows this to be accomplished.

I created a package to illustrate this. Basically, the package implements the scenario above. It consists of a For Each Loop to iterate through all the files in a folder. There is a data flow task that processes each input file, and runs it through a row count (just for illustration purposes). The Script task is there to illustrate an error handling task could be attached to the data flow, but it’s empty in the example package. It would need to be replaced with the appropriate task to handle a failure, such as a File System task to move the file to another folder.

image

The key to handling the errors is the Sequence container, or more specifically, how the Sequence container is set up. The OnError event handler for the Sequence container has been enabled. You can do this by selecting the Sequence container, then clicking the Event Handlers tab. Select the OnError event in the drop down, then click the link in the designer to create an empty event handler. It’s not necessary to add anything to event handler.

image

Once it has been created, go to the Variables window, and select the option to show System variables. Select the Propagate variable, and set the value to FALSE. This prevents any errors occurring on tasks inside the Sequence container from being raised to the For Each loop.

image

This is all that really needs to be done, but if you run the package with only this set, you’ll notice that the Data Flow task and the Sequence container both report an error. To prevent the Sequence Container from reporting an error, set the MaximumErrorCount property of the container to 0.

image

Note that you do not have to use a sequence container to accomplish this. You could make the same settings on individual tasks inside the For Each loop. However, the Sequence container gives you a common place to make the settings, and you can have as many tasks inside it as you’d like.

The example package is posted on my Skydrive. The zip file contains a folder call Import. The package expects this folder to be located in C:\Temp. It contains 4 text files for the package to process. The 2nd file contains an error (one of the rows has fewer columns than it is supposed to), so that you can see the error behavior.

Configuration Changes in SSIS 2008

In a previous post, I detailed some of the challenges that I had encountered when trying to modify the connection string used for SQL Server configurations from the command line. In SSIS 2005, command line options were applied after the configurations were loaded (with the exception of parent package variables). Effectively, this meant that you could not change the location a configuration pointed to from the command line.

This has been changed in SSIS 2008. Configurations are now loaded twice, once when the package is originally loaded, and then again after any command line options are applied. The order of events is:

  1. The package file is loaded.
  2. The configurations specified in the package at design time are applied in the order specified in the package (except for Parent Package Variables).
  3. Command line values are applied.
  4. The configurations specified in the package at design time are reloaded in the order specified in the package (except for Parent Package Variables). The configurations will use any updated values from the command line when they are reloaded, so it is possible to change the connection string for a SQL Server configuration.
  5. Parent Package Variable Configurations are applied.
  6. The package is run.

This is a nice improvement, since you can now set up a package with SQL Server configurations, and point the package to the correct database are runtime by simply using the /CONN switch of DTEXEC. However, it still isn’t perfect . Since the design time configuration is applied twice, a value that you apply using /SET on the command line can be overridden by a preexisting configuration.

If you’d like to see this behavior, I’ve provided a simple package. The package has a single variable (“TestVar”) defined. It has an XML configuration enabled that sets the value of TestVar to “Dev”. There is a single Script Task that fires an information event that contains the value of the variable.

public void Main()
        {
            bool fireAgain = false;
            Variables vars = null;
            Dts.VariableDispenser.LockOneForRead("TestVar", ref vars);
            Dts.Events.FireInformation(0, "", "Value is: " + vars["TestVar"].Value, "", 0, ref fireAgain);
            vars.Unlock();
            Dts.TaskResult = (int)ScriptResults.Success;
        }

If I run this package using this command:

dtexec /file Configuration_Overwrite.dtsx /SET \Package.Variables[User::TestVar].Properties[Value];Prod /REP EWI

I’d expect to see the value “Prod” reported in the output. Instead, “Dev” is displayed. If you disable the package configurations and re-run it using the same command, it does report “Prod”.

Overall, I prefer the new behavior, but it does introduce a new issue to be aware of. In general, you need to be careful to not /SET a value that is also specified in a configuration, as it will be overwritten. I’ve filed a Connect suggestion for the addition of /SetBefore and /SetAfter switches to DTEXEC, so that you can explicitly define when you want the /SET applied. If you think it’s a good idea, please vote for it.

The sample package is on my SkyDrive. You may need to update the location of the XML configuration, but it has no other dependencies.

Thanks to Cho Yeung on the SSIS team for clarifying the load order when the package is executed.

Mariner Wins Microsoft Performance Management Partner of the Year

This is really cool – Mariner (my employer) was awarded the Microsoft Performance Management Partner of the Year award this year, based on a solution that we created for the Charlotte-Mecklenburg Schools (CMS) system. When I was first told about the award, I asked, “It’s for the Southeast region, right?”. The answer: “No, think bigger.” “The US? Very cool!” “No, bigger than that.” Yes, it’s the Worldwide Performance Management Partner of the Year – selected from over 2000 entries.

So what is this award winning solution? It’s a performance management system that allows school systems to track how well they are progressing on their long term goals for improving the learning environments in schools. This was an interesting solution to work on – most of my experience has been in applying business intelligence to more traditional business solutions. While there are some similarities, the core focus is very different. Success for a school system isn’t judged on profits, it’s based on how well the children are educated. Since I have one child already in the CMS system, and another one going in soon, there was much more of a personal impact from seeing what they were doing, and their approach to meeting their goals.

It was also interesting from a technology standpoint. The technology for the solution involves most of Microsoft’s BI stack – SQL Server for the data repository, SSIS for the data integration, SSAS for analysis, SSRS for some reporting, and Sharepoint + PerformancePoint Server for the front-end display of the analytic information. It’s pretty impressive seeing how Microsoft’s BI story has really come together over the last couple of years, and actually getting the chance to put all these pieces in action with each other was great (and challenging in a few spots :) ). The team that did the development work did a great job with it, and it really shows in the end product.

Updated Information on 64-bit Considerations for SSIS

Douglas Laudenschlager has posted some good information about the considerations you need to make if you are running SSIS on a 64-bit box. This is additional information to what is currently in Books Online, and it’s well worth looking at. Considering that a lot of developers are developing on x86 machines, and deploying to x64 servers, it’s even more valuable. I currently do all my development on a x64 machine, but since the development environment (BIDS) is 32-bit only, some of these items still have impact.

64-bit Considerations for SQL Server Integration Services – Douglas Laudenschlager – Dougbert.com

Calculating a Running Total Part 2

To finish up the topic that I started in my previous post, I wanted to post a more complex example of running total calculations. This is based off a request on the MSDN forums, for a running total calculation that would “slide” along as additional rows were processed. In the poster’s example, the running total needed to reflect the total for the last 10 weeks.

To show how to accomplish this, I’m going to update the same package that was used in the previous example.

image

The script component has the same RunningTotal column defined on the output:

image

The primary difference is in the script itself. Since I want to keep a list of the previous 10 records, I am using a Queue object, which is part of the System.Collections.Generic namespace. This object lets you easily remove the oldest item from the queue, so it works very well for this example. In the script, if the queue has more than 10 items, the oldest item is removed prior to adding a new one. The running total is produced by iterating through all the items in the collection, and storing the result in the RunningTotal output column.

Public Class ScriptMain
    Inherits UserComponent
    Dim runningTotalCollection As Queue(Of Integer) = New Queue(Of Integer)
 
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        If runningTotalCollection.Count >= 10 Then
            runningTotalCollection.Dequeue()
        End If
 
        runningTotalCollection.Enqueue(Row.OrderQty)
 
        Dim x As Integer
        Dim sumX As Integer
        For Each x In runningTotalCollection
            sumX += x
        Next
 
        Row.RunningTotal = sumX
    End Sub
 
End Class

This is another example of the flexibility of the Script component. It also helps highlight that by using the Script component, you can store the values from previous rows, and then reference them later. The example package is posted on my Skydrive.

Calculating a Running Total In SSIS

Running totals are a common calculation for reporting, and are occasionally needed in ETL processes to populate aggregate tables. One way to accomplish this in SSIS, if you are retrieving data from a relational database, is to do the running total in the SELECT statement. There are a number of resources on the Internet that illustrate how to accomplish this. However, this may not work for you if you are retrieving information from a flat file. Also, the logic to implement a running total may not be straightforward in SQL, particularly if you need sliding windows for the totals (I’ll explain this later).

Fortunately, there are other ways to accomplish this in SSIS. A very flexible method is to use a script component, as this will let you customize the logic however you prefer. As an example, I’ll take a look at a couple of examples from AdventureWorks.

First, the more straightforward example. The AdventureWorks database has a Sales.SalesOrderDetail table that lists the itemized purchases for each order. Note – I’m only filtering the results for example purposes. The actual code in the package works with the whole table.

image

If I was to calculate a running total for the OrderQty for each SalesOrderID above, I’d expect to see:

SalesOrderID ProductID OrderQty RunningTotal
43685 765 3 3
43685 763 1 4
43685 754 1 5
43685 725 1 6
43686 758 3 3
43686 762 1 4
43686 770 1 5
43687 768 1 1
43687 765 2 3

Notice that the running total resets each time the SalesOrderID changes. To implement a script component that will calculate the running total, I’ve created a new script component and added an output column to it to hold the running total.

image

Inside the script component, I’ve added a variable to store the value of the control number (SalesOrderID), and another variable to capture the running total. When the control number changes, the running total is reset.

Public Class ScriptMain
    Inherits UserComponent
    Dim runningTotal As Integer
    Dim previousValue As Integer
 
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        If Not previousValue = Row.SalesOrderID Then
            runningTotal = 0
            previousValue = Row.SalesOrderID
        End If
 
        runningTotal += Row.OrderQty
 
        Row.RunningTotal = runningTotal
    End Sub
 
End Class

This is an easy method to calculate running totals in the data flow. One thing to note is that the data must be sorted by the control number (SalesOrderID in the example) in order for this to work properly.

This is in an example package on my SkyDrive here. I will follow up this post with another sample showing a more complex "running total" problem and how it can be solved in SSIS as well.

SSIS Configurations At Design Time

There’s an interesting “behavior” that occurs with configurations at design time. The value of the configuration will be saved in your package as the new stored value. I’d seen this before, but never really thought about it, until a poster on the forums questioned why it was happening. To understand it, you need to know how the package is processed when it is loaded in the design environment.


If you have configurations enabled in the package, then the configuration is applied to the package when it is initially loaded. When a configuration is applied, it updates the target property of the configuration. For example, you might have a package that has the Description property set to “No Configuration”. If you add a configuration that sets the Description property of your package to “Has Configuration”, when the package is loaded, the value specified in the configuration (“Has Configuration”) will overwrite the value currently stored in the Description property (“No Configuration”).


If you then close the package without saving, the change to the property is not saved, and it will reflect the original value (“No Configuration”). If, however, you save the package (and remember, running it in the IDE saves it automatically), the new value (“Has Configuration”) will be saved as the value in the package file.


Normally, this does not cause any issues. If you execute a package in production that has a configuration, it is not saved after execution, so the values of the configurations is not save in the package file. However, it can impact you in the development environment. This is one of the reasons that I recommend setting up configurations even in your development environment. If you do that, then you know that regardless of the value the package was saved with, it will have the value of your local configuration each time it is loaded in the IDE. This is also a good reason never to update a package directly on your production server. When you open it in the IDE on the production server, it will retrieve the production configurations. If you save or run it, it will be saved with the production configuration values. This can cause a number of headaches in troubleshooting problems down the road, so I recommend that you don’t do it.