Archive for October 2007

Scanning a Package For Specific Tasks

I recently had a need to scan through an SSIS package to find all instances of data flow tasks within the package. This is to support a future enhancement in BIDSHelper. However, I ended up making the code a little more general, so it can be used to get a list of all instances of any type of tasks within a package.


Before looking at the code, there is a concept that it is helpful to understand. SSIS packages have a nested structure. Everything is held within a container. Some containers, like the Sequence or ForEach Loop containers, can hold groups of other containers. The package itself is a container. All containers that can hold multiple containers implement the IDTSSequence interface. The TaskHost container, however, holds a single task, and does not implement the IDTSSequence interface. The TaskHost is what we have to find to identify individual instances of tasks.


The function below is designed to be called recursively. It uses the IDTSSequence interface to iterate through the Executables collection, which can hold either TaskHosts or containers that implement IDTSSequence themselves, which is where the recursion comes in. It also takes the type name as a string parameter. The code checks the type name string against the name of the type as returned by the GetType function. It’s simply checking to see if the typeName string exists within the full string returned, so it doesn’t have to be an complete match. This means you don’t have to specify the full namespace, like “Microsoft.SqlServer.Dts.Tasks.ExecuteSQLTask”. Instead, you can just specify “ExecuteSQL”.

        List<DtsContainer> FindExecutablesByType(IDTSSequence sequence, string typeName)
{
string typeNameUpper = typeName.ToUpper();
List<DtsContainer> matchingExecutable = new List<DtsContainer>();
foreach (Executable e in sequence.Executables)
{
if (e.GetType().ToString().ToUpper().Contains(typeNameUpper))
{
matchingExecutable.Add((DtsContainer)e);
}
if (e is TaskHost)
{
TaskHost taskHost = (TaskHost)e;

if ((typeNameUpper.Contains(“DATA FLOW”)
|| typeNameUpper.Contains(“DATAFLOW”)
|| typeNameUpper.Contains(“MAINPIPE”)
|| typeNameUpper.Contains(“PIPELINE”))
&& taskHost.InnerObject is IDTSPipeline90)
{
matchingExecutable.Add((DtsContainer)e);
}
else if (taskHost.InnerObject.GetType().ToString().ToUpper().Contains(typeNameUpper))
{
matchingExecutable.Add((DtsContainer)e);
}
}
if (e is IDTSSequence)
{
matchingExecutable.AddRange(FindExecutablesByType((IDTSSequence)e, typeNameUpper));
}

}
return matchingExecutable;
}


The function returns a list of DtsContainer objects, which is a base class for all control flow objects. The DtsContainer gives access to the name and description for each object. There is some special handling for Data Flow objects. These are actually implemented as COM objects, so a call to GetType on them returns “ComObject” rather than the .NET class name. To work around this, I have hard-coded a few values for the data flow objects, and a specific check for the IDTSPipeline object. There may be a better way to accomplish this, but I haven’t located it yet. Another limitation of this code is that it does not check event handlers. Any tasks inside of event handlers will not be found. This could be added fairly easily, but wasn’t necessary for my current purposes, so it didn’t make the cut.


To call this function and iterate the entire package, you can use this code:

Microsoft.SqlServer.Dts.Runtime.Application ssisApp = new Microsoft.SqlServer.Dts.Runtime.Application();
Package pkg = ssisApp.LoadPackage(“C:\\Temp\\MyPackage.dtsx”, null);
List<DtsContainer> containers = FindExecutablesByType((IDTSSequence)pkg, “ExecuteSQL”);
foreach (DtsContainer exec in containers)
{
//Do Something
}

I’ve put together a small GUI around this, posted on my SkyDrive.



If you are interested in learning more about working with SSIS programmatically, please post in the comments. I’m considering doing a few more posts on this, but only if there is interest.

Address Columns Generically In a Script Component

When writing script components in SSIS, it’s often useful to be able to process the columns without knowing exactly what they are. For example, you may want to loop through all the columns to check for a conditional, like NULL values in the columns. Or you may want to take a set of columns and concatenate them into a single string for output, or output each one as an XML element. In any of these cases, you don’t necessarily care about the individual column details.


One option is to use Reflection. This is fairly easy to set up, but is not the fastest performing option. To use Reflection, add this to the top of the Script class:

Imports System.Reflection

In the ProcessInputRow, you can use the following code:

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
If columnValue.GetValue(Row, Nothing).ToString() = “January” Then
columnValue.SetValue(Row, String.Empty, Nothing)
End If
Next
End Sub


This example is checking the value of each column to see if it is “January”, and if it is, writing an empty string back to the column. Be aware that this option is not the best performing option. It’s OK for small data sets, but I would not use it for large ones. You can get better performance from the Reflection option by caching the PropertyInfo objects for each column in an instance variable, but if you are really interested in performance, keep reading.


This option is a bit more complex, but performs better. This is based off some code originally written by MSDN Forum member jaegd (original post here). It’s been simplified a good bit, but that introduces some limitations. As written, this isn’t very type safe, so be sure to add code to check the data type if you are not working with strings.

Public Class ScriptMain
Inherits UserComponent
Private inputBuffer As PipelineBuffer

Public Overrides Sub ProcessInput(ByVal InputID As Integer, ByVal Buffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
inputBuffer = Buffer
MyBase.ProcessInput(InputID, Buffer)
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim counter As Integer = 0

For counter = 0 To inputBuffer.ColumnCount – 1
If inputBuffer.Item(counter).ToString() = “January” Then
inputBuffer.Item(counter) = String.Empty
End If
Next
End Sub
End Class


In the ProcessInput method, the Buffer object is cached, as we’ll need it to reference the columns by their index. In the ProcessInputRow method, we can use the Item property and the column index to read from or write to the column.


That’s two options for working with columns generically in the Script component. As mentioned above, use the second option for performance.



As usual, I’ve posted a sample with both methods on my SkyDrive.