Posts tagged ‘Examples’

XML Destination Script Component

I often hear comments along the lines of “Why doesn’t SSIS have an XML destination?”. Unfortunately, I don’t know the answer to that, but I can show you how to implement your own XML destination fairly easily. Thanks to the magic of the script component, it doesn’t take much work at all. This was also a good opportunity for me to use a destination script component, as I haven’t had much need for this in my work.

One caveat to this is that I have kept the XML format very simple and very generic. You can customize the script to handle more complex scenarios, but I’d imagine there would be a lot of work in producing a component that can handle all the different ways of formatting data in XML. Maybe that’s why MS didn’t include it out of the box :)

As an example, I have a very simple package with a single data flow. The data flow has a flat file source, and a script destination component.

XML Dest

The flat file source is standard, nothing interesting there. The work is all done in the script. When adding the script component, make sure to specify that it is a destination component.

When configuring the script component destination, I chose all the input columns that I wanted to include in the XML (in this case, all of the columns from my flat file).


There is no need to alter anything on the Inputs and Outputs page. Since this is a destination component, no output is necessary. On the Connection Managers page, I added a reference to a connection manager of type ”File”, and named it Destination.


This is the complete script for the component. I’ll explain it section by section below.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO
Imports System.Reflection

Public Class ScriptMain
Inherits UserComponent

Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”

Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub

Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
End Sub

Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

With xmlWriter
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
.WriteLine(FormatElement(rowElement, True))
End With
End Sub

Private Function FormatElement(ByVal elementName As String) As String
Return FormatElement(elementName, False)
End Function

Private Function FormatElement(ByVal elementName As String, ByVal closingTag As Boolean) As String
Dim returnValue As String
If closingTag Then
returnValue = “</”
returnValue = “<”
End If
returnValue += elementName + “>”

Return returnValue
End Function

End Class

I added the System.IO and System.Reflection to the Imports as I am using objects from both namespaces. There are 4 class level variables defined:

    Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”

xmlWriter is a StreamWriter, the .NET Framework object used to create the XML file. For more information on this, please see the MSDN documentation. rootElement is the value to enclose the entire XML document with, and rowElement defines what to enclose each individual row in. By changing the values in these variables, you affect what the final XML output looks like.

targetFile holds the path and file name for the destination file. It is set in the AcquireConnections method. Interesting note: when you are working with a file connection manager, calling AcquireConnection simply returns the path and file name of the file. Why not just use a variable or set it explicitly? Doing it this way makes the script behave more like a standard SSIS destination, and makes it clearer what you’d need to change to put the file in a new destination. As a plus, if you set an expression or configuration on the ConnectionString property of the file connection manager, the script will use it automatically.

    Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub

In the PreExecute method, the code creates a new instance of the StreamWriter, and writes the opening tag of the XML file.

    Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
End Sub

The ProcessInputRow method is where most of the work occurs. I’m using the System.Reflection and the ComponentMetaData objects to process the Row object dynamically. Essentially, this code determines what input columns are available at runtime, based on what was selected in the Script component’s property pages. It then writes each of those columns to the file, surrounding it with an XML tag based on the column name.

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

With xmlWriter
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
.WriteLine(FormatElement(rowElement, True))
End With
End Sub

 Finally, in the PostExecute method, the closing tag is written, and the file is closed.

    Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
End Sub

 With the script component in place, if I read in a text file with these values:


I get an output that looks like this:

<Row><NAME>A</NAME><VALUE>1</VALUE><DATE>1/1/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>2</VALUE><DATE>1/2/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>3</VALUE><DATE>1/3/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>4</VALUE><DATE>1/4/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>5</VALUE><DATE>1/5/2000 12:00:00 AM</DATE></Row>

The nicest thing about this script, in my opinion, is that it does not have to be modified if the input columns change. The use of the System.Reflection classes allows the code to process the Row object at runtime. This does come at the expense of performance, though, so I wouldn’t recommend using this in high volume scenarios without some tweaks.





Dynamically Unpivoting Columns to Rows

There’s been a few questions on the forums recently about taking column values and pivoting them to rows. For example, I might have an input file with 5 columns: a key column and 4 category columns.

Key; C1; C2; C3; C4

and I’d like an output that has 2 columns (the key and the category value):


Since my input has a set number of columns, I can use the Unpivot Transform to get the desired result.

But what if I have a variable number of columns to pivot? In that case the input file might look like this:


The first column still indicates the key, but there are a variable number of categories. In Handling Flat Files with Varying Numbers of Columns, I showed how to handle the varying number of columns by treating each row as a single column. This post is going to expand on that technique and show how to incorporate a pivot into the script.

The connection manager and the flat file source should be set up the same way as in the previous post, so I won’t cover that again. The difference is in the script component. Since I want to output more rows than I’m getting as inputs, I’m setting the script up with an asynchronous output. That is done by setting the SynchronousInputID property to None. Two columns were added to the output, one for the key, and one for the category value.

This is the code inside the script task:

Dim Values() As String
Dim i As Integer

Values = Row.Column0.Split(CChar(“;”))

‘Array is zero based – but zero index is the key value,
‘so start with 1
For i = 1 To Values.GetUpperBound(0)
    Output0Buffer.Key = CInt(Values(0))
    Output0Buffer.Value = CInt(Values(i))

The Split function is used to break the string up into individual columns in an array. The code loops through the array, starting with the second column, since the first one represents the key. For each iteration through the loop, a row is added to the output buffer, and the key and and value are added to the row.

That’s pretty much all there is to it. This technique can be used to handle flat files and columns that contain multiple items (in XML, or as a delimited list).

Handling Varying Columns, Part 2

In Handling Flat Files with Varying Numbers of Columns, I showed an example of parsing a flat file with an inconsistent number of columns. I used a script component, but Jamie commented that the same thing could be accomplished through a Conditional Split and Derived Column transform. So, here’s part 2.

I added a new data flow to the same package. The data flow is a bit more complicated for this.

The Conditional Split determines what type of row I’m dealing with, and passes it to the appropriate output. It does this by checking how many delimiters appear in the row. The FindString function will return a 0 if the string specified is not found, or if the string specified occurs less than the number of occurrences specified.

Now that I know how many columns I need to parse, I’m use a Derived Column transform to split the columns from the main string.

The expression for the first column looks for the first occurrence of the delimiter.

SUBSTRING(Line,1,FINDSTRING(Line,”,”,1) – 1)

For the second column, the expression is a bit more complicated. It has start from the first delimiter, and stop at the second. Since the SubString function needs the length, the expression is calculating the difference between the first and second delimiter. In addition, it is casting the result to an integer.

(DT_I4)(SUBSTRING(Line,FINDSTRING(Line,”,”,1) + 1,FINDSTRING(Line,”,”,2) – FINDSTRING(Line,”,”,1) – 1))

Finally, the third expression finds the second delimiter, and gets the rest of the string. I’m taking a shortcut by using the full value for the length, since if the length argument is exceeds the length of the string, the rest of the string is returned.


Finally, a Union All is used to combine the data back into a single flow.

Technically, this could be accomplished without the Conditional Split. However, the logic required for the Derived Column transform would be much more complex, as each column parsing expression would have to be wrapped in a conditional expression to see if that column actually existed for the row.

In SSIS, there are usually at least two ways to accomplish anything, which is one of the things I like about it. However, there are differing advantages to the two approaches covered here and in the previous post. In general, I favor using the script component for the following reasons:

  • Easier (at least in my opinion) to introduce complex logic for parsing the columns

  • Simpler data flow

However, the Derived Column is easier if you aren’t comfortable with .NET coding, and makes it easier to interpret what is happening in the data flow.

I’ve attached the updated sample package at the end of this post.

Handling Flat Files with Varying Numbers of Columns

5/15 Update – I added Part 2 to show how to do the same thing with a Conditional Split and a Derived Column transform, per Jamie’s feedback (see the comments on this post).
A common question on the forums has been how to handle flat files that have a varying number of columns. For example, one row contains 3 columns, and another row may contain on two columns. The example below shows a sample file that uses a comma to delimit the columns, and a cursor return / line feed to delimit the row.

SSIS does not handle this scenario easily, due to the way it parses flat files. It parses by looking for the next column delimiter. The row delimiter is just the column delimiter for the last defined column. So, on our second line in the sample file, SSIS is looking for a comma instead of a CR/LF. The result of this is that the third row ends up combined with the second row, and we get something that looks like this:

I’m not going to go into a discussion about whether this is good or bad. This article is about how to work around it. If you’d like to see it changed in future versions of SSIS, please go to Connect ( and vote for it to be changed.
Now, onto the workaround. First, I’ve defined a flat file connection manager that treats each row as one column. I’m using the row delimiter (CR/LF) as the column delimiter.

If you are following along, your flat file should preview like this:
Next, in a data flow, I’ve added a flat file source that uses the connection manager. It is connected to a script component that is set as a Transform. The Line column is checked as an input.

In the Inputs and Outputs area, I’ve added three columns, for the three real columns in my flat file, and set the data types appropriately.

Finally, I added the following script to the task:
Public Class ScriptMain
    Inherits UserComponent
    Private columnDelimiter() As Char = CType(“,”, Char())

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As
        Dim rowValues As String()

        rowValues = Row.Line.Split(columnDelimiter)
        If rowValues.GetUpperBound(0) < 2 Then
            ‘Row is not complete – Handle error
            Row.Name_IsNull = True
            Row.Number_IsNull = True
            Row.Date_IsNull = True
            Row.Name = rowValues.GetValue(0).ToString()
            Row.Number = Convert.ToInt32(rowValues.GetValue(1))
            Row.Date = Convert.ToDateTime(rowValues.GetValue(2))
        End If
    End Sub

End Class
The columnDelimiter variable holds the value for the column delimiter – a comma in my case. The Split function parses the value contained in Line (the single column defined in the connection manager) and returns an array containing one element for each column in it. Since I’m expecting 3 columns, I’m performing a check to see if the array contains all three columns (.NET uses 0-based array indexes). If columns are missing, I have an error that needs to be handled. In this example, I am simply setting all my column values to NULL. The error handling could be enhanced by redirecting the rows to an error output, but I wanted to keep things simple. With this method, I could use a conditional split to filter out the rows with NULL.
Finally, if the correct number of columns are present, I’m setting the output columns created earlier with the values from the array. Notice that the Convert is necessary to make sure the value is the correct type.
That’s pretty much it. Depending on your needs, you may need to customize the script a bit to better handle error conditions, or reparsing the columns. I’ve attached the sample package and text file below. The sample is using the Trash Destination from Konesans, which you can download from

Sample files here

As always, feedback is appreciated.

Handling Multiple Errors in SSIS

[edited on 12/14/2007 to correct an error in the text around string handling - the samples were not modified]
One actual failure in SSIS can trigger a whole series of error messages. For example, failure to convert a column value from a string to an integer in a Derived Column transform generates the following messages:
[Data Conversion [70]] Error: Data conversion failed while converting column “Fiscal year” (18) to column “NumericFiscalYear” (83). The conversion returned status value 2 and status text “The value could not be converted because of a potential loss of data.”.
[Data Conversion [70]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The “output column “NumericFiscalYear” (83)” failed because error code 0xC020907F occurred, and the error row disposition on “output column “NumericFiscalYear” (83)” specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
[DTS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component “Data Conversion” (70) failed with error code 0xC0209029. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
[DTS.Pipeline] Error: SSIS Error Code DTS_E_THREADFAILED. Thread “WorkThread0″ has exited with error code 0xC0209029. There may be error messages posted before this with more information on why the thread has exited.
If you are logging errors to a flat file or an error logging table, then recording each error may be fine. However, if you’re writing the errors to the Windows event log, or sending them via email, you may not want to record multiple messages each time an error occurs. You might want to record only the first message, or you might want to group all the errors into a single log entry or email. Fortunately, the event model in SSIS allows you to easily customize how errors are handled.
I’ve put together a small sample package that shows how you might accomplish this. The package contains a single data flow that loads a text file, attempts to convert a column from string to numeric, and writes it to a Trash destination (see to get this component).
The text file has an invalid value in one of the columns, which will cause the data flow to fail, and generate the four messages listed above. The package is set up to capture all of the error messages generated, store them in a collection, and concatenate them into a single string when the package is finished executing. Once that is done, the resulting string could be emailed or recorded to a log.
As mentioned, the data flow is very straightforward:

I’ve also created two variables at the package level: errorMessages as an Object, and emailText as a String. I’ll explain why later in the post.
The real work occurs in the event handlers. SSIS raises events for all executables(packages and tasks are both executables). The event we’re interested in is the OnError event, which is raised once for each error that occurs.
You get to the event handlers by selecting the Event Handlers tab in the SSIS designer. Once there, the Executable for which you want to capture events needs to be selected.

Since I want to handle errors for anything in the package, I’m setting the executable to CaptureErrors (the name of the package). By default, any event raised by a child executable (that is, an executable that is nested inside another executable) will also be raised in its parent. You can disable that behavior by setting the Propagate system variable, but that’s a topic for another post. I’m also using “OnError” from the list of events and have added a Script Task to the event handler.

The Script Task has two variables passed in: the System::ErrorDescription, which contains the text of the error message, and User:errorMessages, which I’ll use to keep track of all the error messages.

Here’s the script used in the Script Task:
Dim messages As Collections.ArrayList
messages = CType(Dts.Variables(“errorMessages”).Value, Collections.ArrayList)
Catch ex As Exception
messages = New Collections.ArrayList()
End Try

Dts.Variables(“errorMessages”).Value = messages
Dts.TaskResult = Dts.Results.Success
I’m first attempting to retrieve the ArrayList from the errorMessages variable. If the value of the variable can’t be cast to an ArrayList, it indicates that it hasn’t been initialized yet. Once that has been handled, the error description is added to the ArrayList. This handles capturing the list of all error messages.
The next step is to process all the messages in order to email or log them. Since I only want to do this once, I’m using the OnPostExecute event, which fires when the executable is finished running.

There is another Script Task present in this event handler. This one has the User::errorMessages and User:emailText variables passed in.

The script in this task is concatenating a long string based on the error messages captured and returning it in the emailText variable:
Dim errorDesc As String
Dim messages As Collections.ArrayList

messages = CType(Dts.Variables(“errorMessages”).Value, Collections.ArrayList)
Catch ex As Exception
‘If there is an exception – the object was never initialized, so there were no errors
End Try

For Each errorDesc In messages
Dts.Variables(“emailText”).Value = Dts.Variables(“emailText”).Value.ToString + errorDesc + vbCrLf

Dts.TaskResult = Dts.Results.Success
Once that has been done, the resulting string could be emailed or logged as desired. Since SSIS can generate fairly verbose error messages, I chose to store the messages in an ArrayList object. I then use the ArrayList to build the actual string message.
Hopefully, this information is helpful to anyone who wants to customize SSIS event handling. I’ve attached the sample package, and the text file used in the sample. If you have any feedback or suggestions for improvement, please leave them in the comments.

Writing a Resultset to a Flat File

This is another example package. This one retrieves a resultset and stores it in a variable. It then uses a data flow with a custom script source to read each row of the resultset and output it as a delimited string. Why would you want to do this? Well, it enables you to treat a resultset generically from a data flow perspective. There is only a single “column” defined in the metadata. So this works well if you want to export an unknown number of columns from a resultset to a flat file.

First, use an Execute SQL to retrieve and store a resultset in a variable. Then execution should continue to the data flow.

A custom script component acts as the source for the data flow. When adding the script component, make sure to source as the type of the script. Select Inputs and Outputs, and add a single output column to the existing output, and set the type of the column to DT_STR and set the length to 1000 (or something long enough to hold your widest resultset).

Select the Script option next, and add the resultset variable to the ReadOnly Variables property. Click the Design Script button, and add the following to the CreateNewOutputRows function: (Script adapted from Jamie Thomson’s example at

Dim oledbAdapter As New OleDb.OleDbDataAdapter
Dim dataTable As New System.Data.DataTable
Dim dataRow As Data.DataRow
Dim delimitedValues As String
Dim i As Integer

oledbAdapter.Fill(dataTable, Me.Variables.resultset)
For Each dataRow In dataTable.Rows

  delimitedValues = String.Empty

    For i = 0 To dataRow.ItemArray.GetUpperBound(0)
      delimitedValues = delimitedValues + dataRow.Item(i).ToString()
      If i <> dataRow.ItemArray.GetUpperBound(0) Then
        delimitedValues = delimitedValues + “|”
      End If

    OutputBuffer.DelimitedRow = delimitedValues

You’ll also need to add a reference to the System.XML assembly (Select Project..Add Reference from the toolbar). If you want to use a different delimiter, just change the constant “|”.

Now you can connect this to a flat file destination with a single column defined, and output your recordset to that without worrying about the number or type of columns.

Using For Each to Iterate a Resultset

Update: Kirk Haselden has a great example of this that I found right after I posted this. 

There’s been a number of questions about iterating through a collection of items and performing an operation for each item in the SSIS forums recently, so I put together a quick example of doing this.

The example package retrieves a list of parent rows from one table, then uses a Foreach Loop to iterate through each parent row. It uses the ID from the parent row to retrieve the matching child records from a second table.

This example assumes the following tables in your database:

CREATE TABLE [dbo].[ChildTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ParentID] [int] NOT NULL,
[ChildName] [varchar](25) NULL,
( [ID] ASC )

CREATE TABLE [dbo].[ParentTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ParentName] [varchar](25) NOT NULL,
( [ID] ASC )

It also assumes that the ParentID in ChildTable points to the IDs in ParentTable.

The attached package starts with an Execute SQL tasks that retrieves the records from the ParentTable and stores it in a variable named “parentResultset”. The Foreach Loop container is set to use the “parentResultset” variable as the source variable for enumeration. It maps the ID column from the “parentResultset” variable to the “parentID” variable on the Variable Mappings tab.

There is a variable named “select”. This variable uses an expression to build the following select statement: “SELECT ID, ParentID, ChildName FROM ChildTable WHERE ParentID = ” + the “parentID” variable. The “select” variable is then used in another Execute SQL task.

Hopefully this helps illustrate how to use the Foreach Loop.

Sending Header and Detail to different destinations

A question came up today on the SSIS forum about processing flat files that included header and detail information. The poster wanted to send header information to one destination and detail information to another. The header rows have 2 columns, the detail rows have 5.


One way to solve this is a script component, which would have two outputs, and redirect rows based on programatically determining what type of row was being dealt with. The advantages for this approach are:

  • You only read the source once

  • Completely flexible for whatever logic you want to use to identify header and detail rows

  •  It handles files that have repeated header / detail sections

A downside is that you will probably have to treat each row from the source as a single column, and break the columns out yourself. Project Real has a great example of this approach (though they show it in custom components, the same concepts apply), so rather than try to cover it again here, I’ll provide a link to their documentation.

An alternative approach is to read the same flat file twice, once to get header rows, once to get the detail. This works well when you will only have one header, detail combination per input file.

To do this for the file defined above, you need to create two connections to the flat file, one for the header defining two columns, the other for detail with five columns. For the detail connection, set the header rows to skip to 3. The detail source can be handled normally, as no special handling is required.

The header source flow does need a few additions. First, add a script component. In the script component, add a new column to the output (I named mine Rows) to hold the row count. Use the following code to add the row count:

Static RowCount As Integer = 0

RowCount = RowCount + 1

Row.Rows = RowCount

Next, add a conditional split component. Add a case for everything with a Rows value less than 2 (Rows <= 2) to only get the two header values we want. Now this flow can be written to the appropriate destination.

Rather than using a simple row count, you could use the script component to do pattern matching using Like, or other techniques to identify the header rows, and output a simple boolean flag indicating whether the row should be included.

I’ve attached a simple sample package that demonstrates this. It uses the Konesans Trash Destination Adapter (, so you’ll need that to run the sample.