Posts tagged ‘Flat Files’

Processing Flat Files with SSIS at SQLSaturday #61

I’m looking forward to speaking at SQLSaturday #61 (#sqlsat61) in Washington, DC this coming weekend (12/4/2010). As usual for a SQLSaturday, there are some great speakers scheduled to present.

I’ll be presenting on “Processing Flat Files with SSIS”, which goes over some tips and tricks for working with flat files. It’s always a fun session to give, as everyone seems to have encountered some troublesome flat files at some point in their career.

If you are in attendance at the event, please feel free to stop by and say hello.

Slides From “Processing Flat Files with SSIS”

Thanks to the Columbia Enterprise Developer’s Guild for letting me present last night. The audience was great, and I got a lot of good questions. Several people asked if the samples could be made available, and I also had a request to post the slides for some people who weren’t able to make it. So, here they are. I’ve posted them to my SkyDrive here. If you have any questions or comments, please feel free to leave them here.

Creating Multiple Rows in a Text File from a Single Row

A recent post on the SSIS forums was asking about creating multiple rows in a text output file from a single row in the data flow. Given a set of rows like this:

John Smith 1/1/1900 Value A
Jane Smith 12/1/1900 Value B

the poster wanted this output in the text file:

John Smith  
1/1/1900 Value A
Jane Smith  
12/1/1900 Value B

Basically, the poster wanted a line break in the middle of a row of data, while keeping a line break at the end.

There are a couple of ways to accomplish this in SSIS. One way is the use of a script task to create the file, which gives you complete control over the format of the file. There’s also a couple of ways to do it directly in SSIS. The first way is to use a Multicast transform to create two copies of each row, perform some string concatenation, and then combine them using a Union All or a Merge.

image

The Derived Column transforms are used to put the multiple columns into a single column, so that a variable length record can be written to the flat file. The Sort transforms and the Merge combines the rows into the proper order, before sending them to a flat file.

The other option (and one that probably falls under the category of stupid SSIS tricks), is to hack the flat file connection manager a little bit. You can set the column delimiters so that a carriage return/linefeed is inserted in the middle of the row. However, this isn’t as simple as just choosing {CR}{LF} as the column delimiter. SSIS checks to make sure that none of the column delimiters are the same as the row delimiter. Why it does that check, I don’t know, given the way it parses flat files. Regardless, you have to work around it. So, you can simply select the column where you want to introduce the break, and set it’s delimiter to {CR}.

image

Then insert a new column immediately following that column, set the output width to 0, and set the column delimiter to {LF}.

image

Now the output will include a carriage return / linefeed between the columns.

The sample package for this is located here. It is SSIS 2008, but the concepts are the same for 2005.

Adding Headers and Footers to Flat Files

A common question with SSIS is how to handle adding headers and footers to a flat file when the format is different that the data rows. This gets more difficult when the header or footer needs to contain data based on the actual data rows, such as a total or a record count.


Basically, this can be accomplished by creating the header or footer in a separate task, and then prefixing or appending it to the data. I’ve prepared a sample package that shows a couple of different ways to accomplish this. For generating footers, it shows how to use a second data flow to append an additional row onto the footer. There are two methods shown for adding headers. Method 1 uses a Script task to add the header by loading the file into a string, adding the header, and saving the file again. Method 2 creates a separate header file, and uses a file copy operation to concatenate the two files.


image


There a couple of things to note about the package. There are three flat file connection managers defined in the package.


image


Two of the connection managers (Destination and DestinationWithFooter) point to the same file, but have different column definitions. All three have their connection strings set by expressions that point to variables. If you want to change the target files, you need to modify the variables, not the connection managers.image


The DestinationFile variables holds the path to a working file, which will hold the data before the headers and footers are added. The DestinationHeader variable holds the path to a working file used only for method 2, which holds the header row before it is concatenated into the final file. The FinalFile variable holds the path to the final output file, which includes the header, data, and footer rows.


The Generate File data flow uses a Data Generator component from Konesans (a free download from www.sqlis.com). It simply generates some data, captures the number of rows using a Row Count transform, and outputs it to the Destination connection manager.


The Append Footer and Create a Header File data flows are very similar. They both use a Script source to generate a single row containing the footer or header data, respectively. The script is below.



Public Overrides Sub CreateNewOutputRows()
    Dim recordCount As Integer
    Dim vars As IDTSVariables90
 
    ‘Get the record count
    Me.VariableDispenser.LockOneForRead(“RecordCount”, vars)
    recordCount = CType(vars(“RecordCount”).Value, Integer)
    vars.Unlock()
 
    ‘Output one row with record count
    Output0Buffer.AddRow()
    Output0Buffer.FooterRow = String.Format(“Footer Row Count: {0}”, recordCount)
    Output0Buffer.SetEndOfRowset()
End Sub

The two differ in the way the destination is handled. In the Append Footer data flow, the Flat File destination is set to append to the file, rather than overwriting it. It uses the DestinationWithFooter connection manager, which points to the same file that the data was sent to in the Generate File data flow. The DestinationWithFooter connection manager defines the file as having a single column. That makes it easier to append the footer row directly to the file. After the Append Footer data flow task has run, there is a single file that has the data rows and a footer row.


image


In the Create a Header File data flow, the header row is sent to a new file. Then, in the Append Files execute process task, the header and the data file are concatenated together using a copy command. This is Method 2 for adding the header.


For Method 1, the Add Header script tasks uses the System.IO.File class and the System.Text.StringBuilder class to create a new file containing the header row and the data rows, which already has the footer included.



Public Sub Main()
    Dim fileContents As New StringBuilder()
    Dim vars As Variables
    Dim recordCount As Integer
    Dim finalFile As String
 
    ‘Get the record count
    Dts.VariableDispenser.LockForRead(“RecordCount”)
    Dts.VariableDispenser.LockForRead(“FinalFile”)
    Dts.VariableDispenser.GetVariables(vars)
    recordCount = CType(vars(“RecordCount”).Value, Integer)
    finalFile = CType(vars(“FinalFile”).Value, String)
    vars.Unlock()
 
    ‘Write header, then append file contents and write back out.
    fileContents.AppendLine(String.Format(“Header Row Count 1: {0}”, recordCount))
    fileContents.Append(File.ReadAllText(Dts.Connections(“Destination”).ConnectionString))
    File.WriteAllText(finalFile, fileContents.ToString())
 
    Dts.TaskResult = Dts.Results.Success
End Sub

Hopefully this example is helpful, if you are trying to sort through the options for adding headers and footers. The example is available on my Skydrive here.

Dealing with Unicode and ASCII Text Files

A recent post on the MSDN forums brought up an interesting problem. The poster was attempting to process a folder of files, all of which had the same format, but some were encoded as ASCII (ANSI, DT_STR, or varchar, if you like the SQL Server data types) and some were encoded as Unicode (DT_WSTR, nvarchar).


The initial issue was determining which way the file was encoded. After a little research, I located a post entitled “Detecting File Encodings in .NET” by Heath Stewart. This showed how to check the encoding of a file in C#. I converted the code to VB.NET and set it up to work within a Script Task. These are the results:

    Public Sub Main()
”’Code based on sample originally posted here: http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=469
Dim file As System.IO.FileStream = Nothing
Dim isUnicode As Boolean = False

Try
file = New System.IO.FileStream(Dts.Connections(“TestFileAnsi”).AcquireConnection(Nothing).ToString(), _
System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read)
If file.CanSeek Then
Dim bom(4) As Byte ‘ = Byte ‘// Get the byte-order mark, if there is one
file.Read(bom, 0, 4)
If ((bom(0) = &HEF And bom(1) = &HBB And bom(2) = &HBF) Or _
(bom(0) = &HFF And bom(1) = &HFE) Or _
(bom(0) = &HFE And bom(1) = &HFF) Or _
(bom(0) = 0 And bom(1) = 0 And bom(2) = &HFE And bom(3) = &HFF)) Then
isUnicode = True
Else
isUnicode = False
End If
‘// Now reposition the file cursor back to the start of the file
file.Seek(0, System.IO.SeekOrigin.Begin)
Else
‘// The file cannot be randomly accessed, so you need to decide what to set the default to
‘// based on the data provided. If you’re expecting data from a lot of older applications,
‘// default your encoding to Encoding.ASCII. If you’re expecting data from a lot of newer
‘// applications, default your encoding to Encoding.Unicode. Also, since binary files are
‘// single byte-based, so you will want to use Encoding.ASCII, even though you’ll probably
‘// never need to use the encoding then since the Encoding classes are really meant to get
‘// strings from the byte array that is the file.
isUnicode = False
End If

Dts.TaskResult = Dts.Results.Success

Catch e As Exception
Dts.TaskResult = Dts.Results.Failure
Finally
If Not file Is Nothing Then
file.Close()
End If

Try
Dim vars As Variables
Dts.VariableDispenser.LockOneForWrite(“isUnicode”, vars)
vars(“isUnicode”).Value = isUnicode
vars.Unlock()
Catch ex As Exception
Dts.TaskResult = Dts.Results.Failure
End Try
End Try
End Sub


Essentially, the script above is getting the file path from a flat file connection manager.  It’s then checking the byte order mark (BOM), which is what tells us if the file is Unicode or not. The script assumes that the file is ASCII, if it can’t determine it from the BOM. The result of the check is written to a boolean variable defined in the package called isUnicode.


That part was relatively straightforward. Originally, I thought I could just use this variable to set the Unicode property on the flat file connection manager through an expression, and use the same dataflow, as long as I typed all the string columns as DT_WSTR (I really should have known better). Oddly enough, when I first tested this, I found that if I set the Unicode property to false on the connection manager, but set the column types to DT_WSTR, I could process ASCII files without a problem, but Unicode files didn’t send any rows through, even though there was no error. If I set the Unicode property to true, and then set it to false via an expression, the flat file source threw a validation error because the error output was using a DT_NTEXT type, and it needs to use DT_TEXT with ASCII files.


What I ended up with instead isn’t quite as simple, but it does work consistently. I created two connection managers, one configured for ASCII, and the other for Unicode. Both connection managers have an expression defined that sets their ConnectionString property based on the same variable, so they both point at the same file. Then I created two data flows, one for ASCII, one for Unicode. This eliminates any metadata errors. Finally, I set precedence constraints from the script task to each data flow, and made execution conditional based on the isUnicode variable.


image


The constraint to DF ANSI looks like this:image


and the constraint to DF Unicode looks like:image


The sample files for this (included a text file formatted as Unicode and ASCII) are on my Live Drive:



 


To switch the test file from ASCII to Unicode, change the file path in the “testFilePath” variable.

Processing a Flat File with Header and Detail Rows

It’s been tough finding time to write any content recently, but this was a scenario that came up recently both at work and on the forums, so I decided to write it up. It’s another variation on processing flat files (they seem to be endless). This scenario deals with an input file that contains two types of records, headers and details. This processing for this is based off another post (Handling Flat Files with a Varying Number of Columns), so I’m only going to cover what is unique in this scenario.


The input file looks like this:


HL111Header1
DLDetail1
DLDetail2
HL222Header2
DLDetail3
DLDetail4

The rows that begin with HL are header lines, the rows with DL are detail lines. One additional requirement in this example is that the header code (the three digits following the HL in each header line) must be included into each detail row.


The data flow looks like this:


HeaderDetail Data Flow


The Flat File Source is bringing in each row as a single column, delimited by CR/LF. The script component has two outputs defined, both as synchronous, and with an exclusion group greater than zero. (For more info on this, please see this post from Jamie Thompson.) One output will include the Header row, and the other will include the Detail rows. Note that each output must have unique column names defined.


The script appears below. It is simply parsing the Line column (that contains the entire row of data), and putting the values into the appropriate columns on each output. Note that the headerCode variable is stored each time a header row is encountered, and then used to populate the code value on the detail output. 


Public Class ScriptMain
Inherits UserComponent
Private headerCode As Integer

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If Row.Line.Substring(0, 2) = “HL” Then
headerCode = CType(Row.Line.Substring(2, 3), Integer)
Row.DirectRowToHeader()
Row.HRRecordType = Row.Line.Substring(0, 2)
Row.HRCode = CType(Row.Line.Substring(2, 3), Integer)
Row.HRName = Row.Line.Substring(5)
ElseIf Row.Line.Substring(0, 2) = “DL” Then
Row.DirectRowToDetail()
Row.RecordType = Row.Line.Substring(0, 2)
Row.Code = headerCode
Row.Name = Row.Line.Substring(2)
End If
End Sub

End Class


There it is. Fairly straightforward, but it was quick to put together. The sample files are attached below.

Multi-File Outputs Part 2

In my previous post, I discussed how to use a script destination to output multiple flat files from a single source. In it, I mentioned that I would post an alternate approach, that didn’t require any scripting. So, here it is.


This is using the same example data as the previous post, a source file that looks like:


NAME;VALUE;DATE
A;1;1/1/2000
A;2;1/2/2000
A;3;1/3/2000
A;4;1/4/2000
A;5;1/5/2000
B;1;1/1/2000
B;2;1/2/2000
B;3;1/3/2000
B;4;1/4/2000
B;5;1/5/2000
C;1;1/1/2000
C;2;1/2/2000
C;3;1/3/2000
C;4;1/4/2000
C;5;1/5/2000

The goal is to output one file containing all “A” rows, one with “B” rows, and one with “C” rows.


 


The control flow consists of a Data Flow, ForEach container, and another Data Flow.


MultiFile 2


The first data flow is used to get the number of “slices” in the original file, and the data to identify each slice. The results are stored in a recordset object that is used by the ForEach loop. In the recordset, I expect to see three records, with the values A, B, and C. To get that from the source, I can either use a conditional split, or an aggregate transform. Important: you do not need both; I am only including both approaches as examples. If you implement this as a real solution, please only use the approach that best suits your needs.


GetSliceDF


The aggregate transform is set to group by and pass through the Name column from the source file.


Aggregate


The condition split works by looking for rows with a value of 1. Those are the only rows passed to the recordset destination.


CondSplit


The choice of which one to use really depends on how complex the rules for determining how many files you need are. If it is based on a single column, or set of columns, the aggregate transform works well. For more complex logic, the conditional split is probably more appropriate.


Once the recordset is populated, it is stored in a variable. There are two defined, one to hold the results of the conditional split (splitRowList), and one to hold the results of the aggregate transform (aggregatedRowList). Again, you don’t need both. The aggregatedRowList is the one used for the rest of this example.


Variables


The Foreach loop is set to use the aggregatedRowList variable.


ForEach1


The first column of the recordset (the Name column) is mapped to the currentSlice variable. This is what is used in the data flow to determine whether the row should be included in the output. It is also used in an expression for the currentFile variable, which is also used to set the ConnectionString property of the Destination File connection manager:

“C:\\OutputDir\\” + “Output” +  @[User::currentSlice] + “.txt”

This updates the destination filename for each iteration of the loop.


ForEach


In the data flow, the source and destination are fairly straightforward.


DataFlow1


In the conditional split, the currentSlice variable is compared to the Name column to determine if this row should be included in the output. More complex logic could be included here fairly easily.


CondSplit1


That’s pretty much all there is to it.


Why would you want to use this approach over using a script? For one, this would be easy to adapt to multiple table destinations, instead of flat files. Two, this approach does not care about whether the rows are sorted or not. However, it is reliant on the conditional split, so the logic for which output a row belongs in needs to be something that can be implemented in a conditional split. And it does require multiple passes through the source rows. The script approach seems to be better suited to sequential, single pass file processing.

Multi-File Output Destination Script Component

Yesterday I posted about using a destination script component to write XML files from a data flow. Today, I’m continuing in the same vein with a post about writing multiple destination files from a destination script component. The inspiration for this was a post on the MSDN forums where a developer described a scenario where he had a single file and needed to split it into multiple output files based on some conditional logic. Normally, this could be handled with a conditional split and multiple destinations, but the catch in his scenario was that he didn’t know how many destination files he would need.


Based on that, I worked up a quick script destination to accomplish his goal. Later, I’m also going to post an alternative approach that does not use any script.


Multifile Destination


The data flow is fairly straightforward. The flat file source is retrieving a flat file with the following data:


NAME;VALUE;DATE
A;1;1/1/2000
A;2;1/2/2000
A;3;1/3/2000
A;4;1/4/2000
A;5;1/5/2000
B;1;1/1/2000
B;2;1/2/2000
B;3;1/3/2000
B;4;1/4/2000
B;5;1/5/2000
C;1;1/1/2000
C;2;1/2/2000
C;3;1/3/2000
C;4;1/4/2000
C;5;1/5/2000

The goal is to output three files, one each for the A, B, and C rows. I used a folder connection rather than a specific flat file connection. The folder connection specifies the output directory to place the files in.

DestinationFolder

On the script task (created as a destination), all the available input columns were selected.

Script Inputs

The Destination Folder is specified on the Connection Managers page.

ConnMgr

The script is as follows:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO

Public Class ScriptMain
Inherits UserComponent

Private Const fileName As String = “Output”
Private Const fileExt As String = “.txt”
Private Const delimiter As String = “;”
Private targetFolder As String
Private fileWriter As StreamWriter
Private fileCount As Int32 = 1
Private previousValue As String = String.Empty

Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFolder = CType(Me.Connections.Destination.AcquireConnection(Nothing), String) + “\”
End Sub

Public Overrides Sub PostExecute()
fileWriter.Close()
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90

If Not Row.NAME = previousValue Then
If Not fileWriter Is Nothing Then
fileWriter.Close()
End If
fileWriter = New StreamWriter(targetFolder + fileName + fileCount.ToString() + fileExt, False)
fileCount += 1
previousValue = Row.NAME
End If

With fileWriter
.Write(Row.NAME + delimiter)
.Write(Row.VALUE.ToString() + delimiter)
.WriteLine(Row.DATE.ToString() + delimiter)
End With
End Sub

End Class


The most interesting portion of this script is in the ProcessInputRow method. The initial section of the method checks to see if the current row Name value is the same as the previous row. If it is different, it closes the current file, opens a new one, and increments a counter for the file name. Then the current row values are written out to the file.

This script works well for outputting multiple files. In the script, I’m using a simple previous value comparison to see if  I need to start a new file, but the logic could be much more complex if necessary.

An advantage of this approach over the non-script approach is that it only requires one pass through the source file. Another advantage is that the comparison logic can be fairly complex and can easily handle triggering new file creation based on number of rows or a trigger row. The pure SSIS approach can’t handle that as easily. However, this approach does require that your rows are sorted properly before being fed into the destination, whereas the alternate approach can handle the rows in any order.

XML Destination Script Component

I often hear comments along the lines of “Why doesn’t SSIS have an XML destination?”. Unfortunately, I don’t know the answer to that, but I can show you how to implement your own XML destination fairly easily. Thanks to the magic of the script component, it doesn’t take much work at all. This was also a good opportunity for me to use a destination script component, as I haven’t had much need for this in my work.


One caveat to this is that I have kept the XML format very simple and very generic. You can customize the script to handle more complex scenarios, but I’d imagine there would be a lot of work in producing a component that can handle all the different ways of formatting data in XML. Maybe that’s why MS didn’t include it out of the box :)


As an example, I have a very simple package with a single data flow. The data flow has a flat file source, and a script destination component.


XML Dest


The flat file source is standard, nothing interesting there. The work is all done in the script. When adding the script component, make sure to specify that it is a destination component.


When configuring the script component destination, I chose all the input columns that I wanted to include in the XML (in this case, all of the columns from my flat file).


DestScriptColumns


There is no need to alter anything on the Inputs and Outputs page. Since this is a destination component, no output is necessary. On the Connection Managers page, I added a reference to a connection manager of type ”File”, and named it Destination.


DestScriptCM


This is the complete script for the component. I’ll explain it section by section below.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO
Imports System.Reflection

Public Class ScriptMain
Inherits UserComponent

Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”

Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub

Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
xmlWriter.WriteLine(FormatElement(rootElement))
End Sub

Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
xmlWriter.Close()
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

With xmlWriter
.Write(FormatElement(rowElement))
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
Next
.WriteLine(FormatElement(rowElement, True))
End With
End Sub

Private Function FormatElement(ByVal elementName As String) As String
Return FormatElement(elementName, False)
End Function

Private Function FormatElement(ByVal elementName As String, ByVal closingTag As Boolean) As String
Dim returnValue As String
If closingTag Then
returnValue = “</”
Else
returnValue = “<”
End If
returnValue += elementName + “>”

Return returnValue
End Function

End Class


I added the System.IO and System.Reflection to the Imports as I am using objects from both namespaces. There are 4 class level variables defined:

    Private targetFile As String
Private xmlWriter As StreamWriter
Private rootElement As String = “Root”
Private rowElement As String = “Row”

xmlWriter is a StreamWriter, the .NET Framework object used to create the XML file. For more information on this, please see the MSDN documentation. rootElement is the value to enclose the entire XML document with, and rowElement defines what to enclose each individual row in. By changing the values in these variables, you affect what the final XML output looks like.


targetFile holds the path and file name for the destination file. It is set in the AcquireConnections method. Interesting note: when you are working with a file connection manager, calling AcquireConnection simply returns the path and file name of the file. Why not just use a variable or set it explicitly? Doing it this way makes the script behave more like a standard SSIS destination, and makes it clearer what you’d need to change to put the file in a new destination. As a plus, if you set an expression or configuration on the ConnectionString property of the file connection manager, the script will use it automatically.

    Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
targetFile = CType(Me.Connections.Destination.AcquireConnection(Nothing), String)
End Sub

In the PreExecute method, the code creates a new instance of the StreamWriter, and writes the opening tag of the XML file.

    Public Overrides Sub PreExecute()
xmlWriter = New StreamWriter(targetFile, False)
xmlWriter.WriteLine(FormatElement(rootElement))
End Sub

The ProcessInputRow method is where most of the work occurs. I’m using the System.Reflection and the ComponentMetaData objects to process the Row object dynamically. Essentially, this code determines what input columns are available at runtime, based on what was selected in the Script component’s property pages. It then writes each of those columns to the file, surrounding it with an XML tag based on the column name.

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim column As IDTSInputColumn90
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo

With xmlWriter
.Write(FormatElement(rowElement))
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
.Write(FormatElement(column.Name) + columnValue.GetValue(Row, Nothing).ToString() + FormatElement(column.Name, True))
Next
.WriteLine(FormatElement(rowElement, True))
End With
End Sub


 Finally, in the PostExecute method, the closing tag is written, and the file is closed.

    Public Overrides Sub PostExecute()
xmlWriter.WriteLine(FormatElement(rootElement, True))
xmlWriter.Close()
End Sub

 With the script component in place, if I read in a text file with these values:


NAME;VALUE;DATE
A;1;1/1/2000
A;2;1/2/2000
A;3;1/3/2000
A;4;1/4/2000
A;5;1/5/2000

I get an output that looks like this:

<Root>
<Row><NAME>A</NAME><VALUE>1</VALUE><DATE>1/1/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>2</VALUE><DATE>1/2/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>3</VALUE><DATE>1/3/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>4</VALUE><DATE>1/4/2000 12:00:00 AM</DATE></Row>
<Row><NAME>A</NAME><VALUE>5</VALUE><DATE>1/5/2000 12:00:00 AM</DATE></Row>
</Root>

The nicest thing about this script, in my opinion, is that it does not have to be modified if the input columns change. The use of the System.Reflection classes allows the code to process the Row object at runtime. This does come at the expense of performance, though, so I wouldn’t recommend using this in high volume scenarios without some tweaks.


 


 


 


 

Dynamically Unpivoting Columns to Rows

There’s been a few questions on the forums recently about taking column values and pivoting them to rows. For example, I might have an input file with 5 columns: a key column and 4 category columns.


Key; C1; C2; C3; C4
1;1;2;3;4
2;1;2;3;4
3;1;2;3;4
4;1;2;3;4


and I’d like an output that has 2 columns (the key and the category value):


1;1
1;2
1;3
1;4
2;1
2;2
2;3


Since my input has a set number of columns, I can use the Unpivot Transform to get the desired result.



But what if I have a variable number of columns to pivot? In that case the input file might look like this:


1;1;2;3
2;1;2;3;4;5
3;1
4;1;2;3;4;5;6;7;8
5;1;2;3;4;5;6;7;8;9
6;1;2;3


The first column still indicates the key, but there are a variable number of categories. In Handling Flat Files with Varying Numbers of Columns, I showed how to handle the varying number of columns by treating each row as a single column. This post is going to expand on that technique and show how to incorporate a pivot into the script.


The connection manager and the flat file source should be set up the same way as in the previous post, so I won’t cover that again. The difference is in the script component. Since I want to output more rows than I’m getting as inputs, I’m setting the script up with an asynchronous output. That is done by setting the SynchronousInputID property to None. Two columns were added to the output, one for the key, and one for the category value.



This is the code inside the script task:


Dim Values() As String
Dim i As Integer

Values = Row.Column0.Split(CChar(“;”))

‘Array is zero based – but zero index is the key value,
‘so start with 1
For i = 1 To Values.GetUpperBound(0)
    Output0Buffer.AddRow()
    Output0Buffer.Key = CInt(Values(0))
    Output0Buffer.Value = CInt(Values(i))
Next

The Split function is used to break the string up into individual columns in an array. The code loops through the array, starting with the second column, since the first one represents the key. For each iteration through the loop, a row is added to the output buffer, and the key and and value are added to the row.


That’s pretty much all there is to it. This technique can be used to handle flat files and columns that contain multiple items (in XML, or as a delimited list).