Flat Files – Agile BI http://agilebi.com A community for sharing ideas about business intelligence development using agile methods. Sun, 28 Nov 2010 20:29:33 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 Processing Flat Files with SSIS at SQLSaturday #61 http://agilebi.com/blog/2010/11/28/processing-flat-files-with-ssis-at-sqlsaturday-61/ Sun, 28 Nov 2010 20:29:33 +0000 http://6.1424 Continue reading ]]> I’m looking forward to speaking at SQLSaturday #61 (#sqlsat61) in Washington, DC this coming weekend (12/4/2010). As usual for a SQLSaturday, there are some great speakers scheduled to present.

I’ll be presenting on “Processing Flat Files with SSIS”, which goes over some tips and tricks for working with flat files. It’s always a fun session to give, as everyone seems to have encountered some troublesome flat files at some point in their career.

If you are in attendance at the event, please feel free to stop by and say hello.

]]>
Handling Flat Files with Varying Numbers of Columns http://agilebi.com/blog/2007/05/08/handling-flat-files-with-varying-numbers-of-columns/ Tue, 08 May 2007 02:00:00 +0000 http://6.1252 Continue reading ]]> 5/15 Update – I added Part 2 to show how to do the same thing with a Conditional Split and a Derived Column transform, per Jamie’s feedback (see the comments on this post).
A common question on the forums has been how to handle flat files that have a varying number of columns. For example, one row contains 3 columns, and another row may contain on two columns. The example below shows a sample file that uses a comma to delimit the columns, and a cursor return / line feed to delimit the row.
TestValue1,100,12/01/2007
TestValue2,200
TestValue3,300,12/01/2007
TestValue4,400,12/01/2007
TestValue5,500
TestValue6,600,12/01/2007
TestValue7,700,12/01/2007
TestValue8,800
TestValue9,900,12/01/2007
TestValue0,1000,12/01/2007

SSIS does not handle this scenario easily, due to the way it parses flat files. It parses by looking for the next column delimiter. The row delimiter is just the column delimiter for the last defined column. So, on our second line in the sample file, SSIS is looking for a comma instead of a CR/LF. The result of this is that the third row ends up combined with the second row, and we get something that looks like this:

I’m not going to go into a discussion about whether this is good or bad. This article is about how to work around it. If you’d like to see it changed in future versions of SSIS, please go to Connect (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=124478) and vote for it to be changed.
Now, onto the workaround. First, I’ve defined a flat file connection manager that treats each row as one column. I’m using the row delimiter (CR/LF) as the column delimiter.

If you are following along, your flat file should preview like this:
 
Next, in a data flow, I’ve added a flat file source that uses the connection manager. It is connected to a script component that is set as a Transform. The Line column is checked as an input.

In the Inputs and Outputs area, I’ve added three columns, for the three real columns in my flat file, and set the data types appropriately.

Finally, I added the following script to the task:
Public Class ScriptMain
    Inherits UserComponent
    Private columnDelimiter() As Char = CType(“,”, Char())

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As
        Input0Buffer)
        Dim rowValues As String()

        rowValues = Row.Line.Split(columnDelimiter)
        If rowValues.GetUpperBound(0) < 2 Then
            ‘Row is not complete – Handle error
            Row.Name_IsNull = True
            Row.Number_IsNull = True
            Row.Date_IsNull = True
        Else
            Row.Name = rowValues.GetValue(0).ToString()
            Row.Number = Convert.ToInt32(rowValues.GetValue(1))
            Row.Date = Convert.ToDateTime(rowValues.GetValue(2))
        End If
    End Sub

End Class
The columnDelimiter variable holds the value for the column delimiter – a comma in my case. The Split function parses the value contained in Line (the single column defined in the connection manager) and returns an array containing one element for each column in it. Since I’m expecting 3 columns, I’m performing a check to see if the array contains all three columns (.NET uses 0-based array indexes). If columns are missing, I have an error that needs to be handled. In this example, I am simply setting all my column values to NULL. The error handling could be enhanced by redirecting the rows to an error output, but I wanted to keep things simple. With this method, I could use a conditional split to filter out the rows with NULL.
Finally, if the correct number of columns are present, I’m setting the output columns created earlier with the values from the array. Notice that the Convert is necessary to make sure the value is the correct type.
That’s pretty much it. Depending on your needs, you may need to customize the script a bit to better handle error conditions, or reparsing the columns. I’ve attached the sample package and text file below. The sample is using the Trash Destination from Konesans, which you can download from www.SQLIS.com

Sample files here

As always, feedback is appreciated.

]]>