Archive for February 2008

February CTP for SQL Server 2008

The February CTP for SQL Server 2008 has been released, and is available to download at https://connect.microsoft.com/SQLServer/. Microsoft is also sponsoring a bug bash contest, with prizes for the most bugs, most critical bug, etc. The rules are posted here: https://connect.microsoft.com/SQLServer/content/content.aspx?ContentID=7734.

Bundling Errors in the Data Flow

I’ve posted about handling multiple errors in SSIS before, but that post was addressing it in the context of the control flow. What if you want to capture a series of data errors from the data flow? An example of this might be a requirement to capture and email all the rows that fail in a Lookup transform, so that the reference table can be updated with new values.

There are a number of ways to do this. You can send the error output of the Lookup to a Flat File destination, and then attach that to an email message using the Send Mail task. If you want to avoid writing anything to disk, you could send the error rows to a Recordset destination, and then use a Script Task to put them into a string variable to send. Or you could just write them directly to a string, using a Script destination.

I’ve put together a sample package that shows how to send the error rows to either a flat file or a string variable, and then how to email both. The control flow consists of a Data Flow task, and two Send Mail tasks (you won’t need both, they are just there for illustration purposes).

image

The data flow is fairly basic. The OLE DB Source uses a hard-coded query to return the numbers 1 through 5. The Lookup uses a hard-coded query to return a reference table with 1 and 2, so we expect 3, 4, and 5 to fail. The Row Count transform is used to capture the number of rows in error, so that we know if an email needs to be sent. The Multicast is used so that we can show both methods of handling the error rows. The Write Errors to File destination simply sends the error rows to a flat file.

image

The Write Errors to String Script Destination uses a StringBuilder class to capture each row, then writes it to the errorRows variable at the end of data flow processing. The StringBuilder is more efficient for handling large strings, which is why it is used here. If the package was processing a large number of rows, the error row string could get quite large.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Text
 
Public Class ScriptMain
    Inherits UserComponent
    Dim errorMessages As StringBuilder = New StringBuilder()
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        errorMessages.AppendLine(String.Format("Key value {0} not found in the reference table.", Row.KEYVALUE))
    End Sub
 
    Public Overrides Sub PostExecute()
        Dim vars As IDTSVariables90
 
        Me.VariableDispenser.LockOneForWrite("errorRows", vars)
        vars("errorRows").Value = errorMessages.ToString()
        vars.Unlock()
    End Sub
 
End Class

Back in the control flow, each of the Send Mail tasks has a precedence constraint that also has an expression set on it. The expression is just checking to see if there were any errors. If not, we don’t need to send the emails.

image

The Send Mail – No Attachment task uses the string variable populated by the Script destination in the data flow as the Message Source for the email.

image

The Send Mail – attachment, on the other hand, uses a file as an attachment to the message. There is something to point out with this method. The package has a string variable defined that holds the path and filename of the error file. This variable is used in an expression on the ConnectionString property of the flat file connection manager, and in an expression on the FileAttachments property of this Send Mail task. This lets us set the path in one place, and avoiding changing it in multiple locations if it ever needs to be updated.

image

That’s all there is to it. I’ve uploaded the sample package to my SkyDrive, located here. To run it, you’ll need to update the SMTP server address in the SMTP connection manager, and the ErrorFile variable to point to a valid path on your machine.

Adding Headers and Footers to Flat Files

A common question with SSIS is how to handle adding headers and footers to a flat file when the format is different that the data rows. This gets more difficult when the header or footer needs to contain data based on the actual data rows, such as a total or a record count.


Basically, this can be accomplished by creating the header or footer in a separate task, and then prefixing or appending it to the data. I’ve prepared a sample package that shows a couple of different ways to accomplish this. For generating footers, it shows how to use a second data flow to append an additional row onto the footer. There are two methods shown for adding headers. Method 1 uses a Script task to add the header by loading the file into a string, adding the header, and saving the file again. Method 2 creates a separate header file, and uses a file copy operation to concatenate the two files.


image


There a couple of things to note about the package. There are three flat file connection managers defined in the package.


image


Two of the connection managers (Destination and DestinationWithFooter) point to the same file, but have different column definitions. All three have their connection strings set by expressions that point to variables. If you want to change the target files, you need to modify the variables, not the connection managers.image


The DestinationFile variables holds the path to a working file, which will hold the data before the headers and footers are added. The DestinationHeader variable holds the path to a working file used only for method 2, which holds the header row before it is concatenated into the final file. The FinalFile variable holds the path to the final output file, which includes the header, data, and footer rows.


The Generate File data flow uses a Data Generator component from Konesans (a free download from www.sqlis.com). It simply generates some data, captures the number of rows using a Row Count transform, and outputs it to the Destination connection manager.


The Append Footer and Create a Header File data flows are very similar. They both use a Script source to generate a single row containing the footer or header data, respectively. The script is below.



Public Overrides Sub CreateNewOutputRows()
    Dim recordCount As Integer
    Dim vars As IDTSVariables90
 
    ‘Get the record count
    Me.VariableDispenser.LockOneForRead(“RecordCount”, vars)
    recordCount = CType(vars(“RecordCount”).Value, Integer)
    vars.Unlock()
 
    ‘Output one row with record count
    Output0Buffer.AddRow()
    Output0Buffer.FooterRow = String.Format(“Footer Row Count: {0}”, recordCount)
    Output0Buffer.SetEndOfRowset()
End Sub

The two differ in the way the destination is handled. In the Append Footer data flow, the Flat File destination is set to append to the file, rather than overwriting it. It uses the DestinationWithFooter connection manager, which points to the same file that the data was sent to in the Generate File data flow. The DestinationWithFooter connection manager defines the file as having a single column. That makes it easier to append the footer row directly to the file. After the Append Footer data flow task has run, there is a single file that has the data rows and a footer row.


image


In the Create a Header File data flow, the header row is sent to a new file. Then, in the Append Files execute process task, the header and the data file are concatenated together using a copy command. This is Method 2 for adding the header.


For Method 1, the Add Header script tasks uses the System.IO.File class and the System.Text.StringBuilder class to create a new file containing the header row and the data rows, which already has the footer included.



Public Sub Main()
    Dim fileContents As New StringBuilder()
    Dim vars As Variables
    Dim recordCount As Integer
    Dim finalFile As String
 
    ‘Get the record count
    Dts.VariableDispenser.LockForRead(“RecordCount”)
    Dts.VariableDispenser.LockForRead(“FinalFile”)
    Dts.VariableDispenser.GetVariables(vars)
    recordCount = CType(vars(“RecordCount”).Value, Integer)
    finalFile = CType(vars(“FinalFile”).Value, String)
    vars.Unlock()
 
    ‘Write header, then append file contents and write back out.
    fileContents.AppendLine(String.Format(“Header Row Count 1: {0}”, recordCount))
    fileContents.Append(File.ReadAllText(Dts.Connections(“Destination”).ConnectionString))
    File.WriteAllText(finalFile, fileContents.ToString())
 
    Dts.TaskResult = Dts.Results.Success
End Sub

Hopefully this example is helpful, if you are trying to sort through the options for adding headers and footers. The example is available on my Skydrive here.

Importing Files Using SSIS

A topic that has come up a few times recently is the idea of loading a set of files into a database using SSIS. One of the data flow components in SSIS is the Import Column transform, which allows you to load files into a binary column. There is a great video by Brian Knight that shows how to use it, and I recommend viewing that  (12/01/2013 update: evidently, the video this post originally linked to is no longer available). If you are looking for a quick overview of the component, and a different approach for loading the list of filenames to import, then read on.

 

The Import Column transform works in the data flow, and imports one file for each row that is passed through it. It expects a column that contains the file name to import as an input. It outputs a column of type DT_TEXT, DT_NTEXT, or DT_IMAGE, that contains the file contents.

 

I’ve included a sample package with this post that uses the Import Column transform. It has a single data flow that uses a script component to get the list of files to import.

 

pic1[4]

 

The package has two connection managers, one of which points to a SQL Server database where the files will be stored. The other connection manager is a File connection manager, that is pointed to a folder. This is the folder that we want to import the files from.

 

p1

 

The script component was created as a Source. A single output column of type DT_WSTR was added to contain the filenames.

 

1

 

On the connection managers page, the File connection manager is specified so that it can be accessed from the script.

 

1[4]

 

The script uses the Directory class from the System.IO namespace. By calling the GetFiles method, the code can iterate through all of the files in the directory, and output one row for each file.

 

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO

Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub CreateNewOutputRows()
        Dim fileName As String

        For Each fileName In Directory.GetFiles(Me.Connections.ImportFilesDir.AcquireConnection(Nothing).ToString())
            Output0Buffer.AddRow()
            Output0Buffer.Filename = fileName
        Next

        Output0Buffer.SetEndOfRowset()
    End Sub

End Class

 

The next component is the Import Columns component. Configuring it can be a little difficult. I found the Books Online documentation a little unclear on what had to be done. On the Input Columns tab, the column that contains the filename (including path) to import needs to be selected.

 

1[1]

 

On the Input and Output Properties tab, a new column was added to hold the binary contents of the file. When adding this column, make a note of the LineageID value, as it needs to be used in the next step.

 

1[1]

 

After adding the output column, the input column (that contains the filename, not the file contents), needs to be selected. The LinageID from the previous step needs to be put into the FileDataColumnID property. This tells the component which column to populate with the file contents.

 

1[3]

 

The OLE DB Destination is fairly straightforward, as it just maps the columns from the data flow to the database.

 

Hopefully this helps if you are working with the Import Column transform. The samples are located on my Skydrive.