Importing Files Using SSIS

A topic that has come up a few times recently is the idea of loading a set of files into a database using SSIS. One of the data flow components in SSIS is the Import Column transform, which allows you to load files into a binary column. There is a great video by Brian Knight that shows how to use it, and I recommend viewing that  (12/01/2013 update: evidently, the video this post originally linked to is no longer available). If you are looking for a quick overview of the component, and a different approach for loading the list of filenames to import, then read on.

 

The Import Column transform works in the data flow, and imports one file for each row that is passed through it. It expects a column that contains the file name to import as an input. It outputs a column of type DT_TEXT, DT_NTEXT, or DT_IMAGE, that contains the file contents.

 

I’ve included a sample package with this post that uses the Import Column transform. It has a single data flow that uses a script component to get the list of files to import.

 

pic1[4]

 

The package has two connection managers, one of which points to a SQL Server database where the files will be stored. The other connection manager is a File connection manager, that is pointed to a folder. This is the folder that we want to import the files from.

 

p1

 

The script component was created as a Source. A single output column of type DT_WSTR was added to contain the filenames.

 

1

 

On the connection managers page, the File connection manager is specified so that it can be accessed from the script.

 

1[4]

 

The script uses the Directory class from the System.IO namespace. By calling the GetFiles method, the code can iterate through all of the files in the directory, and output one row for each file.

 

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO

Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub CreateNewOutputRows()
        Dim fileName As String

        For Each fileName In Directory.GetFiles(Me.Connections.ImportFilesDir.AcquireConnection(Nothing).ToString())
            Output0Buffer.AddRow()
            Output0Buffer.Filename = fileName
        Next

        Output0Buffer.SetEndOfRowset()
    End Sub

End Class

 

The next component is the Import Columns component. Configuring it can be a little difficult. I found the Books Online documentation a little unclear on what had to be done. On the Input Columns tab, the column that contains the filename (including path) to import needs to be selected.

 

1[1]

 

On the Input and Output Properties tab, a new column was added to hold the binary contents of the file. When adding this column, make a note of the LineageID value, as it needs to be used in the next step.

 

1[1]

 

After adding the output column, the input column (that contains the filename, not the file contents), needs to be selected. The LinageID from the previous step needs to be put into the FileDataColumnID property. This tells the component which column to populate with the file contents.

 

1[3]

 

The OLE DB Destination is fairly straightforward, as it just maps the columns from the data flow to the database.

 

Hopefully this helps if you are working with the Import Column transform. The samples are located on my Skydrive.

13 Comments

  1. Xaq says:

    hi this is some thing i was looking for but my package is slightly different
    it needs file directories(which your package is giving with the file names)
    and i want file names for each one in a seperate column.

  2. Smith says:

    Hi,

    I hope this make sense,I have one question about the Error handling, I want to track all the bad rows which handle through error redirect to file or table but I want to create some functionality which automatically use the same error output file or table in the same package after correcting the errors apart and process it from the already processed data.

    Please help me out.

    Regards,
    Smith

  3. Bill says:

    I’ve used this and it works great. I was wondering if there is a way of also capturing the file datetime stamp to store with the file?

    • jwelch says:

      Glad it works for you. You can capture the Last Modified time for the file by adding a new output column to the Script Source for the timestamp, and retrieve it using this code in the script:

      DateTime lastModifiedDate = File.GetLastWriteTime(filename);

      Then you can just copy the lastModifiedDate value to the new output column.

  4. I think the video link is a bit out of date, it seems to direct to some sort of badly written english spam page.

  5. Phil says:

    This is a great tutorial, if a little light on detail, making it a bit hard going for an SSIS novice like myself. However, I’m getting an error when I try to execute the package, telling me that the column can’t be inserted because the conversion between types DT_WSTR and DT_IMAGE is not supported. I assumed this was because your tutorial is written on the premise that the files being imported are images, whereas mine are Word documents, so I changed the datatype of the Output Column in the Import Column component to DT_NTEXT. However, I still get the exact same error message (referring to DT_IMAGE, even though I’ve changed the datatype). And actually, my theory about why the package failed can’t have been correct anyway because the message is explicitly telling me that the conversion between DT_WSTR and DT_IMAGE isn’t possible, yet that’s the conversion type your tutorial seems to describe?

    Like I said, I’m a total novice at this, so I’m probably just being stupid!

    • jwelch says:

      Word docs would still be stored in a DT_IMAGE column – DT_NTEXT would be for Unicode text information. Work documents are stored in a binary format (zipped XML), which SSIS treats as the DT_IMAGE type.

      It sounds like something is misconfigured in the component – are you sure you used the right lineage IDs? My example isn’t actually converting any types. It tells SSIS that it should use the content of the filename column to look for a file on your disk, and read the binary content of the file into the file column. There is no data conversion happening.

      • Phil says:

        I changed it to DT_IMAGE but it still didn’t work. However, I seem to have it working simply by putting the code that’s run in the For Each… Next loop inside a Try…Catch statement. I haven’t even put anything in the Catch statement, but this alone seems to have allowed the package to execute successfully and, to my surprise after adding the Try-Catch, every file has been imported into the database.

        I’m now trying to figure out how to get the transformation to also include the file name and extension as strings to be inserted into columns associated with file record. Doesn’t seem to be anywhere near as easy as I would have hoped!

        Thank for replying, by the way!

Leave a Reply