Error Redirection with the OLE DB Destination

A question that comes up occasionally is how to handle errors that occur on the OLE DB Destination in the data flow. Errors that occur when using the OLE DB Destination to insert into a table are usually caused by primary key, foreign key, or check constraints violations. When one of these errors occurs, the default behavior of the OLE DB Destination is to fail and not insert any rows, even the good ones. So the question is, "How can you insert the good rows, while redirecting the bad rows somewhere else?"

There is always the option of preprocessing the data prior to sending it to the OLE DB Destination, to ensure that no constraints will be violated. However, this usually means that you are incurring the validation cost twice – once up front, and then again before you insert into the table. You can drop and recreate the constraints on the table, but this incurs some performance penalties itself. As usual with anything dealing with performance, your mileage may vary, so the best way to see which performs best for you is to test it in your environment.

Another approach is to leave the constraints intact on the destination table, and handle the constraint violations through error redirection. You can enable error redirection on the OLE DB Destination, but you have to change the "Maximum insert commit size" property to make it work. By default, this value is set so that the entire set of rows going to the database is committed at once. A failure to insert one rows means that no rows will be inserted.

image

If you set this to a value of one, the OLE DB Destination will only try to commit a single row at time. The problem with this is that single row inserts are painfully slow. So you probably want to set this value considerably higher, between 10,000 and 100,000 rows. You may have to try a few different values to determine what works best in your environment.

Now that you have the OLE DB Destination set up to commit multiple, smaller batches instead of one huge batch, you can enable error redirection. However, it will still redirect the entire batch of records that contain an error, not just the individual rows in error. For example, if you defined a batch size of 10,000 rows, and one row in the 10,000 row batch has an error, they will all be redirected to the error output. So how do you set it up to get all the good rows inserted, and get down to just the rows in error? We really want it to work on single row batches for error handling purposes, but as pointed out earlier, that can be very slow.

There is an approach that allows you to get the best of both worlds – single row error handling with good performance. It involves staging the inserts through multiple OLE DB Destinations, each with smaller batch sizes. The last one in the process should have a commit size of 1, so that you get single rows on error redirection.

To illustrate this technique, I’ve created a sample project. It has a few external dependencies that need to be set up before it will run. You need a database with three tables in it. One is a Numbers table, which are incredibly useful for quickly generating sample data. If you don’t have one, please take a look at this post from Adam Machanic to see why you need one. The second table is used to test inserting data into a table with constraints.

CREATE TABLE [dbo].[SampleTable](
    [SampleTableID] [int] NOT NULL,
    [Name] [varchar](50) NULL,
    [Description] [varchar](250) NULL,
    [CheckValue] [int] NULL,
 CONSTRAINT [PK_SampleTable] PRIMARY KEY CLUSTERED 
([SampleTableID] ASC )) ON [PRIMARY]
GO
 
ALTER TABLE [dbo].[SampleTable]  WITH CHECK ADD  CONSTRAINT [CK_SampleTable] CHECK  (([CheckValue]>=(0) AND [CheckValue]<=(9)))
GO

 

The third table is a duplicate of the second table, but with no constraints. This table is used to catch the error rows.

CREATE TABLE [dbo].[SampleTableError](
    [SampleTableID] [int] NOT NULL,
    [Name] [varchar](50) NULL,
    [Description] [varchar](250) NULL,
    [CheckValue] [int] NULL
) ON [PRIMARY]
GO

The package control flow truncates the destination table (SampleTable), and then runs a data flow.

image

The data flow consists of a source that generates 5000 rows of data, with two of the rows violating constraints on the destination table. The OLE DB Source is connected to a OLE DB Destination that is pointed to the SampleTable, with a Maximum Insert Commit Size set to 1000 rows. The error output is connected to a second OLE DB Destination, that inserts into the same table (SampleTable). The second destination has it’s Maximum Insert Commit Size set to 1. Finally, the error output from that OLE DB Destination is connected to a third one, which inserts the error rows into SampleTableError.

image

The first OLE DB Destination attempts to insert a batch, giving us good performance. If the batch is successful, it moves on to the next batch. This keeps the inserts happening quickly. However, if the batch has one or more error rows, the entire batch is redirected to the second OLE DB Destination. This one inserts rows one at a time, giving us the single row error handling that we wanted. Good rows are inserted, but the error rows are redirected to the third OLE DB Destination to be written to an error table. It doesn’t have to be an OLE DB Destination to handle the error rows, you could use a Flat File Destination or a script component to process them.

You can modify this technique by using additional stages of OLE DB Destinations (for example, to go from 50,000 rows to 10,000 to 1) but I try not to go beyond three levels to keep things understandable. Your needs may vary, depending on the performance and number of rows you are processing.

I’ve uploaded the sample package to my SkyDrive. This one is done with SSIS 2008, but the same technique works in 2005.

16 Comments

  1. http:// says:

    In my experience, this works very well unless you turn on Transactions on the package–i.e. set the Transaction property to Required. In my experience, if you turn on transactions, then you will get an error because of the redirection to the second ole db destination.

    Am I correct? If so, is there a way to get the benefit of your solution and get transaction support as well?

  2. jwelch says:

    Hmmm, I rarely use transactions in SSIS. I prefer to control them myself for a number of reasons (which would probably make for a good post itself).

    I’ll have to do some testing. It doesn’t surprise me that it doesn’t work with transactions, but off the top of my head, I’m not sure if there is a workaround.

  3. Sreehari says:

    Hi,

    One thing i dint understand is,Instead of two OLEDB Destination we can have one OLEDB Destination with the Commit size as 1. This also works fine. Why do we need two Destination pointing to same destination with different commit size. Can you please explain.

    • jwelch says:

      Because a commit size of 1 means you are performing single row inserts – which can be very slow. By stepping from a larger commit size to a smaller one, your get fast inserts for most batches, and only single row inserts for the batch that contains the errors.

  4. thelondon says:

    Always been frustrated with the batch fail problem, and have always used your approach.
    Have a flat file (270K rows) and im inserting into a SQL ole db destination
    Mics=10000 rows/batch = 10000 and redirecting into error table.
    Am finding that only the 5 rows fail and not a 10k batch. (Was expecting whole batch to fail)
    Have MS fixed this in 2008??? Am very puzzled

  5. Oleg says:

    I guess it is good solution. I use it in my packages. But i have a problem with truncation of data. When it happens the data are not redirected to error output ^(

  6. Satish says:

    Thank you so much :)
    It worked for me very well.

    Thank You again,
    Satish

  7. sairam says:

    hi ,

    i am currently using transaction in a seq container.THe DFT load in the sequence container has to load 4 lac records.
    I am using Fast load in the destination and would like to redirect error rows to another table.
    however I am getting an error while doing this.
    Any work arounds possible to implement redirect rows.

  8. sairam says:

    HI
    I tried using the above mentioned method.
    however when the error row is encountered the ELT goes into an infinite loop.
    Any suggestion.

  9. Frank says:

    Great way to deal with it!!

    Thank you!

  10. Naresh says:

    Thanks Johh!, we implemented this approach in one of our packages and it worked great!

Leave a Reply