Posts tagged ‘Error Handling’

Error Redirection with the OLE DB Destination

A question that comes up occasionally is how to handle errors that occur on the OLE DB Destination in the data flow. Errors that occur when using the OLE DB Destination to insert into a table are usually caused by primary key, foreign key, or check constraints violations. When one of these errors occurs, the default behavior of the OLE DB Destination is to fail and not insert any rows, even the good ones. So the question is, "How can you insert the good rows, while redirecting the bad rows somewhere else?"

There is always the option of preprocessing the data prior to sending it to the OLE DB Destination, to ensure that no constraints will be violated. However, this usually means that you are incurring the validation cost twice – once up front, and then again before you insert into the table. You can drop and recreate the constraints on the table, but this incurs some performance penalties itself. As usual with anything dealing with performance, your mileage may vary, so the best way to see which performs best for you is to test it in your environment.

Another approach is to leave the constraints intact on the destination table, and handle the constraint violations through error redirection. You can enable error redirection on the OLE DB Destination, but you have to change the "Maximum insert commit size" property to make it work. By default, this value is set so that the entire set of rows going to the database is committed at once. A failure to insert one rows means that no rows will be inserted.

image

If you set this to a value of one, the OLE DB Destination will only try to commit a single row at time. The problem with this is that single row inserts are painfully slow. So you probably want to set this value considerably higher, between 10,000 and 100,000 rows. You may have to try a few different values to determine what works best in your environment.

Now that you have the OLE DB Destination set up to commit multiple, smaller batches instead of one huge batch, you can enable error redirection. However, it will still redirect the entire batch of records that contain an error, not just the individual rows in error. For example, if you defined a batch size of 10,000 rows, and one row in the 10,000 row batch has an error, they will all be redirected to the error output. So how do you set it up to get all the good rows inserted, and get down to just the rows in error? We really want it to work on single row batches for error handling purposes, but as pointed out earlier, that can be very slow.

There is an approach that allows you to get the best of both worlds – single row error handling with good performance. It involves staging the inserts through multiple OLE DB Destinations, each with smaller batch sizes. The last one in the process should have a commit size of 1, so that you get single rows on error redirection.

To illustrate this technique, I’ve created a sample project. It has a few external dependencies that need to be set up before it will run. You need a database with three tables in it. One is a Numbers table, which are incredibly useful for quickly generating sample data. If you don’t have one, please take a look at this post from Adam Machanic to see why you need one. The second table is used to test inserting data into a table with constraints.

CREATE TABLE [dbo].[SampleTable](
    [SampleTableID] [int] NOT NULL,
    [Name] [varchar](50) NULL,
    [Description] [varchar](250) NULL,
    [CheckValue] [int] NULL,
 CONSTRAINT [PK_SampleTable] PRIMARY KEY CLUSTERED 
([SampleTableID] ASC )) ON [PRIMARY]
GO
 
ALTER TABLE [dbo].[SampleTable]  WITH CHECK ADD  CONSTRAINT [CK_SampleTable] CHECK  (([CheckValue]>=(0) AND [CheckValue]<=(9)))
GO

 

The third table is a duplicate of the second table, but with no constraints. This table is used to catch the error rows.

CREATE TABLE [dbo].[SampleTableError](
    [SampleTableID] [int] NOT NULL,
    [Name] [varchar](50) NULL,
    [Description] [varchar](250) NULL,
    [CheckValue] [int] NULL
) ON [PRIMARY]
GO

The package control flow truncates the destination table (SampleTable), and then runs a data flow.

image

The data flow consists of a source that generates 5000 rows of data, with two of the rows violating constraints on the destination table. The OLE DB Source is connected to a OLE DB Destination that is pointed to the SampleTable, with a Maximum Insert Commit Size set to 1000 rows. The error output is connected to a second OLE DB Destination, that inserts into the same table (SampleTable). The second destination has it’s Maximum Insert Commit Size set to 1. Finally, the error output from that OLE DB Destination is connected to a third one, which inserts the error rows into SampleTableError.

image

The first OLE DB Destination attempts to insert a batch, giving us good performance. If the batch is successful, it moves on to the next batch. This keeps the inserts happening quickly. However, if the batch has one or more error rows, the entire batch is redirected to the second OLE DB Destination. This one inserts rows one at a time, giving us the single row error handling that we wanted. Good rows are inserted, but the error rows are redirected to the third OLE DB Destination to be written to an error table. It doesn’t have to be an OLE DB Destination to handle the error rows, you could use a Flat File Destination or a script component to process them.

You can modify this technique by using additional stages of OLE DB Destinations (for example, to go from 50,000 rows to 10,000 to 1) but I try not to go beyond three levels to keep things understandable. Your needs may vary, depending on the performance and number of rows you are processing.

I’ve uploaded the sample package to my SkyDrive. This one is done with SSIS 2008, but the same technique works in 2005.

Bundling Errors in the Data Flow

I’ve posted about handling multiple errors in SSIS before, but that post was addressing it in the context of the control flow. What if you want to capture a series of data errors from the data flow? An example of this might be a requirement to capture and email all the rows that fail in a Lookup transform, so that the reference table can be updated with new values.

There are a number of ways to do this. You can send the error output of the Lookup to a Flat File destination, and then attach that to an email message using the Send Mail task. If you want to avoid writing anything to disk, you could send the error rows to a Recordset destination, and then use a Script Task to put them into a string variable to send. Or you could just write them directly to a string, using a Script destination.

I’ve put together a sample package that shows how to send the error rows to either a flat file or a string variable, and then how to email both. The control flow consists of a Data Flow task, and two Send Mail tasks (you won’t need both, they are just there for illustration purposes).

image

The data flow is fairly basic. The OLE DB Source uses a hard-coded query to return the numbers 1 through 5. The Lookup uses a hard-coded query to return a reference table with 1 and 2, so we expect 3, 4, and 5 to fail. The Row Count transform is used to capture the number of rows in error, so that we know if an email needs to be sent. The Multicast is used so that we can show both methods of handling the error rows. The Write Errors to File destination simply sends the error rows to a flat file.

image

The Write Errors to String Script Destination uses a StringBuilder class to capture each row, then writes it to the errorRows variable at the end of data flow processing. The StringBuilder is more efficient for handling large strings, which is why it is used here. If the package was processing a large number of rows, the error row string could get quite large.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Text
 
Public Class ScriptMain
    Inherits UserComponent
    Dim errorMessages As StringBuilder = New StringBuilder()
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        errorMessages.AppendLine(String.Format("Key value {0} not found in the reference table.", Row.KEYVALUE))
    End Sub
 
    Public Overrides Sub PostExecute()
        Dim vars As IDTSVariables90
 
        Me.VariableDispenser.LockOneForWrite("errorRows", vars)
        vars("errorRows").Value = errorMessages.ToString()
        vars.Unlock()
    End Sub
 
End Class

Back in the control flow, each of the Send Mail tasks has a precedence constraint that also has an expression set on it. The expression is just checking to see if there were any errors. If not, we don’t need to send the emails.

image

The Send Mail – No Attachment task uses the string variable populated by the Script destination in the data flow as the Message Source for the email.

image

The Send Mail – attachment, on the other hand, uses a file as an attachment to the message. There is something to point out with this method. The package has a string variable defined that holds the path and filename of the error file. This variable is used in an expression on the ConnectionString property of the flat file connection manager, and in an expression on the FileAttachments property of this Send Mail task. This lets us set the path in one place, and avoiding changing it in multiple locations if it ever needs to be updated.

image

That’s all there is to it. I’ve uploaded the sample package to my SkyDrive, located here. To run it, you’ll need to update the SMTP server address in the SMTP connection manager, and the ErrorFile variable to point to a valid path on your machine.

Handling Multiple Errors in SSIS Revisited

Previously, I posted about how multiple errors in SSIS can be combined into a single unit for emailing or logging. Recently, a thread on the MSDN forums pointed out that the example wasn’t quite complete. While it does show how to collect all the error messages and get them into a single string, it doesn’t show how to email or log that string only once. So I’m giving it another shot, and show two options for handling this. This post does build on the previous one, so please read it before proceeding.


If you download the sample package in the previous post and examine it, you’ll see that there is a script task in the OnPostExecute event. This script task is where the error messages are assembled, and the code to deliver the bundled error message would be placed. Please note – the script does not actually email or log the message, that was left open for the reader to implement.


If you run the package, you’ll notice that the OnPostExecute event fires twice. It’s firing once for the data flow, and once for the package. Since the OnPostExecute event handler is where the script is located, this causes the message to be delivered twice – not what we want. To work around this, we need to make sure the OnPostExecute only fires once. One way to handle this is to set the Propagate system variable to false. The Propagate variable controls whether the event is raised to the next container. In this case, if Propagate is set to false on the data flow’s OnPostExecute event, the package’s OnPostExecute will only be fired once.


image


However, this approach requires that all tasks in the package need to have the Propagate variable set to false in their OnPostExecute event handlers. To work around this, I recommend incorporating a single Sequence Container, with the Propagate variable set to false on it’s OnPostExecute. Any tasks in the package should be added inside the Sequence Container. If you do this, the Propagate variable only needs to be set once.


image


Another approach is to eliminate the use of the OnPostExecute event altogether. To do this, the Script Task needs to be moved to the control flow, and the OnPostExecute event handler can be deleted altogether. A Sequence Container should still be used to hold all of the tasks that need to be grouped for error handling. The Script Task should be connected to the Sequence Container with a Failure constraint.


image


This approach also ensures that the Script Task will only execute once, if there is a failure in the Sequence Container.


Hopefully, this helps clarify how to leverage collecting the error messages to only send a single error result. I’ve uploaded the samples here, so please feel free to take a look at them and let me know if you have questions.

Handling Multiple Errors in SSIS

[edited on 12/14/2007 to correct an error in the text around string handling - the samples were not modified]
One actual failure in SSIS can trigger a whole series of error messages. For example, failure to convert a column value from a string to an integer in a Derived Column transform generates the following messages:
[Data Conversion [70]] Error: Data conversion failed while converting column “Fiscal year” (18) to column “NumericFiscalYear” (83). The conversion returned status value 2 and status text “The value could not be converted because of a potential loss of data.”.
[Data Conversion [70]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The “output column “NumericFiscalYear” (83)” failed because error code 0xC020907F occurred, and the error row disposition on “output column “NumericFiscalYear” (83)” specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
[DTS.Pipeline] Error: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component “Data Conversion” (70) failed with error code 0xC0209029. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
[DTS.Pipeline] Error: SSIS Error Code DTS_E_THREADFAILED. Thread “WorkThread0″ has exited with error code 0xC0209029. There may be error messages posted before this with more information on why the thread has exited.
If you are logging errors to a flat file or an error logging table, then recording each error may be fine. However, if you’re writing the errors to the Windows event log, or sending them via email, you may not want to record multiple messages each time an error occurs. You might want to record only the first message, or you might want to group all the errors into a single log entry or email. Fortunately, the event model in SSIS allows you to easily customize how errors are handled.
I’ve put together a small sample package that shows how you might accomplish this. The package contains a single data flow that loads a text file, attempts to convert a column from string to numeric, and writes it to a Trash destination (see www.SQLIS.com to get this component).
The text file has an invalid value in one of the columns, which will cause the data flow to fail, and generate the four messages listed above. The package is set up to capture all of the error messages generated, store them in a collection, and concatenate them into a single string when the package is finished executing. Once that is done, the resulting string could be emailed or recorded to a log.
As mentioned, the data flow is very straightforward:

I’ve also created two variables at the package level: errorMessages as an Object, and emailText as a String. I’ll explain why later in the post.
The real work occurs in the event handlers. SSIS raises events for all executables(packages and tasks are both executables). The event we’re interested in is the OnError event, which is raised once for each error that occurs.
You get to the event handlers by selecting the Event Handlers tab in the SSIS designer. Once there, the Executable for which you want to capture events needs to be selected.

Since I want to handle errors for anything in the package, I’m setting the executable to CaptureErrors (the name of the package). By default, any event raised by a child executable (that is, an executable that is nested inside another executable) will also be raised in its parent. You can disable that behavior by setting the Propagate system variable, but that’s a topic for another post. I’m also using “OnError” from the list of events and have added a Script Task to the event handler.

The Script Task has two variables passed in: the System::ErrorDescription, which contains the text of the error message, and User:errorMessages, which I’ll use to keep track of all the error messages.

Here’s the script used in the Script Task:
Dim messages As Collections.ArrayList
Try
messages = CType(Dts.Variables(“errorMessages”).Value, Collections.ArrayList)
Catch ex As Exception
messages = New Collections.ArrayList()
End Try

messages.Add(Dts.Variables(“ErrorDescription”).Value.ToString())
Dts.Variables(“errorMessages”).Value = messages
Dts.TaskResult = Dts.Results.Success
I’m first attempting to retrieve the ArrayList from the errorMessages variable. If the value of the variable can’t be cast to an ArrayList, it indicates that it hasn’t been initialized yet. Once that has been handled, the error description is added to the ArrayList. This handles capturing the list of all error messages.
The next step is to process all the messages in order to email or log them. Since I only want to do this once, I’m using the OnPostExecute event, which fires when the executable is finished running.

There is another Script Task present in this event handler. This one has the User::errorMessages and User:emailText variables passed in.

The script in this task is concatenating a long string based on the error messages captured and returning it in the emailText variable:
Dim errorDesc As String
Dim messages As Collections.ArrayList

Try
messages = CType(Dts.Variables(“errorMessages”).Value, Collections.ArrayList)
Catch ex As Exception
‘If there is an exception – the object was never initialized, so there were no errors
Return
End Try

For Each errorDesc In messages
Dts.Variables(“emailText”).Value = Dts.Variables(“emailText”).Value.ToString + errorDesc + vbCrLf
Next

Dts.TaskResult = Dts.Results.Success
Once that has been done, the resulting string could be emailed or logged as desired. Since SSIS can generate fairly verbose error messages, I chose to store the messages in an ArrayList object. I then use the ArrayList to build the actual string message.
Hopefully, this information is helpful to anyone who wants to customize SSIS event handling. I’ve attached the sample package, and the text file used in the sample. If you have any feedback or suggestions for improvement, please leave them in the comments.