Agile BI

The Microsoft BI Conference

Posted on September 18, 2008 by jwelch

I’m doing a couple of chalk talks at the Microsoft BI Conference, occurring in Seattle, WA on October 6-8. The chalk talks are a different format than the typical presentation: smaller audience, no (or very few) slides, and more conversational in format. I’m doing one on unit testing for SSIS, and another on building custom components for SSIS. I really enjoy the chalk talk format, and I’m looking forward to some good conversations. If you are planning on being at the conference, please stop by.

Posted in Uncategorized | Comments Off

SQL Saturday in Greenville, SC

Posted on September 18, 2008 by jwelch

There’s a SQL Saturday coming up in Greenville, SC on October 11th. It’s a free, day long event with a number of great speakers doing sessions. I don’t include myself in the great speaker category, but I was asked to present a couple of sessions on SSIS, including one on data profiling in the ETL process, one of my favorite topics.

Again, it’s free and there’s some great content, so if you are in the area or close by, you should check it out.

Posted in Uncategorized | Comments Off

The SSWUG BI Virtual Conference Is Moving…

Posted on September 18, 2008 by jwelch

but not very far, just to Nov. 5-7. In order to provide more value to attendees, the BI conference is being run at the same time as the SharePoint, SQL, and .NET conferences. So now, instead of just having access to the BI content, you’ll be able to pick and choose session from all four conferences. Personally, I love the idea of 3 days of solid BI content, but if you like a little more variety, this should be great.

Check in at http://www.vconferenceonline.com/Business-Intelligence/ to see the updated information.

Posted in Uncategorized | Comments Off

Presenting at the Triad SQL Server User Group

Posted on September 16, 2008 by jwelch

I’m presenting at the Triad SQL Server User Group tomorrow night on Migrating to Reporting Services 2008. It was a lively group last time I was there, and I’m looking forward to visiting them again. If you are in the area, please stop by.

Here’s the abstract for the presentation:

With the new and enhanced capabilities that Reporting Services 2008 offers, more and more companies are evaluating a migration from other reporting technologies to Reporting Services. This session will cover the commonly needed steps in a migration process from both project planning and technical delivery perspectives. Commonly used patterns in migrations will also be discussed. There will be specific focus on how ad-hoc reporting environments can be set up while leveraging the new features in 2008.

They are currently having some problems with their site (at www.triadsql.com) but if you need directions, you can get them from the Triad Developers Group site here: http://www.triaddev.org/directions/. They meet at the same location.

Posted in Uncategorized | Comments Off

Error Redirection with the OLE DB Destination

Posted on September 5, 2008 by jwelch

A question that comes up occasionally is how to handle errors that occur on the OLE DB Destination in the data flow. Errors that occur when using the OLE DB Destination to insert into a table are usually caused by primary key, foreign key, or check constraints violations. When one of these errors occurs, the default behavior of the OLE DB Destination is to fail and not insert any rows, even the good ones. So the question is, "How can you insert the good rows, while redirecting the bad rows somewhere else?"

There is always the option of preprocessing the data prior to sending it to the OLE DB Destination, to ensure that no constraints will be violated. However, this usually means that you are incurring the validation cost twice – once up front, and then again before you insert into the table. You can drop and recreate the constraints on the table, but this incurs some performance penalties itself. As usual with anything dealing with performance, your mileage may vary, so the best way to see which performs best for you is to test it in your environment.

Another approach is to leave the constraints intact on the destination table, and handle the constraint violations through error redirection. You can enable error redirection on the OLE DB Destination, but you have to change the "Maximum insert commit size" property to make it work. By default, this value is set so that the entire set of rows going to the database is committed at once. A failure to insert one rows means that no rows will be inserted.

If you set this to a value of one, the OLE DB Destination will only try to commit a single row at time. The problem with this is that single row inserts are painfully slow. So you probably want to set this value considerably higher, between 10,000 and 100,000 rows. You may have to try a few different values to determine what works best in your environment.

Now that you have the OLE DB Destination set up to commit multiple, smaller batches instead of one huge batch, you can enable error redirection. However, it will still redirect the entire batch of records that contain an error, not just the individual rows in error. For example, if you defined a batch size of 10,000 rows, and one row in the 10,000 row batch has an error, they will all be redirected to the error output. So how do you set it up to get all the good rows inserted, and get down to just the rows in error? We really want it to work on single row batches for error handling purposes, but as pointed out earlier, that can be very slow.

There is an approach that allows you to get the best of both worlds – single row error handling with good performance. It involves staging the inserts through multiple OLE DB Destinations, each with smaller batch sizes. The last one in the process should have a commit size of 1, so that you get single rows on error redirection.

To illustrate this technique, I’ve created a sample project. It has a few external dependencies that need to be set up before it will run. You need a database with three tables in it. One is a Numbers table, which are incredibly useful for quickly generating sample data. If you don’t have one, please take a look at this post from Adam Machanic to see why you need one. The second table is used to test inserting data into a table with constraints.

CREATE TABLE [dbo].[SampleTable](

    [SampleTableID] [int] NOT NULL,

    [Name] [varchar](50) NULL,

    [Description] [varchar](250) NULL,

    [CheckValue] [int] NULL,

 CONSTRAINT [PK_SampleTable] PRIMARY KEY CLUSTERED

([SampleTableID] ASC )) ON [PRIMARY]

GO

ALTER TABLE [dbo].[SampleTable]  WITH CHECK ADD  CONSTRAINT [CK_SampleTable] CHECK  (([CheckValue]>=(0) AND [CheckValue]<=(9)))

GO

The third table is a duplicate of the second table, but with no constraints. This table is used to catch the error rows.

CREATE TABLE [dbo].[SampleTableError](

    [SampleTableID] [int] NOT NULL,

    [Name] [varchar](50) NULL,

    [Description] [varchar](250) NULL,

    [CheckValue] [int] NULL

) ON [PRIMARY]

GO

The package control flow truncates the destination table (SampleTable), and then runs a data flow.

The data flow consists of a source that generates 5000 rows of data, with two of the rows violating constraints on the destination table. The OLE DB Source is connected to a OLE DB Destination that is pointed to the SampleTable, with a Maximum Insert Commit Size set to 1000 rows. The error output is connected to a second OLE DB Destination, that inserts into the same table (SampleTable). The second destination has it’s Maximum Insert Commit Size set to 1. Finally, the error output from that OLE DB Destination is connected to a third one, which inserts the error rows into SampleTableError.

The first OLE DB Destination attempts to insert a batch, giving us good performance. If the batch is successful, it moves on to the next batch. This keeps the inserts happening quickly. However, if the batch has one or more error rows, the entire batch is redirected to the second OLE DB Destination. This one inserts rows one at a time, giving us the single row error handling that we wanted. Good rows are inserted, but the error rows are redirected to the third OLE DB Destination to be written to an error table. It doesn’t have to be an OLE DB Destination to handle the error rows, you could use a Flat File Destination or a script component to process them.

You can modify this technique by using additional stages of OLE DB Destinations (for example, to go from 50,000 rows to 10,000 to 1) but I try not to go beyond three levels to keep things understandable. Your needs may vary, depending on the performance and number of rows you are processing.

I’ve uploaded the sample package to my SkyDrive. This one is done with SSIS 2008, but the same technique works in 2005.

Posted in Uncategorized | Comments Off

ssisUnit 1.0 Is Available

Posted on August 31, 2008 by jwelch

The 1.0 bits for ssisUnit are now on CodePlex. The Release page for it includes versions for SQL Server 2005 and 2008.

The biggest feature in this release is a GUI for creating the unit tests. The test files are still XML, but the GUI abstracts away a lot of the detail involved in creating a test. A couple of things to be aware of with the GUI, however:

It does not check to see if your test suite has been saved before closing. So remember, "Save Early, Save Often" 🙂
It’s possible (even likely) that you will see some errors while using the GUI. Being the most recently developed part of ssisUnit, it’s had the least testing. So if you see an error, please open an issue on CodePlex.

If you are reading this, and wondering why anyone would want to unit test an SSIS package, please check out the TechEd Online presentation I posted about recently, Testing the SQL Database. You might also consider attending the Unit Testing an SSIS Package session at the SSWUG BI Virtual Conference. I’ll be covering the basics of unit testing SSIS packages, and an overview of using ssisUnit to automate the process.

If you are reading this, and wondering why unit test at all, I highly encourage you to check out Test Driven Development, By Kent Beck. Not a traditional read for database developers, but it will give you a completely different take on the importance and benefits of unit testing.

Posted in Uncategorized | Comments Off

Configuration Approaches In SSIS

Posted on August 31, 2008 by jwelch

A number of the SSIS MVPs were asked to contribute a white paper to MSDN a few months ago, and those articles were published recently. You’ll notice a common thread in most of the topics:

Considerations for High Volume ETL Using SQL Server Integration Services by Pat Martin
Reusing Connections with Data Sources and Configurations by Matthew Roche
Defining a Configuration Approach for Integration Services Packages by John Welch
Best Practices for Integration Services Configurations by Jamie Thomson
Understanding Integration Services Package Configurations by Rafael Salas (coming soon)

They are all great articles, and well worth reading.

Judging by the volume of questions related to configurations on the MSDN Forums, they are one of the more confusing features of SSIS. While there is some overlap in the articles on configurations, there’s valuable information that is unique to each, and will help give you a more rounded view of how to use them. As there are a lot of options and tradeoffs with configurations, it’s no surprise that everyone uses them slightly differently.

I hope the articles are valuable. Please leave comments if you have an approach to configurations that works well for you.

Posted in Uncategorized | Comments Off

Testing the SQL Database at TechED

Posted on August 19, 2008 by jwelch

While at TechEd this summer, I participated in a panel discussion on testing the database and related technologies. The panel also included Gert Drapers and Jamie Laflen of the Visual Studio Team System – Database Edition team, David Reed (who manages the SQL Server community samples on CodePlex), and Adam Machanic as the moderator. The discussion covered some of the pros and cons of testing data, and some of the common issues encountered. It was also a chance to talk about testing related areas (like SSIS packages), of which I am a big fan. If you are interested in seeing it, it was just put online.

If you are interested in unit testing SSIS packages, keep an eye on www.codeplex.com/ssisunit. I should have a new release going up soon, which adds some much needed features, and a much more extensible framework for adding new functionality.

Posted in Uncategorized | Comments Off

Using the XML Task to Run an XPath Query

Posted on July 23, 2008 by jwelch

I came across a question about this the other day, and thought I’d share the answer here, as there doesn’t seem to be much information out there about it. The questioner wanted to know how to read a portion of an XML document into a variable. Fortunately, the XML Task allows you to do this by running an XPath query. To configure the task, follow these steps:

Set the operation type to XPATH.
Specify a source for the XML (file, variable, or directly input).
Set the OperationResult to save the results in a variable.
Set the SecondOperandType to Direct Input.
Specify the XPath query in the SecondOperand.
Set the PutResultInOneNode option to False.
Set the XPathOperation to Node list.

If, after configuring this, the variable is empty, it’s likely that the XPath isn’t correct. This can be caused by a number of things. A good way to test your XPath is this web page: http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm It allows you to upload your XML file, and then run your XPath query against it.

Another common problem occurs when the XML file uses namespaces. If it does, you can try using XPath like this: //*[local-name()=’Setup’ and namespace-uri()=’http://test.org’]

Hopefully this helps, if you are working with XPath in SSIS.

Posted in Uncategorized | Comments Off

Presenting at the SSWUG Virtual BI Conference

Posted on July 20, 2008 by jwelch

I’m going to be presenting a few sessions at the upcoming SSWUG Virtual Business Intelligence Conference. It’s occurring from September 24-26, 2008. It has a very impressive list of speakers (I’m really not sure how I made it on the list – it’s quite an honor). I’ll be presenting on configurations in SSIS, unit testing SSIS (one of my favorite topics), and the new Report Builder functionality in SQL Server 2008.

I’m looking forward to it, but it will be a new experience for me. The sessions will be recorded in advance, and then the speakers will be online during their presentation time slot to answer questions from the audience. It sounds interesting, and since it’s being broadcast online, there’s potential for a much wider range of attendees.

If you’re interested in the conference, please check out the web site.

Posted in Uncategorized | Comments Off

The Microsoft BI Conference

SQL Saturday in Greenville, SC

The SSWUG BI Virtual Conference Is Moving…

Presenting at the Triad SQL Server User Group

Error Redirection with the OLE DB Destination

ssisUnit 1.0 Is Available

Configuration Approaches In SSIS

Testing the SQL Database at TechED

Using the XML Task to Run an XPath Query

Presenting at the SSWUG Virtual BI Conference

Recent Posts

Recent Comments

Archives

Categories

Meta