Posts tagged ‘SQL Server 2008’

What’s Happening on SQL Server When My Package Fails?

We encountered an issue recently in an ETL process during the extract portion. When extracting from one very large table, using a NOLOCK hint, we were encountering errors indicating that the data was changing while we were extracting it. Since no other processes were supposed to be using the table during the same time window as the ETL, this was a source of some confusion. Since there was nothing indicating what, if anything, was changing the data, we needed a quick way to see what was going on in the database at the point the package failed.

Normally, when troubleshooting database activity problems, I’d just run sp_who or sp_who2. However, since the problem was occurring between 2 and 3 am, nobody was terribly excited around getting up to monitor the server. We didn’t really want to leave Profiler running all night either. So, we set up the package to log the current database activity whenever an error occurred. Basically, in the OnError event handler, we just added a data flow. The data flow runs the sp_who stored procedure to get the current activity on the database, and saves it to a text file.

This was pretty simple, and I’ve created a sample package to show it. The main control flow has a script task that generates an error, purely for the purpose of firing the OnError event handler.

image

The event handler does a check to make sure this is the first time it’s fired, since we didn’t want to log the activity multiple times. It then runs a data flow, and once it completes, sets a flag to ensure it doesn’t doesn’t fire again.

image

The data flow runs sp_who. While sp_who2 gives better information, it doesn’t provide metadata in an easily consumable format for SSIS. sp_who, on the other hand, works right off, so we went with it for simplicity. The output is sent to a text file.

image

It was pretty simple to implement, and it served the purpose. By reviewing the information logged, we could tell what other processes were running, and what computer it originated from. If you ever encounter a similar situation, hopefully this can help. The template package is located on my SkyDrive.

SQL Server Launch Event

The launch event in Charlotte went very well – we had over 100 people in attendance, and the sessions were packed. We had a lot of good information on the new features in SQL Server 2008, covering the new T-SQL features, server consolidation, dimensional modeling, Analysis Services, and Reporting Services.

If you have follow up questions from any of the presentations, or are interested in seeing more information on the topics, send me an email and we can see about lining the presenter up for a more in-depth presentation.

SQL Server 2008 Firestarter Event

There is a SQL Server 2008 Firestarter event at the Charlotte Microsoft campus next Thursday, January 15th. If you want to learn more about the new features in 2008, or just get a chance to network, it’s a great opportunity. We’ll have 5 SQL Server MVPs present, so it will be a good chance to ask some questions as well. If you are interested, please register at this link: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032400468&Culture=en-US

Presentation at the Triad SQL Server User Group

On May 15th I’ll be presenting at the Triad SQL Server User Group, located in High Point, NC. The presentation will be focusing on new features in SQL Server 2008 for business intelligence and data warehousing. If you’re in the area, please stop in and introduce yourself.

February CTP for SQL Server 2008

The February CTP for SQL Server 2008 has been released, and is available to download at https://connect.microsoft.com/SQLServer/. Microsoft is also sponsoring a bug bash contest, with prizes for the most bugs, most critical bug, etc. The rules are posted here: https://connect.microsoft.com/SQLServer/content/content.aspx?ContentID=7734.

The November CTP for SQL Server 2008 Is Out!

The download link is here: http://www.microsoft.com/downloads/details.aspx?FamilyId=3BF4C5CA-B905-4EBC-8901-1D4C1D1DA884&displaylang=en

Evidently there have been some issues getting it on the Connect site, so they are hosting it in Microsoft’s Download Center for now.

SQL Server 2008 CTP 4 is Out

And you can download it from here. It looks like there are a few new improvements in Reporting Services, and some new data types in SSIS (to complement the new data types in the relational engine), but nothing too major. Hopefully, I’ll have time to take it for a spin this weekend.

SQL Server 2008 – Using Merge From SSIS

In my last post, I talked about using the new MERGE statement in SQL Server 2008. It allows you to merge the data in a result set with a table, based on a join condition that you specify. This is a common scenario in data warehousing, where you often need to take a working table and compare it to a target table to determine if a row should inserted, updated, or deleted. This is often the case for dimension table. Currently, there is no SSIS component in the June 2007 CTP that leverages the MERGE statement directly. This post will explore one possible pattern for using MERGE from SSIS.

In SSIS 2005, a commonly used pattern for updating dimension tables was to retrieve all of the rows intended for the dimension table in a data flow via a source component. Then the rows would be run through a lookup component to determine if they existed in the target table. Rows to be updated (those found in the lookup) would be run through an OLE DB Command component. Rows to be inserted would be sent to an OLE DB Destination.

image

In cases where performance is a concern, the OLE DB Command could be replaced with another OLE DB Destination that writes all rows to be updated to a temporary table. In the control flow, an Execute SQL task would issue an UPDATE statement, using the temporary table as a source.

image

To implement this using a MERGE statement, we can modify the second pattern. All rows that are read from the source should be written to a temporary table. Then, the Execute SQL task will be used to run a MERGE statement that uses the temporary table as the source for the merge. The matching of the rows will be done through the join condition in the MERGE statement.

image

Why would using the MERGE statement be better than using the lookup pattern described above? For one thing, the data flow is simpler, with no branches in execution. Also, the data flow only has to write to a single table, instead of two tables. I haven’t done any performance testing on the MERGE statement yet, so I can’t say for certain whether it is faster.

I’m still hoping for a destination component that can be used directly in the data flow to perform the MERGE, but the SSIS team has indicated that it probably won’t happen.

Using the Merge Statement in SQL Server 2008

It’s been a busy couple of weeks, but I’ve finally found some time to play around with the June CTP of SQL Server 2008. One of the items that caught my interest was the MERGE statement. This is a new addition to the T-SQL language that lets you perform INSERTs, UPDATEs, and DELETEs against a table in a single statement. You specify a source table to compare to, and what conditions should result in an UPDATE vs. an INSERT.

This is a fairly common scenario in data warehousing, where you may need to either insert new rows into a fact table, or update existing rows with new information. Here’s a an example scenario to illustrate this. A store sends an hourly feed of transactions that have occurred in the last hour. You need to populate a fact table that should contain a monthly total for each customer, showing how many of each product were purchased in the month and how much was spent. The fact table needs to updated each time the hourly sales transaction comes in, so that it is always current. And you have to take into account that a customer may have already purchased an item in the same month, so you may need to update an existing row in the fact table rather than insert a new one.

Our fact table looks like this:

CREATE TABLE [dbo].[FactProductSalesSnapshot](
    [DimCustomerID] [int] NOT NULL,
    [PurchaseDateID] [int] NOT NULL,
    [DimProductID] [int] NOT NULL,
    [Quantity] [int] NOT NULL,
    [SalesAmount] [money] NOT NULL
)

and the merge table (the source for the MERGE statement) looks like this:

CREATE TABLE [dbo].[SalesTransactionStage](
    [SalesTransactionID] [int] NOT NULL,
    [CustomerID] [int] NOT NULL,
    [DateID] [int] NOT NULL,
    [ProductID] [int] NOT NULL,
    [Quantity] [int] NOT NULL,
    [SalesAmount] [money] NOT NULL,
    [BatchID] [tinyint] NOT NULL
)

The batch ID indicates which batch of hourly sales transactions this row belongs to. To merge this in, you’d execute the following statement:

MERGE FactProductSalesSnapshot AS fact
USING (
  SELECT
      CustomerID
      ,DateID
      ,ProductID
      ,Quantity
      ,SalesAmount
  FROM SalesTransactionStage
 WHERE BatchID = 4
) AS src
ON (    fact.DimProductID = src.ProductID
    AND fact.DimCustomerID = src.CustomerID
    AND fact.PurchaseDateID = src.DateID )
WHEN MATCHED THEN
    UPDATE SET
        fact.Quantity = fact.Quantity + src.Quantity
        ,fact.SalesAmount = fact.SalesAmount + src.SalesAmount
WHEN NOT MATCHED THEN
    INSERT (DimCustomerID, PurchaseDateID, DimProductID, Quantity, SalesAmount)
        VALUES (src.CustomerID, src.DateID, src.ProductID, src.Quantity, src.SalesAmount);
 
 

To break this statement down a little bit, the USING (..) portion defines the source for the Merge to use for comparisons (the SalesTransactionStage table defined previously). The ON clause works like the ON clause in a JOIN statement, and determines how the source is matched against the target. In this case, we’re joining on the Product, Customer, and Date keys.

The WHEN MATCHED clause determines what should happen if we find a match. We are performing an UPDATE, and adding the values from the stage table to existing balance in the fact table. WHEN NOT MATCHED indicates that this particular combination of product, customer, and date does not yet exist, so the new values are inserted.

The WHEN clauses also support conditional logic, so we could add additional logic to catch returns (where the quantity is less than zero) and store that information in a separate column.

Overall, the MERGE statement seems to work very well, and I think it will be very useful if you are building warehouses on SQL Server. I’m planning on a part 2 to this to show how MERGE can be used from Integration Services. Currently, there doesn’t appear to be a direct way to leverage it, outside of an Execute SQL statement, which has some drawbacks. I am curious to see if Microsoft introduces a new destination component for the pipeline that takes advantage of the MERGE functionality.