Dynamically Pivoting Rows to Columns

Previously, I posted about dynamically un-pivoting columns to rows. In this post, we’ll look at the reverse scenario. Suppose that you have a list of sales orders, in this format:
























Sales Order Number Product
SO-001 Hammer
SO-001 Table Saw
SO-001 Cordless Screwdriver
SO-002 Table Saw
SO-002 Wrench
SO-003 Flashlight

If there is a (reasonable) maximum number of products, you can create the maximum number of columns needed, and use the Pivot transform. However, if the number of products per order vary significantly, you might want to use something a little more flexible. This can be accomplished by storing the values in a single column as XML, or as a delimited set of values. 


For this example, we’ll take the input described above and transform it to this:















Sales Order Number Product
SO-001 Hammer|Table Saw|Cordless Screwdriver
SO-002 Table Saw|Wrench
SO-003 Flashlight

The data flow for this is straightforward – a Flat File Source, a Script component, and a Multicast (so that a data viewer can be added to see the results of the script). The Script component is where the bulk of the work is done.


image 


The script transform is configured with a single asynchronous output, by setting Output 0′s SynchronousInputID property to “None”. Two columns are added to Output 0, one for the Sales Order Number, and one for the list of products.


image


The script itself is listed below. Essentially, the script checks each row to see if the sales order number has changed from the previous row. If it is the same, it appends the product to a list. If the sales order number is different, the currently stored values for sales order number and the product list are sent to the asynchronous output.


There is some special handling for the first and last rows. On the first row, we don’t want to output anything, as we need to check the next row first. When the input buffer is finished, we need send the last value for sales order number and product list, and then call the SetEndOfRowSet on the output buffer. To make sure we’ve processed all the rows, we need to check EndOfRowSet on the input buffer. This should be checked in the Input0_ProcessInput routine, as it will never be equal to true in the Input0_ProcessInputRow method.


 



   1: Public Class ScriptMain
   2:     Inherits UserComponent
   3:  
   4:     Private orderNum As String
   5:     Private productList As String
   6:     Private blnFirstRow As Boolean = True
   7:  
   8:     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
   9:  
  10:         If Row.SalesOrderNumber = orderNum Then
  11:             ‘append this item to the list
  12:             productList = productList + “|” + Row.Product
  13:         Else
  14:             If blnFirstRow Then
  15:                 blnFirstRow = False
  16:             Else
  17:                 ‘output the current values
  18:                 Output0Buffer.AddRow()
  19:                 Output0Buffer.SalesOrderNumber = orderNum
  20:                 Output0Buffer.ProductList = productList
  21:             End If
  22:  
  23:             ‘reset the control variables
  24:             orderNum = Row.SalesOrderNumber
  25:             productList = Row.Product
  26:         End If
  27:     End Sub
  28:  
  29:     Public Overrides Sub Input0_ProcessInput(ByVal Buffer As Input0Buffer)
  30:         MyBase.Input0_ProcessInput(Buffer)
  31:         If Buffer.EndOfRowset Then
  32:             Output0Buffer.AddRow()
  33:             Output0Buffer.SalesOrderNumber = orderNum
  34:             Output0Buffer.ProductList = productList
  35:             Output0Buffer.SetEndOfRowset()
  36:         End If
  37:     End Sub
  38: End Class

This can be useful if you need to pivot a varied number of columns. The same techniques can also be used anywhere you want to process a set of rows and change the output based on previous row values. If you’d like to take a more in-depth look, the sample files are on my SkyDrive:



 


 

6 Comments

  1. mohit.nayyar says:

    This one is really useful in SSIS, and on the similar lines we can use COALESCE function in SQL Server to combine multiple rows into one.

    USE NORTHWIND
    GO

    CREATE FUNCTION ProductList (@CategoryID INT)
    RETURNS VARCHAR(1000)
    AS
    BEGIN
    DECLARE @Products VARCHAR(1000)

    SELECT @Products = COALESCE(@Products + ‘, ‘, ”) + ProductName
    FROM Products
    WHERE CategoryID = @CategoryID
    ORDER BY ProductName ASC

    RETURN @Products
    END
    GO

    SELECT Distinct CategoryID, dbo.ProductList (CategoryID) AS ProductList
    FROM Products
    GO

    - Mohit Nayyar

  2. jwelch says:

    That’s a handy way to use COALESCE – I’ll have to bear it in mind for SQL applications.

  3. http:// says:

    Hi, It is really great what you developed. I’m using this but I can’t make it work beacuse I have many columns and not only two as your example. What I did was changing Output 0′s SynchronousInputID property to “input “Input 0″ (7652)” so as to have after the script all the data that I’m using. The problem Is that when I change the variables inside the script I recive this message error: “Output0Buffer is not declared”.
    Is there speciall adjustment that I have to do when I chane Outout 0 property? Thanks in advance.

  4. Kevin_Hates_SSIS says:

    Thank you sooo much. SSIS can be so frustratin at times, this article was exactly what I was looking for and it did the trick perfectly.

  5. http:// says:

    Thanks for this post. This is the only option as far as I’m aware of in SSIS 2005 for concatenating multiple rows into single string value across groupings of rows. (Assuming pivot is not an option because the number of rows per grouping could be unlimited). Thanks for providing a solution I could not find anywhere else after at least an hour on Google.

  6. Nate says:

    Old post, but still of good value! Since the component is asynchronous, you have to copy your output “group by” columns over from the input. This can be a bit tedious with larger tables. Perhaps a custom component could do this a bit cleaner, but as it stands, this solves the problem for now :-) .

Leave a Reply to Kevin_Hates_SSIS