Dynamically Pivoting Rows to Columns
Previously, I posted about dynamically un-pivoting columns to rows. In this post, we’ll look at the reverse scenario. Suppose that you have a list of sales orders, in this format:
Sales Order Number | Product |
SO-001 | Hammer |
SO-001 | Table Saw |
SO-001 | Cordless Screwdriver |
SO-002 | Table Saw |
SO-002 | Wrench |
SO-003 | Flashlight |
If there is a (reasonable) maximum number of products, you can create the maximum number of columns needed, and use the Pivot transform. However, if the number of products per order vary significantly, you might want to use something a little more flexible. This can be accomplished by storing the values in a single column as XML, or as a delimited set of values.
For this example, we’ll take the input described above and transform it to this:
Sales Order Number | Product |
SO-001 | Hammer|Table Saw|Cordless Screwdriver |
SO-002 | Table Saw|Wrench |
SO-003 | Flashlight |
The data flow for this is straightforward – a Flat File Source, a Script component, and a Multicast (so that a data viewer can be added to see the results of the script). The Script component is where the bulk of the work is done.
The script transform is configured with a single asynchronous output, by setting Output 0′s SynchronousInputID property to “None”. Two columns are added to Output 0, one for the Sales Order Number, and one for the list of products.
The script itself is listed below. Essentially, the script checks each row to see if the sales order number has changed from the previous row. If it is the same, it appends the product to a list. If the sales order number is different, the currently stored values for sales order number and the product list are sent to the asynchronous output.
There is some special handling for the first and last rows. On the first row, we don’t want to output anything, as we need to check the next row first. When the input buffer is finished, we need send the last value for sales order number and product list, and then call the SetEndOfRowSet on the output buffer. To make sure we’ve processed all the rows, we need to check EndOfRowSet on the input buffer. This should be checked in the Input0_ProcessInput routine, as it will never be equal to true in the Input0_ProcessInputRow method.
1: Public Class ScriptMain
2: Inherits UserComponent
3:
4: Private orderNum As String
5: Private productList As String
6: Private blnFirstRow As Boolean = True
7:
8: Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
9:
10: If Row.SalesOrderNumber = orderNum Then
11: ‘append this item to the list
12: productList = productList + “|” + Row.Product
13: Else
14: If blnFirstRow Then
15: blnFirstRow = False
16: Else
17: ‘output the current values
18: Output0Buffer.AddRow()
19: Output0Buffer.SalesOrderNumber = orderNum
20: Output0Buffer.ProductList = productList
21: End If
22:
23: ‘reset the control variables
24: orderNum = Row.SalesOrderNumber
25: productList = Row.Product
26: End If
27: End Sub
28:
29: Public Overrides Sub Input0_ProcessInput(ByVal Buffer As Input0Buffer)
30: MyBase.Input0_ProcessInput(Buffer)
31: If Buffer.EndOfRowset Then
32: Output0Buffer.AddRow()
33: Output0Buffer.SalesOrderNumber = orderNum
34: Output0Buffer.ProductList = productList
35: Output0Buffer.SetEndOfRowset()
36: End If
37: End Sub
38: End Class
This can be useful if you need to pivot a varied number of columns. The same techniques can also be used anywhere you want to process a set of rows and change the output based on previous row values. If you’d like to take a more in-depth look, the sample files are on my SkyDrive:
This one is really useful in SSIS, and on the similar lines we can use COALESCE function in SQL Server to combine multiple rows into one.
USE NORTHWIND
GO
CREATE FUNCTION ProductList (@CategoryID INT)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE @Products VARCHAR(1000)
SELECT @Products = COALESCE(@Products + ‘, ‘, ”) + ProductName
FROM Products
WHERE CategoryID = @CategoryID
ORDER BY ProductName ASC
RETURN @Products
END
GO
SELECT Distinct CategoryID, dbo.ProductList (CategoryID) AS ProductList
FROM Products
GO
- Mohit Nayyar
That’s a handy way to use COALESCE – I’ll have to bear it in mind for SQL applications.
Hi, It is really great what you developed. I’m using this but I can’t make it work beacuse I have many columns and not only two as your example. What I did was changing Output 0′s SynchronousInputID property to “input “Input 0″ (7652)” so as to have after the script all the data that I’m using. The problem Is that when I change the variables inside the script I recive this message error: “Output0Buffer is not declared”.
Is there speciall adjustment that I have to do when I chane Outout 0 property? Thanks in advance.
Thank you sooo much. SSIS can be so frustratin at times, this article was exactly what I was looking for and it did the trick perfectly.
Thanks for this post. This is the only option as far as I’m aware of in SSIS 2005 for concatenating multiple rows into single string value across groupings of rows. (Assuming pivot is not an option because the number of rows per grouping could be unlimited). Thanks for providing a solution I could not find anywhere else after at least an hour on Google.
Old post, but still of good value! Since the component is asynchronous, you have to copy your output “group by” columns over from the input. This can be a bit tedious with larger tables. Perhaps a custom component could do this a bit cleaner, but as it stands, this solves the problem for now
.