Previously, I posted about dynamically un-pivoting columns to rows. In this post, we’ll look at the reverse scenario. Suppose that you have a list of sales orders, in this format:
Sales Order Number | Product |
SO-001 | Hammer |
SO-001 | Table Saw |
SO-001 | Cordless Screwdriver |
SO-002 | Table Saw |
SO-002 | Wrench |
SO-003 | Flashlight |
If there is a (reasonable) maximum number of products, you can create the maximum number of columns needed, and use the Pivot transform. However, if the number of products per order vary significantly, you might want to use something a little more flexible. This can be accomplished by storing the values in a single column as XML, or as a delimited set of values.
For this example, we’ll take the input described above and transform it to this:
Sales Order Number | Product |
SO-001 | Hammer|Table Saw|Cordless Screwdriver |
SO-002 | Table Saw|Wrench |
SO-003 | Flashlight |
The data flow for this is straightforward – a Flat File Source, a Script component, and a Multicast (so that a data viewer can be added to see the results of the script). The Script component is where the bulk of the work is done.
The script transform is configured with a single asynchronous output, by setting Output 0’s SynchronousInputID property to “None”. Two columns are added to Output 0, one for the Sales Order Number, and one for the list of products.
The script itself is listed below. Essentially, the script checks each row to see if the sales order number has changed from the previous row. If it is the same, it appends the product to a list. If the sales order number is different, the currently stored values for sales order number and the product list are sent to the asynchronous output.
There is some special handling for the first and last rows. On the first row, we don’t want to output anything, as we need to check the next row first. When the input buffer is finished, we need send the last value for sales order number and product list, and then call the SetEndOfRowSet on the output buffer. To make sure we’ve processed all the rows, we need to check EndOfRowSet on the input buffer. This should be checked in the Input0_ProcessInput routine, as it will never be equal to true in the Input0_ProcessInputRow method.
1: Public Class ScriptMain
2: Inherits UserComponent
3:
4: Private orderNum As String
5: Private productList As String
6: Private blnFirstRow As Boolean = True
7:
8: Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
9:
10: If Row.SalesOrderNumber = orderNum Then
11: ‘append this item to the list
12: productList = productList + “|” + Row.Product
13: Else
14: If blnFirstRow Then
15: blnFirstRow = False
16: Else
17: ‘output the current values
18: Output0Buffer.AddRow()
19: Output0Buffer.SalesOrderNumber = orderNum
20: Output0Buffer.ProductList = productList
21: End If
22:
23: ‘reset the control variables
24: orderNum = Row.SalesOrderNumber
25: productList = Row.Product
26: End If
27: End Sub
28:
29: Public Overrides Sub Input0_ProcessInput(ByVal Buffer As Input0Buffer)
30: MyBase.Input0_ProcessInput(Buffer)
31: If Buffer.EndOfRowset Then
32: Output0Buffer.AddRow()
33: Output0Buffer.SalesOrderNumber = orderNum
34: Output0Buffer.ProductList = productList
35: Output0Buffer.SetEndOfRowset()
36: End If
37: End Sub
38: End Class
This can be useful if you need to pivot a varied number of columns. The same techniques can also be used anywhere you want to process a set of rows and change the output based on previous row values. If you’d like to take a more in-depth look, the sample files are on my SkyDrive: