Do you use whitespace or special characters in your column names? Most people don’t, because of the additional headaches it creates. You have to delimit the column names, come up with a work-around for tools that don’t support column names with special characters, etc. Underscores, however, are used pretty extensively in place of spaces. If you are using Script Components in SSIS, though, you may encounter an all-new headache with special characters or even underscores in your column names.
When you use a script component in SSIS, it generates some .NET code for you automatically, based on the metadata in the pipeline connected to the script component. However, when this code is generated, SSIS strips out any whitespace or special characters from the names of inputs, outputs, and columns. It only retains the letters and numbers (alphanumeric characters) in these names.
Here’s some examples of column name issues that I’ve run into with scripts (and while these specific items are made up, they represent real-world scenarios I’ve encountered – there’s some really horrible naming approaches out there):
Original Column Name | Script Column Name |
Account | Account |
Account# | Account |
Account Number | AccountNumber |
Account_Number | AccountNumber |
TI_TXN_ID | TITXNID |
TI_TXNID | TITXNID |
As you can see, once the alphanumeric characters have been stripped from these column names, they are no longer unique. That can pose a few problems in your script code. What’s worse, because this code is auto-generated by SSIS, you can’t fix it without changing the column names in the data flow, even though this is really purely a script thing (and not even a .NET limitation – underscores are perfectly valid in .NET naming). What’s even worse than that – you don’t get an error till the binary code is recompiled.
So, if you are working with script components, make sure all your column names are unique even when all non-alphanumeric characters have been stripped from them. The same thing applies to your output names – they must be unique based only on the alphanumeric characters.