At the PASS Summit this week, I heard a few interesting bits about SSIS 2008 that should be in the next CTP.
One, ADO.NET will be fully supported, with a ADO.NET Data Source (renamed from the Data Reader Data Source) and a ADO.NET Destination. Since ADO.NET has an ODBC provider, we should finally have the ability to use an ODBC database as a destination. And they will both have custom UIs, so no more messing around in the Advanced Editor.
Two, there’s a new data profiling task. This does a really nice job of processing a table and doing all the standard data profiling activities. It’s based on something out of Microsoft Research, so it has some pretty cool capabilities, like a pattern recognition function that will spit out regular expressions that match the contents of the column.
Three, since the script will be run through VSTA instead of VSA, it will be pretty easy to create a web service transform. Just create a script component, reference the web service, and all the proxy code will be created for you. (It’s not news that VSTA is replacing VSA, but I hadn’t thought about how that would impact web services until this).
Four, data flow threading will be much better. Previously, a single execution tree was always single threaded. That’s why in 2005, if you have a long chain of synchronous tasks, you may get better performance by introducing a Union All transform into the sequence. It breaks up the execution tree and allows the engine to run multiple threads. In 2008, the data flow engine will be able to introduce new threads itself.
Five, there will be a Watson-style dump tool available. It will be invoked automatically on a crash, or you could invoke it on demand. It will dump the current state of the package out to a text file, included variable values, data flow progress, etc.
And finally, lookups are going to be drastically enhanced. They can be sourced from ADO.NET or a flat file. We’ll have the ability to cache the lookups and reuse them across data flows in the same package. We’ll also be able to persist the cache between runs of the package. And, interestingly enough, the persisted file will look a lot like a RAW file. There should be some interesting possibilities in that :). There will also be a “missed rows” output, instead of having to use the error output to capture records that didn’t have a match.