Archive for March 2009

Updating Custom Assembly References in SSIS

Just a quick tip related to a recent problem I had with a custom task I developed for SSIS. I had to recompile it recently, and in the process, managed to change the public key associated with the assembly. Since SSIS uses strong names (which include the public key) to reference custom tasks, this caused a bit of a problem. None of the packages that used the custom task would execute properly, and I really didn’t want to go through 20+ packages and delete the task, then add it back and try to remember how it was set up in each package. Fortunately, fixing this was much simpler. I opened one of packages in the XML view, located the original strong name (which looks something like this “Mariner.Pamlico.ETL.ControlFlowTasks.CustomTask, Mariner.Pamlico.ETL.ControlFlowTasks, Version=1.0.0.0, Culture=neutral, PublicKeyToken=73ea249dd43ab42e”), and did a search and replace on all the packages to update the reference with the new public key. I could then execute all my packages without any problems, and they all retained their original settings.

Like I said, just a quick tip if you ever have to update multiple packages to reference a new version of a custom task or component. A few words of warning – before updating the XML of a package directly, it’s always a good idea to make a backup. Also, this works as long as you haven’t changed the public interface for the task. If you have removed properties, it’s likely to cause new and interesting errors when you attempt to open the package in BIDS.

SSWUG Business Intelligence Virtual Conference and SDS

It’s been a busy first quarter, but I should have a little more time available to blog now. 

I’ve just completed shooting my sessions for the SSWUG Business Intelligence Virtual Conference, which has been one of the things occupying my time recently. I had a few changes in my lineup of presentations. With the recent announcements about changes to SQL Data Services (SDS), there was really no reason to present the sessions on using SSIS and SSRS against SDS. If you haven’t seen the changes, you can review them here, but as a quick overview, Microsoft is eliminating the ACE model and the requirement to access SDS through SOAP or REST. Instead, they will offer traditional TDS access to SDS, meaning that accessing a database in the cloud will be a very similar experience to accessing an on-premise SQL Server. In fact, many tools, like SSRS and SSIS, that required custom extensions in order to use SDS under the old model, should work out of the box with the new model. So, there really wasn’t much point in presenting how to do something that won’t be necessary any longer.

So, I ended up still doing the SDS 101 session, which gives an overview of the changes to the service, and some reasons why you might want to look into this for new applications. I also did two other sessions. “Doing More (ETL) With Less (Effort) by Automating SSIS” focuses on how to build packages programmatically, and focuses on some of the simpler options for accomplishing this, like EzAPI. “Processing Flat Files with SSIS” runs through many of the common issues developers encounter when using SSIS against flat files, and shows a number of ways to work around those problems. If you have trouble working with flat files, this would be a good session to attend.

You can register for the conference here, and if you use the code SPVJWESP09 when you register, you should get $10 off the registration fee.

Scale Up or Scale Out for SQL Server Data Warehouses

Historically, scale up has been the model for Microsoft data warehouses. Running a large, multi-terabyte data warehouse meant buying a lot of hardware for a single server, and hoping that it would be enough, once the warehouse was fully loaded and under use. If the hardware wasn’t sized properly, you could be looking at big costs for purchasing a new server, with more capacity for memory, disk, and CPUs.

Over the past several months, though, there have been a number of announcements in the SQL Server space that change that. We now have the option of scaling our warehouses up or out. Project “Madison”, which is the integration of the massively parallel processing (MPP) technologies from the DATAllegro acquisition, promises to allow SQL Server 2008 to scale out to 100s of terabytes in the warehouse, by distributing processing among multiple commodity servers. Even though it’s not been officially released yet, I’ve seen several demos of the functionality, and it looks promising. The advantage of this approach is that as you need additional capacity, you simply add additional servers.

On the scale up front, last week Microsoft announced “SQL Server Fast Track Data Warehouse”, which is a set of reference architectures for symmetrical multi processing (SMP) data warehousing. These are single server configurations that are optimized for data warehousing workloads, and have been tested and validated. These take much of the guesswork out of sizing your data warehouse server. However, you still have to provide good estimates of query volume and size to use the reference architectures effectively.

So now the question becomes, should you target a scale up or scale out approach for your data warehouse? One of the deciding factors is going to be your data volume. The Fast Track reference architectures are currently targeted towards 4 to 32 terabyte warehouses. Given current hardware restrictions, that’s the practical limit for a single server. However, as the hardware continues to get better, that number is expected to go up. “Madison”, on the other hand, can scale well past 32 terabytes. So if your current data needs are greater than 32 terabytes, I’d be looking closely at “Madison”.

What if your current needs are less than 32 terabytes, but you expect to grow past that point over the next couple of years? Well, fortunately, the Fast Track reference architectures are designed to offer an easy transition to “Madison”, when your needs grow to that point. And if you expect your data volumes to stay below the 32 terabyte mark, then the Fast Track reference architectures certainly offer a greater degree of confidence that you are getting the appropriate configuration for your warehouse.

It’s always nice to have options, and improving the scaling abilities of SQL Server should certainly help Microsoft in the large data warehouse marketplace. However, the roadmap for how this might apply to the Analysis Services component of SQL Server hasn’t really been directly addressed yet. It would seem logical to offer the same sort of solutions in that space. It will be interesting to see which direction Microsoft takes on that.