Welcome to Agile BI Community Sign in | Join | Help

It’s generally accepted* that when you are using an OLE DB Source or an ADO.NET Source, you should specify all the columns, rather than selecting the table from the dropdown, or using a SELECT * SQL statement.

If you’re like me, then you probably don’t have the columns in all your tables memorized, and you don’t like typing, so getting a complete list of columns is a bit painful. You either have to use the query designer to create the whole query, or go to SSMS and create the SELECT statement from there. However, there is a simpler way. Just type “SELECT * FROM [your table here]” into the source component’s SQL Command box, and then click the Build Query button. The query designer will automatically expand the * into the specific column names. Just click OK as soon as the designer opens, and you end up with a fully populated column list.

 

*For some of the reasons why this is generally accepted from an SSIS perspective, see these posts from Jamie Thomson: http://blogs.conchango.com/jamiethomson/archive/2006/02/21/SSIS_3A00_-SELECT-_2A002E002E002E00_-or-select-from-a-dropdown-in-an-OLE-DB-Source-component_3F00_.aspx and https://blogs.conchango.com/jamiethomson/archive/2006/01/05/2554.aspx (point #4). If you want reasons from a general SQL standpoint, search for “never use ”SELECT *” in SQL” in your favorite search engine.

I’ll be presenting at the Columbia, SC PASS user group on July 7. I’ll be speaking on “Getting Started with Analysis Services 2008”, which is a demo heavy introduction to using SQL Server Analysis Services 2008. If you want to use SSAS, but aren’t sure quite where to start, this session should help.

If you are in the Columbia area on July 7, please drop by. You can find more information here:

http://midlands.sqlpass.org/

I ran across an interesting behavior in SSIS this week, and thought it was worth sharing. Occasionally, I’ll have the need to use the value of a variable (set in the control flow) in the data flow. The typical way to accomplish this is to use a Derived Column transformation to introduce the variable’s value into the data flow. However, you need to be cautious if you are doing this with a String variable.

When you are working with Derived Column expressions, the output type of the Derived Column expression is based on the input values. For example, inputting an expression like 1+2 will result in an output column data type of DT_I4 (an integer). Basically, the editor guesses the data type based on the input values. Generally, this is OK – if you are referencing other data flow columns, or variables that aren’t of type string, the lengths of the values are static. However, when the editor calculates the length of string variables, it uses the current value in the variable. So, if your variable has 3 characters, the output data type only expects three characters. If the value in the variable is static and never changes, this is fine. But if the value is updated regularly (perhaps it’s a variable in a template package) or the value is set through an expression (so that the value changes at runtime), this can cause truncation errors in the package.

I created a sample package that demonstrates this. The package is pretty simple: a single Data Flow task, with an OLE DB Source, a Derived Column transformation, and a Union All (just there to terminate the data flow).

image

There is a single variable named TestVar that contains the value “1234”.

image

In the Derived Column transformation, there is a single new column added. Note that the calculated length for the output column is 4, matching the value “1234” in the variable.

image

If the package is run right now, everything works fine. But that’s not the case if you change the value in the variable to “123456”.

image

Running the package after changing the value results in the following error:

[Derived Column [91]] Error: The "component "Derived Column" (91)" failed because truncation occurred, and the truncation row disposition on "output column "ColumnFromStringVar" (289)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.

This occurs because the metadata in the Derived Column transformation isn’t updated when the variable’s value is changed. So, to avoid seeing this error occur in your packages, you need to explicitly set the output column’s length.

In 2005, you could change the calculated data type by editing the data type, length, precision, and scale fields. In 2008, however, the values are locked. You can change the data type by going into the advanced editor for the Derived Column. However, it’s easier to simply cast the string value, to force the Derived Column editor to treat it as if it has constant length.

image

By using this approach, as long as your string variable’s value is less than 50 characters, the Derived Column will continue to work. It’s best to set the length of the cast to the same value as the destination column’s length.

There’s a Connect submission on improving this behavior, either by updating the pipeline’s metadata as the string variable’s value changes, or by throwing a validation warning or error if the current value of the variable exceeds the output length in the Derived Column transformation. If you agree that this could use some improvement, you can vote here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=470995

In the meantime, I’d highly recommend performing an explicit cast in any Derived Column expression that uses a string variable, particularly if the value is subject to change.

The sample package is on my SkyDrive, if you’d like to see the error in action.

A recent post on the SSIS forums was asking about creating multiple rows in a text output file from a single row in the data flow. Given a set of rows like this:

John Smith 1/1/1900 Value A
Jane Smith 12/1/1900 Value B

the poster wanted this output in the text file:

John Smith  
1/1/1900 Value A
Jane Smith  
12/1/1900 Value B

Basically, the poster wanted a line break in the middle of a row of data, while keeping a line break at the end.

There are a couple of ways to accomplish this in SSIS. One way is the use of a script task to create the file, which gives you complete control over the format of the file. There’s also a couple of ways to do it directly in SSIS. The first way is to use a Multicast transform to create two copies of each row, perform some string concatenation, and then combine them using a Union All or a Merge.

image

The Derived Column transforms are used to put the multiple columns into a single column, so that a variable length record can be written to the flat file. The Sort transforms and the Merge combines the rows into the proper order, before sending them to a flat file.

The other option (and one that probably falls under the category of stupid SSIS tricks), is to hack the flat file connection manager a little bit. You can set the column delimiters so that a carriage return/linefeed is inserted in the middle of the row. However, this isn’t as simple as just choosing {CR}{LF} as the column delimiter. SSIS checks to make sure that none of the column delimiters are the same as the row delimiter. Why it does that check, I don’t know, given the way it parses flat files. Regardless, you have to work around it. So, you can simply select the column where you want to introduce the break, and set it’s delimiter to {CR}.

image

Then insert a new column immediately following that column, set the output width to 0, and set the column delimiter to {LF}.

image

Now the output will include a carriage return / linefeed between the columns.

The sample package for this is located here. It is SSIS 2008, but the concepts are the same for 2005.

If you’ve done much work with SSIS, you’re probably aware that on opening a package in BIDS, SSIS validates all the objects in the package. This can cause the packages to open very slowly, particularly if it has connections to a database that is currently unavailable.

I recently needed to upgrade several older packages to SSIS 2008. Unfortunately, these packages were referencing a database that I no longer have access to. On top of that, there are a number of data flows in each package, all of which use this non-existent database. Opening a package in BIDS was taking more than 10 minutes, which is about 9 minutes and 55 seconds past the point where my patience runs out. Normally, this wouldn’t be that big of a deal. Once the project was opened, I would just set the Work Offline option (located under the SSIS menu in BIDS), which prevents the validation from running.

image

However, each package was in its own project (for reasons I won’t go into in this post, but primarily failure to plan ahead), so I was looking at a very slow and painful process to upgrade these packages.

Fortunately, there is a way to enable the Work Offline option prior to actually opening the project. Locate the BIDS *.user file associated with the project. For SSIS, this file should be located in the same folder as the project (.dtproj) file, and will have a filename like “<project name>.dtproj.user”. Open this file in Notepad, and you should see something like the following (it’s got a few additional tags in 2008, but the general format is the same):

<?xml version="1.0" encoding="utf-8"?>
<DataTransformationsUserConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Configurations>
    <Configuration>
      <Name>Development</Name>
      <Options>
        <UserIDs />
        <UserPasswords />
        <OfflineMode>false</OfflineMode>
      </Options>
    </Configuration>
  </Configurations>
</DataTransformationsUserConfiguration>

Locate the <OfflineMode> tag (in red above) and change the value from false to true. Now, when the project is opened, it will already be in Offline mode, so you won’t have to suffer through a lengthy validation process.

There is a free community event from SSWUG coming up on April 17th. For people interested in Analysis Services (and who isn’t?), you’ll be able to see a webcast from Donald Farmer on developing high performance cubes. If you haven’t heard Donald speak before, I highly recommend signing up. If you have heard him, you’ve probably already clicked the link to register.

If you are interested in attending the Business Intelligence vConference, but haven’t made up your mind yet, you can now preview some of the sessions. A preview of my session on automating SSIS is available, but please don’t let that scare you off. There is going to be a lot of good content. You can still use the code SPVJWESP09 to get $10 off the registration.

Just a quick tip related to a recent problem I had with a custom task I developed for SSIS. I had to recompile it recently, and in the process, managed to change the public key associated with the assembly. Since SSIS uses strong names (which include the public key) to reference custom tasks, this caused a bit of a problem. None of the packages that used the custom task would execute properly, and I really didn’t want to go through 20+ packages and delete the task, then add it back and try to remember how it was set up in each package. Fortunately, fixing this was much simpler. I opened one of packages in the XML view, located the original strong name (which looks something like this “Mariner.Pamlico.ETL.ControlFlowTasks.CustomTask, Mariner.Pamlico.ETL.ControlFlowTasks, Version=1.0.0.0, Culture=neutral, PublicKeyToken=73ea249dd43ab42e”), and did a search and replace on all the packages to update the reference with the new public key. I could then execute all my packages without any problems, and they all retained their original settings.

Like I said, just a quick tip if you ever have to update multiple packages to reference a new version of a custom task or component. A few words of warning – before updating the XML of a package directly, it’s always a good idea to make a backup. Also, this works as long as you haven’t changed the public interface for the task. If you have removed properties, it’s likely to cause new and interesting errors when you attempt to open the package in BIDS.

It’s been a busy first quarter, but I should have a little more time available to blog now. 

I’ve just completed shooting my sessions for the SSWUG Business Intelligence Virtual Conference, which has been one of the things occupying my time recently. I had a few changes in my lineup of presentations. With the recent announcements about changes to SQL Data Services (SDS), there was really no reason to present the sessions on using SSIS and SSRS against SDS. If you haven’t seen the changes, you can review them here, but as a quick overview, Microsoft is eliminating the ACE model and the requirement to access SDS through SOAP or REST. Instead, they will offer traditional TDS access to SDS, meaning that accessing a database in the cloud will be a very similar experience to accessing an on-premise SQL Server. In fact, many tools, like SSRS and SSIS, that required custom extensions in order to use SDS under the old model, should work out of the box with the new model. So, there really wasn’t much point in presenting how to do something that won’t be necessary any longer.

So, I ended up still doing the SDS 101 session, which gives an overview of the changes to the service, and some reasons why you might want to look into this for new applications. I also did two other sessions. “Doing More (ETL) With Less (Effort) by Automating SSIS” focuses on how to build packages programmatically, and focuses on some of the simpler options for accomplishing this, like EzAPI. “Processing Flat Files with SSIS” runs through many of the common issues developers encounter when using SSIS against flat files, and shows a number of ways to work around those problems. If you have trouble working with flat files, this would be a good session to attend.

You can register for the conference here, and if you use the code SPVJWESP09 when you register, you should get $10 off the registration fee.

Recently, a user on the MSDN forums asked how to check an Agent job from SSIS to see if it’s completed. Once it had completed, additional steps need to be performed, for example, archiving flat files. However, this couldn’t be done till the job completed. Fortunately, this isn’t too difficult to accomplish in SSIS.

I put together the following sample package to illustrate this. Basically, the package starts a job, then goes into a loop to see if the package has completed.

image

The Start Job Execute SQL Task uses the sp_start_job stored procedure to start an Agent job. sp_start_job is asynchronous, that it, it does not wait for the job to complete, but returns right away. This sample is just running a job that pauses execution for several seconds before completing.

image

The For Loop container is configured to look at the @executionStatus variable. The initial expression (InitExpression) sets the value of the variable to 1 at the beginning of the loop. The loop will continue executing until the EvalExpression is false, in this case when @executionStatus is no longer equal to 1.

image

The first step inside the loop is to delay for a second. This allows the job to get started initially, and prevents the package from bombarding the server with requests. The delay is implemented using a WAITFOR statement in the Execute SQL Task.

image

Finally, another Execute SQL Task is used to check the current status of the job. This is done by calling sp_help_job, which returns, among other things, the current_execution_status column. The value in this column indicates what the job is doing currently. 1 indicates that it is currently executing, so when the value is no longer 1, it’s time to exit the loop. The value of the current_execution_status column is assigned to the @executionStatus variable, and the loop repeats as long as the value is 1.

image

This is the Result Set page, where the column is assigned to the variable.

image

As I mentioned, this is fairly straightforward to implement. The one wrinkle I found is that you do need some amount of delay between calling sp_start_job and checking the execution status. If you do it immediately, it’s likely that the job hasn’t started yet, and the loop will exit immediately. On my system, introducing the 1 second delay gave the job time to actually get started. Also, you’ll need to alter the Agent job being called to match your system.

The sample package is on my Skydrive. Let me know if you have questions or feedback.

I’m going to be doing 3 sessions at the upcoming SSWUG BI Virtual Conference. If you are interested in getting an introduction to SQL Data Services (SDS), I’ll be doing an introductory session on it, and then covering how it can be used from SSIS and SSRS. SDS is pretty interesting, but pretty different from standard SQL Server.

There’s a great lineup of speakers and session for the conference – you can check them out here. If you want to sign up, you can use this code: SPVJWESP09 for $10 off the registration fee.

If you develop in Visual Studio using C# or VB, then you are probably familiar with code snippets. They are little chunks of code that you can insert into your application and save you some typing. For example, if you are writing a C# application, and you type “if” and hit TAB twice, the following will be inserted into your code:

image

You can enter in the value to be evaluated, and then type in the rest of your code. I like code snippets because they save me from typing the same things over and over again. However, in SSIS 2005 scripting, code snippets weren’t available, due to the VSA environment used by the script objects. Now that we have a real scripting environment in SSIS 2008, code snippets can be used, although there are a few things you need to know.

Two of the most common activities I do in SSIS scripts are reading from and writing to variables. So, I put together a few code snippets that do this. Here’s what the snippet for reading a value from a variable looks like:

image

To create a snippet, you define the code snippet in an XML format, and save it to a file with a .snippet extension. I’m not going to post the code for the snippet here, as it is fairly verbose, but you can look at the .snippet file in any text editor. Once you have the file, you need to put it in a specific location for the Visual Studio environment to recognize it. This is where it can get a little confusing. Normally, snippets go into the “C:\Users\<user>\Documents\Visual Studio 2008\Code Snippets\<language>\My Code Snippets” folder. However, this is the location used for the full Visual Studio environment. Since the script editors in SSIS use the Visual Studio Tools for Applications environments, they get their own folder. In fact, there are separate folders for the Script Task and the Script Component. Code snippets for the Script Task go into the “C:\Users\<user>\Documents\Integration Services Script Task\Code Snippets\<language>\My Code Snippets” folder, and snippets for the Script Component go into the “C:\Users\<user>\Documents\Integration Services Script Component\Code Snippets\<language>\My Code Snippets” folder.

This actually works out well, because the code for the snippets is slightly different between the Script Task and Script Component. This is due to the object references being slightly different between the two.

I’ve provided two code snippets for C# for reading from and writing to variables in the Script Task, and the same two in C#, but altered to work in the Script Component. To use them, you can simply copy them to the above folders, and then type “SSISReadOneVariable” or “SSISWriteOneVariable” as a shortcut, followed by two tabs to insert them. You can change the shortcut in the snippet file if you’d like to use something else. It also wouldn’t be difficult to convert them to VB, if that’s your preference. The snippets are located on my SkyDrive. If you have ideas for other useful snippets, post a comment. Enjoy!

We encountered an issue recently in an ETL process during the extract portion. When extracting from one very large table, using a NOLOCK hint, we were encountering errors indicating that the data was changing while we were extracting it. Since no other processes were supposed to be using the table during the same time window as the ETL, this was a source of some confusion. Since there was nothing indicating what, if anything, was changing the data, we needed a quick way to see what was going on in the database at the point the package failed.

Normally, when troubleshooting database activity problems, I’d just run sp_who or sp_who2. However, since the problem was occurring between 2 and 3 am, nobody was terribly excited around getting up to monitor the server. We didn’t really want to leave Profiler running all night either. So, we set up the package to log the current database activity whenever an error occurred. Basically, in the OnError event handler, we just added a data flow. The data flow runs the sp_who stored procedure to get the current activity on the database, and saves it to a text file.

This was pretty simple, and I’ve created a sample package to show it. The main control flow has a script task that generates an error, purely for the purpose of firing the OnError event handler.

image

The event handler does a check to make sure this is the first time it’s fired, since we didn’t want to log the activity multiple times. It then runs a data flow, and once it completes, sets a flag to ensure it doesn’t doesn’t fire again.

image

The data flow runs sp_who. While sp_who2 gives better information, it doesn’t provide metadata in an easily consumable format for SSIS. sp_who, on the other hand, works right off, so we went with it for simplicity. The output is sent to a text file.

image

It was pretty simple to implement, and it served the purpose. By reviewing the information logged, we could tell what other processes were running, and what computer it originated from. If you ever encounter a similar situation, hopefully this can help. The template package is located on my SkyDrive.

When I originally started this blog, I wanted to cover agile development in BI/DW projects, as well as technical topics about BI. However, it didn’t work out quite that way. I’ve posted a few thing here and there about agile development on this blog, but for the most part it’s been focused on technical topics (and most of those about SSIS). So, to avoid any confusion on this blog, I’ve started a second one named BIpartisan (no, it’s not about politics :) ). On BIpartisan, I’ll be focusing a bit more on higher level topics around agile development and delivering value with BI, and not so much on technical ones. If you are interested in that, please check it out.

I will continue to post here as well on the more technical topics. I’ve got several posts planned, but it’s been tough finding time to put them together. Hopefully, things will settle down a bit and I’ll have more free time for blogging.

The launch event in Charlotte went very well – we had over 100 people in attendance, and the sessions were packed. We had a lot of good information on the new features in SQL Server 2008, covering the new T-SQL features, server consolidation, dimensional modeling, Analysis Services, and Reporting Services.

If you have follow up questions from any of the presentations, or are interested in seeing more information on the topics, send me an email and we can see about lining the presenter up for a more in-depth presentation.

There is a SQL Server 2008 Firestarter event at the Charlotte Microsoft campus next Thursday, January 15th. If you want to learn more about the new features in 2008, or just get a chance to network, it’s a great opportunity. We’ll have 5 SQL Server MVPs present, so it will be a good chance to ask some questions as well. If you are interested, please register at this link: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032400468&Culture=en-US

More Posts Next page »