Where’s John These Days?

Apologies for the lack of updates to the blog recently. It’s been a very busy time, but hopefully things will settle down a bit now.

Exciting news today (for me at least)! It was my first day as a Pragmatic Works employee. I’ve joined their product group, and will be helping manage the development of their BI tools. As I’ve commented on this blog before, one of the things I ‘m really passionate about is enabling BI developers to create solutions faster and more easily, and I’m looking forward to the opportunities that Pragmatic Works presents to continue doing exactly that. I also get to work with a great group of developers and some really sharp BI people, so it promises to be a lot of fun.

My excitement is tempered somewhat by sadness at leaving another great group of developers at Varigence. I enjoyed working with everyone there, and wish them success in their future endeavors.

In other news, I have a number of presentations coming up. I’ll be at SQLBits in London on March the 29th, presenting a precon with Matt Masson on SSIS Performance Design Patterns (space is limited, register now!). I also have a session on SSIS Unit Testing at SQLBits.

On April 14th, I’ll be presenting at SQL Saturday #111 in Atlanta, which is always a great time. I’ll be presenting on Tuning SSAS Processing Performance

Last, but definitely not least, I was thrilled to find out that I’ll be presenting the Tuning SSAS Processing Performance session at SQL Rally in Dallas on May 10-11 as well. Please vote for one of my other sessions in the community choice options, if you see one that appeals to you. I’m really looking forward to seeing some of my friends from Texas again.

Posted in Events | Tagged , , | Comments Off

YTD Calculations

Whether you’re creating a report, chart, KPI, or calculated measure, at some point you’ll probably need to add year-to-date calculations. It doesn’t matter if you’re using SQL or MDX, the technique to generate the From and To dates for a YTD calculation remains the same.

The general concept goes like this:

Step 1: Get the current year and prefix it with “01/01/”. This is the From date.
Step 2: Get today’s date. This is the To date.

It seems simple enough but if your ETL process runs daily and pulls “yesterday’s” data then you’ll need to account for a 1 day lag on the To date.  For example, if today is June 15th, 2011 then your warehouse will only contain data up to June 14th.  Using the logic described above, the YTD calculation will select data from “01/01/2011″ to “06/15/2011″, even though data for June 15th hasn’t yet been loaded into the warehouse.  Although not entirely accurate, it won’t hurt anything, the YTD calculation just won’t return any data for the 15th.

Now let’s say today is December 31st, 2011.  On this day the warehouse will only contain data up to December 30th and the YTD calculation will select data from “01/01/2011″ to “12/31/2011″.  Are you beginning to see the problem?  Here’s one last example to drive the point home: Lets advance one day to January 1st, 2012.  On this day the warehouse will only contain data up to December 31st, 2011 but the YTD calculation will select data from “01/01/2012″ to “01/01/2012″.  In this case the YTD calculation will return an empty resultset and the business users will never get a full 365-day view of the data.

Ok, so we know we have to update the YTD calculation to handle the one-day lag in the ETL.  The logic now looks like this:

Step 1: Get the current year of yesterday and prefix it with “01/01/”. This is the From date.
Step 2: Get yesterday’s date. This is the To date.

To summarize, here’s what the YTD calculation will select using the original logic:

  • June 15th, 2011: “01/01/2011″ through “06/15/2011″
  • December 31st, 2011: “01/01/2011″ through “12/31/2011″
  • January 1st, 2012: “01/01/2012″ through “01/01/2012″

And here’s what the YTD calculation will select using the new logic:

  • June 15th, 2011: “01/01/2011″ through “06/14/2011″
  • December 31st, 2011: “01/01/2011″ through “12/30/2011″
  • January 1st, 2012: “01/01/2011″ through “12/31/2011″

As you can see, given the 1-day ETL lag, the new YTD calculation will always select the correct range of data, even on the first of the year.

But lets take it a step further.  I propose the idea of working with the “last completed month” or even the “last completed week”.  This way calculations are handled more gracefully.  In other words, the YTD calculation is based on January through the last completed month or week.  Keep in mind that we still have to account for the 1-day ETL lag.

This is what would happen if we decided to operate on the last completed month:

  • June 15th, 2011: “01/01/2011″ through “05/31/2011″
  • December 31st, 2011: “01/01/2011″ through “11/30/2011″
  • January 1st, 2012: “01/01/2011″ through “12/31/2011″

In order to achieve this behavior we must implement the following logic:

Step 1: Get the current year of yesterday and prefix it with “01/01/”. This is the From date.
Step 2: Get the month of yesterday’s date, append “/01/” for the day, append the year of yesterday’s date, and subtract one day. This is the To date.

The benefit of this approach is that business users have up to a month to analyze data produced by year-to-date calculations.  However, it’s a double edged sword because business users have to wait up to a month.  Admittedly this approach may be a better choice for “rolling” calculations but this will depend on what works best for the business.  In some cases using the last completed week may be preferred over using the last completed month since users still have time to analyze data but don’t have to wait long for new data to be included in the calculation.

Posted in MDX, SQL | Comments Off

How to Export BLOB data to Files in SSIS

Overview

This topic will cover how to export data from BLOB fields in SQL Server and save those files out to the filesystem. This is accomplished within VB.Net code, but could just as easily be ported over to C#.

DBServer: SQL Server 2008 SP2
Development Environment: Visual Studio 2008

Package Setup

  1. Add an Execute SQL Task that retrieves the BLOB information from the Database
    • ResultSet: Full result set
    • Parameter Mapping: Add any necessary parameters for your stored procedure
    • Result Set
      • Result Name: 0 (zero)
      • Variable Name: Doesn’t matter, so long as the variable is of type System.Object
  2. Connect the Execute SQL Task to a For Each Loop Container (FELC)
    1. Collection: Foreach ADO Enumerator
    2. ADO object source variable: Result Set Variable Name from Step 1
    3. Enumeration mode: Rows in the first table
    4. Variable Mapping: Map out the columns from the Result Set to Package Variables (Index is 0-based)
      1. NOTE: Be sure to setup the BLOB output variable as a System.Object
  3. Add a Script Task to output the BLOB information to the file system

Script Code (VB.Net)


Public Sub SaveMemoryStream(ByVal buffer2 As Array, ByVal FileName As String, ByVal ms As MemoryStream)
        Dim outStream As FileStream

        outStream = File.OpenWrite(FileName)
        outStream.Write(buffer2, 0, ms.Position)
        outStream.Flush()
        outStream.Close()
    End Sub

    Public Sub Main()
        Dim Folder_Path As String
        Dim File_Name As String
        Dim s_File_Name As String
        Dim buffer() As Byte
        Dim oStream As System.IO.MemoryStream = Nothing
        Dim oFileStream As System.IO.FileStream = Nothing
        Dim buffer2() As Byte

        Folder_Path = Dts.Variables("Working_Directory").Value.ToString() + ""
        File_Name = Dts.Variables("File_Path").Value.ToString()

        s_File_Name = Folder_Path & File_Name

        Try
            buffer = CType(Dts.Variables("Blob").Value, Byte())
            oStream = New System.IO.MemoryStream(buffer)
            oStream.Write(buffer, 0, buffer.Length)

            buffer2 = oStream.ToArray()
            SaveMemoryStream(buffer2, s_File_Name, oStream)

            'Close the stream
            oStream.Flush()

            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            Dts.Events.FireError(0, "Create Temp Image", ex.Message, "", 0)
            Dts.TaskResult = ScriptResults.Failure

        Finally
            If Not oStream Is Nothing Then
                oStream.Close()
            End If

            oStream = Nothing

            If Not oFileStream Is Nothing Then
                oFileStream.Close()
            End If

            oFileStream = Nothing
        End Try
        '
        Dts.TaskResult = ScriptResults.Success
    End Sub
Posted in Uncategorized | Comments Off

Deploying a custom SSRS Assembly to the GAC

I briefly mentioned a technique I used to deploy custom assemblies for SSRS to the GAC in my PASS presentation on

Building a Reporting Services Framework.  I didn’t spend a lot of time on that, but I did get some questions on it, so I thought I’d go into more detail here.  Here’s a pointer to the documentation around custom assembly deployment from MS.  This post is how to automate the process if you’re using the GAC method.

Here is a code sample showing how I perform the remote deployment.

Local Deployments

Before we actually get around to deploying to the server we need to be able to test our custom assembly locally.  I use the following Post-Build Event in my custom assembly.  If you aren’t familiar with Build Events, you get to them by right-clicking on the project and choosing ‘Build Events’ in the left hand side of the properties.

copy "$(TargetPath)" "%ProgramFiles%Microsoft Visual Studio 9.0Common7IDEPrivateAssemblies"

gacutil.exe  -i $(TargetPath)

copy /Y "$(TargetPath)" "C:_PASS_DemoDeployAssemblies"

This build event does three things:

  1. Copies the dll to the PrivateAssemblies directory.  This makes the custom assembly available to BIDS so you can use it in Preview Mode.  You have to restart BIDS after updating the custom assembly for the new version to be loaded.
  2. GAC’s the custom assembly so it can be used by your local Reporting Services instance.  This will require a restart of the service to be live, but I typically don’t include that in the build process because it takes a few seconds and I’m use it less often.
  3. Copies the dll to the Deployment section of my solution.  I typically source control the dll, and use the process described later in this post to deploy the assembly to other environments.

Remote Deployments

Here is the solution that I found for automating deployments for other servers.  You have to jump through a few hoops to be able to do this (at least with Windows Server 2008 R2).  If you search the web you’ll find a few other solutions to this problem, but I didn’t find any that met my specific needs.  The specific scenario I was trying to enable was unattended deployments.  If you log in to the machine, or if you’re able to provide credentials, this is an easier problem.  This was the only way I found to do this without manually collecting credentials.  If you do that and use credssp, you can use PowerShell to do this remotely.  If you are willing to send an admin password clear text, you can use PSExec.  Neither of those two solutions met my needs, so I chose the following approach:

  1. Copy the assembly to the target machine.
  2. Install the assembly using System.EnterpriseServices executed via a remote call.
  3. Verify the assembly was actually installed (the installation procedure doesn’t return failures).
  4. Restart the ReportServer service.

You might need to tweak the example scripts a little bit for your environment.  You might also want to dot source the DeploymentLibrary script (i.e., ./DeploymentLibrary.ps1) in deploy_ssrs_assemblies.ps1.  I have all my libraries load as part of my profile, so this may assume the library is already loaded.  The installation process just takes the path you deployed the assembly to (such as “\Server01-devd$ReportsdeploymentassembliesEltUtilities.dll”) and figures out the name of the assembly and the server from it.

The sample I provided is laid out as follows:

  1. Deployment – The root of the deployment structure.
    1. Assemblies – The assemblies we want to deploy.  I also typically include other libraries used for other types of automated deployments here as well.
    2. Dev – This contains a simple driver file that sets all the variables necessary for a deployment to the dev environment.  I usually also have a localhost, qa, and prod folder as well (identical scripts except for the variables).  The code to set up the logging should probably be refactored into another method in the PowerShell library, but that’s on the TODO list.
    3. deploy_ssrs_assemblies.ps1 – The script that handles the orchestration of the deployment.  This is common, parameter driven functionality that is called by the drivers for each of the environments.
    4. DeploymentLibrary.ps1 – The methods that install the assembly on the remote server and restart the services.

There are a few moving pieces to this process, but the example should be pretty well documented. 

Conclusion

That’s about it.  If this solution doesn’t meet your specific needs, there are a few other ones out there (such as Remote GAC Manager) that might, or you can just use the standard PSExec approach.  Happy automating!

Posted in How To | Tagged | Comments Off

Demo Materials for PASS Session BIA-304–Building a Reporting Services Framework

I presented my session on ‘Building a Reporting Services Framework’ at the PASS Summit 2011 on Friday.  I had a great time at the summit, both presenting and attending all the great sessions.  A wonderful time was had by all.

Here’s a link to the demo materials I went through.  You probably don’t want to just push any of this out to production as is, but it should be a good start.  I also used the Adventure Works 2008 R2 database, but it isn’t included in this package.

Posted in Uncategorized | Tagged | Comments Off

When To Use NVARCHAR

The Short Answer

Only when you need to store multi-lingual (unicode) data. 

The Nitty Gritty

The fundamental unit of storage in a SQL Server database is an 8 KB page.  To paraphrase the MSDN, SQL Server reads and writes whole data pages and as a result, the maximum amount of data that can be contained within a single row on a page is 8,060 bytes (8 KB). 

The page size in turn limits the maximum size of VARCHAR, a variable-length non-Unicode character datatype, to 8,000 bytes.  Take note!  The operative word here is non-Unicode.  A non-Unicode character is stored as UTF-8 (8 bits) and requires one byte of storage per character, meaning that the VARCHAR datatype may contain at most 8,000 characters. 

In contrast, NVARCHAR is a variable-length Unicode datatype.  And unlike non-Unicode characters, Unicode characters are stored as UTF-16 (16 bits) and require two bytes of storage per character.   Because of the SQL Server page size, NVARCHAR datatypes have the same length restrictions as their VARCHAR cousins, 8,000 bytes.  This means that an NVARCHAR datatype may contain, at most, 4,000 characters.  The net result is NVARCHAR dataypes take up twice as much space as a VARCHAR datatype.  Said another way, NVARCHAR(4000) is the same size as VARCHAR(8000).

But the NVARCHAR datatype shouldn’t be dismissed entirely.  The English language can be encoded in UTF-8, making VARCHAR the datatype of choice.  However, other languages, such as Japanese, Hebrew, Arabic, etc., have an extended set of character codes that are only found in UTF-16.  In order to store data in these languages you must use the NVARCHAR datatype, but that should be the only time.

Warehouse Beware

The rule for when to use an NVARCHAR datatype applies to both OLTP systems and OLAP systems.  If your OLTP system is riddled with unnecessary NVARCHAR datatypes you may want to consider converting them to VARCHAR in the data warehouse to save space.  It’s tempting just to convert all NVARCHARS to VARCHARS but you always have to ask yourself, was the column defined as an NVARCHAR for a reason?  This is a question you’ll need to have answered by the business users.  Is your company planning to expand their business globally?  If so, it might make sense to keep the NVARCHARs, but only for columns where it really matters.

Posted in SQL Server | Tagged , , , , , , | Comments Off

Checking Whether a Character is Numeric

In SSIS, checking whether a value is numeric has always been a bit painful. Unfortunately, the SSIS expression language doesn’t have an IsNumeric function. Jamie Thomson (blog | twitter) has a nice post here that explains one way to do this check in a script component.

However, there is a shortcut you can take if you only need to check a specific character to see if it’s numeric. You can use the following expression:

FINDSTRING("0123456789", [CharacterToCheck], 1) != 0

If this returns True, the character is numeric. False means it’s not numeric.

This works by checking for the existence of the character in the string that contains all the numeric characters. If it finds an occurrence, then the character must be numeric. Obviously, this has some limitations over a real IsNumeric function, but if you just need to check single characters, it’s a lot faster than writing a script.

Posted in SSIS | Tagged , | Comments Off

Naming Columns for the Script Component

Do you use whitespace or special characters in your column names? Most people don’t, because of the additional headaches it creates. You have to delimit the column names, come up with a work-around for tools that don’t support column names with special characters, etc. Underscores, however, are used pretty extensively in place of spaces. If you are using Script Components in SSIS, though, you may encounter an all-new headache with special characters or even underscores in your column names.

When you use a script component in SSIS, it generates some .NET code for you automatically, based on the metadata in the pipeline connected to the script component. However, when this code is generated, SSIS strips out any whitespace or special characters from the names of inputs, outputs, and columns. It only retains the letters and numbers (alphanumeric characters) in these names.

Here’s some examples of column name issues that I’ve run into with scripts (and while these specific items are made up, they represent real-world scenarios I’ve encountered – there’s some really horrible naming approaches out there):

Original Column Name Script Column Name
Account Account
Account# Account
Account Number AccountNumber
Account_Number AccountNumber
TI_TXN_ID TITXNID
TI_TXNID TITXNID

 

As you can see, once the alphanumeric characters have been stripped from these column names, they are no longer unique. That can pose a few problems in your script code. What’s worse, because this code is auto-generated by SSIS, you can’t fix it without changing the column names in the data flow, even though this is really purely a script thing (and not even a .NET limitation – underscores are perfectly valid in .NET naming). What’s even worse than that – you don’t get an error till the binary code is recompiled.

So, if you are working with script components, make sure all your column names are unique even when all non-alphanumeric characters have been stripped from them. The same thing applies to your output names – they must be unique based only on the alphanumeric characters.

Posted in SSIS | Tagged , | Comments Off

Using OLE DB Connections from Script Tasks

I write scripts on a pretty regular basic, and often need to access database connections from them. It’s pretty easy to do this if you are using an ADO.NET connection. However, if you are using OLE DB, you have to go through a couple of additional steps to convert the connection to an ADO.NET version. Matt Masson posted a great code sample for doing this conversion.

I use a slightly altered version of this code pretty regularly. It’s been modified to support both OLE DB and ADO.NET connections, so that I can switch connections without having to change the script code.

To use it, you need to add a reference in your script project to Microsoft.SqlServer.DTSRuntimeWrap. Then, add the following to the usings section at the top of the script:

using System.Data.Common;
using Wrap = Microsoft.SqlServer.Dts.Runtime.Wrapper;

For the code to get the connection, use the following:

ConnectionManager cm = Dts.Connections["MyConnection"];
DbConnection conn = null;
if (cm.CreationName == "OLEDB")
{
	Wrap.IDTSConnectionManagerDatabaseParameters100 cmParams =
	cm.InnerObject as Wrap.IDTSConnectionManagerDatabaseParameters100;
	conn = cmParams.GetConnectionForSchema() as DbConnection;
}
else
{
	conn = cm.AcquireConnection(null) as DbConnection;
}

if (conn.State == ConnectionState.Closed)
{
	conn.Open();
}

// TODO: Add your code here

conn.Close();
Dts.TaskResult = (int)ScriptResults.Success;

You can use the “conn” object to perform actions against the connection. Since it’s using the common DBConnection interface, you can use it against any database connection that you have an ADO.NET provider for (which includes OLE DB providers).

Posted in SSIS | Tagged , | Comments Off

SQLSaturday and Touring the South

I’m sitting in the Seattle-Tacoma airport right, now, waiting for my redeye flight back to Charlotte after a fun and productive week in Seattle.

When I get home, I’ll be jumping straight into a car with my family and driving for 7 or 8 hours. Why, you ask? To get to Birmingham, Alabama for SQLSaturday #81 on 7/30. I’m giving two sessions, Do More (ETL) with Less (Effort) – Automating SSIS and Handling Advanced Data Warehouse Scenarios in SSIS.

The following weekend, 8/6, I’ll be in Baton Rouge, LA for SQLSaturday #64, delivering the same sessions. If you happen to be attending either one, please look me up.

Posted in SQL Saturday | Tagged , | Comments Off