Archive for the ‘Uncategorized’ Category.

Demo Materials for PASS Session BIA-302 – Building a Supportable ETL Framework

Last week I presented the session “” at the PASS 2013 Summit.

Here’s a link to the demo that I went through.

BIA-301 – Demo Materials

Demo Materials for PASS Session BIA-304–Building a Reporting Services Framework

I presented my session on ‘Building a Reporting Services Framework’ at the PASS Summit 2011 on Friday.  I had a great time at the summit, both presenting and attending all the great sessions.  A wonderful time was had by all.

Here’s a link to the demo materials I went through.  You probably don’t want to just push any of this out to production as is, but it should be a good start.  I also used the Adventure Works 2008 R2 database, but it isn’t included in this package.

Automatically Using the Latest Version of an SSRS Template

The ability to add Templates to SSRS (such as documented here and here and here) is a neat feature.  Basically, you just create an RDL and put it in the “C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject\” directory… now you can create a new report starting from the template by selecting it from the “New Items” dialog.  This is great when you don’t want to start from scratch on every report.

The only thing I’ve found to be a friction point is that every time someone updates the Template (which you’re storing somewhere in source control) you need to copy it over to that folder.  That’s no big deal, but when you have a lot of developers that all have to do that frequently, it’s a pain.

The easy way to solve this is with symlinks.  You can’t use a shortcut, but you can create a soft symlink to point back at your template in source control.  That way you will always create a report off of the latest version in your repository.  (Yeah, you still have to keep that up to date… no getting around that.)

mklink "C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject\TemplatePortrait.rdl"  "C:\SourceControl\frameworks\SsrsFramework\ReportingTemplates\TemplatePortrait.rdl"

mklink "C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject\TemplateLandscape.rdl" "C:\SourceControl\frameworks\SsrsFramework\ReportingTemplates\TemplateLandscape.rdl"

That’s all there is to it!

Using SSRS queries with parameters with Netezza

Our company has started using SQL Server Reporting Services 2008 R2 to handle most of our reporting against Netezza.  This works pretty well over all using the Netezza 6.0 OLE DB driver.  There are a few things to be aware of though.

I found a number of people were using expressions to construct a SQL statement to send for the data set.  This works well, but it’s a little bit of a pain to both write and maintain.  Luckily, we can use parameters (with just a few things to be aware of).  That means we can write SQL that looks like this:

    AND day_of_week IN (@day_of_week_multi)
    AND date IN (@date)
    AND date_key =  @date_key
    AND good_day = @good_day

You can use parameters in the normal way with Netezza, but there are a few details to be aware of.  You just imbed the parameter in the query like

WHERE cola = @param

and create a corresponding parameter in your data sets.  The OLE DB driver handle surrounding surrounding the parameter with quotes when required (strings and date times), and it automatically escapes any single quotes in the string.

Parameters are ordinal, so you need to create a parameter (in order) for each value you want to send. There parameters in your query will be replaced in order based on the parameters you specified in the dataset.  No big deal, but something to be aware of.  You can either use a either a normal parameter (such as ‘@myparam’) or a question mark (‘?’).

Hears the rub… the OLE DB driver also automatically escapes single quotes so you don’t have to worry about that.  This is great… but there is a little bit of an issue when you want to use multi-value parameters.  Basically, the normal method of using SQL like

WHERE col_a IN (@multi_value_param)

and an expression like this for a parameter

=JOIN(@multi_param, "’,’”)

doesn’t work because the single ticks will always be escaped (so you’ll see a value like “’val1’’,’’val2’’,’’val3’” instead of “’val1’,’val2’,’val3’” in the resulting SQL).  No problem… we just need a different approach.  We’ll use something in the WHERE clause like

STRPOS(@multi_param, '|' || col_a|| '|') > 0

with an expression like

=”|” + JOIN(@multi_param, "|”) + “|”

This will bracket each value in a special “end of column” character such as “|”, and thus look for an exact match.  You may need to use a different character, or handle escaping, depending on your data.  You can always use multiple characters if you’d like (such as “~~~” as a delimiter).  This is a little hacky, but not too bad.  Remember to add the delimiter to the start and end of the string as well so you can match those values as well.

I tested this on a 1.5 billion row table against a similar query that uses an IN clause.  I found the two methods were essentially equal in performance, though that might degrade for an extremely large number of parameter values.  These both provide exact matches against the value, and they provide equivalent results.

But… what about using a multi-select on a date?  Unfortunately, that get’s slightly more complicated, but not too bad.  All you need to do is write a little code such as

    ''' <summary>
    ''' Convert an array of Dates into a string.
    ''' </summary>
    ''' <param name="inputArray">Array of dates to convert.</param>
    ''' <param name="delimiter">Delimiter to place between each item in the array.</param>
    ''' <param name="formatString">Format string for the date.</param>
    ''' <returns>String of formatted datetimes separated by a delimiter.</returns>
    ''' <remarks></remarks>
    Public Function JoinDate(ByVal inputArray() As Object, ByVal delimiter As String, ByVal formatString As String) As String
        Dim output As String = String.Empty

        Dim i As Integer
        For i = 0 To inputArray.Length - 1
            output += (CDate(inputArray(i))).ToString(formatString) + delimiter
        Next i

        ' Trim the trailing delimiter off the string
        If output.Length > 0 Then
            output = output.Substring(0, output.Length - delimiter.Length)
        End If

        Return output

    End Function

And use an expression like

="|" + Code.JoinDate(Parameters!date.Value, "|", "yyyy-MM-dd") + "|"

And you’re good to go!  This will convert a multi-select DateTime variable (which is an array of DateTime objects) into a string containing dates formatted as you request.

So for this example where we are constraining on a multi-value string parameter, a multi-value date parameter, plus a few other standard guys, we end up with

    AND strpos(@day_of_week_multi, '|' ||day_of_week || '|') > 0
    AND strpos(@date, '|' || date || '|') > 0
    AND date_key =  @date_key
    AND good_day = @good_day

Here’s a sample that shows how to use integer, datetime, and string values (with some examples of multi-value parameters thrown in).  Just use the SQL script to build and populate the test table, set the data source appropriately, and play around with it.

Extracting MySql UTF-8 fields with SSIS

Last week I ran into an interesting task with SSIS.  I’m pulling data from a MySql database (5.0.79-enterprise) using the MySql ADO.Net Connector with SSIS and loading the data into a SQL Server 2008 R2 database.  This has worked pretty well, but I ran into a few issues when dealing with UTF-8, so I thought I’d document some of my steps here.

Before we dive into the code, here’s a little information on how UTF-8 is handled in MySql and SQL Server in case you’re not familar with it.  Here’s some more information on Unicode that’s good reading material as well.

MySql doesn’t have the concept of national data types (such as nvarchar or nchar).  This is handled by setting the CHARSET of each table (which can be latin1, UTF-8, etc.).  You can read more about how MySql handles unicode characters in the MySql documentation.  One thing to note is that MySql support UTF-8 and UCS2, but you need to be aware of a few details about the implementation… The UTF-8 implementation does not use a BOM, and the UCS-2 implementation uses big-endian byte order and does not use a BOM.  The ADO.Net Connector doesn’t allow you to set a Code Page when you source the data.

SQL Server stores unicode data in national fields (nchar, nvarchar) using a little-endian UCS-2 encoding.  See for more details on this (it was written for 2005, but is applicable to 2008 R2 as well). UCS-2 is a predecessor of UTF-16. UCS-2 differs from UTF-16 in that UCS-2 is a fixed-length encoding that represents all characters as a 16-bit value (2 bytes), and therefore does not support supplementary characters. UCS-2 is frequently confused with UTF-16, which is used to internally represent text in the Microsoft Windows operating systems (Windows NT, Windows 2000, Windows XP, and Windows CE), but UCS-2 is more limited.  You may note that the UCS-2 implementations between the two systems are different, so you will have to transform the strings when transferring data between the two systems.

Our source system has been around for quite awhile… it started off only supporting latin character sets, but as the company grew we had to handle international characters as well.  Some tables were created using a UTF-8 character set, but some were never converted from latin1… the front end just started inserting UTF-8 strings into the fields.  This means that in certain cases, we have different encodings in the same field which have to be handled.  This doesn’t materially affect the details of how I implemented this solution, but it does mean that some of the built-in conversion function in MySql won’t necessarily behave as expected, and that you sometimes have to handle a field differently based on when it was stored in the database.

Getting Started

So how do you even know you have a problem like this?  The issue is how the data is represented in each system.   I was trying to get a consistent representation of my data across multiple systems in Linux and Windows, and through a variety of client tools.  Particularly if you don’t control the source of your data, you need to determine if it is correct, if there is an encoding issue, or if there is just a display issue.  One thing that is important is to make sure your tools can actually handle displaying these characters… some can’t.  When in doubt, I’d always fall back to something like Notepad++ with A Hex Editor plug-in.  Here’s an example of a word expressed in hex (which is correct in the source system), the word as it is displayed in the source system (where it was incorrectly encoded), and the word as it should be expressed when encoded correctly.


When I was initially looking at the data, I tended to focus on a few rows/columns where I could easily see the data was incorrect.  I found Japanese and German to be the easiest for this… Japanese tends to display as “all or nothing” being correct, where as characters such as umlauts in German will be displayed differently in each encoding, giving you a good clue when things are right and wrong.  I find I used a lot of functions such as “HEX()” in MySql and “CAST(xxx AS varbinary)” in SQL Server to look at the hex representations, and I will often dump query results to a text file and look at it in Notepad++ to verify what I see.

The Approach

I’m pulling data in from the source system directly, not landing it in a file before loading it in.  That approach can be used to avoid some of these issues… depending on how the data is stored in the source system, you could just create a Flat File source and import that data using a Code Page of 65001 to transform the Unicode characters.  In my particular situation, because of how the data was stored, this wasn’t possible… even if I did want to land the data multiple times, which I didn’t.

To start, I created an ADO.Net source to pull data from the source.  All the fields from the source are typed as WSTR… but they still have to be translated from UTF-8 to Windows Unicode.  The most reliable way I found to do this was to create a query like this:

    ,CAST(COALESCE(NULLIF(field1, ''), ' ') AS binary) AS field1_binary
    ,CAST(COALESCE(NULLIF(field2, ''), ' ') AS binary) AS field2_binary

The purpose of this is to convert the string to binary (so SSIS will see it as a BYTESTREAM).  So why the the NULLIF and COALESCE, you’re probably asking?  SSIS doesn’t like null byte streams… they cause an error (more on that later).  Unfortunately, when you try and cast an empty string to a binary, it is transformed into a null.  I haven’t found a way around that, and further haven’t found a way to COALESCE that null back into anything.  It looks like once it becomes a null, it stays a null.  The solution I found was to convert all nulls and empty strings to a single space, then convert that back to a null downstream.  This isn’t optimal, but it works fine in my situation so I’m OK with it.

Once we get each of these strings inside SSIS as byte streams, we need to convert them from UTF-8 byte streams into Unicode byte streams.  This isn’t difficult to do in C#, so we just need to create a Script Transform.  You use the binary fields as inputs, then create WSTR outputs for field1 and field2.  Then we use a method that looks like this:

public static string GetUnicodeFromUtf8ByteStream(byte[] input, bool convertEmptyStringToNull)
    // Create a UTF-8 string from the UTF-8 byte stream
    string inputAsUtf8 = System.Text.Encoding.UTF8.GetString(input, 0, input.Length); 

    // Opportunity to short-circuit; if the string is empty, and
    // the user wants to return nulls for empty strings, go ahead
    // and return a null.
    if (convertEmptyStringToNull &amp;&amp; inputAsUtf8.Trim().Length == 0)
        return null;

    // Convert the  UTF-8 encoded string into a Unicode byte stream
    byte[] convertedToUnicode = System.Text.Encoding.Unicode.GetBytes(inputAsUtf8);
    // Convert the Unicode byte stream into a unicode string
    string output = System.Text.Encoding.Unicode.GetString(convertedToUnicode); 

    // Return the correctly encoded string
    return output;

I also created a separate method

public static string GetUnicodeFromUtf8ByteStream(byte[] input)
    return GetUnicodeFromUtf8ByteStream(input, true);

To provide default behavior on how to handle empty strings.  I used this to work around the issue where empty strings don’t come across from MySql.  If you have to differentiate between nulls and empty strings, you’ll need to come up with a work around.

You could also probably just use the method System.Text.Encoding.Convert(Encoding srcEncoding, Encoding dstEncoding, byte[] bytes), but I wanted more control over the transformation.  I haven’t tested that, but it should work.

I have these methods (along with another few transforms) in a custom assembly, but you can put this directly into the transformation component.  Then, you just need code like this in your ProcessInputRow method:

// Convert fields that are stored in UTF-8 format into Unicode
Row.field1 = Converter.GetUnicodeFromUtf8ByteStream(Row.field1binary);
Row.field2 = Converter.GetUnicodeFromUtf8ByteStream(Row.field2binary);

This converts a field containing UTF-8 data into a proper Unicode string inside of SQL Server.

The Performance

Of course, any time you do something like this there is the question of performance.  I initially ran this test on a set with about 3.5 million records, with 6 fields I was performing the conversion on.  Here are some numbers I came up with running each of these cases a few times.  The hit is a few percent, but it isn’t that huge.  I saw roughly the same performance when scaling up to sets of around 100 million rows or so.

Test Time
Select (no casting, original fields, throwing away the data) 1:25
Select (coalesces, nullifs, casting) 1:26
Select (coalesces, nullifs, casting) + transformation 1:34

Some Things that Didn’t Work

One thing that annoyed me about this solution was the COALESCE and NULLIF handling.  Without this, though, a byte stream column will fail in SSIS.  I did try changing the ErrorRowDisposition from RD_FailComponent to RD_IgnoreFailure.  That allows nulls to come through.  Unfortunately, at least in my sample, I found that doing this more than doubled the time it took to import the data.  And even then, you have to use a Derived Column transform to create a flag column (on whether or not each field is null), then you have to handle nulls vs. non-nulls differently in the script transforms.  It was a nice thought – and could work for some applications – but it wasn’t a good fit for my solution.


The biggest issue I had doing all of this was figuring out what was stored in the source system, and how to transform it.  There were actually a few extra flavors of data in the source system, but the approach above worked for all of them.

Migration to Word Press

We recently migrated this site from Community Server to Word Press. Most of the posts came over pretty well, but some of the code samples and pictures didn’t make it. I’ll try and get these updated and fixed over the next week. Thanks for your patience…

Presenting ‘Enterprise Class Analysis Services Development’ at the Vancouver PASS Chapter

Hey Guys!

I’m going to be giving a presentation on Enterprise Class Analysis Services Development at the Vancouver PASS Chapter Friday, March 12th.

I’ll be talking about some of the topics I’ve blogged about here, including working with multiple developers, using a custom MSBuild task to build SSAS Databases, and reviewing a monitoring/reporting solution for SSAS.  If you don’t happen to be in Vancouver, you can always attend virtually.  :)

Title: Enterprise Class Analysis Services Development

SSAS is one of the most popular tools for OLAP, but many organizations experience challenges when attempting to use their standard development best practices with the tool. This session will cover many topics around enterprise development practices for SSAS, including how to effectively use source control with multiple developers, enable robust automated build/deployment strategies, implement usage monitoring and tracking solutions, and support unit testing for SSAS solutions.

Objective 1: Demonstrate how to use source control with multiple developers.

Objective 2: Show techniques to automate builds and enable robust deployment strategies.

Objective 3: Review strategies for robust monitoring of multiple SSAS deployments for development, administrative, and business purposes.

Using MSBuild with SQL Server Analysis Services Projects

I’ve written several blogs and community samples on working with SSAS Projects directly using AMO (instead of SSAS Databases on an Analysis Services server).  I was travelling this weekend, and got a chance to create a sample MSBuild task that will generate a .ASDatabase file directly from a Visual Studio project, without requiring Visual Studio itself.  This means that multiple developers can work on a project, check-in files via source control, and can schedule an automated build, build on a dedicated “clean” machine (without VS), or any of a number of other scenarios.

I added the custom MSBuild task to the Analysis Services Community Samples project on CodePlex under the SsasHelper sample.

The Build task code itself is trivial.  I won’t go into a lot of detail on that, has creating/debugging custom tasks is well documented (you can start the library with MSBuild.exe as the external program and the project file as the argument).  I used it as a wrapper for previously developed ProjectHelper code that does all the heavy lifting.  All we do is inherit from Microsoft.Build.Utilities.Task and implement the Execute method.  All I do in the task is de-serialize the project (based on the Visual Studio project passed in), validate the project (based on the target version of SSAS), and write out the .ASDatabase file.  This could of course be modified to use another method to deploy the database, but I’ve been using the .ASDatabase method for awhile with no issues.

Here’s the main code for the method:

Database database = ProjectHelper.DeserializeProject(SsasProjectFile);

// ... Verify our project doesn't have any errors ...
ValidationResultCollection results;

bool isValidated = ProjectHelper.ValidateDatabase(database, SsasServerEdition, out results);

// If the database doesn't validate (i.e., a build error)
// log the errors and return failure.
foreach (ValidationResult result in results)

if (!isValidated)
return false;

// Build the .ASDatabase file
ProjectHelper.GenerateASDatabaseFile(database, SsasTargetFile);
catch (Exception ex)
return false;

return true;

So… how do we actually use this?  I included a sample project file in the SsasBuilder project.  The basic idea is you have a project file (a file that ends in “.[optional prefix]proj”, such as “.proj”, “.csproj”, etc.).  You can call this via MSBuild.  Note that the standard SSAS project file DOES NOT work with MSBuild.  The schemas required for that project conflict with the MSBuild schema, so you’ll have to create another project file, or build the build step into somewhere else.  Here’s an example project file:

   1: <Project xmlns="">

   2:     <UsingTask TaskName="SsasBuilder.SsasBuildASDatabaseFileTask"

   3:         AssemblyFile="C:\TFS\SsasHelper\SsasBuilder\bin\debug\SsasBuilder.dll"/>

   4:     <Target Name="BuildASDatabaseFile">

   5:         <SsasBuildASDatabaseFileTask SsasProjectFile = "C:\Test\enterprise_Gold\Adventure Works DW 2008.dwproj"

   6:             SsasTargetFile = "C:\Test\SsasBuildTest\AdventureWorks.ASDtabase"

   7:             SsasServerEdition = "Enterprise" />

   8:     </Target>

   9: </Project>

Here I’m using absolute paths, but you can use either properties or relative paths as required for your particular project.  You just use a UsingTask tag to point to the assembly containing the build task, then use the task in a target.  For this sample I’m I’m taking the SSAS project file and target filename, along with the server edition, as parameters.  If there are no errors in the project file, the .ASDatabase file will be generated in the specified location.

Now, all you have to do is call “MSBuild.exe <Whateveryounamedyourproject>”, and you’ll get a .ASDatabase file out of it…



Validating SSAS Projects Programmatically

Earlier this week I got a feature request from someone someone that was looking in to my SsasHelper sample on the Microsoft SQL Server Community Samples:  Analysis Services site on CodePlex.  She was interested in Building a SQL Server Analysis Services .ASDatabase file from a Visual Studio SSAS Project, but pointed out that I didn’t actually *validate* a project before creating the .ASDatabase file, so if someone had checked in a project with errors, the whole process could blow up (or we might deploy a database with some issues).  I looked into doing this, and it turns out it’s really easy to accomplish.  I updated the code in SsasHelper sample on CodePlex to show how to do this.

The actual code is really simple:

   1: bool doesBuild = false;

   2: results = new ValidationResultCollection();


   4: // We have to provide a ServerEdition for this method to work.  There are 

   5: // overloads that look like the will work without them, but they can't be used

   6: // in this scenario.

   7: // The ServerEdition might need to be changed for your situation.

   8: // This can be modified to return warnings and messages as well.

   9: doesBuild = database.Validate(results, ValidationOptions.None, ServerEdition.Developer);


  11: return doesBuild;

You can use the method I created like so:

   1: Database database;

   2: bool hasErrors = false;


   4: // Load a SSAS database object based on a BIDS project

   5: database = ProjectHelper.DeserializeProject(ssasProjectFile);


   7: // ... Verify our project doesn't have any errors ...

   8: ValidationResultCollection results;

   9: hasErrors = ProjectHelper.ValidateDatabase(database, out results);


  11: foreach (ValidationResult result in results)

  12: {

  13:     Console.WriteLine(string.Format("{0}", result.Description));

  14: }


  16: Console.WriteLine(string.Format("Project is Error Free?  {0}", hasErrors));


  18: Console.WriteLine("Project validated!");

This will take a Database object and validate it to see if there are any errors.  You can modify it to return Warnings and Messages as well by changing the Validation Options.

One thing to note is that the Server Edition is a required parameter.  There are a few overloads of the Validate method that don’t require this parameter, but what they try and do is walk up the object tree to get the Server object (associated with the Database) and retrieve the edition.  Since I’m de-serializing a project into a Database object, this property isn’t available (and can’t be set).  It is important to use this method with the correct Server Edition.  The validation process will throw errors if you use some (but I don’t think *all*) of the features for a different edition.  For example, if your project includes Translations (an Enterprise-only feature), validation will succeed if you use ServerEdition.Developer or ServerEdition.Enterprise, but will fail if you use ServerEdition.Standard.



Filtering Objects in PowerShell based on a List of Accepted Values

I was writing a script the other day where I want to return a collection of Services based on the name.  It took me a few minutes to figure out how to do this, so I thought I’d jot it down.  Nothing revolutionary, but I’ve definitely found this pattern to be handy.

Let’s start with a comma separated list of objects we want to filter by.  In this case I only want to return objects that start with “a”, “c”, “e”, or “g”.  I’ll take that list, and split it into an array.

# List we want to filter against
# Split it into an array
$filterList = ‘a,c,e,g’.Split(",")

Next, I’ll generate a collection of objects to try and filter.  I’ll just use a function to make life easy:

# Function to generate a collection of objects to test on
function Create-TestCollection {
    # Turn this into an object to filter
    $objectList = 'a,b,c,d,e,f,g'.Split(",")

    foreach($object in $objectList)
        $output = new-object PSObject
        $output | add-member noteproperty Name $object

        Write-Output $output

# Populate the variable to test with our collection of objects
$collection = Create-TestCollection

Now $collection contains a list of objects with a single Name property containing “a”, “b”, “c”, “d”, etc.

Now list filter $collection using a Where:

Write-Host "Objects from the Collection where the Name property exists in an Array"
# Filter the collection of objects based on a CSV list
$collection | Where {$_.Name -and $filterList -eq $_.Name }

And it will return only the objects from our initial comma separated list:


This also works with other strings:

# An array we want to filter
$collection2 = 'a,d,g,j,m'.Split(",")

Write-Host "Strings in an Array where that exist in an Array"
# Filter the collection of objects based on a CSV list
$collection2 | Where {$_ -and $filterList -eq $_ }

Strings in an Array where that exist in an Array