Posts tagged ‘SQL Server 2008’

Create XML Fragment in SQL Server

The XML datatype, first introduced in SQL Server 2005, can be very  handy when utilized properly. It is an industry standard, and with it, it is easy to import/export data across different servers, as well as different systems entirely. This tutorial will guide you through creating an XML fragment on the fly in SQL Server. Additional resources, as well as reference materials can be found at the bottom of this post. These instructions are valid for both SQL Server 2005 and SQL Server 2008.

First off, we need some sample data:

DECLARE @Students TABLE (
 FirstName nvarchar(50),
 LastName nvarchar(50),
 DisplayName nvarchar(100)
)
INSERT INTO @Students
SELECT 'Jim' as FirstName, 'Bob' as LastName, 'Jim Bob' as DisplayName
UNION
SELECT 'John', 'Doe', 'John Doe'
UNION
SELECT 'Jane', 'Doe', 'Jane Doe'
UNION
SELECT 'Yuri', 'Tao', 'Yuri Tao'

Now that we have our test data, we can attempt to create an XML string for each distinct row. Ultimately, we want our table to end up like the sample below:

XML

We could try our first option:

select * from @Students FOR XML RAW

However, this option returns an XML fragment for the entire dataset – not what we’re looking for here. We need an individual XML fragment for each row.

After searching around on the net and a few forums, I was finally able to get an answer:

SELECT *
FROM @Students s
CROSS APPLY
(
 SELECT
 (
 SELECT *
 FROM @Students t
 WHERE t.DisplayName = s.DisplayName
 FOR XML RAW
 ) x1
)x

The CROSS APPLY function basically creates a Cartesian product. In this case, it iterates over each row in our table and produces the desired XML result. However, if you were to run this query as-is on our sample data, you would notice that the XML output, while formatted as XML, isn’t of an XML datatype.

To fix this, simply convert the column, like so:

SELECT s.*, CONVERT(XML,x.x1) as ErrorData
FROM @Students s
CROSS APPLY
(
 SELECT
 (
 SELECT *
 FROM @Students t
 WHERE t.DisplayName = s.DisplayName
 FOR XML RAW
 ) x1
)x

That’s all there is to it! We now have an XML fragment for each row of data in our sample. As always, test the example for performance bottlenecks. I’ve utilized this 50,000 records and it returns in about 1 minute. Not too bad, given that the records are very wide (40-50 columns).

Later, I will write a post detailing querying and shredding XML data into a relational dataset – good stuff indeed!!

Additional Resources:

Death by SQL…an Act in Two Parts

CartoonHow common is it to run into performance issues with SQL Server? Daily?  Hourly? Maybe for you, it’s a common existence; and for that, I’m sorry. And how are some ways that you deal with performance degradation in SQL? I’d venture to say that, for most, it would involve juggling indexes, statistics, etc. on the tables in question. But what about going about this all differently?

What if we take a step back and look at the code itself? Maybe the code is the problem, not the server performance. Since running across SQLServerCentral in the early days of my BI experience, there were a few blog posts and articles which have stuck with me throughout. One such article, More RBAR and “Tuning” UPDATEs, has been of great help to me.

This article opened up my eyes to a completely different way of thinking when it comes to performance problems within SQL Server. I highly suggest reading it before continuing with the rest of my post here.

I ran into this “tuning” problem the other day when working with some Fact records that I was trying to tie to a Type 2 Dimension. I have about 37,000 school enrollment records for which I need to find the appropriate Student ID surrogate key among 273,000 different student records. It seemed pretty simple enough:

  • Link the Fact record to the Dimension record using the Student Number
  • Based upon the Fact record’s registration date column, place the record with the correct version of the Student

Act 1

There are two different ways to construct our SQL statement to get this job accomplished: either set-based or row-by-row (see RBAR above) Obviously, one is much more preferred above the other method. Take, for example, the code below (RBAR):

UPDATE [PreStage_FactSchoolEnrollment]
SET [AlternateStudentID] =
(SELECT
    (SELECT TOP 1 DS.[StudentID]
    FROM [DimStudent] DS
    WHERE DS.[EffectiveEndDate] >= [FactSchoolEnrollment].[RegistrationDate]
        AND DS.[StudentSISID] = [FactSchoolEnrollment].[Pupil_Number]
    ORDER BY DS.[EffectiveEndDate],DS.[StudentID])
FROM [FactSchoolEnrollment]
WHERE [PreStage_FactSchoolEnrollment].[ID] = [FactSchoolEnrollment].[ID])

Seems innocent enough, right? However, there is a huge performance issue with this query. Below is a screenshot of one particular piece of the actual execution plan from this query above. Remember our record counts: ~37,000 Fact records and ~273,000 Dimension records.

queryplan_indexspool

That’s right…that number circled above is over 8 BILLION rows that were created in memory!! (8,465,578,262 to be exact). This is the base problem with RBAR queries. In essence, this query, as it is currently structured, queried and stored the ENTIRE dimension (all 273,000 records) for EACH of the incoming Fact records (37,000). That is where the 8.4 Billion records are created. Notice that this update took over 48 minutes run. There isn’t an index in the world that is going to help this type of performance monster.

Act 2

Enter set-based SQL. How about we reconstruct this query as a set-based query instead? Look at the differences in the SQL below:

UPDATE PreStage_FactSchoolEnrollment
SET PreStage_FactSchoolEnrollment.AlternateStudentID = ISNULL(b.StudentID,-1)

FROM (SELECT ROW_NUMBER() OVER (PARTITION BY StudentSISID ORDER
BY EffectiveEndDate ASC) as rownum, Pupil_Number, sub_b.StudentID
FROM FactSchoolEnrollment sub_a
INNER JOIN DimStudent sub_b ON
    sub_a.Pupil_Number = sub_b.StudentSISID
WHERE (sub_b.EffectiveEndDate >= sub_a.RegistrationDate)) b
WHERE PreStage_FactSchoolEnrollment.Pupil_Number = b.Pupil_Number

This end result of this query is EXACTLY the same as the above query; the only difference is that this query took all of 9 seconds to return data. Now that’s a performance gain!

Followup

Now, understandably, it may not be feasible to rewrite your SQL code because of different constraints. But, if you can, at all, (and I’m pleading with you here), PLEASE try to rewrite the code itself. You will be surprised at how much of a difference syntax can make!

How to Restore a Corrupt Database in SQL Server 2008 (Error 1813)

I ran into this issue the other day, and thought that I’d blog about it. I had copied a database from a co-worker (MDF files only, unbeknown to me) that I needed for my work. However, he forgot to include the Log file for the associated database in the backup that I received(!) Consequently, when I went to restore the database onto my local machine, I was greeted with a very ‘friendly’ message from SQL Server saying that my database could not be restored.

If this had been any other situation, I would have just gone over to my coworker and asked him to give me a copy of the Log file as well (after having chastised him for giving me a bum backup). However, when this situation occurred, we were at two different physical locations, and I didn’t have a way to get over to where he was.

After googling around for a little bit, I ran across a TERRIFIC post that saved my bacon.

Basically what the guy did is created a dummy database, took SQL offline and swapped the MDF files out, brought SQL back online (effectively putting the database into SUSPECT mode), then put the database into EMERGENCY mode, and had SQL re-create the LDF file. Genius!

I can attest to the fact that this works on SQL Server 2008. My database was 27GB in size, so it may take a while to create the LDF file. For me it took about 30 minutes to recreate the file.

Hopefully this post will help someone else out with the issue that I was facing!

Inconsistent Background Colors in SSRS 2008

I ran across this issue a while back and figured I’d blog about my findings. After running my issue through Google, it turns out that I wasn’t the only one with this same problem.

In SSRS, you can use the following code expression to set alternating background colors on your rows as part of the Background Color property, making the data easier to read:

=IIF(RowNumber(Nothing) MOD 2,"WhiteSmoke","White"

The only problem with this code is this: If your matrix has grouping on the dataset, then there is the possibility that your background colors can be inconsistent. Look at the image below to see an explanation of what happens when you use the above code snippet as your background color:

image

Notice on this example how the first row is White, then the next 2 are WhiteSmoke, the next White, etc. These rows should be presented as shown below (alternating White to WhiteSmoke):

image

 

To accomplish this, I derived the background in SQL Server using a case statement, and then passed that column value as the background image in SSRS.

CASE
    WHEN DENSE_RANK() OVER (ORDER BY LastName) % 2 = 1 then 'WhiteSmoke'
    ELSE 'White'END as BGColor

I found a great write-up relating to the different ranking functions within SQL Server here; if you’re confused about any of the ranking functions, I would suggest it as a good read. It is written to SQL Server 2005, but is also applicable to SQL Server 2008. The way that I am using the ranking function here, the case statement performs a MOD 2 (% 2) function on the LastName. If the result = 1, then we pick WhiteSmoke as the background color. Else, we choose White.

Now, all that needs to be done is to include the BGColor column as part of your dataset, and then set the Background Color property in the cells that you want to alternate (in my case, I wanted the entire row to alternate colors, so I set BGColor as the Background Color in all of my cells).

Another advantage of this approach is centralized maintenance. Say, for instance, that you have a stored procedure in which you are incorporating this code, and that stored procedure is utilized by 5 different reports. If, for some reason, your user comes back saying that they want the alternating background colors to be different, then you have to physically touch each RDL file to accommodate this change. By determining the background inside SQL Server, you only have to make the change in one place — the stored procedure.

One thing to note: Whatever you use as your order column in your case statement (in my case, I used LastName), you have to ensure that your grouping in SSRS is also set to order by that column. Else, your row colors will once again be out of order.

Hope that this write-up helps someone work around the headache that I spun my wheels working through!

NOTE: If you enable sorting on any of your columnsin SSRS, then this approach will not work. What I have found is that you can use the function that I mentioned above, and all you need to do is to ensure that your data is presented to SSRS sorted by LastName (perform the sort in your dataset query in SSRS, and the also sort your data in the Report).

=IIF(RowNumber(Nothing) MOD 2,"WhiteSmoke","White"