SQL Server 2008 – Agile BI http://agilebi.com A community for sharing ideas about business intelligence development using agile methods. Wed, 06 Oct 2010 16:34:17 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 Create XML Fragment in SQL Server http://agilebi.com/blog/2010/10/05/create-xml-fragment-in-sql-server/ Tue, 05 Oct 2010 17:03:44 +0000 http://9.52 Continue reading ]]> The XML datatype, first introduced in SQL Server 2005, can be very  handy when utilized properly. It is an industry standard, and with it, it is easy to import/export data across different servers, as well as different systems entirely. This tutorial will guide you through creating an XML fragment on the fly in SQL Server. Additional resources, as well as reference materials can be found at the bottom of this post. These instructions are valid for both SQL Server 2005 and SQL Server 2008.

First off, we need some sample data:

[sourcecode language=”sql”]
DECLARE @Students TABLE (
FirstName nvarchar(50),
LastName nvarchar(50),
DisplayName nvarchar(100)
)
INSERT INTO @Students
SELECT ‘Jim’ as FirstName, ‘Bob’ as LastName, ‘Jim Bob’ as DisplayName
UNION
SELECT ‘John’, ‘Doe’, ‘John Doe’
UNION
SELECT ‘Jane’, ‘Doe’, ‘Jane Doe’
UNION
SELECT ‘Yuri’, ‘Tao’, ‘Yuri Tao’
[/sourcecode]

Now that we have our test data, we can attempt to create an XML string for each distinct row. Ultimately, we want our table to end up like the sample below:

XML

We could try our first option:

[sourcecode language=”sql”]
select * from @Students FOR XML RAW
[/sourcecode]

However, this option returns an XML fragment for the entire dataset – not what we’re looking for here. We need an individual XML fragment for each row.

After searching around on the net and a few forums, I was finally able to get an answer:
[sourcecode language=”sql”]
SELECT *
FROM @Students s
CROSS APPLY
(
SELECT
(
SELECT *
FROM @Students t
WHERE t.DisplayName = s.DisplayName
FOR XML RAW
) x1
)x
[/sourcecode]

The CROSS APPLY function basically creates a Cartesian product. In this case, it iterates over each row in our table and produces the desired XML result. However, if you were to run this query as-is on our sample data, you would notice that the XML output, while formatted as XML, isn’t of an XML datatype.

To fix this, simply convert the column, like so:
[sourcecode language=”sql”]
SELECT s.*, CONVERT(XML,x.x1) as ErrorData
FROM @Students s
CROSS APPLY
(
SELECT
(
SELECT *
FROM @Students t
WHERE t.DisplayName = s.DisplayName
FOR XML RAW
) x1
)x

[/sourcecode]

That’s all there is to it! We now have an XML fragment for each row of data in our sample. As always, test the example for performance bottlenecks. I’ve utilized this 50,000 records and it returns in about 1 minute. Not too bad, given that the records are very wide (40-50 columns).

Later, I will write a post detailing querying and shredding XML data into a relational dataset – good stuff indeed!!

Additional Resources:

]]>
Death by SQL…an Act in Two Parts http://agilebi.com/blog/2010/09/20/death-by-sqlan-act-in-two-parts/ Mon, 20 Sep 2010 19:46:55 +0000 http://9.34 Continue reading ]]> CartoonHow common is it to run into performance issues with SQL Server? Daily?  Hourly? Maybe for you, it’s a common existence; and for that, I’m sorry. And how are some ways that you deal with performance degradation in SQL? I’d venture to say that, for most, it would involve juggling indexes, statistics, etc. on the tables in question. But what about going about this all differently?

What if we take a step back and look at the code itself? Maybe the code is the problem, not the server performance. Since running across SQLServerCentral in the early days of my BI experience, there were a few blog posts and articles which have stuck with me throughout. One such article, More RBAR and “Tuning” UPDATEs, has been of great help to me.

This article opened up my eyes to a completely different way of thinking when it comes to performance problems within SQL Server. I highly suggest reading it before continuing with the rest of my post here.

I ran into this “tuning” problem the other day when working with some Fact records that I was trying to tie to a Type 2 Dimension. I have about 37,000 school enrollment records for which I need to find the appropriate Student ID surrogate key among 273,000 different student records. It seemed pretty simple enough:

  • Link the Fact record to the Dimension record using the Student Number
  • Based upon the Fact record’s registration date column, place the record with the correct version of the Student

Act 1

There are two different ways to construct our SQL statement to get this job accomplished: either set-based or row-by-row (see RBAR above) Obviously, one is much more preferred above the other method. Take, for example, the code below (RBAR):

UPDATE [PreStage_FactSchoolEnrollment]
SET [AlternateStudentID] =
(SELECT
    (SELECT TOP 1 DS.[StudentID]
    FROM [DimStudent] DS
    WHERE DS.[EffectiveEndDate] >= [FactSchoolEnrollment].[RegistrationDate]
        AND DS.[StudentSISID] = [FactSchoolEnrollment].[Pupil_Number]
    ORDER BY DS.[EffectiveEndDate],DS.[StudentID])
FROM [FactSchoolEnrollment]
WHERE [PreStage_FactSchoolEnrollment].[ID] = [FactSchoolEnrollment].[ID])

Seems innocent enough, right? However, there is a huge performance issue with this query. Below is a screenshot of one particular piece of the actual execution plan from this query above. Remember our record counts: ~37,000 Fact records and ~273,000 Dimension records.

queryplan_indexspool

That’s right…that number circled above is over 8 BILLION rows that were created in memory!! (8,465,578,262 to be exact). This is the base problem with RBAR queries. In essence, this query, as it is currently structured, queried and stored the ENTIRE dimension (all 273,000 records) for EACH of the incoming Fact records (37,000). That is where the 8.4 Billion records are created. Notice that this update took over 48 minutes run. There isn’t an index in the world that is going to help this type of performance monster.

Act 2

Enter set-based SQL. How about we reconstruct this query as a set-based query instead? Look at the differences in the SQL below:

UPDATE PreStage_FactSchoolEnrollment
SET PreStage_FactSchoolEnrollment.AlternateStudentID = ISNULL(b.StudentID,-1)

FROM (SELECT ROW_NUMBER() OVER (PARTITION BY StudentSISID ORDER
BY EffectiveEndDate ASC) as rownum, Pupil_Number, sub_b.StudentID
FROM FactSchoolEnrollment sub_a
INNER JOIN DimStudent sub_b ON
    sub_a.Pupil_Number = sub_b.StudentSISID
WHERE (sub_b.EffectiveEndDate >= sub_a.RegistrationDate)) b
WHERE PreStage_FactSchoolEnrollment.Pupil_Number = b.Pupil_Number

This end result of this query is EXACTLY the same as the above query; the only difference is that this query took all of 9 seconds to return data. Now that’s a performance gain!

Followup

Now, understandably, it may not be feasible to rewrite your SQL code because of different constraints. But, if you can, at all, (and I’m pleading with you here), PLEASE try to rewrite the code itself. You will be surprised at how much of a difference syntax can make!

]]>
How to Restore a Corrupt Database in SQL Server 2008 (Error 1813) http://agilebi.com/blog/2010/09/14/how-to-restore-a-corrupt-database-in-sql-server-2008-error-1813/ Tue, 14 Sep 2010 18:35:56 +0000 http://9.26 Continue reading ]]> I ran into this issue the other day, and thought that I’d blog about it. I had copied a database from a co-worker (MDF files only, unbeknown to me) that I needed for my work. However, he forgot to include the Log file for the associated database in the backup that I received(!) Consequently, when I went to restore the database onto my local machine, I was greeted with a very ‘friendly’ message from SQL Server saying that my database could not be restored.

If this had been any other situation, I would have just gone over to my coworker and asked him to give me a copy of the Log file as well (after having chastised him for giving me a bum backup). However, when this situation occurred, we were at two different physical locations, and I didn’t have a way to get over to where he was.

After googling around for a little bit, I ran across a TERRIFIC post that saved my bacon.

Basically what the guy did is created a dummy database, took SQL offline and swapped the MDF files out, brought SQL back online (effectively putting the database into SUSPECT mode), then put the database into EMERGENCY mode, and had SQL re-create the LDF file. Genius!

I can attest to the fact that this works on SQL Server 2008. My database was 27GB in size, so it may take a while to create the LDF file. For me it took about 30 minutes to recreate the file.

Hopefully this post will help someone else out with the issue that I was facing!

]]>