Wednesday, August 6, 2008

The power of ROW_NUMBER()

Ever since I heard Itzak Ben-Gan talk about the awesome capability of the ROW_NUMBER() function in 2005 I have been waiting for an opportunity to solve a TSQL challenge by using it; today I got the opportunity.

I have a table with a million records that contains a record for each account held by a whole range of companies; and there is usually more than one and as many as 50 records per company.

The challenge is to write a TSQL statement that returns a single record per company with a column for each account - essentially pivoting the table on the account numbers and naming the columns [Account 1],...[Account n].

Here is my solution and as you can see I use the ROW_NUMBER() function not only to partition the company data but also to generate a recordset for numbers to dynamically build the TSQL statement. I also made use of two other 2005 features: the nvarchar(max) datatype and the TOP(@variable).

--add the variable to determine how many account columns we need
DECLARE @AccountColumns int
SET @AccountColumns = 50;

--define the case statement template for each column
DECLARE @AccountColumnSQL nvarchar(255)


SET @AccountColumnSQL = N',MAX(CASE WHEN SQ.[Row] = #No# THEN SQ.[Account] END) AS ''Account #No#''';


--use the sys.objects table to generate a recordset of sequential numbers from 1 to n

DECLARE @AccountColumnList nvarchar(max)
SELECT @AccountColumnList = (
SELECT TOP(@AccountColumns) REPLACE(@AccountColumnSQL,'#No#',ROW_NUMBER() OVER(ORDER BY [object_id]))
FROM sys.objects
FOR XML PATH('')
);

--define the query to produce the pivoted values

DECLARE @AccountQuery nvarchar(max)
SET @AccountQuery = N'
SELECT
[Id]
,[Name]
' + @AccountColumnList + '
FROM (
SELECT
[Id]
,[CompanyName]
,[Account]
,ROW_NUMBER() OVER(PARTITION BY [Id] ORDER BY [Id] DESC) AS ''Row''
FROM [dbo].[CompanyAccounts]
) AS SQ
GROUP BY
[Id]
,[Name]';

EXEC(@AccountQuery);



And that's it. The query returned 1 million records with 52 columns in 1min 11sec...nice!

If you are wondering why I didn't use the PIVOT statement rather than the CASE it because I couldn't get it to work and also because BG (that's Ben-Gan) suggested that there was no significant advantage to using it other than neatness.

CLR Bug

Ok, I did a backup of a db, that contained CLR procs, and then restored it with another name; set the database trust:

ALTER DATABASE [MyDb] SET TRUSTWORTHY ON;

...executed a CLR proc and here is what SQL gave me:

Msg 10314, Level 16, State 11, Line 1
An error occurred in the Microsoft .NET Framework while trying to load assembly id 66426. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE. Run the query again, or check documentation to see how to solve the assembly trust issues. For more information about this error:
System.IO.FileLoadException: Could not load file or assembly 'ciqapplication, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. Exception from HRESULT: 0x80FC80F1
System.IO.FileLoadException:
at System.Reflection.Assembly.nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, Assembly locationHint, StackCrawlMark& stackMark, Boolean throwOnFileNotFound, Boolean forIntrospection)
at System.Reflection.Assembly.InternalLoad(AssemblyName assemblyRef, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
at System.Reflection.Assembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection)
at System.Reflection.Assembly.Load(String assemblyString)


Thankfully it was an easy one to fix (see MSKB: http://support.microsoft.com/kb/918040)

I changed the db owner to sa:

USE [MyDb];
GO
EXEC sp_changedbowner 'sa';

...and it now works. Turns out that there is an issue the db owner not having a specific login on the server, even though my access was granted via a group login.

Sure, you could have figured this out with google but it did seem a nice nugget to blog about.

Friday, August 1, 2008

Db Unit Testing: Intro

One of the most exciting features of DbPro (Visual Studio Team Edition for Database Professionals) is the unit test - see here for a really good intro from Jamie Lafen. The reason I think this is so exciting is because it brings the world of database development into the same arena as the rest of software development on the microsoft platform; for so long the two disciplines have been miles apart with development techniques and procedures being completely different.

But with the introduction of the database project and also with the database unit test we can now start to treat database development the same way we treat software development. Both should be adhering to the same methodology and striving to achieve the same goals of quality, robustness and agility.

Well, these are all just words and what I really want to say is that I want to be able to treat a stored procedure or function with the same respect that I would a method or interface in C#. I want to have a test rig that I can attach to a database nightly build that can give me the confidence that the T-SQL I have added does not invalidate any of the existing schema and that my code still works, end-to-end. And most importantly, I want this to happen automatically each night without me having to sit for ages executing test scripts that only work for me.

So, my adventures into TDDD begins here.

Monday, July 28, 2008

Common Collation Conflict

Whenever you access database schema outside the scope of the current database there is a chance that you will run into a collation conflict error. Most of the instances of this error that I have encountered have been caused by remote databases using the older SQL collation (SQL_Latin1_General_CP1_CI_AS) instead of the windows collation (Latin1_General_CI_AS).

Because these two collations use the Latin1 (ANSI) code page and have the same sort orders (CI_AS), there is no harm in changing from one to the other in a conflict scenario.

Consider the following query:

SELECT cur.[Column1]
FROM [dbo].[Table1] AS cur
INNER JOIN [RemoteServer].[RemoteDb].[dbo].[Table1] AS rem
ON (rem.[Column1] = cur.[Column1]);

This produces the following error because the remote table column has been defined using the SQL collation.

Msg 468, Level 16, State 9, Line 1
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.


To resolve this problem, we could simply add the COLLATE statement to convert to the local Windows collation.

SELECT cur.[Column1]
FROM [dbo].[Table1] AS cur
INNER JOIN [RemoteServer].[RemoteDb].[dbo].[Table1] AS rem
ON (rem.[Column1] COLLATE Latin1_General_CI_AS = cur.[Column1]);

However, there is another way; a linked server can be configured to ignore remote collations by changing the "Use Remote Collation" Server Option to FALSE. This is a far cleaner, simpler way to handle collation conflicts where linked servers are involved.

Microsoft buys DATAllegro

Firstly, I don’t know much about DatAllegro, other than the fact that their product is an enterprise Data Warehouse solution. A colleague of mine told me about the acquisition (http://www.datallegro.com/pr/7_24_08_microsoft_acquisition.asp) this morning. What I think is significant is that MS is continuing to grow its database platform and that is good news for SQL Professionals. The way I look at it is that if SQL grows, I will grow with it and somewhere down the line my job description and skill-set become far more valuable. That can only be a good thing.

Friday, July 25, 2008

To Blog or not to Blog

I have been convinced by one my colleagues, Matt (http://cubisbi.blogspot.com/) that it would be a good idea to start a blog for a wider audience than our corporate environment; this after posting internally for nearly a year now.

My focus is SQL Server and Visual Studio for Database Pro's.

Hope you get some sort of value from the blog.

Cheers

BG