Wednesday, August 6, 2008

The power of ROW_NUMBER()

Ever since I heard Itzak Ben-Gan talk about the awesome capability of the ROW_NUMBER() function in 2005 I have been waiting for an opportunity to solve a TSQL challenge by using it; today I got the opportunity.

I have a table with a million records that contains a record for each account held by a whole range of companies; and there is usually more than one and as many as 50 records per company.

The challenge is to write a TSQL statement that returns a single record per company with a column for each account - essentially pivoting the table on the account numbers and naming the columns [Account 1],...[Account n].

Here is my solution and as you can see I use the ROW_NUMBER() function not only to partition the company data but also to generate a recordset for numbers to dynamically build the TSQL statement. I also made use of two other 2005 features: the nvarchar(max) datatype and the TOP(@variable).

--add the variable to determine how many account columns we need
DECLARE @AccountColumns int
SET @AccountColumns = 50;

--define the case statement template for each column
DECLARE @AccountColumnSQL nvarchar(255)


SET @AccountColumnSQL = N',MAX(CASE WHEN SQ.[Row] = #No# THEN SQ.[Account] END) AS ''Account #No#''';


--use the sys.objects table to generate a recordset of sequential numbers from 1 to n

DECLARE @AccountColumnList nvarchar(max)
SELECT @AccountColumnList = (
SELECT TOP(@AccountColumns) REPLACE(@AccountColumnSQL,'#No#',ROW_NUMBER() OVER(ORDER BY [object_id]))
FROM sys.objects
FOR XML PATH('')
);

--define the query to produce the pivoted values

DECLARE @AccountQuery nvarchar(max)
SET @AccountQuery = N'
SELECT
[Id]
,[Name]
' + @AccountColumnList + '
FROM (
SELECT
[Id]
,[CompanyName]
,[Account]
,ROW_NUMBER() OVER(PARTITION BY [Id] ORDER BY [Id] DESC) AS ''Row''
FROM [dbo].[CompanyAccounts]
) AS SQ
GROUP BY
[Id]
,[Name]';

EXEC(@AccountQuery);



And that's it. The query returned 1 million records with 52 columns in 1min 11sec...nice!

If you are wondering why I didn't use the PIVOT statement rather than the CASE it because I couldn't get it to work and also because BG (that's Ben-Gan) suggested that there was no significant advantage to using it other than neatness.

1 comment:

Anonymous said...

Ehh, the reason its taken so long to find a suitable challenge for ROW_NUMBER is because its taken so long to get the ball rolling from 2000 to 2005.