Pages

Monday, 9 July 2012

Cursors and While loops reply

Earlier in the year David Curlewis ( t | b ) wrote a blog post titled "cursors and while loops". If you haven't read that article please go read it now as this is basically a reply or follow-up to that post.

In his article, David showed that for the test he was using the cursor was far and away the higher performing query. The WHILE loop took 30+ seconds whereas the cursor query took closer to 0.5 seconds.

I spotted a potential improvement to make the WHILE loop behave more in line with what I expected and I think what David expected.

Lets recap

Since the hardware I am running on is no doubt different to David's and I ran my tests on SQL Server 2012 with cumulative update 2 (11.0.2325) there are going to be some timing differences. I believe I have used the same AdventureWorks version of SQL 2008 (get it from CodePlex) and I have used the scripts David used.

For each test result I have taken the average of 5 tests.

Rather than repeat ALL the cursor tests that David did I've selected the one that performed the fastest: a GLOBAL STATIC CURSOR. As I said: go read his article if haven't yet.

The CURSOR approach

------------------------------------------------
-- CURSOR
------------------------------------------------
--DECLARE cursor_name CURSOR [ LOCAL | GLOBAL ]
--     [ FORWARD_ONLY | SCROLL ]
--     [ STATIC | KEYSET | DYNAMIC | FAST_FORWARD ]
--     [ READ_ONLY | SCROLL_LOCKS | OPTIMISTIC ]
--     [ TYPE_WARNING ]
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO
SET NOCOUNT ON;
GO
DECLARE @SalesOrderID INT, @LineCount INT;
DECLARE SOID CURSOR GLOBAL STATIC TYPE_WARNING
FOR
    SELECT  SalesOrderID
    FROM    Sales.SalesOrderHeader
    WHERE   OnlineOrderFlag = 1;
 
OPEN SOID;
FETCH NEXT FROM SOID INTO @SalesOrderID;
WHILE @@FETCH_STATUS = 0
BEGIN
    SET @LineCount = ISNULL(@LineCount,0) + ISNULL((
            SELECT  COUNT(*)
            FROM    Sales.SalesOrderDetail
            WHERE   SalesOrderID = @SalesOrderID
            ), 0);
 
    FETCH NEXT FROM SOID INTO @SalesOrderID;
END
CLOSE SOID;
DEALLOCATE SOID;
IF @LineCount <> 60398
BEGIN
    RAISERROR('Bad Total',16,1);
    PRINT @LineCount;
END
GO



When I ran this the code achieved the following metrics:
CPU (ms) Reads Writes Duration (ms)
699 195433 52 1216

The WHILE loop

------------------------------------------------
-- WHILE LOOP (using counter)
------------------------------------------------
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO
SET NOCOUNT ON;
GO
IF OBJECT_ID('tempdb..#SOID') IS NOT NULL DROP TABLE #SOID;
GO
CREATE TABLE #SOID (ID INT IDENTITY, SalesOrderID INT NOT NULL);
DECLARE @i INT = 1, @ii INT, @SalesOrderID INT, @LineCount INT;
 
INSERT  #SOID (SalesOrderID)
SELECT  SalesOrderID
FROM    Sales.SalesOrderHeader
WHERE   OnlineOrderFlag = 1;

SET @ii = @@ROWCOUNT;
 
WHILE @i <= @ii
BEGIN
    SET @SalesOrderID = (SELECT SalesOrderID FROM #SOID WHERE ID = @i);
 
    SET @LineCount = ISNULL(@LineCount, 0) + ISNULL((
            SELECT  COUNT(*)
            FROM    Sales.SalesOrderDetail
            WHERE   SalesOrderID = @SalesOrderID
            ),0);
 
    SET @i += 1;
END
IF @LineCount <> 60398
BEGIN
    RAISERROR('Bad Total',16,1);
    PRINT @LineCount;
END
GO

This code achieved the following metrics:
CPU (ms) Reads Writes Duration (ms)
39642 1827358 60 41651

We've gone from 0.7secs to a whopping 39.6secs of CPU and 1.2secs duration to 41.6secs. Clearly theres an issue with the WHILE loop.

The keen eyed amongst you would have spotted where all that resource is being chewed up - the reads has jumped from 195,433 to 1,827,358. The WHILE loop is doing nearly 10 times as many reads.

What I realised when I first saw this with David's article was that the ID column lookup on #SQLIO has no index. Therefore with every iteration of the loop we have to scan that temporary table. We're really only after one row and we know that the ID is unique because it is an IDENTITY column. All we need to do is add an index on that column and being unique its quick and easy to make it a primary key...

The WHILE loop with a PRIMARY KEY

------------------------------------------------
-- WHILE LOOP (using counter)
------------------------------------------------
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO
SET NOCOUNT ON;
GO
IF OBJECT_ID('tempdb..#SOID') IS NOT NULL DROP TABLE #SOID;
GO
CREATE TABLE #SOID (ID INT IDENTITY PRIMARY KEY CLUSTERED, SalesOrderID INT NOT NULL);
DECLARE @i INT = 1, @ii INT, @SalesOrderID INT, @LineCount INT;
 
INSERT  #SOID (SalesOrderID)
SELECT  SalesOrderID
FROM    Sales.SalesOrderHeader
WHERE   OnlineOrderFlag = 1;

SET @ii = @@ROWCOUNT;
 
WHILE @i <= @ii
BEGIN
    SET @SalesOrderID = (SELECT SalesOrderID FROM #SOID WHERE ID = @i);
 
    SET @LineCount = ISNULL(@LineCount, 0) + ISNULL((
            SELECT  COUNT(*)
            FROM    Sales.SalesOrderDetail
            WHERE   SalesOrderID = @SalesOrderID
            ),0);
 
    SET @i += 1;
END
IF @LineCount <> 60398
BEGIN
    RAISERROR('Bad Total',16,1);
    PRINT @LineCount;
END
GO

And....voila:
CPU (ms) Reads Writes Duration (ms)
717 140336 31 1215

There are a few milliseconds here and there, but the performance of the while loop and cursor are now in line with one another.

Whilst I was here I thought I would try the same query but using a table variable instead.

The WHILE loop using a TABLE variable

Here's the code that I used for this test:

-------------------------------------------------
-- WHILE LOOP (using counter) with TABLE VARIABLE
-------------------------------------------------
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO
SET NOCOUNT ON;
GO
DECLARE @SQLIO TABLE(ID INT IDENTITY PRIMARY KEY CLUSTERED, SalesOrderID INT NOT NULL)
DECLARE @i INT = 1, @ii INT, @SalesOrderID INT, @LineCount INT;
 
INSERT  @SQLIO (SalesOrderID)
SELECT  SalesOrderID
FROM    Sales.SalesOrderHeader
WHERE   OnlineOrderFlag = 1;

SET @ii = @@ROWCOUNT;
 
WHILE @i <= @ii
BEGIN
    SET @SalesOrderID = (SELECT SalesOrderID FROM @SQLIO WHERE ID = @i);
 
    SET @LineCount = ISNULL(@LineCount, 0) + ISNULL((
            SELECT  COUNT(*)
            FROM    Sales.SalesOrderDetail
            WHERE   SalesOrderID = @SalesOrderID
            ),0);
 
    SET @i += 1;
END
IF @LineCount <> 60398
BEGIN
    RAISERROR('Bad Total',16,1);
    PRINT @LineCount;
END
GO

I fully expected the results for this to be the same as the temporary table.
CPU (ms) Reads Writes Duration (ms)
699 140074 31 1183

And that is basically what we see.

Full results

CPU

Test CPU (ms)
Cursor 669
While Loop (original) 39642
While Loop (temp table) 717
While Loop (temp variable) 661

CPU for all tests:


CPU for tests excluding the original while:



Duration

Test Reads
Cursor 1216
While Loop (original) 41651
While Loop (temp table) 1215
While Loop (temp variable) 1184

Duration for all tests:


Duration for tests excluding the original while:


Reads

Test Reads
Cursor 195433
While Loop (original) 1827358
While Loop (temp table) 140336
While Loop (temp variable) 140074

Reads for all tests:


Reads for tests excluding the original while:


Writes

Test Reads
Cursor 52
While Loop (original) 60
While Loop (temp table) 31
While Loop (temp variable) 31

Writes for all tests:


Conclusion

I personally don't draw much of a conclusion from these tests.

The cursor and the while loop perform at (more or less) the same level. It is worth noting that a STATIC CURSOR effectively builds a temporary table in the background. Books Online about cursors even has the following:

STATIC

Defines a cursor that makes a temporary copy of the data to be used by the cursor. All requests to the cursor are answered from this temporary table in tempdb; therefore, modifications made to base tables are not reflected in the data returned by fetches made to this cursor, and this cursor does not allow modifications.


Its therefore unsurprising that performance of these two approaches is on a par with one another.

The table variable approach seems to offer a slight benefit over the others. However, the figures are very close and this could simply be down to the way the measurements are taken and/or something else running in the background, even if very briefly, at the time of the tests.

Lets be clear about one thing: either of these approaches is a bad solution for the question that it answers. You can write a very simple set based SELECT query in place of these queries.

No comments:

Post a Comment