Pages

Tuesday, 13 December 2011

T-SQL Tuesday #25 - T-SQL Tricks - Selective aggregates


Firstly, thanks to Allen White for hosting T-SQL Tuesday #25.

This month's T-SQL Tuesday is about T-SQL Tips and Tricks.  I mulled over a couple of ideas I had and eventually settled on showing a way to do selective aggregates.  Not because I necessarily think its a trick, though it sort of is, but because I keep seeing this done in other terrible ways (as we'll see).

Selective Aggregates

So what is a selective aggregate?
Well I'm not even sure that's an official name.  I just made it up but it does describe the approach.  Its best described with an example, and I'm going to use our old friend AdventureWorks (get it from CodePlex)

This is a very trivial example, but imagine we wanted to get an aggregate (we're going to use COUNT) of sales by month and year. Well that's quite easy:

select
    year(OrderDate)
    ,month(OrderDate)
    ,count(*)
from
    Sales.SalesOrderHeader
group by
    year(OrderDate)
    ,month(OrderDate)
order by
    year(OrderDate)
    ,month(OrderDate)

Results


Boom! And there you have it.  But what if we wanted to have years across the top...just like your CFO asked so (s)he could compare year on year? 
We could use Pivot for this scenario but thats not what I want to show!  If you look at the results we have years 2005 through 2008.

What I'll often see is something like this:

select
    month(soh.OrderDate) as [Month]
    ,(    select count(*) 
          from Sales.SalesOrderHeader as soh2
          where year(soh2.OrderDate) = 2005
          and month(soh.OrderDate) = month(soh2.OrderDate)
    ) as [2005]
    ,(    select count(*) 
          from Sales.SalesOrderHeader as soh2
          where year(soh2.OrderDate) = 2006
          and month(soh.OrderDate) = month(soh2.OrderDate)
    ) as [2006]
    ,(    select count(*) 
          from Sales.SalesOrderHeader as soh2
          where year(soh2.OrderDate) = 2007
          and month(soh.OrderDate) = month(soh2.OrderDate)
     ) as [2007]
     ,(   select count(*) 
          from Sales.SalesOrderHeader as soh2
          where year(soh2.OrderDate) = 2008
          and month(soh.OrderDate) = month(soh2.OrderDate)
     ) as [2008]
from
    Sales.SalesOrderHeader as soh
group by
    month(soh.OrderDate)
order by
    month(soh.OrderDate)

Results



And it works. But suddenly we've gone from referencing the Sales.SalesOrderHeader table once to 5 times. Thats not good.

How else could we achieve the same thing? The trick is to change your thinking slightly. We are trying to count occurrences where the year is a given value. And a count is the same as sum(1). And thats the trick, we sum the value 1 each time the year is the one we want. Here's how:

select
    month(OrderDate) as [Month]
    ,[2005] = sum(case when Year(OrderDate) = 2005 then 1 else 0 end)
    ,[2006] = sum(case when Year(OrderDate) = 2006 then 1 else 0 end)
    ,[2007] = sum(case when Year(OrderDate) = 2007 then 1 else 0 end)
    ,[2008] = sum(case when Year(OrderDate) = 2008 then 1 else 0 end)
    ,count(*) as TotalInMonth
from
    Sales.SalesOrderHeader
group by
    month(OrderDate)
order by
    month(OrderDate)

Results


Double boom! As you can see, we wrap a CASE statement inside a SUM. The case statement simply checks the year value and either reports 1 (count this one) or 0 (don't count this one). I don't know about you, but that method is much easier to read and to comprehend what is going on. Not only that its considerably faster. We are back to referencing the table just the once.

Query Plan Comparison

Aside from being easier to read, it has a much improved query plan.

Here is the query plan for the basic select:

As you'd probably expect it does a single scan of the table.

Here's the plan for the query with all the sub-selects (brace yourself):

I had to zoom out a bit to fit it on my screen. But just glancing at it...it looks worse right!? That's because it is. Much worse.

And here's the one for the selective aggregate:

That's back to a single table scan. It actually has 1 less operator than the original select by virtual of 1 less group by.

Conclusion

In these examples I was counting the records by doing a SUM(1 or 0). But hopefully you'll have spotted this isn't the only thing you can SUM. For example, we could have grabbed the "TotalDue" value:
select
    month(OrderDate) as [Month]
    ,[2005] = sum(case when Year(OrderDate) = 2005 then TotalDue else 0 end)
    ,[2006] = sum(case when Year(OrderDate) = 2006 then TotalDue else 0 end)
    ,[2007] = sum(case when Year(OrderDate) = 2007 then TotalDue else 0 end)
    ,[2008] = sum(case when Year(OrderDate) = 2008 then TotalDue else 0 end)
from
    Sales.SalesOrderHeader
group by
    month(OrderDate)
order by
    month(OrderDate)

Which would have yielded these results:

There's always many ways to write a query. And I'm sure many of you will already know this approach which is why I'm not 100% convinced its a T-SQL trick. But I meet people far too often that simply didn't know you could do this - and I guess that makes it valid for a T-SQL trick. Try to always think "can I write this a better way" - your DBA will thank you for it.

I'm now off to steal learn from everyone elses tricks. I suggest you do too!

1 comment:

  1. Hey, thank you for this very useful article. I had this question during an interview, and unfortunately went with the not-so-optimal way of doing the transpose :(

    ReplyDelete