Fri 24 Jul 2009
SQL Server – COUNT(expression) bug
Posted by Phil Brammer under SQL Server , SQL Server 2005 , SQL Server 2008[4] Comments
Starting with SQL Server 2005 RTM and through the latest SQL Server 2008 build at the time of this writing, there is a pretty nasty bug with respect to the COUNT(expression) function and its rule that when using an expression, it filters out NULLs in its count results.
Let me explain with a table called, NAME:
First_Name
John
Adam
Kris
NULL
Paula
NULL
George
’select count(*) from NAME’ will return the value of 7.
’select count(First_Name) from NAME’ will return the value of 5 because there are two NULL values that are being excluded by rule.
So far so good. The bug in this case shows up under conditions where you are COUNTing a column from what I’ll call a “grandchild table.” In other words, if I have a table, SYS.OBJECTS, and it is joined to SYS.COLUMNS, which is then joined to another table (ANOTHERTABLE) that may or may not have matching SYS.COLUMNS in it and you are counting a column (eliminating NULLs) within ANOTHERTABLE, you may get erroneous results.
A working example is below, thanks to Ionel’s repro query included in the Connect bug submission:
select o.object_id,
count(x.column_id) should_be_zero_always
from sys.objects o
left join sys.columns c
on o.object_id = c.object_id
left join (
select column_id = 2
where 0 = 1
) x
on c.column_id = x.column_id
group by o.object_id
You can run the above code in any database, and will likely see that 1s show up in the column titled, “should_be_zero_always.”
As you can see, the above query is going to be quite common where you do not want to filter results from the driving table and are forced to use an outer join.
The data behind the scenes looks like this:
o.object_id c.object_id x.column_id cnt_returns_1_y_n
1131151075 NULL NULL y
1147151132 1147151132 1 n
1147151132 1147151132 2 n
1147151132 1147151132 3 n
1147151132 1147151132 4 n
1147151132 1147151132 5 n
1147151132 1147151132 6 n
1163151189 NULL NULL y
1179151246 NULL NULL y
Where we have a NULL in c, x is also forced to be NULL (even though x is hardcoded to return an empty data set in the example above), COUNT(x.column_id) will return a 1 for this scenario, when it should be returning a zero.
This works correctly in SQL Server 2000, I might add.
The bug is currently being tracked internally at Microsoft, and in theory, no future CTPs (which would be for releases beyond 2008) won’t have the issue, but there is no E.T.A. on when we’ll see a fix in 2005 or 2008. Hopefully by the end of the year if we’re lucky and it doesn’t get de-prioritized.
I would encourage you to review the Connect bug submission and at least vote, but by all means, please leave your feedback on the issue while you are there.
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=380304
August 1st, 2009 at 8:29 pm
Actually, from what I’ve been taught in my CURRENT SQL classes (70-441 and 70-442) this is a documented FEATURE, NOT a bug.
It clearly states in the SQL class that I am taking that if WILL NOT get a count of all the records in a table IF YOU DO A COUNT ON A FIELD THAT HAS NULLS BECAUSE A NULL IS JUST THAT – NULL.
As such, the code you listed does exactly what you told it to do – get a count on the field FirstName field WHERE IT HAS A VALUE. NULL doesn’t ‘count’ (forgive the pun) in this case.
For your count to be accurate simply select a field defined as NOT NULL, like the primary key of your data record.
August 1st, 2009 at 8:36 pm
Ed, while you are correct, I think you may be misinterpreting what I’ve written.
The bug as I’ve indicated above, is where COUNT(column) (which only returns a count of NON NULL values) actually returns a value when in reality it should not. If you look at the last resultset I’ve provided, the last column indicates where this bug is present (the cnt_returns_1_y_n column). Clearly you can see that COUNT is returning a 1 where the column it is counting is NULL. THAT is a bug.
Phil
August 3rd, 2009 at 4:11 am
Wow! That’s really one to look out for. I do think that one of your column headings in the sample data is wrong, though. Shouldn’t ‘x.column_id’ read ‘c.column_id’? Had me scratching my head for a few minutes!
Thanks for alerting us to this – could save a lot of confusion!
Olly
August 3rd, 2009 at 5:26 am
Olly,
x.column_id is correct. Because we are left joining to the subquery aliased as ‘x,’ I wanted to show that its value is indeed NULL. If I chose c.column_id instead, we would actually see some values in that column, which is not the intended behavior.
Phil