Saturday, February 25, 2006
BUG: Combination of GROUP BY with HAVING Clause and LEFT OUTER JOIN with Derived Table With LEFT Function Produces Incorrect Result
Applies To : SQL Server 2000 RTM to SP4
This bug is a variant of KB 308458 which is fixed in SP2.
Symptoms
If all of the following conditions are true for a query, the query will return extra rows.
• You include GROUP BY and HAVING clauses.
• You use a LEFT OUTER JOIN.
• You incorporate derived tables.
• You use an aggregate.
• You use the LEFT function.
Cause
The optimizer inappropriately processes a cross join.
Steps to Reproduce
Execute the following query in Query Analyzer
Use Pubs
GO
SELECT T.pub_id, T.type, SUM(T.price) AS BasePrice
FROM Titles T
LEFT JOIN
(Select 1) X (pub_id)
ON X.pub_id = T.pub_id
WHERE LEFT(T.title,3) = 'The'
GROUP BY T.pub_id, T.type
--HAVING SUM(T.price) > 0
You will get the following results
pub_id | type | BasePrice |
1389 | business | 19.9900 |
0877 | mod_cook | 2.9900 |
0877 | UNDECIDED | NULL |
Now try executing the above query by uncommenting the the HAVING caluse to exclude the NULL value.
Use Pubs
GO
SELECT T.pub_id, T.type, SUM(T.price) AS BasePrice
FROM Titles T
LEFT JOIN
(Select 1) X (pub_id)
ON X.pub_id = T.pub_id
WHERE LEFT(T.title,3) = 'The'
GROUP BY T.pub_id, T.type
HAVING SUM(T.price) > 0
you will get the following incorrect result.
pub_id | type | BasePrice |
1389 | business | 19.9900 |
0877 | mod_cook | 19.9900 |
1389 | business | 2.9900 |
0877 | mod_cook | 2.9900 |
1389 | business | NULL |
0877 | mod_cook | NULL |
If you look at the execution plan, you can see that the query performs an incorrect cross join.
|--Nested Loops(Inner Join)
..... |--Compute Scalar(DEFINE:([Expr1005]=If ([Expr1018]=0) then NULL else [Expr1019]))
..... |....|--Stream Aggregate(GROUP BY:([T].[type], [T].[pub_id]) DEFINE:([Expr1018]=COUNT_BIG([T].[price]), [Expr1019]=SUM([T].[price])))
......|........|--Sort(ORDER BY:([T].[type] ASC, [T].[pub_id] ASC))
......|..............|--Clustered Index Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind] AS [T]), WHERE:(substring([T].[title], 1, 3)='The'))
......|--Table Spool
...........|--Filter(WHERE:([Expr1002]>0.00))
................|--Compute Scalar(DEFINE:([Expr1002]=If ([Expr1020]=0) then NULL else [Expr1021]))
.....................|--Stream Aggregate(GROUP BY:([T].[type], [T].[pub_id]) DEFINE:([Expr1020]=COUNT_BIG([T].[price]), [Expr1021]=SUM([T].[price])))
..........................|--Sort(ORDER BY:([T].[type] ASC, [T].[pub_id] ASC))
...............................|--Clustered Index Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind] AS [T]), WHERE:(substring([T].[title], 1, 3)='The'))
Workaround
You can workaround the problem by just replacing the LEFT function with SUBSTRING function.
SELECT T.pub_id, T.type, SUM(T.price) AS BasePrice
FROM Titles T
LEFT JOIN
(Select 1) X (pub_id)
ON X.pub_id = T.pub_id
WHERE SUBSTRING(T.title,1,3) = 'The'
GROUP BY T.pub_id, T.type
HAVING SUM(T.price) > 0
Note
This bug is fixed in SQL server 2005