Saturday, February 25, 2006
BUG: Combination of GROUP BY with HAVING Clause and LEFT OUTER JOIN with Derived Table With LEFT Function Produces Incorrect Result
Applies To : SQL Server 2000 RTM to SP4
This bug is a variant of KB 308458 which is fixed in SP2.
Symptoms
If all of the following conditions are true for a query, the query will return extra rows.
• You include GROUP BY and HAVING clauses.
• You use a LEFT OUTER JOIN.
• You incorporate derived tables.
• You use an aggregate.
• You use the LEFT function.
Cause
The optimizer inappropriately processes a cross join.
Steps to Reproduce
Execute the following query in Query Analyzer
Use Pubs
GO
SELECT T.pub_id, T.type, SUM(T.price) AS BasePrice
FROM Titles T
LEFT JOIN
(Select 1) X (pub_id)
ON X.pub_id = T.pub_id
WHERE LEFT(T.title,3) = 'The'
GROUP BY T.pub_id, T.type
--HAVING SUM(T.price) > 0
You will get the following results
pub_id | type | BasePrice |
1389 | business | 19.9900 |
0877 | mod_cook | 2.9900 |
0877 | UNDECIDED | NULL |
Now try executing the above query by uncommenting the the HAVING caluse to exclude the NULL value.
Use Pubs
GO
SELECT T.pub_id, T.type, SUM(T.price) AS BasePrice
FROM Titles T
LEFT JOIN
(Select 1) X (pub_id)
ON X.pub_id = T.pub_id
WHERE LEFT(T.title,3) = 'The'
GROUP BY T.pub_id, T.type
HAVING SUM(T.price) > 0
you will get the following incorrect result.
pub_id | type | BasePrice |
1389 | business | 19.9900 |
0877 | mod_cook | 19.9900 |
1389 | business | 2.9900 |
0877 | mod_cook | 2.9900 |
1389 | business | NULL |
0877 | mod_cook | NULL |
If you look at the execution plan, you can see that the query performs an incorrect cross join.
|--Nested Loops(Inner Join)
..... |--Compute Scalar(DEFINE:([Expr1005]=If ([Expr1018]=0) then NULL else [Expr1019]))
..... |....|--Stream Aggregate(GROUP BY:([T].[type], [T].[pub_id]) DEFINE:([Expr1018]=COUNT_BIG([T].[price]), [Expr1019]=SUM([T].[price])))
......|........|--Sort(ORDER BY:([T].[type] ASC, [T].[pub_id] ASC))
......|..............|--Clustered Index Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind] AS [T]), WHERE:(substring([T].[title], 1, 3)='The'))
......|--Table Spool
...........|--Filter(WHERE:([Expr1002]>0.00))
................|--Compute Scalar(DEFINE:([Expr1002]=If ([Expr1020]=0) then NULL else [Expr1021]))
.....................|--Stream Aggregate(GROUP BY:([T].[type], [T].[pub_id]) DEFINE:([Expr1020]=COUNT_BIG([T].[price]), [Expr1021]=SUM([T].[price])))
..........................|--Sort(ORDER BY:([T].[type] ASC, [T].[pub_id] ASC))
...............................|--Clustered Index Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind] AS [T]), WHERE:(substring([T].[title], 1, 3)='The'))
Workaround
You can workaround the problem by just replacing the LEFT function with SUBSTRING function.
SELECT T.pub_id, T.type, SUM(T.price) AS BasePrice
FROM Titles T
LEFT JOIN
(Select 1) X (pub_id)
ON X.pub_id = T.pub_id
WHERE SUBSTRING(T.title,1,3) = 'The'
GROUP BY T.pub_id, T.type
HAVING SUM(T.price) > 0
Note
This bug is fixed in SQL server 2005
Wednesday, February 22, 2006
Named Constraints on Temporary tables
Last week I was investigating a peculiar issue reported by our testing team. They are receving an error like the following occassionally.
Server: Msg 2714, Level 16, State 4, Line 2
There is already an object named 'PK_Temp' in the database.
It seems that the error occurs only when multiple users tries to access the same page. A concurrency issue.
I looked at the code of the procedure being called from the page. I found that a code segment like the following is causing the problem.
CREATE TABLE #Temp(id int NOT NULL)
ALTER TABLE #Temp ADD CONSTRAINT PK_Temp PRIMARY KEY(id)
But why? SQL Server is supposed to make a temp table name unique by adding a unique integer suffix.
I opened two seperate connections in Query Analyzer and execute the above code. Same Error!
It turns that even though the temprary table names automatically made unique between concurrent sessions, the constraint names still has to be unique across a database.
I ended up adding "Do not use named constraints on temporary tables" to the list of best pracices.
Alternatively you can define constraints on temporary tables either by specifying a nameless constructor when you create table, like,
CREATE TABLE #Temp(id int NOT NULL PRIMARY KEY)
Or by implicitly adding a constraint with the following syntax
ALTER TABLE #Temp ADD PRIMARY KEY(id)
Server: Msg 2714, Level 16, State 4, Line 2
There is already an object named 'PK_Temp' in the database.
It seems that the error occurs only when multiple users tries to access the same page. A concurrency issue.
I looked at the code of the procedure being called from the page. I found that a code segment like the following is causing the problem.
CREATE TABLE #Temp(id int NOT NULL)
ALTER TABLE #Temp ADD CONSTRAINT PK_Temp PRIMARY KEY(id)
But why? SQL Server is supposed to make a temp table name unique by adding a unique integer suffix.
I opened two seperate connections in Query Analyzer and execute the above code. Same Error!
It turns that even though the temprary table names automatically made unique between concurrent sessions, the constraint names still has to be unique across a database.
I ended up adding "Do not use named constraints on temporary tables" to the list of best pracices.
Alternatively you can define constraints on temporary tables either by specifying a nameless constructor when you create table, like,
CREATE TABLE #Temp(id int NOT NULL PRIMARY KEY)
Or by implicitly adding a constraint with the following syntax
ALTER TABLE #Temp ADD PRIMARY KEY(id)