Tuesday, August 3, 2010

Can a LEFT JOIN restrict your resultset?

A colleague came to me the other day with this question: "Isn't it true that a left join shouldn't restrict the number of records returned?"

This was one of those crisis-of-understanding moments: we're all taught that if you query a table and get 100 rows back, adding on a left join may add rows to your resultset, but should never remove any.

His query was something like this:
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID where od.OrderDate='3/11/10' and p.Type in ('B','T')
He told me that when he ran this query:
select od.* from OrderDetails od where od.OrderDate='3/11/10'
he got 100 records back. But when we added on the left join, suddenly he only got 87 records. Since when, he wanted to know, could a left join do that? Not all the OrderDetail records had a ProductUid (some were for taxes, miscellaneous charges, etc), which was why he used the left join to begin with. And sure enough, those records were the ones that were missing from the results. Shouldn't the left join have returned those records, with no p.ISBN value?

The problem here is actually our old friend, Mr. NULL. The left join did return the OrderDetail records without a ProductUid. But the where clause filtered them out-- specifically, the phrase "p.Type in ('B','T')". When I told my friend that this was the problem, he protested. "But Product.Type is a required field!" It was defined as NOT NULL in the table.

So, how could p.Type be the problem, and when would it ever be null?

Remember, when a match isn't found in a left join, the fields in the right table (in this case, "p") are all null. So, after evaluating the left join, there still were 100 records. However, now when we come to evaluating "p.Type in ('B','T')", we run into the situation where p.Type is null (because no record was found in p).

Since"null=anything" always returns false (technically "unknown"), and because "in" evaluates to a series of "or"s in the query engine:

p.Type = 'B' or p.Type = 'T'

,we end up filtering out the very records that the left join was supposed to leave in.

There are three quick solutions to this:
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID and p.Type in ('B','T') where od.OrderDate='3/11/10'
or
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID where od.OrderDate='3/11/10' and (p.Type in ('B','T')or p.Type is null)
or
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID where od.OrderDate='3/11/10' and isnull(p.Type,'') in ('B','T')
The first solution is probably preferable-- move the p.Type condition into the join expression. What we're really trying to join to is "products with a type of B or T"-- if none are found, then p.ISBN should be null in our resultset.

It's equivalent to:
select od.*, p.ISBN from OrderDetails od left join (select * from Products where p.Type in ('B','T')) p on od.ProductUid=p.UID where od.OrderDate='3/11/10'
Testing for "p.Type is null" in the where clause should accomplish the same thing, but consider the condition where Product.Type is a nullable field, and there are records where p.Type is null and p.ISBN is not. By testing for "p.Type is null" in the where clause, we'd be including records with no match from the left join, but we'd also be including ISBNs for products with a null Type, which isn't what we want.

This is particularly troublesome when the data you're working with almost always has a match in the right table. In this case, 99% of the OrderDetails records had a ProductUid, so the developer never had to worry about p.Type. When he found a rare data condition, it seemed like the rules of SQL had changed on him.

The bottom line is this: be careful when using where clauses on columns from a right table-- even when those columns are defined as NOT NULL in the database. For that matter, get in the habit of always asking yourself "what if this column were null?" when testing for conditions in a SQL statement.

No comments:

Post a Comment