Strange NOT IN behavior with NULL values in SQL Database queries!
Friday, December 18th, 2009So, one for you DBA buffs out there. BTW, I HATE how SQL relational databases treat NULLs. I’ve never seen a case when it proved useful OR what did what I was expecting it to do. And I’ve never met another DBA or engineer that did…
So, here’s a new exploration of a curious way that NOT IN() treats nulls… In short, if there are ANY nulls in the set of values to be checked on the right, it will ALWAYS return false… And in this case, the olde rule of swapping an IN (subquery) with and EXISTS (subquery) is not true (e.g. a “NOT IN (SELECT b FROM c)” is NOT the same as “NOT EXISTS(select 1 FROM c WHERE a = b)” if c.b has NULL’s in it… )
For all this proof And some more discovery of how these DB engine’s optimize “NOT( a IN (subquery))” to “a NOT IN(subquery)”, see the below SQL code and results…
-- EXAMPLE 1: Simple NULL column behavior...
-- values in of foo.x: [b, null]
--
-- 'a' = x ==> (Empty)
-- 'a' != x ==> (b)
-- NOT ('a' = x) ==> (b)
--
-- 'a' IN ( select x from foo ) ==> (Empty)
-- 'a' NOT IN ( select x from foo ) ==> (Empty)
-- NOT ('a' IN ( select x from foo ) ) ==> (Empty)
-- EXAMPLE 2:
-- Trying to find all students that are not a teacher's aid...
-- You might go about it like this with a simple NOT IN
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE id NOT IN( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
);
id | name
----+------
(0 rows)
-- It Should have returned Sam and Tom! What Happened?
-- Lets try the reverse
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE id IN( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
);
-- Yeah, OK, we expected that we'd get Bob
id | name
----+------
1 | Bob
(1 row)
-- Messing around though we find these 2 solutions:
-- In the subquery specifically only ask for NOT NULL aid_id's
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE id NOT IN( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
WHERE aid_id IS NOT NULL
);
id | name
----+------
2 | Sam
3 | Tom
(2 rows)
-- OR (My favorite...) use EXISTS instead of NOT IN (I tink it optimizes better anyways)
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE NOT EXISTS( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
WHERE students.id = teachers.aid_id
);
id | name
----+------
2 | Sam
3 | Tom
(2 rows)
-- But, WAIT! I could just use NOT() and get the inverse set, right?
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE NOT( id IN( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
)
);
id | name
----+------
(0 rows)
-- What?! Oh it appears that it's optimizing "NOT( x IN ())" back to "x NOT IN()" at run-time?
-- What if we bury it... Not so easy to optimize, eh?
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE NOT( 1 = 3
OR ( 3 = 3
AND 5 = 5
AND id IN( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
)
)
);
id | name
----+------
(0 rows)
-- WHOA! It kept up...
-- Lets try another method of REALLY burying it...
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE NOT( CASE
WHEN ( id IN( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
)
) then true
ELSE false
END
);
id | name
----+------
2 | Sam
3 | Tom
(2 rows)
-- Finally! So easy.... BTW all these are 100% consistent on MySQL and PostgreSQL
-- And now we actally have a true inverse set
------------------
SELECT id, name
FROM ( select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
) students
WHERE CASE
WHEN ( id IN( SELECT aid_id
FROM ( SELECT 1 as aid_id
UNION ALL SELECT NULL as aid_id
) teachers
)
) then true
ELSE false
END;
id | name
----+------
1 | Bob
(1 row)


