Links
Joseph
BuchananSpot
Ikarian
Total Office
Doran
Fadi
Apple

Strange NOT IN behavior with NULL values in SQL Database queries!

So, one for you DBA buffs out there. BTW, I HATE how SQL relational databases treat NULLs. I’ve never seen a case when it proved useful OR what did what I was expecting it to do. And I’ve never met another DBA or engineer that did…

So, here’s a new exploration of a curious way that NOT IN() treats nulls… In short, if there are ANY nulls in the set of values to be checked on the right, it will ALWAYS return false… And in this case, the olde rule of swapping an IN (subquery) with and EXISTS (subquery) is not true (e.g. a “NOT IN (SELECT b FROM c)” is NOT the same as “NOT EXISTS(select 1 FROM c WHERE a = b)” if c.b has NULL’s in it… )

For all this proof And some more discovery of how these DB engine’s optimize “NOT( a IN (subquery))” to “a NOT IN(subquery)”, see the below SQL code and results…

-- EXAMPLE 1: Simple NULL column behavior...
-- values in of foo.x: [b, null]
-- 
-- 'a' = x       ==> (Empty)
-- 'a' != x      ==> (b)
-- NOT ('a' = x) ==> (b)
-- 
-- 'a' IN      ( select x from foo )   ==> (Empty)
-- 'a' NOT IN  ( select x from foo )   ==> (Empty)
-- NOT ('a' IN ( select x from foo ) ) ==> (Empty)


-- EXAMPLE 2:
-- Trying to find all students that are not a teacher's aid...

--  You might go about it like this with a simple NOT IN
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE id NOT IN( SELECT aid_id
                    FROM ( SELECT 1    as aid_id
                 UNION ALL SELECT NULL as aid_id
                           ) teachers
                  );

 id | name 
----+------
(0 rows)

-- It Should have returned Sam and Tom!  What Happened?


--  Lets try the reverse
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE id IN( SELECT aid_id
                FROM ( SELECT 1    as aid_id
             UNION ALL SELECT NULL as aid_id
                       ) teachers
              );

-- Yeah, OK, we expected that we'd get Bob

id | name 
----+------
  1 | Bob
(1 row)



-- Messing around though we find these 2 solutions:

--  In the subquery specifically only ask for NOT NULL aid_id's
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE id NOT IN( SELECT aid_id
                    FROM ( SELECT 1    as aid_id
                 UNION ALL SELECT NULL as aid_id
                           ) teachers
                   WHERE aid_id IS NOT NULL
                  );

 id | name 
----+------
  2 | Sam
  3 | Tom
(2 rows)


--  OR (My favorite...) use EXISTS instead of NOT IN (I tink it optimizes better anyways)
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT EXISTS( SELECT aid_id
                     FROM ( SELECT 1    as aid_id
                  UNION ALL SELECT NULL as aid_id
                            ) teachers
                    WHERE students.id = teachers.aid_id
                   );

 id | name 
----+------
  2 | Sam
  3 | Tom
(2 rows)


-- But, WAIT!  I could just use NOT() and get the inverse set, right?
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT( id IN( SELECT aid_id
                     FROM ( SELECT 1    as aid_id
                  UNION ALL SELECT NULL as aid_id
                            ) teachers
                   )
            );

 id | name 
----+------
(0 rows)

-- What?!  Oh it appears that it's optimizing "NOT( x IN ())" back to "x NOT IN()" at run-time?

-- What if we bury it... Not so easy to optimize, eh?
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT( 1 = 3
         OR ( 3 = 3
          AND 5 = 5
          AND id IN( SELECT aid_id
                       FROM ( SELECT 1    as aid_id
                    UNION ALL SELECT NULL as aid_id
                              ) teachers
                     )
              )
            );

 id | name 
----+------
(0 rows)

-- WHOA!  It kept up...  


-- Lets try another method of REALLY burying it...
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT( CASE
            WHEN ( id IN( SELECT aid_id
                            FROM ( SELECT 1    as aid_id
                         UNION ALL SELECT NULL as aid_id
                                   ) teachers
                          )
                   ) then true
            ELSE false
            END
            );

 id | name 
----+------
  2 | Sam
  3 | Tom
(2 rows)

-- Finally!  So easy....  BTW all these are 100% consistent on MySQL and PostgreSQL

-- And now we actally have a true inverse set
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE CASE
       WHEN ( id IN( SELECT aid_id
                       FROM ( SELECT 1    as aid_id
                    UNION ALL SELECT NULL as aid_id
                              ) teachers
                     )
              ) then true
       ELSE false
       END;

 id | name 
----+------
  1 | Bob
(1 row)

2 Responses to “Strange NOT IN behavior with NULL values in SQL Database queries!”

  1. Joseph Buchanan Says:

    Nice. How long did you work on this one? It has been a while since I have done serious SQL queries, but I remember having problems with NULLs and had to come up with some way to get rid of them.

    – Dad

  2. Dave Says:

    This was from a discussion with Jon, my friend here at work. Jon ran across the NOT IN issue and together we delved further in to explore the limits of the workarounds… Yeah, it totally seems like we’ve uncovered a real in-consistency in SQL standard. I know the reason behind the regular NULL behavior, but these don’t seem to jive with those ideas… Anyhoo…

Leave a Reply

-----
Blog | Resume | Contact | The Family | The Volcano