Links
Joseph
BuchananSpot
Ikarian
Total Office
Doran
Fadi
Apple

Archive for December, 2009

Strange NOT IN behavior with NULL values in SQL Database queries!

Friday, December 18th, 2009

So, one for you DBA buffs out there. BTW, I HATE how SQL relational databases treat NULLs. I’ve never seen a case when it proved useful OR what did what I was expecting it to do. And I’ve never met another DBA or engineer that did…

So, here’s a new exploration of a curious way that NOT IN() treats nulls… In short, if there are ANY nulls in the set of values to be checked on the right, it will ALWAYS return false… And in this case, the olde rule of swapping an IN (subquery) with and EXISTS (subquery) is not true (e.g. a “NOT IN (SELECT b FROM c)” is NOT the same as “NOT EXISTS(select 1 FROM c WHERE a = b)” if c.b has NULL’s in it… )

For all this proof And some more discovery of how these DB engine’s optimize “NOT( a IN (subquery))” to “a NOT IN(subquery)”, see the below SQL code and results…

-- EXAMPLE 1: Simple NULL column behavior...
-- values in of foo.x: [b, null]
-- 
-- 'a' = x       ==> (Empty)
-- 'a' != x      ==> (b)
-- NOT ('a' = x) ==> (b)
-- 
-- 'a' IN      ( select x from foo )   ==> (Empty)
-- 'a' NOT IN  ( select x from foo )   ==> (Empty)
-- NOT ('a' IN ( select x from foo ) ) ==> (Empty)


-- EXAMPLE 2:
-- Trying to find all students that are not a teacher's aid...

--  You might go about it like this with a simple NOT IN
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE id NOT IN( SELECT aid_id
                    FROM ( SELECT 1    as aid_id
                 UNION ALL SELECT NULL as aid_id
                           ) teachers
                  );

 id | name 
----+------
(0 rows)

-- It Should have returned Sam and Tom!  What Happened?


--  Lets try the reverse
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE id IN( SELECT aid_id
                FROM ( SELECT 1    as aid_id
             UNION ALL SELECT NULL as aid_id
                       ) teachers
              );

-- Yeah, OK, we expected that we'd get Bob

id | name 
----+------
  1 | Bob
(1 row)



-- Messing around though we find these 2 solutions:

--  In the subquery specifically only ask for NOT NULL aid_id's
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE id NOT IN( SELECT aid_id
                    FROM ( SELECT 1    as aid_id
                 UNION ALL SELECT NULL as aid_id
                           ) teachers
                   WHERE aid_id IS NOT NULL
                  );

 id | name 
----+------
  2 | Sam
  3 | Tom
(2 rows)


--  OR (My favorite...) use EXISTS instead of NOT IN (I tink it optimizes better anyways)
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT EXISTS( SELECT aid_id
                     FROM ( SELECT 1    as aid_id
                  UNION ALL SELECT NULL as aid_id
                            ) teachers
                    WHERE students.id = teachers.aid_id
                   );

 id | name 
----+------
  2 | Sam
  3 | Tom
(2 rows)


-- But, WAIT!  I could just use NOT() and get the inverse set, right?
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT( id IN( SELECT aid_id
                     FROM ( SELECT 1    as aid_id
                  UNION ALL SELECT NULL as aid_id
                            ) teachers
                   )
            );

 id | name 
----+------
(0 rows)

-- What?!  Oh it appears that it's optimizing "NOT( x IN ())" back to "x NOT IN()" at run-time?

-- What if we bury it... Not so easy to optimize, eh?
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT( 1 = 3
         OR ( 3 = 3
          AND 5 = 5
          AND id IN( SELECT aid_id
                       FROM ( SELECT 1    as aid_id
                    UNION ALL SELECT NULL as aid_id
                              ) teachers
                     )
              )
            );

 id | name 
----+------
(0 rows)

-- WHOA!  It kept up...  


-- Lets try another method of REALLY burying it...
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE NOT( CASE
            WHEN ( id IN( SELECT aid_id
                            FROM ( SELECT 1    as aid_id
                         UNION ALL SELECT NULL as aid_id
                                   ) teachers
                          )
                   ) then true
            ELSE false
            END
            );

 id | name 
----+------
  2 | Sam
  3 | Tom
(2 rows)

-- Finally!  So easy....  BTW all these are 100% consistent on MySQL and PostgreSQL

-- And now we actally have a true inverse set
------------------
SELECT id, name
  FROM (  select 1 as id, 'Bob' as name
UNION ALL select 2 as id, 'Sam' as name
UNION ALL select 3 as id, 'Tom' as name
         ) students
 WHERE CASE
       WHEN ( id IN( SELECT aid_id
                       FROM ( SELECT 1    as aid_id
                    UNION ALL SELECT NULL as aid_id
                              ) teachers
                     )
              ) then true
       ELSE false
       END;

 id | name 
----+------
  1 | Bob
(1 row)
-----
Blog | Resume | Contact | The Family | The Volcano