Results 1 to 3 of 3
  1. #1
    Join Date
    Jul 2002
    Posts
    58

    Question Unanswered: Interesting observation on nested queries. Bug?

    Fairly long, but bear with me......

    I was debating the merits of SQL-92 syntax for outer joins with a friend who destests it as 'overly verbose and no clearer'. I was looking for a simple example where SQL-92 was clearly superior, and did better than I could imagine while uncovering a very surprising result, which I present here for discussion.

    The example I came up with uses system tables, but this is not necessary or relevent to the problem, it just makes for easily understandable queries with pre-existing data.

    "Find all the user tables in the PUBS database that DON'T have a column named 'title_id'" (true of 8 of the 12 user tables in PUBS)

    The SQL-92 query is trivially easy
    Code:
    SELECT so.name
       FROM sysobjects AS so
       LEFT OUTER JOIN syscolumns AS sc ON sc.id = so.id AND sc.name = 'title_id'
       WHERE sc.name IS NULL
                  AND so.xtype = 'u'
    This correctly selects syscolumns rows on sc.name = 'title_id' before the join and sc.name is null after the join. And it successfully returns the names of the eight tables without a 'title_id' column.

    The interesting part comes in trying to formulate this query with the old *= syntax. I suggested the only way to do it was with a nested query like this, but I made the fatal mistake of not testing it.

    Code:
    SELECT a.tn, a.cn
       FROM (SELECT tn=so.name, cn=sc.name
                    FROM sysobjects AS so, syscolumns AS sc
                    WHERE so.id *= sc.id
                               AND sc.name = 'title_id'
                               AND so.xtype = 'u') as a
       WHERE a.cn IS NULL
    BUT, very surprisingly, it was pointed out to me that this doesn't work either. If you only run the inner query, you will see it returns 12 rows, 4 with the value 'title_id' in the returned cn column, and 8 with NULLs. But run the entire query, and it returns 12 rows, all of which have NULL in the cn column. (try it and see)

    The execution plan for the overall nested query is nothing more than a clustered index scan of the sysobjects looking for xtype = 'u'

    This strikes me as a bug, but it's hard to say. At the very best, the SQL 2000 query compiler does NOT regard () as enforcing an order-of-precedence on queries the way a 3GL language complier would when compiling a statement like
    var1 = 4 * (3 + var2)
    where it would force the normally lower-precedence addition to be done before the multiplication. Instead, it complier sees both queries as having equal precedence and reduces the the nested query to something like.....

    Code:
    SELECT so.name, sc.name
       FROM sysobjects AS so, syscolumns AS sc
       WHERE so.name *= sc.name
                  AND sc.name = 'title_id'
                  AND sc.name IS NULL
                  AND so.xtype = 'u'
    ...and recognizes the mutually exclusive compares on sc.name will eliminate all rows from syscolumns, so it doesn't even bother to look there. But the *= preserves all the otherwise matching (xtype = 'u') rows from sysobjects. I understand HOW it's getting the results, but this is very counter-intuitive at best.

    So, is this a bug? or working as designed? Should nested queries be evaluate ahead of outer queries (forced precedence) or are they all equal? This is kind of a moot point if you are using equijoins, since they are associative, but for outer or other non-equijoins, this is not the case.

    BTW - both DB2 and SQLBase DO return what I would call the 'intuitively correct' result if you make up similar tables. That is, the nested query with *= returns the same 8 row result set as the SQL-92 LEFT OUTER JOIN syntax.

  2. #2
    Join Date
    Oct 2001
    Location
    England
    Posts
    426
    even better

    SELECT a.tn, a.cn
    FROM ( SELECT tn=so.name, cn=sc.name
    FROM sysobjects AS so, syscolumns AS sc
    WHERE so.id *= sc.id
    AND sc.name = 'title_id'
    AND so.xtype = 'u'
    ) as a
    WHERE a.cn = 'title_id'


    I never use the abbreviated syntax for outer joins due to this sort of thing.
    Try formatting the code as below - it will be clearer

    SELECT so.name
    FROM sysobjects AS so
    LEFT OUTER JOIN syscolumns AS sc
    ON sc.id = so.id
    AND sc.name = 'title_id'
    WHERE sc.name IS NULL
    AND so.xtype = 'u'

  3. #3
    Join Date
    Oct 2001
    Location
    England
    Posts
    426
    This does force the derived table to be created then filtered

    SELECT a.tn, a.cn
    FROM (SELECT top 100 percent tn=so.name, cn=sc.name
    FROM sysobjects AS so, syscolumns AS sc
    WHERE so.id *= sc.id
    AND sc.name = 'title_id'
    AND so.xtype = 'u'
    ) as a
    WHERE a.cn is null

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •