Page 1 of 2 12 LastLast
Results 1 to 15 of 28
  1. #1
    Join Date
    Jun 2005
    Posts
    319

    Unanswered: Is this SQL Test too hard?

    I have gotten some criticism from coworkers regarding this test and just wanted to see what you guys think. I realize the wording could use improvement and any criticism towards making it easier to understand is much appreciated.

    FWIW - I had to solve this problem on the job so I feel it is a real-world test that helps me understand how people think and if they try to find alternate solutions.

    Thanks!

    ~~~~~~~~~~~~~~~~~~~~

    Given a table that has over 100,000 records…

    SUBSIDIARY
    =========
    PARENT_ID INT
    CHILD_ID INT
    ULTIMATE_PARENT_ID INT
    CLEANUP_IND BIT

    …where each PARENT_ID can have multiple CHILD_ID values, but the PARENT_ID should not equal the CHILD_ID. After an initial data load, the ULTIMATE_PARENT_ID and CLEANUP_IND columns contain NULL values (see page 2 for sample data).

    ULTIMATE_PARENT_ID is defined as the topmost parent in the chain for the particular CHILD_ID record, so if the chain was only 2-level’s deep the ULTIMATE_PARENT_ID is the CHILD_ID’s PARENT_ID’s PARENT_ID.

    Please write an answer for all three questions below:

    A) Which of the following queries should you run first?
    B) Write an optimized query to identify the ULTIMATE_PARENT_ID for each CHILD_ID and set its value into the ULTIMATE_PARENT_ID column.
    C) Write a query to identify ALL of the circular references and mark each record that is a circular reference by updating the CLEANUP_IND column to 1.

    ~~~~~~~~~ Page 2 ~~~~~~~~~

    Sample Data, remember though this table has over 100,000 records and the parent-child chain can go n-levels deep – where n is not known.

    Code:
    PARENT_ID	CHILD_ID	ULTIMATE_PARENT_ID	CLEANUP_IND
    1024		512		NULL			NULL
    36		2300		NULL			NULL
    887		541		NULL			NULL
    1022		1024		NULL			NULL
    546		887		NULL			NULL
    512		2305		NULL			NULL
    112		967		NULL			NULL
    697		123		NULL			NULL
    901		452		NULL			NULL
    2300		666		NULL			NULL
    334		445		NULL			NULL
    512		903		NULL			NULL
    884		554		NULL			NULL
    313		313		NULL			NULL
    554		884		NULL			NULL
    112		119		NULL			NULL
    967		555		NULL			NULL
    2305		333		NULL			NULL
    333		36		NULL			NULL
    541		546		NULL			NULL
    1030		1020		NULL			NULL
    112		999		NULL			NULL
    Last edited by Gagnon; 05-17-07 at 10:38.

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    I wouldn't even know where to start
    Oh and your question A appears to have a large part of the question missing...
    George
    Home | Blog

  3. #3
    Join Date
    Jun 2005
    Posts
    319
    A is referring to B & C - which do you try to solve first and how do you solve them? If you have questions please ask and I can answer.

    FWIW - nobody I have interviewed has managed to correctly answer these questions, including an author of a few SQL books.

  4. #4
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Run C then B would be my guess.
    Can one parent have more than one child?
    EDIT: After looking back at the sample data I can see that they can.
    This looks like an aweful lot like tying to work out a department tree diagram (employees and their managers etc) - in which case I *might* be able to do it if I had more time - I may attempt it later
    George
    Home | Blog

  5. #5
    Join Date
    Jun 2005
    Posts
    319
    A hint to question A is to try to solve B & C first.

    Yes a parent can have more then one child, think of a Dun & Bradstreet file that lists companies and subsidiaries where each company can be the parent of multiple subsidiaries.

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    tests like this are dumb

    to find the ultimate parent, question B, in an "optimized" way, well, how often are you gonna want to do that? why does it have to be "optimized" and what does this mean?

    define "circular" reference
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Jun 2005
    Posts
    319
    Quote Originally Posted by r937
    tests like this are dumb

    to find the ultimate parent, question B, in an "optimized" way, well, how often are you gonna want to do that? why does it have to be "optimized" and what does this mean?

    define "circular" reference
    You fail.

    Seriously - this is a real world example. Say you inherit a database that has no referential integrity set up and you need to analyze it and identify the top most nodes - are you going to throw your hands up in the air and say this problem is dumb?

    As to why does it have to be optimized - let's say you manage a nightly job that imports millions of records where you need to ETL the data - optimization here is key.

    http://en.wikipedia.org/wiki/Circular_reference
    Last edited by Gagnon; 05-17-07 at 11:48.

  8. #8
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by Gagnon
    Say you inherit a database that has no referential integrity set up
    ...
    are you guy going to throw your hands up in the air and say this problem is dumb?
    Yes - it should have been set up properly in the first place
    George
    Home | Blog

  9. #9
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    your column names are confusing unless I am missing something. which id refers to "self"? ParentID I assume?

    that is, for the first row in your example, is 512 a child of 1024? if so, I would name the columns NodeID, ChildID in that case. ParentID implies that that's the id of my parent, not of myself. know what I mean?

  10. #10
    Join Date
    Jun 2005
    Posts
    319
    That is a good point and I will take it into consideration, my only reservation is it specifically drives people into the direction of the correct solution. So far every single person that has taken this test has tried to work from the bottom-up due in part to the naming of the columns. One of my colleagues asked me if this was a "trick test" and I told him in a way it was since there are multiple ways to solve this problem and you will probably not know right off the bat the correct solution unless you have had to tackle a problem similar to this before. I want candidates to explore multiple solutions to come up with a solution that performs optimally.

  11. #11
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    You have reservations that naming columns clearly drives people in the direction of a correct solution? are you kidding?

    Why not name the columns a,b,c,d then? then it would be even more muddled...

    hopefully you start the interview with some easier questions than this one.

    I can imagine people getting bogged down on this rather quickly. If they aren't able to make much progress, it would be hard to judge how much they really know.

  12. #12
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I would say this are both moderately difficult SQL problems, in that the require multiple steps rather than a single UPDATE statement, as well as looping and the presumption of an understanding of binary trees.
    An SQL Expert should be able to handle them easily, and anybody who wrote a book on SQL should be able to solve them.
    If given the test myself, my first response would be that the inclusion of ULTIMATE_PARENT_ID in the schema is poor design, so award bonus points to anybody who catches that.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  13. #13
    Join Date
    Jun 2005
    Posts
    319
    Quote Originally Posted by blindman
    I would say this are both moderately difficult SQL problems, in that the require multiple steps rather than a single UPDATE statement, as well as looping and the presumption of an understanding of binary trees.
    An SQL Expert should be able to handle them easily, and anybody who wrote a book on SQL should be able to solve them.
    If given the test myself, my first response would be that the inclusion of ULTIMATE_PARENT_ID in the schema is poor design, so award bonus points to anybody who catches that.
    I agree, I don't give this test over a phone screen or first round, this is for second (last) round candidates.

    Denormalization is not always poor design, it depends on the application - if you have a listing stored procedure where you will always be showing the stock ticker (or other hierarchical attribute) for a particular company even if the company is a subsidiary of a company that has a stock ticker then it would be smart to include it in the child records.

  14. #14
    Join Date
    Jun 2005
    Posts
    319
    Quote Originally Posted by jezemine
    You have reservations that naming columns clearly drives people in the direction of a correct solution? are you kidding?

    Why not name the columns a,b,c,d then? then it would be even more muddled...

    hopefully you start the interview with some easier questions than this one.

    I can imagine people getting bogged down on this rather quickly. If they aren't able to make much progress, it would be hard to judge how much they really know.
    A very good side effect of structuring a test like this is seeing how individuals do under pressure, how they behave and how they react towards others that are trying to help them. It is very difficult to evoke this type of pressure in an interview and receive an accurate indication of the candidate's personality. I always help the candidate come to the correct solution and their willingness or stubbornness to learn is an excellent indication of how they will perform on the job.

    Keep in mind the solution to this problem does not require knowledge of some advanced t-sql expertise or special SQL 2005 syntax or tricks. You merely have to know how to either write a cursor (for the inefficient solution) or a while loop which is pretty basic SQL 101. The difficulty lies in arriving at the solution and coming up with a solid algorithm. I have started to tell candidates just give me the pseudocode or algorithm you would use to arrive at the solution rather then get bogged down with sql syntax.

  15. #15
    Join Date
    Jun 2005
    Posts
    319
    For anyone that likes puzzles like this, I can solve both B & C with 4 update statements (and 1 while loop) - using standard SQL 2000 t-sql (no new SQL 2005 techniques).
    Last edited by Gagnon; 05-17-07 at 16:01.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •