Results 1 to 9 of 9
  1. #1
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1

    Unanswered: please test my script to analyze table keys (was "Submitted for your review...:)

    Here is a script I wrote that analyzes datasets and returns all the minimal composite and unary keys that uniquely identify records. I wrote it because I frequently have to analyze client spreadsheets and non-normalized data tables.

    On my desktop server it took about two minutes to analyze 2000 permutations of a table with 50 columns and 5000 records.

    Please try it out for me and let me know if it chokes on anything, or if you see any ways it could be improved!
    Attached Files Attached Files
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  2. #2
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Dude...still looking at it....

    Only 2 minutes you say.....


    hmmmmmmmmmmmmmm

    EDIT: I ran it against Nothwind And I got Nothing
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  3. #3
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    4 minutes 28 seconds on my desktop

    Candidate Fields Permutations Checked
    ---------------- --------------------
    61 1891

    It only returned my identity column.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Dooh..

    It's for one table at a time....I thought you were doing an entire database

    OK, I did Products in Northwind and it didn't pick ProductName as a Natural Key...


    Code:
    Candidate Fields Permutations Checked
    10               66
    
    Natural Keys Found
    [UnitPrice], [UnitsInStock]
    [SupplierID], [UnitPrice], [ReorderLevel]
    [SupplierID], [UnitsInStock], [ReorderLevel]
    [UnitPrice], [UnitsOnOrder], [ReorderLevel]
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Yowch!

    Obviously a bug. I will look into it.

    Thanks, Brett!
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    It may be because there is no unique index on PoductName...but the data is all unique in the sample...

    Go figure M$... I wonder if it was done on purpose to show the "Benefits" of Surrogate keys...

    "An Apple is an Apple until it's renamed"
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    No, its a bug in my script.

    The point of the script is to find natural keys, whether or not they have been defined that way on the table. Otherwise, I'd just query sysindexes.

    There is a flaw in the recursive logic which I need to track down. If I comment out a clause meant to eliminate redundant branches, the script returns the correct results, but if I don't eliminate redundant branches then I am reduced to testing every permutation, which is impractical.

    The solution will probably occur to me in the car on the way home tonight.

    Have a good weekend, all.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    I tried to challenge your logic by going after statistics info rather than doing "select count(distinct [fieldname])..." But got drawn into trying to come up with an algorythm:

    Once you get all non-text/image columns into a temp table (I also eliminated sql_variant by doing nullif(prec, 0) is not null), instead of doing a cursor I was thinking to create permutations by doing "order by newid()" within the loop with "top @number_of_qualifying_columns"... And of course, as usual, got distracted, never finished, etc.

    Have you thought of that?
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I've tried two different methods of searching permutations. The first was "bottom up", starting with single columns and then adding from there, but required an exhaustive search.

    I'm hoping that by using the "top down" approach that I posted I can identify and eliminate searching branches of permutations that are known not to contain natural keys, or that already contain a subset known to be a natural key.

    The algorithm used to create the permutations is not as difficult or imporatant as the algorithm used to eliminate permutations.

    I have an idea in the back of my head (which did occur to me in the car on the way home!), and I'm trying to come up with a way to implement it.

    I think it is an interesting challenge which would also prove useful to solve, so I'm surprised I haven't seen it done before.

    Anybody else here is welcome to take a shot at it! Just write a script that efficiently identifies all the unary and minimal composite keys in a table.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •