Results 1 to 5 of 5
  1. #1
    Join Date
    Feb 2003
    Location
    Paris
    Posts
    38

    Unanswered: To Compare tables and reject 'doubles'

    I have a function which takes two parameters which are names of relations. The goal of the function is to eliminate 'doubles' in the second table.

    How to identify a 'double'? Loop through the first table, fetching the 'match-key' values and then fetching (or selecting) the corresponding values of the second relation. I flag the first tuple in the result of the select, which will therefore not be selected again.

    A 'match-key' is not the primary key. It is composed of one or several attributes that ought to provide a basis of comparison between tuples.

    In this way there ought, at the end to be n tuples in the second relation without a flag. They are, then, rejected.



    The function takes different types of table. Each type of table has different 'match-keys'.


    The question is: how do I implement it? Is there a better way?
    Cordialement

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1

    Re: To Compare tables and reject 'doubles'

    I don't really understand what the first table is for - is it to define the list of columns to compare in the second table, or does it contain values to be matched in the seond table?

    For a table T with 'match-key' columns k1,k2,k3 you can find all the duplicates with this query:

    SELECT * FROM t
    WHERE t.ROWID !=
    ( SELECT MIN(t2.ROWID)
    FROM t t2
    WHERE t2.k1 = t.k1
    AND t2.k2 = t.k2
    AND t2.k3 = t.k3
    );

    (Unless t is small, k1,k2,k3 should be indexed).

  3. #3
    Join Date
    Feb 2003
    Location
    Paris
    Posts
    38
    Thanks, Tony. I'm going to try the query that you sent, but I shall answer your question beforehand.

    I'm doing regression testing. Batch process1 is done in db1 and batch process2 is done in db2. I need to compare the results of the processes, which entails a verification of 20 or so tables.

    This needs to be treated automatically.

    I cannot use the PKs as the basis of comparaison. Therefore, I use other columns that ought to be the same between 2 processes. Any mismatches on these columns are put in a mismatch table and the rest are put in a 'key-match' table.

    The 'key match' table is created for that all attributes of all tuples be compared to the matching tuple in the other realtion.

    So here I have two 'key-match' tables. What I didn't pick-up earlier is that the number of rows is sometimes inequal, that which means that one table has 'doubles' that shouldn't be there (the number of rows must be equivalent). The 'doubles' must be sent to the mismatch relation.

    Thence my problem. My brain is overheating and starting to malfunction.

    Edit:

    Note that there may exist legitmate 'doubles'. If, for example, 'match key' relation of process1 has legitimate 'doubles', then 'match key' of process2 must have them also. It may have an illegitimate 'third' also.
    Last edited by Crassus; 02-18-03 at 05:22.
    Cordialement

  4. #4
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    OK, so you could do something like:

    SELECT k1,k2,k3,COUNT(*) FROM t2
    MINUS
    SELECT k1,k2,k3,COUNT(*) FROM t1

    That will show you key values in t2 that aren't in t1, or are in t1 a different number of times. It will not show you rows that are MISSING from t2, for that you would have to run the opposite query:

    SELECT k1,k2,k3 FROM t1
    MINUS
    SELECT k1,k2,k3 FROM t2

  5. #5
    Join Date
    Feb 2003
    Location
    Paris
    Posts
    38
    Nice one! I was having a go with the 'minus', but missed the idea of the count, which is very important.

    That's a great service you've done me.
    Cordialement

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •