Results 1 to 12 of 12
  1. #1
    Join Date
    Jul 2003
    Posts
    2,296

    Unanswered: compare collections/sets??

    I need to compare a list against a list.
    If all the values match (order does not matter) then I want to know.

    example sets
    PHP Code:
    SETA         SETB        SETC
    xy            cd           jm
    ab            xy           ab
    cd            ab           xy 
    as you see above, SETA = SETB (for my perposes) and I want to know.
    In reality this crap is stored in one table like this:
    PHP Code:
    Col1a    col1       col2  
    123      SETA       ab
    123      SETA       cd
    123      SETA       xy
    123      SETB       cd
    123      SETB       xy
    123      SETB       ab 
    I figured I could gather SETA, then compare that set/collection
    against all others having Col1a = 123.

    My brain is dead and I can't figure another way to do it other than
    concatenating the values thru a loop and then comparing the concatenation like below:
    PHP Code:
    loop
    vString 
    := vString||'|'||vNewValue;
    ...

    /* then essentially I create a check like this */
    if
    ab|cd|xy ab|cd|xy
    then 
    ... 
    anyone have any ideas?
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  2. #2
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    If you are not careful, you'll end up with an N-squared algorithm.
    As the number of items to be processed increases (N),
    the number of comparisons increases by N-Squared;
    because you need to compare a single row in one set against EVERY row in the other list (potentially).

    If the two sets are ordered lists (in the same order), then you can make a SINGLE pass thru each & will know whether or not the two sets match.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  3. #3
    Join Date
    Jan 2004
    Location
    Croatia, Europe
    Posts
    4,094
    Provided Answers: 4
    How about embedded cursor loops and comparing each and every one from first loop with every one from second loop? You know ... cursor1 fetches 'xy'. Loop all cursor2 and look for 'xy'. If you find it, take another cursor1 value. If you don't find it, they are different.
    Not very efficient, but ... nothing else's on my mind.

    P.S. Uh, right ... anacedent types faster than I do
    Last edited by Littlefoot; 08-18-04 at 18:12.

  4. #4
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    >How about embedded cursor loops and comparing each and every one from first loop with every one from second loop?
    Yes, this will "work", but does NOT scale well; a classic N-squared solution.
    With a million rows in each set, it may take a while to complete and make your CPU glow at the same time.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  5. #5
    Join Date
    Jan 2004
    Location
    Croatia, Europe
    Posts
    4,094
    Provided Answers: 4
    Yes, I know. Exactly what you've said in your previous post. Bad solution, but I couldn't (can't) figure out anything else. Glowing CPU is a good solution for cold winter days - it can help warm your feet (just like C64's power supply used to ).

    By the way, when I see your name on the Forum, I expect at least one RTFM in your post I'm kind of disappointed when I don't see it

    Did you, The Duck, solve the problem, anyway?

  6. #6
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    How about using the SQL set operators?
    Code:
    select count(*) from
    ( (select col2 from t
       where col1 = 'SETA'
       MINUS
       select col2 from t
       where col1 = 'SETB'
      )
    UNION
      (select col2 from t
       where col1 = 'SETB'
       MINUS
       select col2 from t
       where col1 = 'SETA'
      )
    )
    where rownum = 1;
    If that returns 1 there is a difference, if 0 then they are the same.

  7. #7
    Join Date
    Jul 2003
    Posts
    2,296
    I almost feel honored that I didn't get a 'RTFM' quote!

    Anyways, it is an interesting situation.
    Andrew might have something in his solution.

    Either that or like Anacedent stated, since the fields are numeric I
    could order the lists and compare them.

    I'll try both.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  8. #8
    Join Date
    Jul 2003
    Posts
    2,296
    Quote Originally Posted by andrewst
    How about using the SQL set operators?

    If that returns 1 there is a difference, if 0 then they are the same.
    wait, why do I need the union?
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  9. #9
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    The UNION is needed to identify differences in both directions;
    objects which exist in set A and not in set B
    AND
    objects which exist in set B and not in set A.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  10. #10
    Join Date
    Jul 2003
    Posts
    2,296
    as mentioned, this is taking a god-awful long time (i am testing the MINUS/UNION method).
    I even picked the smallest set of groups possible to compare.

    Any other suggestions out there?

    Perhaps I need a temp-table.
    I concatenate all sets together and insert into one column in the table.
    Then I look for any rows that match.

    I can't hink of another less-timely way to do it.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  11. #11
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    If the two sets are ordered lists (in the same order), then you can make a SINGLE pass thru each & will know whether or not the two sets match.

    If you'll post an email where I contact you directly, I'll try to "talk" you thru it.
    Alternatively I have an AIM alias.

    Simply put, open 2 cursors; one for each ORDERED list.
    Read the 1st record from each list.
    Compare them.
    If they match, delete them.
    If they don't match, then...
    If A < B, read next record from A & go to top.
    If B > A, read next record from B & go to top.
    When no more records in both A & B, then the records which remain are ALL the non-matching records.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  12. #12
    Join Date
    Nov 2002
    Location
    Desk, slightly south of keyboard
    Posts
    697
    Hi,

    You might want to look at it from a different angle. Here copied and pasted from my SQL Tuning doc...

    For security purposes an investment bank has double entry of all transactions entered throughout the day. Before the end of day routines can be invoked a routine is run which compares each transaction from one set of data with each transaction from the other set and reports on any discrepancies.

    This discrepancy check routine was becoming slower and slower as the volume of data increased.

    Essentially the code to perform the check was something along the line of

    select * from
    (
    select account#, value from TransactionSet1
    minus
    select account#, value from TransactionSet2
    )
    union all
    (
    select account#, value from TransactionSet2
    minus
    select account#, value from TransactionSet1
    )

    Operating on many millions of transactions this single statement incurred a massive cost and ran over a few hours. There was really very little tuning that you could possibly perform on a statement such as this.

    The solution was to perform the checking during each transaction via an intermediate table. Whenever a transaction was entered into either of the TransactionSets, the trigger either inserted or updated a row in the intermediate table. The trigger on TransactionSet1 would add the value to the row and the trigger from TransactionSet2 would subtract the value from the row.

    Both triggers would delete the row if the final result was a value of zero. The intermediate table never exceeded more than a few thousand rows as the two sets of data were being entered concurrently by two different offices.

    The workload to perform the discrepancy check was moved from a single two hour process which checked millions of rows at once, to an extra few milliseconds on each transaction. As the transactions are being entered by human operators, the extra few milliseconds was never noticed.

    At the end of each working day, prior to the closure routines, a simple check on the intermediate table ensured it was empty before proceeding.


    Hth
    Bill
    Please don't email me directly with questions. I've probably just got home from the pub and cannot guarantee the sanity of my answers. In fact, I can't believe I actually made it home.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •