Results 1 to 1 of 1
  1. #1
    Join Date
    Sep 2006
    Location
    Nashville, TN
    Posts
    1

    Suggestions Needed for SubString Search

    Hello all, I'm new to these forums but have been working around DB's for a number of years. I have an interesting problem that I wonder if anyone else has tackled.

    This is just a general question not specific to any particular flavor of DB. If Oracle has a solition I can move it there, if MySQL does, ditto.

    I have two databases, one has 15 million records with alphanumeric strings up to 60 characters in length. Embedded in these strings are alpha numeric circuit id's (some valid, some not) that can be located anywhere in the string. In the second database, I have a list of 600,000 valid active circuit id's in alphanumeric form. My task is to match the two lists.

    Sequentually searching the 15M records for a hit using 'LIKE "%circuit_id%"' can take up to ten minutes per circuit because it must do an unindexed search. Perusing the 15M records using something such as INSTR() takes just as long.

    So what I've done is write a routine that runs through the 15M records, one by one, and carefully trim the string and test it against the indexed list of 600,000 circuits. The routine does as follows, it starts by chopping off the last character until the string is a minimum of 7 characters long. Then it starts over by chopping the next first character off and then taking off the trailing characters until it reaches a minimum of seven characters, each time doing a SQL query for an exact match against the list of 600,000 circuits. This routine can run through 750 circuits a minute.

    Once it finds a match, it drops out of the loop and starts over. In this simulation, it finds a match on line 34.

    I'd love to know if anyone has tackled anything like this and what way you approached the problem. LOOKING FOR SUGGESTIONS!!!!!

    Thanks,
    Richard

    Simulated output......

    XXX7654321YYYYYY = String with embedded circuit id
    XXX = Simulated prefix
    7654321 = Simulated circuit id
    YYYYYY = Simulated suffix

    XXX7654321YYYYYY
    XXX7654321YYYYY
    XXX7654321YYYY
    XXX7654321YYY
    XXX7654321YY
    XXX7654321Y
    XXX7654321
    XXX765432
    XXX76543
    XXX7654
    XX7654321YYYYYY
    XX7654321YYYYY
    XX7654321YYYY
    XX7654321YYY
    XX7654321YY
    XX7654321Y
    XX7654321
    XX765432
    XX76543
    X7654321YYYYYY
    X7654321YYYYY
    X7654321YYYY
    X7654321YYY
    X7654321YY
    X7654321Y
    X7654321
    X765432
    7654321YYYYYY
    7654321YYYYY
    7654321YYYY
    7654321YYY
    7654321YY
    7654321Y
    7654321 < DING! DING! DING!
    654321YYYYYY
    654321YYYYY
    654321YYYY
    654321YYY
    654321YY
    654321Y
    54321YYYYYY
    54321YYYYY
    54321YYYY
    54321YYY
    54321YY
    4321YYYYYY
    4321YYYYY
    4321YYYY
    4321YYY
    321YYYYYY
    321YYYYY
    321YYYY
    21YYYYYY
    21YYYYY
    1YYYYYY
    Last edited by UnixAlles; 09-28-06 at 16:05.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •