Hello all, I'm new to these forums but have been working around DB's for a number of years. I have an interesting problem that I wonder if anyone else has tackled.
This is just a general question not specific to any particular flavor of DB. If Oracle has a solition I can move it there, if MySQL does, ditto.
I have two databases, one has 15 million records with alphanumeric strings up to 60 characters in length. Embedded in these strings are alpha numeric circuit id's (some valid, some not) that can be located anywhere in the string. In the second database, I have a list of 600,000 valid active circuit id's in alphanumeric form. My task is to match the two lists.
Sequentually searching the 15M records for a hit using 'LIKE "%circuit_id%"' can take up to ten minutes per circuit because it must do an unindexed search. Perusing the 15M records using something such as INSTR() takes just as long.
So what I've done is write a routine that runs through the 15M records, one by one, and carefully trim the string and test it against the indexed list of 600,000 circuits. The routine does as follows, it starts by chopping off the last character until the string is a minimum of 7 characters long. Then it starts over by chopping the next first character off and then taking off the trailing characters until it reaches a minimum of seven characters, each time doing a SQL query for an exact match against the list of 600,000 circuits. This routine can run through 750 circuits a minute.
Once it finds a match, it drops out of the loop and starts over. In this simulation, it finds a match on line 34.
I'd love to know if anyone has tackled anything like this and what way you approached the problem. LOOKING FOR SUGGESTIONS!!!!!
XXX7654321YYYYYY = String with embedded circuit id
XXX = Simulated prefix
7654321 = Simulated circuit id
YYYYYY = Simulated suffix