Hi,
I had a table RAW_DATA which had ID (BIGINT), INSTITUTION_CODE, COLLECTION_CODE and CATALOGUE_NUMBER (all VARCHAR(255) ).
My business logic was effectively "Have I seen this INSTITUTION_CODE, COLLECTION_CODE and CATALOGUE_NUMBER before, if so then it is an update, otherwise an insert.
I was running fine at a steady state of 500 inserts/updates per second, up to 20 million records.
Then I noticed that my RAW_DATA index was getting pretty big (4gig) and I only had 1/5th of my expeceted records....
Thus, I decided to pull the codes out to 3 seperate tables and basically make it a star schema design.
So RAW_DATA had ID (BIGINT), INSTITUTION_CODE_ID (BIGINT), COLLECTION_CODE_ID (BIGINT) and CATALOGUE_NUMBER (BIGINT), with 3 supplimentary tables each with a ID and VALUE baically.
SO....
Now when a record comes in:
Have I seen this record before, if yes update, if not:
select the inst code, or create,
select the coll code, or create,
select the cat number, or create,
insert into the RAW_DATA table.
Performance went right down - maybe 200 records per second, but this is still acceptable.
BUT:
It just slowly grinds to a halt... I can only think that the RW lock queue is becoming too big? Currently I can only get about 2million records before it is at 10 a sec or so!!!!
Please please can someone offer some advice / help with this? (you'd be helping out an open source project to share biodiversity information). Am I going about the table desing wrongly? Is there a pattern I should be using? Do I need to keep them together and investigate partitioning?
I am using MyISAM tables.
Many thanks,
Tim
p.s. I'm a java guy so any help greatly appreciated as I am learning DB design!