Hi,

I had a table RAW_DATA which had ID (BIGINT), INSTITUTION_CODE, COLLECTION_CODE and CATALOGUE_NUMBER (all VARCHAR(255) ).

My business logic was effectively "Have I seen this INSTITUTION_CODE, COLLECTION_CODE and CATALOGUE_NUMBER before, if so then it is an update, otherwise an insert.

I was running fine at a steady state of 500 inserts/updates per second, up to 20 million records.

Then I noticed that my RAW_DATA index was getting pretty big (4gig) and I only had 1/5th of my expeceted records....

Thus, I decided to pull the codes out to 3 seperate tables and basically make it a star schema design.

So RAW_DATA had ID (BIGINT), INSTITUTION_CODE_ID (BIGINT), COLLECTION_CODE_ID (BIGINT) and CATALOGUE_NUMBER (BIGINT), with 3 supplimentary tables each with a ID and VALUE baically.

SO....

Now when a record comes in:
Have I seen this record before, if yes update, if not:

select the inst code, or create,
select the coll code, or create,
select the cat number, or create,
insert into the RAW_DATA table.

Performance went right down - maybe 200 records per second, but this is still acceptable.

BUT:

It just slowly grinds to a halt... I can only think that the RW lock queue is becoming too big? Currently I can only get about 2million records before it is at 10 a sec or so!!!!

Please please can someone offer some advice / help with this? (you'd be helping out an open source project to share biodiversity information). Am I going about the table desing wrongly? Is there a pattern I should be using? Do I need to keep them together and investigate partitioning?

I am using MyISAM tables.

Many thanks,

Tim
p.s. I'm a java guy so any help greatly appreciated as I am learning DB design!