Hey there!

I need to create a simple DB scheme with very large tables (talking about ~40 million rows)

I've to handle a large collection of metadata of scientific literature (books, articles and so on => I'm just calling them "records" from now on).

Every record is approx. cited 20 times by other records. What I now want to do is building a MySQL-DB to count these citations. The DB should answer questions like:
"How many times is record XYZ cited by all other records?" or
"How many records does record ABC cite?"

Every record has its unique ID (=> rec_id), consisting of an integer with 11 digits.
I want to create 2 tables:
CREATE TABLE IF NOT EXISTS records (id bigint AUTO_INCREMENT PRIMARY KEY, rec_id bigint KEY)

CREATE TABLE IF NOT EXISTS reference (id bigint NOT NULL, INDEX USING BTREE (id), ref_id bigint NOT NULL, INDEX USING BTREE(ref_id))

I'm also planning to add a column like "ref_count" to table "record" in order to cache the cited by count somehow.

Table "records" is going to have 40 million rows. Since each record has approx. 20 citations, table "reference" is going to have ~800 million rows.

Multiple times a year I'll get a list of say 2 million record-ids and need to know, how often this set of records is cited by all other records.
This means I've to send a huge amount of queries like
SELECT count(id) FROM reference WHERE id=1234
or a single query with a large WHERE condition (... WHERE id=1233 OR id = 5678 OR id=....).

A runtime of several weeks would be okay. But how can I speed this up?
The most import part which should be fast is not the insertion of the records, but the huge amount of count() queries.

Which DB engine is the best for my purpose?
Is my DB scheme appropriate?
Is BTREE the best choice?
I'm using an auto_increment id to insert faster into index, but will this affect the runtime of my count() queries?
Which server parameter (buffer_size and so on) are most important?
Do you have any clues how to optimize it?
How much WHERE conditions per request should I use?
Should I collect say 1000 queries (packaging) before I'll send it to the MySQL server?

...



TIA

apollo1987