If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Optimizing count() on large tables

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-25-11, 11:03
apollo1987 apollo1987 is offline
Registered User
 
Join Date: Oct 2011
Posts: 1
Optimizing count() on large tables

Hey there!

I need to create a simple DB scheme with very large tables (talking about ~40 million rows)

I've to handle a large collection of metadata of scientific literature (books, articles and so on => I'm just calling them "records" from now on).

Every record is approx. cited 20 times by other records. What I now want to do is building a MySQL-DB to count these citations. The DB should answer questions like:
"How many times is record XYZ cited by all other records?" or
"How many records does record ABC cite?"

Every record has its unique ID (=> rec_id), consisting of an integer with 11 digits.
I want to create 2 tables:
CREATE TABLE IF NOT EXISTS records (id bigint AUTO_INCREMENT PRIMARY KEY, rec_id bigint KEY)

CREATE TABLE IF NOT EXISTS reference (id bigint NOT NULL, INDEX USING BTREE (id), ref_id bigint NOT NULL, INDEX USING BTREE(ref_id))

I'm also planning to add a column like "ref_count" to table "record" in order to cache the cited by count somehow.

Table "records" is going to have 40 million rows. Since each record has approx. 20 citations, table "reference" is going to have ~800 million rows.

Multiple times a year I'll get a list of say 2 million record-ids and need to know, how often this set of records is cited by all other records.
This means I've to send a huge amount of queries like
SELECT count(id) FROM reference WHERE id=1234
or a single query with a large WHERE condition (... WHERE id=1233 OR id = 5678 OR id=....).

A runtime of several weeks would be okay. But how can I speed this up?
The most import part which should be fast is not the insertion of the records, but the huge amount of count() queries.

Which DB engine is the best for my purpose?
Is my DB scheme appropriate?
Is BTREE the best choice?
I'm using an auto_increment id to insert faster into index, but will this affect the runtime of my count() queries?
Which server parameter (buffer_size and so on) are most important?
Do you have any clues how to optimize it?
How much WHERE conditions per request should I use?
Should I collect say 1000 queries (packaging) before I'll send it to the MySQL server?

...



TIA

apollo1987
Reply With Quote
Reply

Tags
count, large tables, optimize, performance

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On