If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > PC based Database Applications > Other PC Databases > Berkeley DB

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-23-09, 18:14
mauricios.8 mauricios.8 is offline
Registered User
 
Join Date: Jan 2009
Posts: 1
Berkeley DB

Dear folks,

I`m dealling with the problem of pattern matching and I have to store 700.000 key/data pairs ("rows") in a db ("table").

The key is a logical number (0 to 699.999). And the data is an array of 65 integers.

I'm new to Berkeley DB, but after reading the manual and successfully testing the c api, I would like to share my results and ask for advice on tunning my config for my specific need.

As the data field has a fixed length (65 * sizeof(int)), I've choosen to use DB_QUEUE. (From what I've read, this is the best choice for my case, CORRECT?) .

The dabatase will be used for READ ONLY operations and will not be queried "concurrently" from several "clients". Just one c program is going to read one db at a time.

ITERATION SPEED
---------------------------------
After filling the DB with 700.000 "rows" the total time needed to iterate to all the data using a cursor is ~5 seconds (the first time) and ~2 seconds (the second time the c program is called).

As I`m a beginner with Berkeley DB, I believe this difference happens because of the cache system.(Am I right?)

QUERY SPEED
----------------------------

"Queryng" the DB with (DB->get) for 140.000 random keys takes around 0.8 seconds.

MY SCENARIO
------------------------

I'm going to use this DB for pattern matching and therefore all "querys" to the db will work like random accesses.
Each time I need to do pattern matching I will have to query the database 140.000 times (randomly).

(The best solution would be to have all the database loaded in memory ~183 mb).

Berkeley DB seemed to be very fast for ONE DB with 700.000 "rows", when it is queried after the first time (I mean, the time to iterate falls from 5 to 2 the second time I access it randomly).

But the problem is that I need to create several (+20) DB`s like that, each with 700.000 rows.

The first 5 DB's have the same speed as the experience witht he fitst one, described above .
The problem is that, after the 5, each new DB starts to take a greater amount of time for the iteration to be completed on its full content. The 6th DB takes 6 seconds, the 7th takes 8 seconds, and the 20th arrives at almost 12 seconds.
When it comes to querying the DB randomly, the same slow down happens, but in a exponential way. The query time becomes pretty slow: more then 60 seconds on the 20th DB.

Would you suggest any tunning on the config (pagesize, cache and mmap size for my case) or even another DB method / ACCESS method or structure for indexing this data?


I do appreciate your help,
Thank you,
Maurice
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On