If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > High Capacity Database (overload?)

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 09-29-10, 06:32
UnoriginalGuy UnoriginalGuy is offline
Registered User
 
Join Date: Sep 2010
Posts: 3
High Capacity Database (overload?)

I've been tasked with designing what is, on the surface, a fairly simple database which essentially stores two things:
- A blob of data (between 10 KB and 100 MB each)
- A set of information about each blob (e.g. Name, Date, Context)

So thus far we have two tables one table of raw data and a second table that references the first and also stores additional information.

Where it gets "complex" is that it needs to store 500,000 to 1,000,000 of these and frankly I am worried that two tables might collapse.

Additionally the total raw data size is over 10 gig and it might grow bigger in the future.

Now I could split this data by context (I have contextual information) but I would only opt to do that if I felt that it would help. I can either split it four different ways or 70,000 different ways (yes, 70K!).

Has anyone ever ran across a situation like this? Does splitting the data across several tables or even databases help at all?
Reply With Quote
  #2 (permalink)  
Old 09-29-10, 06:51
healdem healdem is offline
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,246
do you need the blob of data to be stored inside the db?
in terms of size of data 10 gig isn't excessive by modern standards, there a good few terrabyte db's, it "just" comes down to cost/budget constraints

as to whether splitting the db makes sense that comes down to you and your design. if the blobs of data are not similar or realted or used by the same people then you could make a case to split the db into relevant sections. personally I think you should resist that temptation.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
  #3 (permalink)  
Old 09-29-10, 06:55
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
500,000 of 10kb is orders of magnitude different to 1,000,000 of 100MB. To plan capacity you will need to zero in on to more accurate numbers.

I'm afraid I don't know what a collapsing table is.

What is your RDBMS? Do you have hardware? What is the purpose of the database? Is it pretty static and will only a blob or two at a time be retrieved, or is it a highly transactional system that will have many thousands of blobs affected by each transaction?
Also, are these blobs currently on a filesystem and, if so, what is the driver to move them into the database? (the reason I ask is that moving the blobs in to the database is a design decision that is not explained by your post).
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #4 (permalink)  
Old 09-29-10, 07:41
UnoriginalGuy UnoriginalGuy is offline
Registered User
 
Join Date: Sep 2010
Posts: 3
Quote:
Originally Posted by healdem View Post
do you need the blob of data to be stored inside the db?
Right now each blob is stored as 1(!) file on the filesystem and having that many files is causing masses of issues. Doing anything with them has a huge amount of overhead. Plus backup is almost impossible (it takes three days per cycle).

We're hoping to reduce the overhead by moving away from files entirely and moving into nice neat tables - or at least that is the theory.

Quote:
Originally Posted by healdem View Post
]as to whether splitting the db makes sense that comes down to you and your design. if the blobs of data are not similar or realted or used by the same people then you could make a case to split the db into relevant sections. personally I think you should resist that temptation.
In terms of database design, it doesn't make sense to split them. They're all generic. Splitting them would be artificial and would only be done in order to maintain the stability of the database.
Reply With Quote
  #5 (permalink)  
Old 09-29-10, 07:51
UnoriginalGuy UnoriginalGuy is offline
Registered User
 
Join Date: Sep 2010
Posts: 3
Quote:
Originally Posted by pootle flump View Post
500,000 of 10kb is orders of magnitude different to 1,000,000 of 100MB. To plan capacity you will need to zero in on to more accurate numbers.
You're absolutely right. But unfortunately it really does vary just that much. Right now we're consuming 10 GB, but in the future I'd expect that to raise. It is very hard to pin down just due to the kind of data that comes in.

Quote:
Originally Posted by pootle flump View Post
I'm afraid I don't know what a collapsing table is.
I guess that is worded wrong. What I want to avoid is having the database throw an exception because the table has exceeded its capacity.

Quote:
Originally Posted by pootle flump View Post
What is your RDBMS?
Likely Postgres.

Quote:
Originally Posted by pootle flump View Post
Do you have hardware?
A mid-range dedicated 2008 server.

Quote:
Originally Posted by pootle flump View Post
What is the purpose of the database?
To expose the data (and meta data) to a web-service which in term exposes it internally. The web-service would relay the raw data and meta data depending on the request.

Quote:
Originally Posted by pootle flump View Post
Is it pretty static and will only a blob or two at a time be retrieved, or is it a highly transactional system that will have many thousands of blobs affected by each transaction?
Fairly static. I'd say 90% of them won't change in any one day, most requests will be for only one blob of data, but for multiple piece of meta data.

So for example you might have a request that downloads one hundred meta data rows, but then only downloads one blob of data as a end result.

Quote:
Originally Posted by pootle flump View Post
Also, are these blobs currently on a filesystem and, if so, what is the driver to move them into the database? (the reason I ask is that moving the blobs in to the database is a design decision that is not explained by your post).
We have hundreds of thousands of little files that are almost impossible to backup (due to the overhead created by moving each one file). We also want to expose this data as a web-service which, granted, we can do with the file system, it would be easier to implement if we can write queries against a dataset rather than having to use the file system's structure.
Reply With Quote
  #6 (permalink)  
Old 09-29-10, 08:47
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by UnoriginalGuy View Post
Likely Postgres.
I'm pretty confident that Postgres is be able to cope with that size.
Provided you properly tune it and have a decent RAID system (e.g. a RAID 10 with a battery backed controlled)

Quote:
We have hundreds of thousands of little files that are almost impossible to backup (due to the overhead created by moving each one file). We also want to expose this data as a web-service which, granted, we can do with the file system, it would be easier to implement if we can write queries against a dataset rather than having to use the file system's structure.
While I usually prefer to store BLOBs in the database don't forget that backing up the database will be quite time consuming due to the size of it. Doing incremental backups with Postgres will require you to setup WAL archiving (which is probably a good idea anyway...)

The normal pattern to deal with such a large number of files in the file system, is to distribute them over several directories, e.g. by hashing the filename or a different attribute.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On