If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Other > Database - Large Number Of Files Storage

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-15-07, 09:50
intrealm intrealm is offline
Registered User
 
Join Date: Nov 2007
Posts: 2
Post Database - Large Number Of Files Storage

Hi there,

I am looking for a database that can be used for storing a large number of database files (i.e. in the thousands) and each database file must be able to store up to 200-300 MB of data.
And on top of that it needs to have low or zero administration so that we can deploy these databases on many machines. I think that embedded databases might fall under this category.

We will be connecting and running queries using C/C++ but most databases have interface wrappers for that so that shouldn't be an issue.

SQL Server 2005 is interesting but the administration might be a bit of an issue. Does Oracle or IBM have any lighter versions?
Open source solutions are also interesting but support might also be a bit of an issue in there.

Any ideas, or sites provided would be greatly appreciated.

Thanks in advance.

Best regards,
Ovidiu Anghelidi

ovidiu@intelligencerealm.com
http://www.intelligencerealm.com/aisystem
Artificial Intelligence System - Reverse Engineering The Brain
Reply With Quote
  #2 (permalink)  
Old 11-15-07, 12:39
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
You're talking about a Terrabyte or more of storage... And, you want the files stored inside the database, not with links inside the database pointing to external files?

With virtually no maintenance or administration? The data must not be very important.



Good luck with your search.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert

Reply With Quote
  #3 (permalink)  
Old 11-15-07, 13:35
intrealm intrealm is offline
Registered User
 
Join Date: Nov 2007
Posts: 2
Hi Loquin,

Quote:
You're talking about a Terrabyte or more of storage...
It is actually going to be more than that but yes, you are right.

Quote:
And, you want the files stored inside the database, not with links inside the database pointing to external files?
I am sorry. I didn't make myself clear. I do not need a file inside a database, only data that is going to be stored in a database, but because there is going to be a lot of data, we will have to split the data and keep it into multiple database files. In know that in Oracle and SQL Server it is possible to increase the number of database files where the data is stored. That's what I meant by database files and that's what we need.

Quote:
With virtually no maintenance or administration? The data must not be very important
Yes, we need zero administration. Most of the data will not change and we will have mostly inserts. The data is important but because we will have multiple copies of data on multiple machines, between 2 and possibly 10 machines, even if a machine goes down the other ones will still be able to access the information. We are using BOINC (i.e. boinc.berkeley.edu) a distributed computing framework and we need something that works with that on Windows machines.

Quote:
Good luck with your search.
Thanks a lot. Not that easy though.
Reply With Quote
  #4 (permalink)  
Old 11-16-07, 15:27
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
I would suggest taking a look at PostgreSQL. Once you get things set up, it can be fairly maintenence free, but, you will occasionally need to perform maintenence on it. Many maintenence tasks can be automated. (Run table vacuums and statistics collections at off hours, same with backups.) But, I think you're being overly optimistic to think that you'll be able to have a server cluster of that size and not have some maintenence. No matter WHAT database you use.

Pg supports automatic table partitioning, and the partitions can be located on different tablespaces (folders/drives) so that you can improve performance. A field or fields may be used as the 'divider' for partitioning. For instance, suppose that you were storing information about people, and you had set up a tablespace based on the last name. You could have all last name records which begin with "A" in one tablespace, lastname beginning with "B" in another, and so on. Your application would insert a record; the database would automatically place the records into the appropriate tablespace.

Most of the big name databases (Oracle, DB2, SQL server) can use similar approaches, and support similar capacity though.

As far as being boinc compatible, I have no clue. I don't know what boinc runs on.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin; 11-16-07 at 15:32.
Reply With Quote
  #5 (permalink)  
Old 11-16-07, 16:29
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
ok, first.. I will say that I am probably discussing stuff I shouldn't because I think I am confused about this post

I am going to take a guess and say that when intrealm says "we can deploy these databases on many machines" that they probably mean work units for the host machines to work on. So they want to pass out large work units and then get them back later and store them. I don't think you need a database to handle that but you really would have to look into how boinc works to answer that. If boinc gives you freedom in how you pass information around then you can pass the data to an embedded db or file on the host application and then pass it back to whatever your server db is running. I don't imagine they have to be the same or that you have to 'store' a database. You could pickle/serialize the data and pass it around.

so simple answer: find a nice embedded db for whatever app you create (I am not sure how boinc handles this) and then pick a good database for your server to store the data. When work is done.. pass it from host to server I of course recommend postgresql hehe.. but I honestly don't know what would be best for you.

for anyone else interested: http://boinc.berkeley.edu/
some of you may of heard of seti@home.. it uses boinc use unused computer time to do distributed processing
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
  #6 (permalink)  
Old 11-16-07, 20:21
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
IF you are talking about setting up a boinc server, then take a look at this link. A Boinc server requires mySql, running on Linux or Unix.

If you are discussing boinc clients, that's a different story altogether.

You would have to develop a client app, and distribute that app to the clients that need it. The boinc server distributes relatively small work units to the clients, which store and process these work unit chunks of data, and then upload the results to the boinc server. In that case, an embedded db would probably work just fine. Because you won't be storing much data on each client.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert

Reply With Quote
  #7 (permalink)  
Old 11-20-07, 09:43
shirishtvrs shirishtvrs is offline
Registered User
 
Join Date: Nov 2007
Posts: 2
Hi Ovidiu Anghelidi

I could see that you are talking about data in TB or more. Being in Ingres I would like you to go forward in looking at Ingres simultaneously with the other large databases like Oracle, SQL Server etc.

Normally all the databases are efficient enough to handle large amounts of data, but it only differs in the way of maintenance and proactiveness in the approach.

Coming to Ingres one interesting thing is the database uses size of the OS disk size to its data files. So, in other words you can use the full disk space available to the datafiles. It also supports any OS and newer concepts like the replication, disk mirroring, SAN Storage etc.

It file system is also flexible enough in storing the datafiles separately in terms of the database created. The concepts of maintaining and using Ingres is very simple and clear which is ease at approach and understanding even for an infant in database environment. The cost of maintenance is almost negligible.

You install it runs........

General performance and Service Pack updates are required to be done timely and nothing else is felt necessary. Mainly no hickups......
Reply With Quote
  #8 (permalink)  
Old 07-07-09, 12:10
JF.pt JF.pt is offline
Registered User
 
Join Date: Jul 2009
Posts: 4
Hi intrealm

Ingres/r3 is the best way, in many aspects
i'm working with very big databases, and works fine.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On