Results 1 to 8 of 8
  1. #1
    Join Date
    Nov 2007
    Posts
    2

    Post Unanswered: Database - Large Number Of Files Storage

    Hi there,

    I am looking for a database that can be used for storing a large number of database files (i.e. in the thousands) and each database file must be able to store up to 200-300 MB of data.
    And on top of that it needs to have low or zero administration so that we can deploy these databases on many machines. I think that embedded databases might fall under this category.

    We will be connecting and running queries using C/C++ but most databases have interface wrappers for that so that shouldn't be an issue.

    SQL Server 2005 is interesting but the administration might be a bit of an issue. Does Oracle or IBM have any lighter versions?
    Open source solutions are also interesting but support might also be a bit of an issue in there.

    Any ideas, or sites provided would be greatly appreciated.

    Thanks in advance.

    Best regards,
    Ovidiu Anghelidi

    ovidiu@intelligencerealm.com
    http://www.intelligencerealm.com/aisystem
    Artificial Intelligence System - Reverse Engineering The Brain

  2. #2
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    You're talking about a Terrabyte or more of storage... And, you want the files stored inside the database, not with links inside the database pointing to external files?

    With virtually no maintenance or administration? The data must not be very important.



    Good luck with your search.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  3. #3
    Join Date
    Nov 2007
    Posts
    2
    Hi Loquin,

    You're talking about a Terrabyte or more of storage...
    It is actually going to be more than that but yes, you are right.

    And, you want the files stored inside the database, not with links inside the database pointing to external files?
    I am sorry. I didn't make myself clear. I do not need a file inside a database, only data that is going to be stored in a database, but because there is going to be a lot of data, we will have to split the data and keep it into multiple database files. In know that in Oracle and SQL Server it is possible to increase the number of database files where the data is stored. That's what I meant by database files and that's what we need.

    With virtually no maintenance or administration? The data must not be very important
    Yes, we need zero administration. Most of the data will not change and we will have mostly inserts. The data is important but because we will have multiple copies of data on multiple machines, between 2 and possibly 10 machines, even if a machine goes down the other ones will still be able to access the information. We are using BOINC (i.e. boinc.berkeley.edu) a distributed computing framework and we need something that works with that on Windows machines.

    Good luck with your search.
    Thanks a lot. Not that easy though.

  4. #4
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    I would suggest taking a look at PostgreSQL. Once you get things set up, it can be fairly maintenence free, but, you will occasionally need to perform maintenence on it. Many maintenence tasks can be automated. (Run table vacuums and statistics collections at off hours, same with backups.) But, I think you're being overly optimistic to think that you'll be able to have a server cluster of that size and not have some maintenence. No matter WHAT database you use.

    Pg supports automatic table partitioning, and the partitions can be located on different tablespaces (folders/drives) so that you can improve performance. A field or fields may be used as the 'divider' for partitioning. For instance, suppose that you were storing information about people, and you had set up a tablespace based on the last name. You could have all last name records which begin with "A" in one tablespace, lastname beginning with "B" in another, and so on. Your application would insert a record; the database would automatically place the records into the appropriate tablespace.

    Most of the big name databases (Oracle, DB2, SQL server) can use similar approaches, and support similar capacity though.

    As far as being boinc compatible, I have no clue. I don't know what boinc runs on.
    Last edited by loquin; 11-16-07 at 16:32.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  5. #5
    Join Date
    May 2005
    Location
    San Antonio, Texas
    Posts
    134
    ok, first.. I will say that I am probably discussing stuff I shouldn't because I think I am confused about this post

    I am going to take a guess and say that when intrealm says "we can deploy these databases on many machines" that they probably mean work units for the host machines to work on. So they want to pass out large work units and then get them back later and store them. I don't think you need a database to handle that but you really would have to look into how boinc works to answer that. If boinc gives you freedom in how you pass information around then you can pass the data to an embedded db or file on the host application and then pass it back to whatever your server db is running. I don't imagine they have to be the same or that you have to 'store' a database. You could pickle/serialize the data and pass it around.

    so simple answer: find a nice embedded db for whatever app you create (I am not sure how boinc handles this) and then pick a good database for your server to store the data. When work is done.. pass it from host to server I of course recommend postgresql hehe.. but I honestly don't know what would be best for you.

    for anyone else interested: http://boinc.berkeley.edu/
    some of you may of heard of seti@home.. it uses boinc use unused computer time to do distributed processing
    Vi veri veniversum vivus vici
    By the power of truth, I, a living man, have conquered the universe

  6. #6
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    IF you are talking about setting up a boinc server, then take a look at this link. A Boinc server requires mySql, running on Linux or Unix.

    If you are discussing boinc clients, that's a different story altogether.

    You would have to develop a client app, and distribute that app to the clients that need it. The boinc server distributes relatively small work units to the clients, which store and process these work unit chunks of data, and then upload the results to the boinc server. In that case, an embedded db would probably work just fine. Because you won't be storing much data on each client.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  7. #7
    Join Date
    Nov 2007
    Posts
    2
    Hi Ovidiu Anghelidi

    I could see that you are talking about data in TB or more. Being in Ingres I would like you to go forward in looking at Ingres simultaneously with the other large databases like Oracle, SQL Server etc.

    Normally all the databases are efficient enough to handle large amounts of data, but it only differs in the way of maintenance and proactiveness in the approach.

    Coming to Ingres one interesting thing is the database uses size of the OS disk size to its data files. So, in other words you can use the full disk space available to the datafiles. It also supports any OS and newer concepts like the replication, disk mirroring, SAN Storage etc.

    It file system is also flexible enough in storing the datafiles separately in terms of the database created. The concepts of maintaining and using Ingres is very simple and clear which is ease at approach and understanding even for an infant in database environment. The cost of maintenance is almost negligible.

    You install it runs........

    General performance and Service Pack updates are required to be done timely and nothing else is felt necessary. Mainly no hickups......

  8. #8
    Join Date
    Jul 2009
    Posts
    4
    Hi intrealm

    Ingres/r3 is the best way, in many aspects
    i'm working with very big databases, and works fine.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •