Results 1 to 10 of 10
  1. #1
    Join Date
    Dec 2010
    Posts
    4

    Unanswered: advice in the layout of a fairly large database

    i am looking for someone to give me a little help(advice) with setting up a fairly large database. i would like to have a little back and forth dialog about if i am doing things correctly.

    i'm not talking about little things like how to create a column and stuff. i'm talking about more complicated things like how i should be rolling over archive data to different tables/databases, which indexes to create... stuff that a real/good dba gets paid to know.

    it would be great to do it over a forum, as maybe others can learn from it too.

    anyone interested?

    i am a software programmer, and while i have done plenty of projects that use databases, i haven't done any on the scale of what will eventually (and fairly quickly) hold terabytes of data. i actually already have a system up and running, but it was done in a hurry, and i know it should be tweaked.

    if anyone is interested, lemme know and i'll start posting my questions/descriptions.

    thanks a lot.

    okie

  2. #2
    Join Date
    Nov 2003
    Posts
    2,932
    Provided Answers: 12
    First question: what do you consider "fairly large"?
    What is the estimated size in GB?
    How many rows will your biggest table have?

  3. #3
    Join Date
    Dec 2010
    Posts
    4
    good question.

    from the looks of our current system (which has been running for around 3 weeks now), my biggest table has around 11 million rows in it. i have a few tables that are about half that size, and several that run about 1/10th that size.

    currently, the size is 11GB.

    things will get added over time, but the bulk of everything is already there (maybe 80%-90%).

    i would estimate this database will grow about 150-200 GB in size per year.

    as i am assuming is the case with most databases, the bulk on what will actually be queried will be information gathered in the past 3-6 months.

  4. #4
    Join Date
    Nov 2003
    Posts
    2,932
    Provided Answers: 12
    11 Million rows is more considered "medium sized" in the PostgreSQL world...

    It does sound you want to look into partioning though.
    But without any specific queries/table definitions/index definitions it's very hard to give you an answer.

    You also want to buy a copy of the just recently published book "High Performance PostgreSQL"
    PostgreSQL: News: "PostgreSQL 9.0 High Performance" book now available

  5. #5
    Join Date
    Dec 2010
    Posts
    4
    good deal. we actually have an offshoot of this system that gets 4 times as much traffic. so somewhere along the lines of 60 million rows a month. but it is going to be stripped down very soon.

    almost all of our data is stored in a proprietary binary format. everything where i work is legacy. code was developed 15 years ago to house our data because databases at the time couldn't really deal with binary data quickly, and so many systems have grown to rely on our internal code, it is still our primary storage/data format (it has been refined throughout the years, and works very, very well for what it does). fortunately, they've brought in some young blood (myself), and are trying to make some transitions to modernizing some legacy projects, as well as new ones.

    we will definitely get into partitioning our data (and currently are doing some special things). fortunately, we have lots of money for hardware.

    my questions are more about the database schema itself. mind if i shoot some questions at you?

  6. #6
    Join Date
    Nov 2003
    Posts
    2,932
    Provided Answers: 12
    Quote Originally Posted by okie.floyd View Post
    fortunately, we have lots of money for hardware.



    my questions are more about the database schema itself. mind if i shoot some questions at you?
    No, concrete questions are always better...

    One thing you might also consider is to have a read-only hot-standby (that is available with PG 9.0) that is only used for long-running queries to off-load load from the online-system.

  7. #7
    Join Date
    Dec 2010
    Posts
    4
    dunno quite what you mean by concrete questions, but i will try...

    we are currently using version 8.4. Version 9 has not been cleared yet via our specs with red hat.

    also, will get the books you recommended.

    okay, first set of questions.

    a.) what is your input on foreign keys? currently, we have a few "products" that are so big, they require a parent table and then a few subtables, where lets say parent table A holds the indexes (unique IDs) of subtables B, C, D, E, and F. currently, we have those references to B, C, D, E, and F set up as foreign keys; is there any overhead to that? is it necessary? as the database grows, is it a lot of overhead to keep track of those foreign references?

    b.) i never quite understood this. do i need to put indexes on child-tables (tables created via inheritence)? if B inherits A, and I put an index on A.column_1, is B.column_1 automatically indexed? or do i need another individual index? if so, do i even need to index A.column_1?

    c.) what is the rule with creating archive tables/databases? as I said, our info becomes less "valuable" the older it gets, and will be queried less often. should this info eventually get unloaded to some other tables/databases, so that the info for the tables that gets queried more often can sit on some partitions that will allow them to be more efficient in respect to queries? give me some insight into this concept if you don't mind.

    thanks.

  8. #8
    Join Date
    Nov 2003
    Posts
    2,932
    Provided Answers: 12
    Disclaimer: I have never needed to use table inheritance so far.
    Quote Originally Posted by okie.floyd View Post
    a.) what is your input on foreign keys?
    You can't have a serious database without them. If your data is interesting just one bit (no pun intended) they are not optional

    as the database grows, is it a lot of overhead to keep track of those foreign references?
    No, not really. Especially if you have an index on each foreign key

    b.) i never quite understood this. do i need to put indexes on child-tables (tables created via inheritence)?
    Sorry, no idea (as I said: I never had the need to use partitioning)

    c.) what is the rule with creating archive tables/databases? as I said, our info becomes less "valuable" the older it gets, and will be queried less often. should this info eventually get unloaded to some other tables/databases, so that the info for the tables that gets queried more often can sit on some partitions that will allow them to be more efficient in respect to queries? give me some insight into this concept if you don't mind.
    You could create partitions e.g. by month or year, moving older partitions to different tablespaces to take IO load off the "main" partitions.

    Data that just "sits" on disk and is not queried will not affect your performance (except for backup & restore)

  9. #9
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Quote Originally Posted by shammat View Post
    ...
    Data that just "sits" on disk and is not queried will not affect your performance (except for backup & restore)
    unless the server has to perform sequential scans to support the select queries
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  10. #10
    Join Date
    Nov 2003
    Posts
    2,932
    Provided Answers: 12
    Quote Originally Posted by loquin View Post
    unless the server has to perform sequential scans to support the select queries
    Sorry, I wasn't clear there.
    Of course you are right, but I was referring to "archive" tables or other ways to move data out of the "normal" tables.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •