Results 1 to 11 of 11
  1. #1
    Join Date
    Oct 2007
    Posts
    5

    Free time series DBMS

    Hi there,

    I am currently involved in a project dealing with huge amounts of very simply structured data in the form of:

    - a unix time-stamp
    - 2 byte measurement data
    - 2 byte meta information

    The thing that's so special here is, that there are no delete or update operations involved at all. The only thing the DBMS has to provide is an extremely efficient way to append data, get access to the latest data-point and to calculate some statistics like arithmetic mean, modal values, minima/maxima.

    So I was wondering, if a time series DBMS is actually the way to go, or if we should just go ahead and develop a home-brewn solution.

    I also thought about using a general purpose DBMS like Mysql or Postgres, but I am having concerns with overhead and that it just will not be able to provide the data fast enough.

    cheers, andreas

  2. #2
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Can you guarantee that the time stamp will be unique? or that a combination of timestamp/meta information will be unique?

    Please quantify "Huge Amounts"... What is huge to you, may not be large to someone else...

    Please quantify "fast enough" ... i.e. Insert rates, retrieval rates, minimum response time, etc.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  3. #3
    Join Date
    Oct 2007
    Posts
    5
    Quote Originally Posted by loquin
    Can you guarantee that the time stamp will be unique? or that a combination of timestamp/meta information will be unique?
    all combinations of timestamp/meta information will be unique.

    Quote Originally Posted by loquin
    Please quantify "Huge Amounts"... What is huge to you, may not be large to someone else...
    I'd estimate that there will be about one insert every two to three seconds.

    Quote Originally Posted by loquin
    Please quantify "fast enough" ... i.e. Insert rates, retrieval rates, minimum response time, etc.
    Insert rates aren't that important (as long as they don't block selects on statistics), what really counts is the access to the last data-point (again depending on timestamp/meta information) and fast access to statistics.

  4. #4
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Quote Originally Posted by amm
    what really counts is the access to the last data-point (again depending on timestamp/meta information) and fast access to statistics.
    define "fast"...
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  5. #5
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Note: The amount of data you describe is on the order of 15 million data points (and apx. 200 MB) per year - not 'huge' by any stretch. The index to support the unique/primary key you describe would almost double this. The database size to support this would not be excessive for any server-based product. Even the Express versions of SQL Server, Oracle, etc., would hold 5 years worth of data in a single database.

    PostgreSQL or MySQL should work fine for this app. Assuming you don't need millisecond response times for reading the data.

    I can't speak for MySQL, but PostgreSQL supports partitioned tables, which act to break up a single logical table structure into multiple, smaller physical tables, improving retrieval response times when single tables get too large (when they won't fit into available server RAM) in time critical applications.

    How many clients will be retrieving data from the server at any one time?


    Note - if you're really concerned about it, download the latest version of PostgreSQL (it's totally free,) install it, build a testbed database with the table/PK you describe, and populate it with a year or two of test data. Then, run tests to see how well it responds...

    I tested postgresql last year with a similar, simple table structure, looking at insert speeds; I was pushing 1500 inserts a second into the table, and was retrieving data simultaneously (1-1000 records) with no apparent reduction in insertion rate.
    Last edited by loquin; 10-09-07 at 16:02.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  6. #6
    Join Date
    Oct 2007
    Posts
    5
    Quote Originally Posted by loquin
    define "fast"...
    there's no definitive time-out limit, but the values should be delivered in about 0.1 to 0.5 seconds.

  7. #7
    Join Date
    Oct 2007
    Posts
    5
    Quote Originally Posted by loquin
    Note: The amount of data you describe is on the order of 200 MB per year - not 'huge' by any stretch. The index to support the unique/primary key you describe would almost double this. The database size to support this would not be excessive for any server-based product. Even the Express versions of SQL Server, Oracle, etc., would hold 5 years worth of data in a single database.

    PostgreSQL or MySQL should work fine for this app. Assuming you don't need millisecond response times for reading the data.
    Thanks a lot for the suggestions. I will give PostgreSQL a try and if that doesn't work out, I'll check out MySQL.

    Quote Originally Posted by loquin
    I can't speak for MySQL, but PostgreSQL supports partitioned tables, which act to break up a single logical table structure into multiple, smaller physical tables, improving retrieval response times when single tables get too large (when they won't fit into available server RAM) in time critical applications.

    How many clients will be retrieving data from the server at any one time?
    The absolute worst case would be about 50 clients at the same time.

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by loquin
    What is huge to you, may not be large to someone else...
    Always an embarrassing realization.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Quote Originally Posted by blindman
    Always an embarrassing realization.
    these personal situations sometime can be...
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  10. #10
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    amm:

    with the statistical values be for the total number of records, or for a subset of the records, and if a subset, will the subset usually be time-based? (records for the last 24 hours, for instance...)
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  11. #11
    Join Date
    Oct 2007
    Posts
    5
    Quote Originally Posted by loquin
    amm:

    with the statistical values be for the total number of records, or for a subset of the records, and if a subset, will the subset usually be time-based? (records for the last 24 hours, for instance...)
    statistics need to be calculated for:
    • all available data
    • data from the last day
    • the last week
    • the last month
    • the last year


    at the moment we only intend to use some basic descriptive statistics like the arithmetic mean, modal values, minima, maxima, ...

    there may be need to do regression analysis later on, but currently the main focus lies on descriptive statistics.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •