| |
Welcome to the dBforums forums.
You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!
If you have any problems with the registration process or your account login, please contact contact support.
If you prefer not to see double-underlined words and corresponding ads, place your cursor here for ContentLink opt out.
|
 |

10-09-07, 13:40
|
|
Registered User
|
|
Join Date: Oct 2007
Posts: 5
|
|
|
Free time series DBMS
|
Hi there,
I am currently involved in a project dealing with huge amounts of very simply structured data in the form of:
- a unix time-stamp
- 2 byte measurement data
- 2 byte meta information
The thing that's so special here is, that there are no delete or update operations involved at all. The only thing the DBMS has to provide is an extremely efficient way to append data, get access to the latest data-point and to calculate some statistics like arithmetic mean, modal values, minima/maxima.
So I was wondering, if a time series DBMS is actually the way to go, or if we should just go ahead and develop a home-brewn solution.
I also thought about using a general purpose DBMS like Mysql or Postgres, but I am having concerns with overhead and that it just will not be able to provide the data fast enough.
cheers, andreas
|
|

10-09-07, 14:22
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,359
|
|
Can you guarantee that the time stamp will be unique? or that a combination of timestamp/meta information will be unique?
Please quantify "Huge Amounts"... What is huge to you, may not be large to someone else...
Please quantify "fast enough" ... i.e. Insert rates, retrieval rates, minimum response time, etc.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
|

10-09-07, 14:33
|
|
Registered User
|
|
Join Date: Oct 2007
Posts: 5
|
|
|
Quote:
|
Originally Posted by loquin
Can you guarantee that the time stamp will be unique? or that a combination of timestamp/meta information will be unique?
|
all combinations of timestamp/meta information will be unique.
Quote:
|
Originally Posted by loquin
Please quantify "Huge Amounts"... What is huge to you, may not be large to someone else...
|
I'd estimate that there will be about one insert every two to three seconds.
Quote:
|
Originally Posted by loquin
Please quantify "fast enough" ... i.e. Insert rates, retrieval rates, minimum response time, etc.
|
Insert rates aren't that important (as long as they don't block selects on statistics), what really counts is the access to the last data-point (again depending on timestamp/meta information) and fast access to statistics.
|
|

10-09-07, 14:49
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,359
|
|
Quote:
|
Originally Posted by amm
what really counts is the access to the last data-point (again depending on timestamp/meta information) and fast access to statistics.
|
define "fast"...
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
|

10-09-07, 14:55
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,359
|
|
Note: The amount of data you describe is on the order of 15 million data points (and apx. 200 MB) per year - not 'huge' by any stretch. The index to support the unique/primary key you describe would almost double this. The database size to support this would not be excessive for any server-based product. Even the Express versions of SQL Server, Oracle, etc., would hold 5 years worth of data in a single database.
PostgreSQL or MySQL should work fine for this app. Assuming you don't need millisecond response times for reading the data.
I can't speak for MySQL, but PostgreSQL supports partitioned tables, which act to break up a single logical table structure into multiple, smaller physical tables, improving retrieval response times when single tables get too large (when they won't fit into available server RAM) in time critical applications.
How many clients will be retrieving data from the server at any one time?
Note - if you're really concerned about it, download the latest version of PostgreSQL (it's totally free,) install it, build a testbed database with the table/PK you describe, and populate it with a year or two of test data. Then, run tests to see how well it responds...
I tested postgresql last year with a similar, simple table structure, looking at insert speeds; I was pushing 1500 inserts a second into the table, and was retrieving data simultaneously (1-1000 records) with no apparent reduction in insertion rate.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
Last edited by loquin : 10-09-07 at 16:02.
|

10-09-07, 15:00
|
|
Registered User
|
|
Join Date: Oct 2007
Posts: 5
|
|
Quote:
|
Originally Posted by loquin
define "fast"...
|
there's no definitive time-out limit, but the values should be delivered in about 0.1 to 0.5 seconds.
|
|

10-09-07, 15:12
|
|
Registered User
|
|
Join Date: Oct 2007
Posts: 5
|
|
Quote:
|
Originally Posted by loquin
Note: The amount of data you describe is on the order of 200 MB per year - not 'huge' by any stretch. The index to support the unique/primary key you describe would almost double this. The database size to support this would not be excessive for any server-based product. Even the Express versions of SQL Server, Oracle, etc., would hold 5 years worth of data in a single database.
PostgreSQL or MySQL should work fine for this app. Assuming you don't need millisecond response times for reading the data.
|
Thanks a lot for the suggestions. I will give PostgreSQL a try and if that doesn't work out, I'll check out MySQL.
Quote:
|
Originally Posted by loquin
I can't speak for MySQL, but PostgreSQL supports partitioned tables, which act to break up a single logical table structure into multiple, smaller physical tables, improving retrieval response times when single tables get too large (when they won't fit into available server RAM) in time critical applications.
How many clients will be retrieving data from the server at any one time?
|
The absolute worst case would be about 50 clients at the same time.
|
|

10-09-07, 15:24
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 9,291
|
|
Quote:
|
Originally Posted by loquin
What is huge to you, may not be large to someone else...
|
Always an embarrassing realization.
|
|

10-09-07, 16:04
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,359
|
|
Quote:
|
Originally Posted by blindman
Always an embarrassing realization.
|
these personal situations sometime can be... 
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
|

10-09-07, 18:35
|
|
Super Moderator
|
|
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,359
|
|
amm:
with the statistical values be for the total number of records, or for a subset of the records, and if a subset, will the subset usually be time-based? (records for the last 24 hours, for instance...)
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
|
|

10-11-07, 18:07
|
|
Registered User
|
|
Join Date: Oct 2007
Posts: 5
|
|
Quote:
|
Originally Posted by loquin
amm:
with the statistical values be for the total number of records, or for a subset of the records, and if a subset, will the subset usually be time-based? (records for the last 24 hours, for instance...)
|
statistics need to be calculated for: - all available data
- data from the last day
- the last week
- the last month
- the last year
at the moment we only intend to use some basic descriptive statistics like the arithmetic mean, modal values, minima, maxima, ...
there may be need to do regression analysis later on, but currently the main focus lies on descriptive statistics.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|