If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Microsoft SQL Server > DB structure for massive data inserts (2,5 to 3,2 mil rows per day)

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-13-10, 16:07
margusj margusj is offline
Registered User
 
Join Date: Mar 2010
Posts: 2
DB structure for massive data inserts (2,5 to 3,2 mil rows per day)

Hi,

I'm taking over one solution witch is not working very well and it haven't maintained systematically.
The plan is to create new table structure/logic and I would appreciate Yours help here.

Input for planning:
data from network, what is need to insert into db, comes like:
"date", "parameter name", "parameter value"
same parameter name, but might be different value, comes approx. after every 3-5 sec. (197 different parameter names)

there's existing application, what inserts this data as one row into db.
that kind of rows comes approx. 2,5 to 3,2 mil. per each day.
Indexes are built on column date and parameter name.

Problem:
Business wants to get every kind of online and calculated data based on this raw data, so we need to insert all sent data.
And if there are more that one week data in one table, queries are getting very slow. (online queries, what should run like once after 30 sec)
also calculations are running from cron, to calculate one hour average, one day average, etc data.

My idea:
create one table, with same indexes, but hold there only last day + current day data.
copy every night now-2 day data to another table. then i can keep so called online table max 6 mil rows and then online queries should be very ok.
second table is for week analyze, also with same indexes, and every night i will copy older than now-7 day data to month table.
month table would be for room saving purpose without indexes and it's actually only archiving.

Questions:
1. how my proposed solution feels?
2. how would be best way to organize nightly coping process?

with thanks.
Margus
Reply With Quote
  #2 (permalink)  
Old 03-13-10, 16:38
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
Are the parameter values all sampled at the same frequency and timestamped with the same time? If so then you could presumably have one row for all 197 attributes - which would mean fewer than 30,000 rows per day.
Reply With Quote
  #3 (permalink)  
Old 03-13-10, 18:12
margusj margusj is offline
Registered User
 
Join Date: Mar 2010
Posts: 2
no, data comes from sensors and sequence or even every batch can consist different order and amount of parameters.
it's like one pipe shooting messages (consist date, param.name and value).
it have to save same way - 3 columns in one row.

of course, question is also, is it point to make one more column - rowid or smt similar ?
is it makes coping easier ?
Reply With Quote
  #4 (permalink)  
Old 03-15-10, 18:17
kaffenils kaffenils is offline
Registered User
 
Join Date: Nov 2005
Posts: 122
You are not looking for a RDBMS system, but a times series database. This is a database system that is specially designed to store time series, and only time series, data. You could take a look at PI System, Historis or iHistorian.
Reply With Quote
  #5 (permalink)  
Old 03-15-10, 20:05
rdjabarov rdjabarov is offline
Registered User
 
Join Date: Jul 2003
Location: San Antonio, TX
Posts: 3,611
This is very similar to what I am collecting for several customers, - perfmon counters. The way I resolved it is by storing the data into data warehouse structure (star schema), where objecttype, objectname, countername, and date value are stored into dimention tables. Then the corresponding ID's for above dimensions are stored along with counter values into a fact table. Of course, along with minimizing the size of each row by introducing the corresponding ID's for dimensions, you also need to consider data partitioning. And in order to introduce an efficient partitioning strategy based on date values, for example, you may want to bring in an additional field from the time dimension. This may be a month, in which case you may want to partition by month with 12 resulting partitions. Or by day of month, with 31 partitions. Or you may even consider week or even a day number (up to 366 partitions). But if you want to introduce a sliding window, you may have to keep the date field within your fact table, and then have a nightly merge/split/switch operation on the production fact table. Along with that, you may want to have an "aligned" archive table with fewer partitions and 1 overlapping with production partition, so that you can really automate "data aging" process.
__________________
"The data in a record depends on the Key to the record, the Whole Key, and
nothing but the Key, so help me Codd."
Reply With Quote
Reply

Tags
mssql 2005, structure

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On