Results 1 to 3 of 3
  1. #1
    Join Date
    Dec 2009
    Posts
    1

    Unanswered: Redundant data to speed up database access

    Hi All

    First of all let me declare myself as a novice database programmer. I am actually an electronics engineer with a database related project.

    I am collecting data from machines in a factory, that will provide information on the manufacturing throughput, downtime, waste etc on these machines. That data is being stored in a SQL Server Express database, in a table I call EventRecords. My application logs about 50 to 100 event records from each of 10 machines, each day. Each event record has about 15 fields of data. So, on a daily basis, I am creating about 500 to 1000 new rows in my EventRecords table, that relate to machine statistics.

    Now, I am also creating reports based on this data and these reports show Key Performance Indicators (KPIs) such as Downtime, RunTime, Waste for each of the machines. This is all, pretty straight forward and fine.

    My question is, though, as time progresses, lets say 1 year from now, my database will have maybe 350,000 EventRecords, and each time I run a report to analyse a Days, or Weeks data, the database will have to scroll through 350,000 records to get at the required dataset.

    Why shouldn't I create another table, lets call it KPI for example, and at midnight each day run a query on the EventRecords table and extract the Key Performance data for each machine and place that 'Summary' for each of the 10 machines into the KPI table, therefore creating just 10 additional new records with the summarised Key Performance data of each machine for that whole day. Then when producing reports in the future, I could query the KPI table instead of the EventRecords table and have only 1/100th of the data to filter through.

    Now I gather that this type of Data Replication is frowned upon, since it goes against the concept of Relational Databases, but this KPI table which could be triggered and executed by a small service on the server seems to be a sensible thing for me to do, improving the database performance in the future.

    I would greatly appreciate your professional views on this.

    Thanks

    Mike

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by mike3 View Post
    Now I gather that this type of Data Replication is frowned upon, since it goes against the concept of Relational Databases, but this KPI table which could be triggered and executed by a small service on the server seems to be a sensible thing for me to do, improving the database performance in the future.
    This is not called "data replication", nor does it go against the concept of relational databases. What you are describing is called "materialized view" or "indexed view", and it is a commonly used approach to improve performance of reporting and decision support applications.

    You can find a detailed discussion of SQL Server materialized views here: Improving Performance with SQL Server 2008 Indexed Views
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    350,000 rows is a relatively small database -- SQL Server can handle many millions of rows easily

    unless your reports are intended to do calculations on the entire history of all events ever recorded (which would require retrieving all rows in the table), there's a good chance that if the table is properly indexed, you won't have any performance issues with your reporting queries

    do a web search for premature optimization

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •