Results 1 to 12 of 12
  1. #1
    Join Date
    Nov 2005
    Posts
    15

    Unanswered: Partitioning vs removal

    Hi all,

    I am currently dealing with a problem of storing big amount of data in database. My SQL Server 2005 collects mostly GPS data from GPS Logers that are installed in cars in order to track their path. Since every logger generates data every second the database - more specifically just one table - gets quickly filled.

    The issue is that the response time of the database slows down so at a certain point the incomming GPS data cannot be even stored because of the timeouts and the GPS logger has to try resending.

    Now, sonce the data older than 3 month are not so interesting as the incomming data one technique would be to remove the old data from DB and store them into a file. But I do not like this solution, since in certain cases the user can query data that are 1 year or so old and getting them from the file system requires an extra effort not mentioning the problems with backup, data integrity etc.

    Would table partitioning resolve the problem? Or is there even better technique how to handle storing of large data volumes?

    Thanks in advance for your opinion.

  2. #2
    Join Date
    Oct 2002
    Location
    Leicester - UK
    Posts
    820
    I had a similar problem with a logging Table

    i just stuck a trigger on it
    Code:
    ALTER TRIGGER [dbo].[TRG_Archive]
       ON  [dbo].[TBL_Log]
       AFTER INSERT
    AS 
    BEGIN
    	-- SET NOCOUNT ON added to prevent extra result sets from
    	-- interfering with SELECT statements.
    	SET NOCOUNT ON;
    
    	Declare @Now as datetime
    	SET @Now = GETDATE()
    
       INSERT INTO TBL_Log_Arc(
    			Occured, Username, [Password], [Message], Data, IsError
    	)
    	SELECT	Occured, Username, [Password], [Message], Data, IsError
    	FROM	TBL_Log
    	WHERE   (Occured < DATEADD(MM, - 1, @Now))
    
    	DELETE FROM TBL_Log
    	WHERE	(Occured < DATEADD(MM, - 1, @Now))
    
    END
    that way if you need to check the data in the Archive it is still there and easily accessible, but isn't involved in your active table so doesn't matter if it is huge, also you might want to check your indexes to improve performance
    Definition of a Beginner, Someone who doesn't know the rules.

    Definition of an Expert, Someone who knows when to ignore the rules.

  3. #3
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by wransen
    Would table partitioning resolve the problem? Or is there even better technique how to handle storing of large data volumes?
    Depends on what's impeding the performance of the database server. If you haven't already then use SQL Profiler, Perfmon and SQL Server's management views to help determine where the problem might lie. Two things to look out for are page splits and file growth. Excessive page splits could be caused by the choice of clustered index key. Auto growth is a very expensive operation and should be avoided at all costs by preallocating space in your data and log files.

    I'm sceptical about the other suggestion of using a trigger to archive the data. An insert trigger would only increase the workload dramatically, especially if you have inserts every second.

  4. #4
    Join Date
    Oct 2002
    Location
    Leicester - UK
    Posts
    820
    Quote Originally Posted by dportas
    I'm sceptical about the other suggestion of using a trigger to archive the data. An insert trigger would only increase the workload dramatically, especially if you have inserts every second.
    i fully agree, i wasn't suggesting that as their solution just saying thats how i handled a similar problem.

    in my case it was logging access attempts to a web service it works quite well as it only has to log a couple of thousand entries a day and with the archiving running constantly it rarely has to move more than a couple of records at a time which stops any large scale locking of the tables. the main problem was data retrieval, because checks rarely went back over 3 weeks it made no sense to have to scan the entire log for the report hence the archive, that was only poled if the date range of the query was more than a month ago

    it might be that their issue is better resolved by buffering the input to reduce the number of locks that happen on the core table
    Last edited by m.timoney; 10-07-09 at 08:42.
    Definition of a Beginner, Someone who doesn't know the rules.

    Definition of an Expert, Someone who knows when to ignore the rules.

  5. #5
    Join Date
    Oct 2002
    Location
    Leicester - UK
    Posts
    820
    Quote Originally Posted by dportas
    I'm sceptical about the other suggestion of using a trigger to archive the data. An insert trigger would only increase the workload dramatically, especially if you have inserts every second.
    i fully agree, i wasn't suggesting that as their solution just saying thats how i handled a similar problem.

    in my case it was logging access attempts to a web service it works quite well as it only has to log a couple of thousand entries a day and with the archiving running constantly it rarely has to move more than a couple of records at a time which stops any large scale locking of the tables. the main problem was data retrieval, because checks rarely went back over 3 weeks it made no sense to have to scan the entire log for the report hence the archive, that was only poled if the date range of the query was more than a month ago

    it might be that their issue is better resolved by buffering the input to reduce the number of locks that happen on the core table
    Definition of a Beginner, Someone who doesn't know the rules.

    Definition of an Expert, Someone who knows when to ignore the rules.

  6. #6
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    Quote Originally Posted by wransen
    Hi all,

    I am currently dealing with a problem of storing big amount of data in database. My SQL Server 2005 collects mostly GPS data from GPS Logers that are installed in cars in order to track their path. Since every logger generates data every second the database - more specifically just one table - gets quickly filled.
    .
    It is a scary world.

    Yes check all of that stuff dportas mentioned. I would also look at what you might have in the ddl surronding this table. Are there triggers, excessive indexing, constraints, replication etc... that might be causing contention on insert.

    Are the users reporting off the same table with this high volume of inserts? There are various schemes and devices to seperate out these 2 things and keep the data near real time, but I really do not know enough about what you are doing and the exact nature of the problem to say more.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  7. #7
    Join Date
    Nov 2005
    Posts
    15
    Thanks to all for your advices!

    Quote Originally Posted by dportas
    Two things to look out for are page splits and file growth. Excessive page splits could be caused by the choice of clustered index key. Auto growth is a very expensive operation and should be avoided at all costs by preallocating space in your data and log files.
    Well - there is no limit on the table size since the service can run basically years. So it is quite difficult to estimate correct growth etc.

    Actually I do not know why the response of the system should be always constant if a key table still grows? System works simply faster with an empty table and gets slower with the filled table. It seems like there are quite long periods with fast response followed by interrupts which occur more frequently with bigger table size.

    There are no special things about the table. Just main index, indexed Logger ID and GPS data including timestamp etc. No triggers nor other special stuff.

    Maybe one thing - after certain time the table does not definitely fit into memory.

    So my solution was to move the old data into another table, which was performed by SQL Agent daily, but did not really help. Getting the data out of the database would definitely do the job but is a bit stone age solution

  8. #8
    Join Date
    Oct 2002
    Location
    Leicester - UK
    Posts
    820
    Quote Originally Posted by wransen
    So my solution was to move the old data into another table, which was performed by SQL Agent daily, but did not really help. Getting the data out of the database would definitely do the job but is a bit stone age solution
    if moving it out of the table didn't help then moving it out of the database probably wont either. unless as was mentioned above you have the Auto grow feature on, if so turn it off.
    the auto grow is a nice idea but the implementation doesn't work very well. what you do is manually allocate the size and if it fills this up then allocate more

    simple rule is measure how much the DB grows in a month (calling this X) and then allocated 3X, then exery month you check the size and if there is less than 2X free give it another 3X

    and once you have a resonable idea of the growth then you can up the timeperiod
    Last edited by m.timoney; 10-07-09 at 11:27.
    Definition of a Beginner, Someone who doesn't know the rules.

    Definition of an Expert, Someone who knows when to ignore the rules.

  9. #9
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Is the table clustered? What is it indexed on? From your description it sounds very like an index-related problem - either fragmentation (due to a GUID used as a cluster key for example) or perhaps there's something else going on that would benefit from a different index.

    The point about file growth is to do as much as possible to minimise the number of times the file has to grow. If the data is growing regularly then it shouldn't be too difficult to estimate growth. Aim to preallocate enough space for at least at least your three month period for example. Regular growth is a sure performance throttler. I work with databases that haven't auto-grown for three years.

    Re moving data to another table. Yes the same thing can be achieved by partitioning and you can switch partitions between tables, which is a near-instantaneous metadata operation. Partitioning is in Enterprise Edition only.

  10. #10
    Join Date
    Nov 2005
    Posts
    15
    Thank you all for dealing with the issue!

    Quote Originally Posted by dportas
    Is the table clustered? What is it indexed on? From your description it sounds very like an index-related problem - either fragmentation (due to a GUID used as a cluster key for example) or perhaps there's something else going on that would benefit from a different index.
    This is the table

    Code:
    CREATE TABLE [dbo].[GpsLogArchive](
    	[GpsLogArchiveID] [numeric](20, 0) IDENTITY(1,1) NOT NULL,
    	[LoggerID] [numeric](20, 0) NULL,
    	[Timestamp] [datetime] NULL,
    	[Latitude] [int] NULL,
    	[Longitude] [int] NULL,
    	[hDOP] [int] NULL,
    	[ValidFix] [int] NULL,
    	[Course] [int] NULL,
    	[Speed] [int] NULL,
    	[Satellites] [int] NULL,
    	[TripID] [numeric](20, 0) NULL,
    	[Status] [tinyint] NULL,
    	[StartOfRoute] [bit] NULL,
    	[EndOfRoute] [bit] NULL,
    	[StartOfTrip] [datetime] NULL,
     CONSTRAINT [PK_GpsLogArchive] PRIMARY KEY CLUSTERED 
    (
    	[GpsLogArchiveID] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    CREATE NONCLUSTERED INDEX [IX_GpsLogArchive] ON [dbo].[GpsLogArchive] 
    (
    	[LoggerID] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    
    CREATE NONCLUSTERED INDEX [IX_GpsLogArchive_1] ON [dbo].[GpsLogArchive] 
    (
    	[TripID] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    
    CREATE NONCLUSTERED INDEX [IX_GpsLogArchive_2] ON [dbo].[GpsLogArchive] 
    (
    	[Status] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    
    ALTER TABLE [dbo].[GpsLogArchive] ADD  CONSTRAINT [PK_GpsLogArchive] PRIMARY KEY CLUSTERED 
    (
    	[GpsLogArchiveID] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    And here some data regarding the table itself

    Code:
    Data space: 2,382.438 MB
    Filegroup: PRIMARY
    Index space: 2,178.563 MB
    Row count: 25911761
    Table is partitioned: False
    Okay, those numeric fields are quite strange, should be actually bigint but I inherited the project like this. The described behavior would nevertheless be the same with bigints.

    Regarding the autogrowth feature - this can be only set for the entire DB not just for one table?

  11. #11
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Its a pretty short row in the table and the number of rows isn't that bad, I've seen partitions with many more rows. There are a few more indexes than you probably really want. I would say to look at some of the items mentioned previously to see what exactly is causing your slow response times, as it would seem to me this is due to your workload and not too many rows in your table.

    Dave Nance

  12. #12
    Join Date
    Aug 2004
    Posts
    2
    Have you done any real statistical analysis on the database to see what the performance is like?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •