Results 1 to 7 of 7
  1. #1
    Join Date
    Jun 2005
    Posts
    319

    Unanswered: table partitioning primary key question

    Our data comes in via PK (i.e. the inserts will be happening w/ sequential values on the PK column). By day seemed like a logical way to partition the data, but before I perform my test on a large set of data I am wondering if I should partition by the PK column. Below is sample code, for the real data there are 67 days worth of data. I think read queries would benefit by data partitioned by day, it is the insert performance I am worried about and why I am keeping the clustered index on the current PK column.

    table in question has 1 billion rows

    code:
    Code:
    CREATE PARTITION FUNCTION pfDailyArchive (datetime) 
    as RANGE RIGHT for values(
    '2010-06-29',
    '2010-06-30',
    '2010-07-01',
    '2010-07-02',
    '2010-07-03',
    '2010-07-04',
    '2010-07-05'
    )
    GO
    
    CREATE PARTITION SCHEME psDailyArchive as partition pfDailyArchive all 
    to ([primary])
    GO
    
    CREATE TABLE [dbo].[DailyArchive](
    	[DailyArchiveId] [bigint] NOT NULL,
    	[dailyDetailId] [int] NOT NULL,
    	[dateTime] [datetime] NOT NULL
    ) ON psDailyArchive( [dateTime] );
    GO
    	
    CREATE CLUSTERED INDEX [IX_DailyArchive]
    ON [dbo].[DailyArchive]( [DailyArchiveId] ) ON psDailyArchive( [dateTime] );
    GO
    
    ALTER TABLE [dbo].[DailyArchive] WITH NOCHECK ADD
    	CONSTRAINT [PK_DailyArchive] PRIMARY KEY NONCLUSTERED
    	(
    		  [DailyArchiveId]
    		, [dateTime]
    	) ON psDailyArchive( [dateTime] );
    GO
    My primary question: Does the clustered index look ok (on 1 column but using a different column in the partition scheme)

    I can see what my options are for multiple filegroups to improve read performance, I don't think that will improve insert performance, but correct me if I am wrong.

    One last follow up, would a (compressed) indexed view be useful on this table? Perhaps using one of the other columns (I did not include full column list in sample). It probably depends on the reporting requirements.

  2. #2
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    A couple of things: (1) placing a table onto partition scheme is unnecessary because when you create your clustered index it will end up on the same partition and will be fully removed from the original; (2) having a clustered index on a non-partitioning field will be detremental to your queries that will try to use DailyArchiveId field; (3) what is the purpose of this "double-barrelled" nonclustered PK?
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  3. #3
    Join Date
    Jun 2005
    Posts
    319
    > (1) placing a table onto partition scheme is unnecessary because when you create your clustered index it will end up on the same partition and will be fully removed from the original;

    Huh? In order to partition a table it needs to be on the scheme and not an FG (i.e. PRIMARY). Also in order to "switch" in a new partition the table needs to be defined on the scheme, correct?

    > (2) having a clustered index on a non-partitioning field will be detremental to your queries that will try to use DailyArchiveId field;

    This is necessary to speed inserts which will be ordered by DailyArchiveId field. Perhaps we change it and we can live with an insert hit, but hopefully that would be mitigated by being able to switch in a new day's worth of data. I did notice what you are talking about by just doing a SELECT TOP 1 * from... order by DailyArchiveId desc, if I include the date in the query it is instant though.

    > (3) what is the purpose of this "double-barrelled" nonclustered PK?

    The column I am partitioning on needs to be in the PK, I think this is a pretty standard implementation but correct me if I am wrong.

    I did notice the index size is much higher (compared to the non-partitioned version) and it is probably due to this double-barrelled approach you mention.
    Last edited by Gagnon; 12-09-10 at 16:43.

  4. #4
    Join Date
    Jun 2005
    Posts
    319
    Should I consider just making the PK clustered on ([DailyArchiveId], [dateTime]) and drop the current clustered index?

    going to try this next...be back in an hour or so...
    Last edited by Gagnon; 12-09-10 at 16:53.

  5. #5
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Quote Originally Posted by Gagnon View Post
    > (1) placing a table ... etc.

    Huh? In order to partition a table it needs to be on the scheme and not an FG (i.e. PRIMARY). Also in order to "switch" in a new partition the table needs to be defined on the scheme, correct?
    Correct, except when you introduce either clustered index, or clustered PK/unique constraint. In this case, ALL heap pages are copied to a B-Tree structure of specified storage type (FG or partition scheme), and then deallocated. Look at the scenario below:
    Code:
    create partition function pf_myfunc(int) as range left for values (1, 100, 1000)
    go
    create partition scheme ps_myIntScheme as partition pf_myfunc
       to (fg1, fg2, fg3, fg4)
    go
    create table dbo.t1 (f1 int identity(0, 50) not null)
       on ps_myIntScheme(f1)
    go
    alter table dbo.t1 add constraint PK_t1 primary key clustered (f1)
    go
    Where is your table now?
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  6. #6
    Join Date
    Jun 2005
    Posts
    319
    I think what you are getting at is the PK does not have to be on the scheme, but can span all partitions, I guess I will create it that way, but the primary column is still going to be [DailyArchiveId], the secondary: [dateTime] - or are you saying the defeats the purpose of partitioning?

    One of the main reasons I want to partition is to increase the insert performance since a large batch of data is to be inserted daily.

  7. #7
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    If this is your primary reason for partitioning, than DailyArchiveId as a clustered index field will kill your performance. The reason is simple, - not only the data resides in the same storage unit as the clustered index, it is also sorted in the order of the key (asc - default, desc - if you specify so).
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •