Page 1 of 2 12 LastLast
Results 1 to 15 of 28
  1. #1
    Join Date
    Oct 2006
    Location
    CA
    Posts
    210

    Unanswered: Question about Clustered Indexes

    Using SQL Server 2000 ... hopefully not too dumb a question.

    Is there a performance hit using Clustered Index on a table that gets a lot of deletes?

    I'm creating a Transaction Log table that will get about 4,000 inserts per day. The value of some of this historical data is worthless after a while, so I delete it.

    It occurs to me that this may create a lot of fragmentation. If so, is this cleaned up during weekly "Reorganize data and index pages" in the Maintenance Plan? Do I also need to select "Remove unused space from database files"?

    Additional question: I though that care needed to be taken that a clustered key be a value that always increments (datestamp, identity key, etc), yet in this write-up, it shows using randomly generated key values. I'm confused. Wouldn't it have to reorganize everything with greater values to insert the new row into the appropriate spot?
    http://www.sql-server-performance.co...ed_indexes.asp

  2. #2
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    I remember reading that in that article too and it didn't make sense to me either. Using a guid for a pk will certainly cause a lot of fragmentation if your fill factor on the clustered index is high, because page splits will occur often. Note that in 2005, you can use NEWSEQUENTIALID() for a guid pk to avoid this problem. in 2000 you an use Gert Draper's XPGUID.dll for the same feature.

    If you set the fill factor on the index high enough, then page splits should occur only rarely if at all since you are deleting data from the table on a regular basis.

    I wouldn't use the "remove unused space" task because allocating extents is a rather expensive operation for the server - generally what I do is allocate enough space to your db files to fit the largest amount of data you expect the db to have so that you aren't allocating extents at runtime. if you shrink the files, it seems likely that the server will just have to grow them again (assuming you have autogrowth turned on).

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by jezemine
    If you set the fill factor on the index high enough, then page splits should occur only rarely if at all since you are deleting data from the table on a regular basis.
    Shouldn't that be low enough?

    Quote Originally Posted by jezemine
    I wouldn't use the "remove unused space" task
    +1

    Monotonically increasing clustered keys (as you mentioned) are most useful for transactional tables with a high ratio\ number of inserts. With only 4000 a day and an adequte fill factor you should be ok. A reorganise will remove fragmentation. If you are worried, run it nightly (if you have the window) - there's no rule that says you can only reindex on a weekly basis.

    Also, deletions will not cause fragmentation but they will reduce the amount of data per page - depending on the question you ask this may be considered a good or bad thing.

    HTH
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Jezamine - what on earth is your local time?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Mar 2007
    Location
    Holmestrand, Norway
    Posts
    332
    Vich: Clustered indexes on random values are really a bad idea, the indexes got to be fragmented. The only really good candidates to clustered indexes are (semi-)sequential data. Large amount of data with relatively few updates in combination with a suitable fillfactor may also do, but not random data.

    With the amount of 4000 inserts a day and a limited lifetime of the data I doubt that you actually need a clustered index, so my advice would be to delete the clustered index and create a nonclustered instead.
    Ole Kristian Velstadbråten Bangås - Virinco - MSSQL.no - Facebook - Twitter

  6. #6
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    Quote Originally Posted by pootle flump
    Shouldn't that be low enough?
    um, yes, thanks.

    Quote Originally Posted by pootle flump
    Jezamine - what on earth is your local time?
    i'm in the seattle area, why?

    EDIT: here's some justification for why you shouldn't use that "remove unused space" task, which is essentially the same as having autoshrink on, from someone who actually knows what they are talking about - unlike me!

    http://blogs.msdn.com/sqlserverstora...hrink-off.aspx
    Last edited by jezemine; 05-18-07 at 10:48.

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by jezemine
    i'm in the seattle area, why?
    Because you were active as I got into work 8:00 am bst. Late one?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  8. #8
    Join Date
    Dec 2002
    Posts
    1,245
    Quote Originally Posted by pootle flump
    Because you were active as I got into work 8:00 am bst. Late one?

    bst?? British Summer Time?

    Regards,

    hmscott
    Have you hugged your backup today?

  9. #9
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    yea, i'm up late sometimes I guess. it's an addiction. help me!

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by hmscott
    bst?? British Summer Time?

    Regards,

    hmscott
    Ya - our clocks have gone... um.... forward.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  11. #11
    Join Date
    Dec 2002
    Posts
    1,245
    It's not that. I just saw BST and I kinda focused in on the first two letters.

    Sorry, not much humor there, but it just kinda caught me...

    All right folks, nothing to see here, move along...

    Regards,

    hmscott
    Have you hugged your backup today?

  12. #12
    Join Date
    Oct 2006
    Location
    CA
    Posts
    210
    Quote Originally Posted by pootle flump
    Shouldn't that be low enough?

    +1

    Monotonically increasing clustered keys (as you mentioned) are most useful for transactional tables with a high ratio\ number of inserts.
    I would think that clustered indexes:

    1. Saves on index maintenance time/space.

    2. Decreases seek time. Last week I changed an index to Clustered and an associated update query I was trying to tune up changed from 100 seconds down to 20 seconds (to process about 15,000 rows). The table in question does not get a lot of inserts or deletes.

    I can see how it would improve transactional table inserts for reason #1, but I would think that it would potentially vastly improve rarely manipulated but often linked to master tables (like a SKU list) for reason #2.

    Quote Originally Posted by pootle flump
    With only 4000 a day and an adequte fill factor you should be ok. A reorganise will remove fragmentation. If you are worried, run it nightly (if you have the window) - there's no rule that says you can only reindex on a weekly basis.
    Thanks for that info - so any damage to seek performance gets cleaned up by Maintenance, therefore no big deal. My weekly should be fine - I'll run the delete process once a week and delete maybe 10%, and it'll run just before weekly maintenance (that currently only takes 10 minutes for the entire database).

    Quote Originally Posted by pootle flump
    Also, deletions will not cause fragmentation but they will reduce the amount of data per page - depending on the question you ask this may be considered a good or bad thing.
    HTH
    I'm afraid I don't understand this. This reveals that I lack a fundamental understanding of how Clustered Tables are actually implemented. I thought they were just sequential by the key value and therefore a delete would cause a fragment and of course any block read may contain logically deleted records. A reorganize would mean the segments are rewritten to remove the fragments. Writing a row that doesn't fit neatly at the top of an existing segment means it has to be inserted by rewriting everything on top of the insert position to create a space. Is my "understanding" wrong or too oversimplified to be useful?


    Quote Originally Posted by roac
    Vich: Clustered indexes on random values are really a bad idea, the indexes got to be fragmented. The only really good candidates to clustered indexes are (semi-)sequential data. Large amount of data with relatively few updates in combination with a suitable fillfactor may also do, but not random data.
    Why "relatively few updates"? Does this have to do with how later updating of null columns are implemented in clustered tables. That's a subject that's always been an entire mystery to me (even for non-clustered tables).

    I try to think about what happens internally. I insert a row but leave half the columns as "NULL". Isn't the rule in Relational Databases that null columns don't take space?

    Some time later, I come along and create values for some NULL columns. Now; for a clustered table, where does it put the values?

    Even a clustered row can't be fully pre-extended (given how big a nvarchar(8000) could be). Does such an update later cause reorganization?

    Quote Originally Posted by roac
    With the amount of 4000 inserts a day and a limited lifetime of the data I doubt that you actually need a clustered index, so my advice would be to delete the clustered index and create a nonclustered instead.
    Thanks for that brilliant thought. I just realized that I will NEVER use the unique primary key in this transaction file. Most often, I'll be using it to inquire on all transactions for a particular SKU in reverse chronological sequence back to a particular date.

    I'd still like to understand more about Clustered Indexes, but clearly I shouldn't make this index Clustered. Perhaps I'll benifit by making the "Transaction Date" index clustered since I'll often want to do a reverse-sequential-read by Transaction Date and they're always inserted in sequence - also; once created they are never changed.
    Last edited by vich; 05-18-07 at 21:42.

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by vich
    I would think that clustered indexes:

    1. Saves on index maintenance time/space.
    2. Decreases seek time. Last week I changed an index to Clustered and an associated update query I was trying to tune up changed from 100 seconds down to 20 seconds (to process about 15,000 rows). The table in question does not get a lot of inserts or deletes.
    I can see how it would improve transactional table inserts for reason #1, but I would think that it would potentially vastly improve rarely manipulated but often linked to master tables (like a SKU list) for reason #2.
    My point was about monotonically increasing clustered indexes, not clustered indexes in general.

    Quote Originally Posted by vich
    I'm afraid I don't understand this. This reveals that I lack a fundamental understanding of how Clustered Tables are actually implemented. I thought they were just sequential by the key value and therefore a delete would cause a fragment and of course any block read may contain logically deleted records. A reorganize would mean the segments are rewritten to remove the fragments. Writing a row that doesn't fit neatly at the top of an existing segment means it has to be inserted by rewriting everything on top of the insert position to create a space. Is my "understanding" wrong or too oversimplified to be useful?
    Imagine ten rows with one integer column with values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Clustered index on the integer column.
    Rows 1, 2, 3 are in page one. Rows 4, 5, 6, 7 are in page two. Rows 8, 9, 10 are in page three. I delete rows 4, 5, 9. So now:
    Page one: 1, 2, 3
    Page two: 6, 7
    Page three: 8, 10

    No fragmentation - the data is still in consecutive pages, order by the values in the key. The pages are just a little less full, which is not the same thing. Performance is still affected but this is not fragmentation. If they were full to begin with and I inserted 5.5 (ok, ok - they were decimals ) then we will introduce fragmentation because half the rows in page two would have to be moved to page four. Now the page order would be one, two, four, three.

    Quickly on some of your other points not addressed at me:
    I think roac meant few updates of the clustered index keys.
    A nullable column has an extra bit "flag" to indicate when it is null. So no - nulls take up no extra space but nullable columns do take up more space to accommodate the flag.

    Have you seen this?
    http://www.microsoft.com/technet/pro.../ss2kidbp.mspx
    Testimonial:
    pootle flump
    ur codings are working excelent.

  14. #14
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by vich
    I would think that clustered indexes......
    ...........
    Actually what were addressing there lol? Just realised there was a bit in the quote about fillfactors and shrinks too.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  15. #15
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by jezemine
    EDIT: here's some justification for why you shouldn't use that "remove unused space" task, which is essentially the same as having autoshrink on, from someone who actually knows what they are talking about - unlike me!

    http://blogs.msdn.com/sqlserverstora...hrink-off.aspx
    In case you are interested vich Paul Randall is a bit of a guru's guru - he wrote the dbcc dbreindex and a few other console commands and assisted with the paper on fragmentation I linked (though he isn't credited lol). If I see anything by that guy I make sure I read it twice
    Testimonial:
    pootle flump
    ur codings are working excelent.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •