Results 1 to 11 of 11
  1. #1
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740

    Unanswered: choice of clustered index

    i have tables that will be involved in replication - disconnected clients will create new records and these records need to get 'absorbed' by the publisher database.

    the table 'Address' is something like:


    name, nvarchar(200) -- certainly NOT unique
    line1, nvarchar(200) -- generic address stuff
    line2, nvarchar(200) -- more generic address stuff
    zip12, int -- the first two (definitely numeric) digits of the zip/postcode - they identify the geographic area
    zipRest, int -- the rest of the (definitely numeric) zip/postcode
    org, int (FK to table Orgs) -- the organisation that owns this address (and potentially many others, possibly in the same building)


    'Address' is a central entity in my database.
    many other tables are related.

    replication (offline record creation) suggests a GUID PK
    GUID PK seems a bad choice for clustering, so let's make GUID PK non-clustered
    ...and then what do i use for clustering.

    how about:
    zip12, zipRest, name (or name, org; or org, name ...possibly unique but **HUGE** and not guaranteed unique)
    or
    simply abandon hope and use GUID
    or ????????

    zip12 looks like a decent start to clustering: new 'Address' records will be INSERTed fairly evenly spread amongst the zip12 value.

    initially there will be 100,000 'Address' records, 100 unique zip12 values, 1 (!!!!) org value 'UNDEFINED' (and not NULL)
    in time there will be 1,000,000 'Address' records, 100 unique zip12 values, 10,000 org values (500,000 Addresses remaining with org 'UNDEFINED')

    thanks for any thoughts, izy
    currently using SS 2008R2

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Not sure I fully understand your issue, but...

    Replication (as I understand it) will add an invisible GUID column to your tables anyway, if one does not already exist. So adding it yourself should not affect performance.

    Your Primary Key does not need to be your clustered index, and your Clustered Index does not need to be a primary key. So you can use a GUID as a PK and cluster on something else.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740
    my issue is choosing the "something else" for my clustering key.

    most of my worlds use surrogate auto-incrementing INTs as both PK and cluster key (let's agree not to proselytize for the moment), but this new (for me) merge-replication world more-or-less imposes GUID PK

    GUID is not a great cluster key.
    for once in my life i don't have a surrogate auto-incrementing PK as a convenient clustering candidate.

    so what to choose as cluster-key?
    where is the balance-point between a long cluster-key and a potentially non-unique (subsequentially system-forced unique) cluster key?
    that cluster-key is going to appear in every non-cluster index so it's size has performance and storage implications.

    to restate my question:
    given a GUID PK
    given that the GUID is a poor cluster-key candidate
    what criteria apply to choosing a cluster key?

    izy
    currently using SS 2008R2

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    It doesn't seem to me that you have a good natural key candidate here, unless you include very field in the record set.

    So, as a clustering key, I would choose a column or columns that would frequently be scanned in search queries. If you expect most of your queries to be "where name = ....", then include name in your clustered index.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Do you have to have a clustered index on this table? Do you do A LOT of INSERTs?
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  6. #6
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740
    ...it depends on your definition of A LOT and how good my guessing is.

    after the initial data load with 100,000 'addresses', i expect INSERTs at not worse than 1000/week tailing off to very much lower numbers and eventually (by the time almost every 'address' of interest is in the db) to almost zero INSERTs.

    the million addresses in my original post is more 'i would like it to be possible' rather than 'its going to happen'.

    nothing obliges me to cluster.
    my obligation is a decent sync performance with the remote clients, and (less of an issue) a halfway-decent reporting capablity for non-remote folk.

    izy
    currently using SS 2008R2

  7. #7
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Well, good point. When I said "A LOT", I meant at least 1 record per second or higher rate. With your numbers I just don't see a point in having a clustered index. If one of your queries runs into a scan operation on this table, - you'll be better off with a heap scan, rather than a clustered index scan.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  8. #8
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740
    *VERY* interesting comment!
    thank you.

    does it change much if my 1000 INSERTs/week do NOT translate to a linear 0.00165 INSERTs/second.

    i really don't have control over sync frequency for the 60...70 road-warriors generating the INSERTs. they are required to sync once a week by their management but the eventual real-life sync pattern is unknown. after discussing with a dozen of the road-warriors i anticipate each one will sync +/- twice a week and typically in the same two hour window (just after normal working hours). this concentrates things a touch (getting closer to 0.1 INSERTs/second during the probable INSERT window).

    stay with the heap ??????

    izy
    currently using SS 2008R2

  9. #9
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Fully based on your details, I don't see how clustered index is gonna help or contribute to INSERT performance operation. SELECT performance will have to be determined by more details if you care to provide them.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  10. #10
    Join Date
    Jun 2005
    Posts
    319
    Have you considered NEWSEQUENTIALID() (Transact-SQL) ?

    GUID's are not always poor clustered key candidates, that thinking disappeared years ago when SQL 2005 came out.

    All Fired Up...: The NEWSEQUENTIALID Function

    Manually adding a uniqueidentifier w/ NEWSEQUENTIALID() default and indexing it will speed up the snapshotting process. When SQL adds the ROWGUID column automatically it is not indexed by default. [7.0, 2000, 2005]

  11. #11
    Join Date
    Dec 2002
    Location
    Préverenges, Switzerland
    Posts
    3,740
    first pass will be NEWSEQUENTIALID() as PK on a heap.
    thanks for the inputs - i'll see how it goes.
    izy
    currently using SS 2008R2

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •