Results 1 to 12 of 12
  1. #1
    Join Date
    Jun 2010
    Posts
    9

    Guids as primary keys

    Setting up a replicated database system where multiple servers write data to certain tables and the changes are replicated to all the other servers in a multiple-master approach.
    The primary keys for most of my tables are already multi part keys along the lines of "Id", "FromDate", "ToDate", "LanguageCode"
    so that every record can be stored multilingually and auditing is kept in the same tables. The Ids are integers but not identity fields.

    The question I have is which of the following suggestions I've found online is the best approach for avoiding clashes between newly generated ids;
    1. Switch the Id to a GUID type
    2. Add a character prefix to the Id field that indicates the originating server. e.g. "A123", "B123", "A451"
    3. Add a numeric suffix to the integer id of equal length for each server, e.g. "1201" is record 12 generated on server 01
    4. Add a second integer id field for the server

    Option 1 is the current favourite but I don't like the potential for wasting space with such a long field and I'm not sure what the effect of using non sequential fields would be on the indexing.
    Option 2 is neater and smaller field-wise but I dont know what the effect would be on the indexes when inserting.
    Option 3 just looks instinctively wrong, even if I know there will never be more than 100 generating servers.
    Option 4 would be a pain to recode for and the primary keys are long enough already.

    Any opinions?

    Anyone know which would give quicker joins, a guid field or a varchar?

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    1) the impact is fragmentation, however if this is SQL Server you can have sequential GUIDS
    2) & 3) Variations on the same theme. Personally, I like neither.
    4) I prefer this. Make it the last column in the PK and it may only make a difference to the subscribing database and not the publishers (other than making the key larger). However, assuming there are less than 256 servers then you can use a 1 byte integer which would be the smallest use of space of all four options. Depends also if there are foreign keys referencing this though.

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    BTW - if this is SQL Server, you know a GUID is added to the row anyway if there is not one as a PK?

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    My preference is to make the PK column a GUID. This means that any machine in a given domain (including App Servers) can create the PK without practical conflicts.

    I object to "smart keys" of any kind, where the meaing of a single attribute (field) has to be determined by the application. Users are chaos incarnate, anything they can do they will do. Applications are wild west and will by-and-large behave but they can and do run amok, but if only the app can determine if it is behaving then I don't don't want to be the sherrif in that particular Dodge City! This pretty much covers options 2 and 3.

    Adding another column to designate "server of origin" will mitigate the problem, but it introduces problems of its own that start with recoding and continue into infinity.

    While Poots has a point that guids do cause fragmentation, I don't see that as a problem. Circa SQL 6.5, that fragmentation was the preferred method for dealing with a problem known as "insert hot spot" that newer developers have never encountered but those of us that survived it remember in great detail. The net effect of fragmentation is trivial in today's databases, and a simple maintenance plan will eradicate even the trivial part of the problem.

    Go for the GUID, it is by far your best bet!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Pat and Blindman are big fans of GUIDs as PKs (and, I assume, therefore clustered indexes) in SQL Server.

    Here's some alternative opinions:
    http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
    In Recovery... | What 5 things should SQL Server get rid of?
    I confess I am surprised Paul Randal has clustered GUIDs as one of his top 5 worst things in SQL Server, but it is a fact that he does.

    Pat - in SQL 2k+ insert hot spots are no longer an issue and monotonically increasing clustered indexes (such as identities and sequential GUIDs) are the optimal physical design for high speed insert performance. I know you keep referring to the 6.5 days but (with the greatest of respect) those are not really relevant now for this particular issue.

  6. #6
    Join Date
    Jun 2010
    Posts
    9
    Hmmm...
    Thanks for the thoughts guys, it's given me something to chew on. I reckon I'll have to do some research into fragmentation before I decide which is best for my situation.

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Check out Paul's blog - he writes about it a lot.

    Some things to consider:
    The process of fragmentation (page splits) is expensive.
    Leaf level fragmentation has little affect on singleton look ups however it will increase IO for range scans.
    B-Tree level fragmentation will increase IO for singleton looks ups, and also a relatively teeny bit for range scans, and not at all for clustered index scans.
    Fragmentation will also result in your data requiring more disk and a table will take up more RAM when in memory.

    Once fragmented, a table will be able to better handle random inserts than a non-fragmented table whose pages are all full. It is for this reason that you can deliberately set your pages to not be completely full when you remove fragmentation (by specifying FILL FACTOR).

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Also, I forgot to highlight because I think it has been missed - GUIDs only cause fragmentation on insert if you use truly random GUIDs. SQL Server allows you to create sequential random GUIDs. Each GUID is greater than the last so you don't get fragmentation on insert.

    As such, I would always use the sequential GUID.

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by PaulMolloy View Post
    Setting up a replicated database system
    If you are setting up SQL Server Replication, it will add GUIDs to each table on its own, if they do not already exist.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by pootle flump View Post
    BTW - if this is SQL Server, you know a GUID is added to the row anyway if there is not one as a PK?
    Quote Originally Posted by blindman View Post
    If you are setting up SQL Server Replication, it will add GUIDs to each table on its own, if they do not already exist.
    Echoey in here ain't it?

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    I was merely replicating your statement.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    GUIDS are great for PK values, they're distinctly sub-optimal for clustered indexes.

    The PK needs to be unique, and it ought to facilitate both scale up and scale out as your database/application grows. GUID values don't repeat within an Active Directory domain, so unique is practically guaranteed. Any machine in the domain can generate a GUID, so both scale up and scale out work nicely.

    Clustered indexes need to optimize storage and/or retrieval. This is a completely different goal/purpose than the PK. Whatever column(s) define the most common "data trawl" for your database probably are the best choice for the clustered index. If it is possible to make the clustered index sequential (such as putting an increasing datetime as the first column), so much the better.

    Good database maintenance plans can ensure that fragmentation isn't a major problem. Designing your application/database to avoid fragmentation is a great deal like designing your vehicle to minimize the need for oil changes... You can certainly do it, but you'll probably pay a significant penalty for that choice every day.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •