If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Guids as primary keys

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 07-05-10, 08:23
PaulMolloy PaulMolloy is offline
Registered User
 
Join Date: Jun 2010
Posts: 9
Guids as primary keys

Setting up a replicated database system where multiple servers write data to certain tables and the changes are replicated to all the other servers in a multiple-master approach.
The primary keys for most of my tables are already multi part keys along the lines of "Id", "FromDate", "ToDate", "LanguageCode"
so that every record can be stored multilingually and auditing is kept in the same tables. The Ids are integers but not identity fields.

The question I have is which of the following suggestions I've found online is the best approach for avoiding clashes between newly generated ids;
1. Switch the Id to a GUID type
2. Add a character prefix to the Id field that indicates the originating server. e.g. "A123", "B123", "A451"
3. Add a numeric suffix to the integer id of equal length for each server, e.g. "1201" is record 12 generated on server 01
4. Add a second integer id field for the server

Option 1 is the current favourite but I don't like the potential for wasting space with such a long field and I'm not sure what the effect of using non sequential fields would be on the indexing.
Option 2 is neater and smaller field-wise but I dont know what the effect would be on the indexes when inserting.
Option 3 just looks instinctively wrong, even if I know there will never be more than 100 generating servers.
Option 4 would be a pain to recode for and the primary keys are long enough already.

Any opinions?

Anyone know which would give quicker joins, a guid field or a varchar?
Reply With Quote
  #2 (permalink)  
Old 07-05-10, 08:42
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
1) the impact is fragmentation, however if this is SQL Server you can have sequential GUIDS
2) & 3) Variations on the same theme. Personally, I like neither.
4) I prefer this. Make it the last column in the PK and it may only make a difference to the subscribing database and not the publishers (other than making the key larger). However, assuming there are less than 256 servers then you can use a 1 byte integer which would be the smallest use of space of all four options. Depends also if there are foreign keys referencing this though.
Reply With Quote
  #3 (permalink)  
Old 07-05-10, 08:44
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
BTW - if this is SQL Server, you know a GUID is added to the row anyway if there is not one as a PK?
Reply With Quote
  #4 (permalink)  
Old 07-05-10, 22:50
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
My preference is to make the PK column a GUID. This means that any machine in a given domain (including App Servers) can create the PK without practical conflicts.

I object to "smart keys" of any kind, where the meaing of a single attribute (field) has to be determined by the application. Users are chaos incarnate, anything they can do they will do. Applications are wild west and will by-and-large behave but they can and do run amok, but if only the app can determine if it is behaving then I don't don't want to be the sherrif in that particular Dodge City! This pretty much covers options 2 and 3.

Adding another column to designate "server of origin" will mitigate the problem, but it introduces problems of its own that start with recoding and continue into infinity.

While Poots has a point that guids do cause fragmentation, I don't see that as a problem. Circa SQL 6.5, that fragmentation was the preferred method for dealing with a problem known as "insert hot spot" that newer developers have never encountered but those of us that survived it remember in great detail. The net effect of fragmentation is trivial in today's databases, and a simple maintenance plan will eradicate even the trivial part of the problem.

Go for the GUID, it is by far your best bet!

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #5 (permalink)  
Old 07-06-10, 03:15
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Pat and Blindman are big fans of GUIDs as PKs (and, I assume, therefore clustered indexes) in SQL Server.

Here's some alternative opinions:
http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
In Recovery... | What 5 things should SQL Server get rid of?
I confess I am surprised Paul Randal has clustered GUIDs as one of his top 5 worst things in SQL Server, but it is a fact that he does.

Pat - in SQL 2k+ insert hot spots are no longer an issue and monotonically increasing clustered indexes (such as identities and sequential GUIDs) are the optimal physical design for high speed insert performance. I know you keep referring to the 6.5 days but (with the greatest of respect) those are not really relevant now for this particular issue.
Reply With Quote
  #6 (permalink)  
Old 07-06-10, 04:24
PaulMolloy PaulMolloy is offline
Registered User
 
Join Date: Jun 2010
Posts: 9
Hmmm...
Thanks for the thoughts guys, it's given me something to chew on. I reckon I'll have to do some research into fragmentation before I decide which is best for my situation.
Reply With Quote
  #7 (permalink)  
Old 07-06-10, 04:31
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Check out Paul's blog - he writes about it a lot.

Some things to consider:
The process of fragmentation (page splits) is expensive.
Leaf level fragmentation has little affect on singleton look ups however it will increase IO for range scans.
B-Tree level fragmentation will increase IO for singleton looks ups, and also a relatively teeny bit for range scans, and not at all for clustered index scans.
Fragmentation will also result in your data requiring more disk and a table will take up more RAM when in memory.

Once fragmented, a table will be able to better handle random inserts than a non-fragmented table whose pages are all full. It is for this reason that you can deliberately set your pages to not be completely full when you remove fragmentation (by specifying FILL FACTOR).
Reply With Quote
  #8 (permalink)  
Old 07-06-10, 04:42
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Also, I forgot to highlight because I think it has been missed - GUIDs only cause fragmentation on insert if you use truly random GUIDs. SQL Server allows you to create sequential random GUIDs. Each GUID is greater than the last so you don't get fragmentation on insert.

As such, I would always use the sequential GUID.
Reply With Quote
  #9 (permalink)  
Old 07-06-10, 10:40
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by PaulMolloy View Post
Setting up a replicated database system
If you are setting up SQL Server Replication, it will add GUIDs to each table on its own, if they do not already exist.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #10 (permalink)  
Old 07-06-10, 10:55
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Quote:
Originally Posted by pootle flump View Post
BTW - if this is SQL Server, you know a GUID is added to the row anyway if there is not one as a PK?
Quote:
Originally Posted by blindman View Post
If you are setting up SQL Server Replication, it will add GUIDs to each table on its own, if they do not already exist.
Echoey in here ain't it?
Reply With Quote
  #11 (permalink)  
Old 07-06-10, 11:55
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
I was merely replicating your statement.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #12 (permalink)  
Old 07-06-10, 13:21
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
GUIDS are great for PK values, they're distinctly sub-optimal for clustered indexes.

The PK needs to be unique, and it ought to facilitate both scale up and scale out as your database/application grows. GUID values don't repeat within an Active Directory domain, so unique is practically guaranteed. Any machine in the domain can generate a GUID, so both scale up and scale out work nicely.

Clustered indexes need to optimize storage and/or retrieval. This is a completely different goal/purpose than the PK. Whatever column(s) define the most common "data trawl" for your database probably are the best choice for the clustered index. If it is possible to make the clustered index sequential (such as putting an increasing datetime as the first column), so much the better.

Good database maintenance plans can ensure that fragmentation isn't a major problem. Designing your application/database to avoid fragmentation is a great deal like designing your vehicle to minimize the need for oil changes... You can certainly do it, but you'll probably pay a significant penalty for that choice every day.

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On