Results 1 to 15 of 15
  1. #1
    Join Date
    Aug 2004
    Posts
    51

    Unanswered: Replace UniqueIdentifiers with Ints in existing DB

    Hello,

    So I finally gave in to all the good advice about uniqueidentifiers and need to reorganize my database using ints in all the primary and foreign keys.

    Anyone have any good advice about how to go about such a refactoring?

    Right now I'm at a good point: not yet in production, only a few tables with data, none more than 100K rows.

    I'm hoping for suggestions that include links to refactoring scripts.

    I also am using Visual Studio 2008 Database version but I'm concerned it may be a bit clumsy for major refactoring in this (early) SP1 incarnation.

    Thanks,

    Kimball

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Identify your uniqueidentifiers:
    Code:
    SELECT s.name As schema_name
         , o.name As table_name
         , c.name As column_name
    FROM   sys.objects As o
     INNER
      JOIN sys.columns As c
        ON c.object_id = o.object_id
     INNER
      JOIN sys.types As t
        ON t.system_type_id = c.system_type_id
     INNER
      JOIN sys.schemas As s
        ON s.schema_id = o.schema_id
    WHERE  o.is_ms_shipped = 0
    AND    t.name = 'uniqueidentifier'
    I am guessing your uniqueidentifiers are key fields in your db?
    George
    Home | Blog

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    If your application depends on GUIDs, you could get a performance boost by keeping the GUIDs as unique keys, but adding an identity as the primary key.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'm just curious, but I use GUIDs every chance I get. GUIDs are far better than INTs in every way that I care. Is there some way that INTs are better than GUIDs?

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Pat,
    I take it you use sequential GUIDS then?

    Due to their random nature GUIDs are considered poor candidates for indexes as they cause frequent page splits.
    George
    Home | Blog

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    To be precise, they are poor candidates for clustered indexes (on large tables).
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quite right, once again I fall foul of not including enough detail. I shouldn't post when I'm tired
    George
    Home | Blog

  8. #8
    Join Date
    Aug 2004
    Posts
    51
    Like PatP, I have also favored GUIDs for all their unique capabilities.

    However I am facing a GIS/List Management project that is multi-year and multi-nation and includes hierarchical organization ownership and overlapping territory boundaries.

    I was impressed by the disk impact calculations in this article: that was referenced in this forum on another thread.

    So my current strategy is to move to INTs for fundamental data relationships within the transactional records but to maintain a separate hierarchy based on GUIDS for a separate hierarchy to track ownership. Or maybe not even then.

    In any case, I anticipate a period of uncertainty where I may have to switch various areas of the overall schema between ints and GUIDs until i get the right balance.

    So what I think I'm going to need is a group of scripts based on the example provided by GVee that are tailored with the standard column names I'm using throughout the design.

    Thanks,

    Kimball

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Yes, but if this is a multi-national enterprise application, then there is a good chance the data will be split across multiple sites, in which case you'll have a need for GUIDs.
    GUIDs vs. Identity is not a simple choice. There are trade-offs. You could easily find yourself (or your developers) having to create a lot of client side code and network traffic to handle identities, that would essentially negate any performance benefits on the database side.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Aug 2004
    Posts
    51
    Hmmm... That brings me back again to those nasty advantages.

    There are a number of business goals that lead directly to GUIDs.

    As I said, in the past I simply gave in to the appeal.

    Since the design of such a system will never be complete and I anticipate refactoring to be an ongoing process, I believe the best step for me at this point is to formulate some rough operational rules and see how far they take me.

    So what do you think about this?

    For example:

    1. GUID columns should exist in all tables which represent client data, but not system data such as lookups or static data such as a GIS table of physical, geographic features.
    2. GUID columns would represent ownership of data but would operate as scopes for rows with INT primary keys. They would represent a logical ownership or scope meta data structure across the entire database.
    3. GUID columns should be sequentially created on the server side but should be replicated into rows created on the client side to maintain ownershp.

    The idea of 'imposing a logical structure' on the database using GUIDs may not be what you are indicating by suggesting GUIDs to represent sites or national partitions?

    When I start thinking in this direction, I wonder why not use the PostgreSql system of using scoped multi-column INT keys representing row id, host id, site id.

    However then I recall that clients generating new rows would like to submit records with GUID keys so that their key does not get hosed at the server during synchronization. (the client is 'occasionally connected')

    That sort of says that Rows exchanged with clients might have to be stored some sort of hybrid 'journal' table that uses GUID keys but those are parsed into normalized tables with INT keys and GUID ownership identifiers.

    ?

    Thanks,

    Kimball

  11. #11
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I still don't buy the "page split" argument. It is purely physical, and actually a "red herring" at that because if you design the schema correctly so that you use the randomness of the GUIDs to minimalize the problem with insert hot spots which are implied by sequential ID values then using GUIDs results in simpler maintenance and a net INCREASE in performance.

    Concventional wisdom is a wonderful thing as long as it is correct. In this case the conventional wisdom applies to poorly designed systems (ones that are predicated on sequential inserts), but falls apart for systems designed to work with truly large amounts of incoming data. If you are willing to trade some disk for a thousand fold increase in INSERT performance, the pseudo-randomness of GUIDs are a blessing.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  12. #12
    Join Date
    Aug 2004
    Posts
    51
    Yikes Pat!

    Are you suggesting that INT keys require a cascading renumbering of primary/foreign key relationships for every insert?

    Or are you just talking about scheduled re-indexing?

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Quote Originally Posted by Pat Phelan
    I still don't buy the "page split" argument. It is purely physical, and actually a "red herring" at that because if you design the schema correctly so that you use the randomness of the GUIDs to minimalize the problem with insert hot spots which are implied by sequential ID values then using GUIDs results in simpler maintenance and a net INCREASE in performance.

    Concventional wisdom is a wonderful thing as long as it is correct. In this case the conventional wisdom applies to poorly designed systems (ones that are predicated on sequential inserts), but falls apart for systems designed to work with truly large amounts of incoming data. If you are willing to trade some disk for a thousand fold increase in INSERT performance, the pseudo-randomness of GUIDs are a blessing.

    -PatP
    This is begging for some demo code.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Take any handy terrabyte size database, preferably with one table that comprises a significant part of that terrabyte. Telco data or extremely high volume transactional data are the only real good candidates I know of for this kind of demonstration.

    Create a prep table of a few million rows with ordinary (non-sequential) GUIDs.

    Start the timer

    Append the prep table to the main table.

    Stop the timer

    Repeat the process with sequential GUID values in both the main and the prep tables.

    I don't have spare hardware to create and test a demo script for you. Please let me know how the sequential GUIDs perform versus the non-sequential GUIDs in your test. I know of three companies (one was Microsoft) that have run this test, and all three of them found the non-sequential GUID test to run significantly faster for all cases except the initial table load (the first 100 Gb of data).

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  15. #15
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by KimballJohnson
    Are you suggesting that INT keys require a cascading renumbering of primary/foreign key relationships for every insert?
    No, nothing like that.

    If youi consider the problems that occur with sequential INSERT operations into a table, the answer seems obvious... When you do sequential inserts, all insert activity occurs at the end of the table which forces the last active page to become a "hot spot" of contention. This means that if you have 20 processes (spids) trying to do insert operations, all of them have to fight for access to the last page in the table, and all of them have to handle the bookkeeping associated with page rollover when the final page becomes full and a new page becomes active at the end.

    This was a common problem for Sybase and Microsoft in the late 1990s because DB2 and Oracle had already cracked the nut on this problem. Before Micorosft SQL 6.5 and Sybase 10.0, there was no viable solution for sequential insert operations. The Data Modeler or the DBA simply had to find a way around the problem.

    GUIDs made the problem a non-issue, since their random nature ensured that even with 50 or 100 active processes (spids), the "hot spot" was negligable if it occurred at all.

    For at least ninty-nine percent of the systems in place today, hot spots are a thing of the past and about as relevant as a spoke shave or a buggy whip. The problem is that for very large systems with really high insert volumes (even if those are just spurts in the stream of activity), hot spots can still be a point of contention.

    Microsoft did a demo of this problem and a work-around for it at one of their regional "think tank" expos a couple of years ago. I might see if I can convince someone with a real world app to show a technology demo at PASS this year... Very few people hit this problem any more, but there are enough that do to make the demo worthwhile, and it would also help to dampen the myth that GUIDs are bad for performance when they can actually improve performance for large cases.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •