Results 1 to 5 of 5
  1. #1
    Join Date
    Sep 2008
    Posts
    4

    Smile Unanswered: Question about Identity Insert

    Hi all,

    I have the following two tables:

    create table person
    (person_id int primary key identity(1000,2),
    person_last_name varchar(50),
    person_middle_name varchar (1),
    person_first_name varchar(50),
    person_addrline_1 varchar(100),
    person_addrline_2 varchar(100),
    person_addrline_3 varchar(100),
    person_city varchar(40),
    person_state varchar(14),
    person_zip varchar(10),
    person_birthdate date,
    person_gender int references gender(gender_id),
    person_ethnicity int references ethnicity(ethnicity_id),
    person_phone varchar(12)
    );


    create table new_persons
    (person_id int primary key identity(2000,2),
    person_last_name varchar(50),
    person_middle_name varchar (1),
    person_first_name varchar(50),
    person_addrline_1 varchar(100),
    person_addrline_2 varchar(100),
    person_addrline_3 varchar(100),
    person_city varchar(40),
    person_state varchar(14),
    person_zip varchar(10),
    person_birthdate date,
    person_gender int references gender(gender_id),
    person_ethnicity int references ethnicity(ethnicity_id),
    person_phone varchar(12)
    );

    And the following two statements:

    --Statement 1:
    SET IDENTITY_INSERT person ON
    insert into person (person_id, person_last_name, person_middle_name, person_first_name, person_addrline_1,
    person_addrline_2, person_city, person_state, person_zip, person_birthdate, person_phone,
    person_gender, person_ethnicity)
    select person_id, person_last_name, person_middle_name, person_first_name, person_addrline_1,
    person_addrline_2, person_city, person_state, person_zip, person_birthdate, person_phone,
    person_gender, person_ethnicity from new_persons
    where LEFT(person_last_name, 1) BETWEEN 'A' AND 'J';
    SET IDENTITY_INSERT person OFF

    --Statement 2: (same as 1 but omitting person_id)
    insert into person (person_last_name, person_middle_name, person_first_name, person_addrline_1,
    person_addrline_2, person_city, person_state, person_zip, person_birthdate, person_phone,
    person_gender, person_ethnicity)
    select person_last_name, person_middle_name, person_first_name, person_addrline_1,
    person_addrline_2, person_city, person_state, person_zip, person_birthdate, person_phone,
    person_gender, person_ethnicity from new_persons
    where LEFT(person_last_name, 1) BETWEEN 'A' AND 'J';

    When I run both of these two statements, the person_id's for the new rows being inserted into the person table start at 2000. That's what I expected when running statement 1, but I expected when I ran statement 2 that that the person ID's of the newly inserted rows would've been assigned as one up (well, 2 up) from the existing max person ID in the person table (which was 1178). Just curious if anyone has any thoughts on why it acts this way?

    Thanks!
    Tracy

  2. #2
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    I started this answer thinking it was going to be relatively easy. Then I did some experiments with identity columns.

    The rule is intuitive enough, but putting it into words is a little difficult, because of the variations. Think of it as SQL Server trying to avoid collisions with future inserts. Identity columns are usually used for primary key columns, so the need for this avoidance is pretty obvious.

    The next identity value is based on the max or min* of the last value inserted and the current identity value plus the step value.

    For an example
    Code:
    -- Create a table with an identity column
    create table #temp (col1 int identity (10, 3), col2 varchar(10))
    
     -- insert two rows to see it all works as expected
    insert into #temp (col2)
    values ('hello'), ('hi')
    
    select *
    From #temp
    
    -- Now let's do a manual insert with an identity value that is only 1 higher than the current max, instead of the usual step of 3
    set identity_insert #temp on
    insert into #temp (col1, col2)
    values (14, 'good day')
    set identity_insert #temp off
    
    -- And we go back to regular inserts
    insert into #temp (col2)
    values ('bye')
    
    -- What should the highest col1 value be?
    select *
    From #temp
    
    -- and clean up
    drop table #temp
    Try it out with a negative step value (-3 instead of 3), and see if anything changes (hint, it does).

    * Depending on whether the step is positive or negative

  3. #3
    Join Date
    Jan 2013
    Posts
    354
    Provided Answers: 1

    You are doing everything wrong

    I have the following two tables:
    This mess is a pile of fundamental errors. Let's look at this line by line:

    CREATE TABLE Person

    Since a table models a set, you have told us that you have one person as a set. The last man on Earth table. The name of a table is a plural or collective noun by the ISO-11179 rules or common sense. But more than that, a table is not generic; it is a set of one particular kind of entity. What specific role do these people play in your data model?

    (person_id INTEGER PRIMARY KEY IDENTITY(1000,2),

    IDENTITY is non-relational and proprietary. It is a table property (a count of the physical insertion attempts to one disk on one machine). So by definition, let me say that again, by definition, it cannot be a key. You just mimicked a magnetic tape file in SQL!

    person_last_name VARCHAR(50),*

    Do non-persons also have last names? If not, why the table name prefix? In a valid data model, the data element is name for what it is by its nature and never for where or how it is used in one part of the schema.

    Why is it NULL-able? Why is it FIFTY characters long? I hang out with Greeks, Poles and Indians who do not have anything the long! Your table is a garbage dump that will fill with bad data and too many NULLs.

    person_middle_name VARCHAR (1),

    Think about how silly “VARCHAR (1)” is and what it means. Then think about naming traditions that have full names like “Joseph Frank Celko” or “Don Diego de la Vega” (aka Zorro!). You went from too long to too short.

    person_first_name VARCHAR(50),*

    same problems again!

    person_addrline_1 VARCHAR(100),*
    person_addrline_2 VARCHAR(100),
    person_addrline_3 VARCHAR(100),

    The USPS uses VARCHAR(35) for street address lines and allows only 2 of them in a CASS compliant label. I will bet you have no idea what CASS is. Stop and learn it.

    person_city VARCHAR(40),

    Again, see the USPS standards. If you want to be an SQL programmer, learn to research.

    person_state VARCHAR(14),

    Wrong! This is a “state_code CHAR(2) NOT NULL CHECK (state_code IN (..))” column declaration. Why do you want to fill a table with garbage data? The CHECK() will assure only valid data for 54 to-letter codes used with ZIP.


    person_zip VARCHAR(10),

    the ZIP code is fixed length, not varying length. There is some debate about ZIP+4 as two columns versus one column. But you are storing the dash for no good reason. I prefer two columns because the route information (the +4) changes and is the most likely to be screwed up.

    person_birthdate DATE,

    Does this need a constraint? Can you have people born in the future? Over 130 years old?

    person_gender INT REFERENCES GENDER(gender_id),

    Again, you did no research. The correct term is “sex_code” and it is an ISO standard (0 = unknown, 1 = male, 2 = female, 9 = legal person). The idea of a “gender_id” is absurd. The sex code is a value on a categorical scale. What do you know about scales and measurements? An “_id” is an identifier which applies to one and only one entity; not to the value or unit on a scale. Do you also believe in “liter_id” for volume or “meter_id” for distance, etc.?

    Again this is doen in a CHECK() constraint in a proper data model.

    person_ethnicity int references ethnicity(ethnicity_id),

    Once more, no research! But this is worse. The ethnicity codes vary from country to country (the UK has a long list based on Celtic tribes, etc; the US uses mostly skin color and then voting blocks). What system are you using?

    Again this is a CHECK() constraint.

    person_phone VARCHAR(12)

    Varying length phone numbers?? And why did you fail to follow the international standards? Google it! Trust me, it is worth the few bytes for the country codes, etc.

    CREATE TABLE New_Person

    The prefix “new_” is a status or role played by a “person” determined by a rule and not a different kind of entity, as unlike as squids and automobiles?? NO! But if you were still using 1950's punch cards and mag tapes then you would have to physically move unit records from one tape (or deck of cards) to another!

    You have just been told by someone who helped with the ANSI/ISO Standards that you are doing everything wrong. If this was a woodcraft forum, you would be asking about which rocks are best for smashing screws into fine furniture.

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    ..there's lots of good pointers in the above post on standards and conventions...

    But don't get tied up with a US centric approach if the db may or does have to support non US data. If its US only then by all means use US standards. if there are global standards out there use 'em. if your application is never going to see the light of day outside your country, then you are free to make a design choice to use a national standard. But make a concious design choice to use a national over an international standard, but use a standard if one is available

    choosing the correct datatype and length (for alpha/alphanumeric fields required suitable knowledge of the possible extremes of data. Ive seen plenty of surnames (family / given name to use the PC weasel words) that are upto 25..30 characters long. when it comes to company names some organisations are inflicted with verbal diarrhea and can easily be 50+ characters long. if you are designing a table for live use consider another column in your persons table which identifies what the person prefers or is commonly known by (assuming there is such a requirement)

    Addresses. in a US application its fine to use the USPS guidelines, but if your application may have addresses that are not controlled under the USPS regime then trying to coerce those addresses to meet the USPS format is not clever. As a reason why not to... a UK address may have 5 or 6 lines, it may have as few as 2.... trying to coerce data into the USPS CASS system is daft at best and downright stupid.


    Its terminology but postal regimes outside the US case and do have their own coding system, ZIP is fine for the US but not outside. At least you chose [var]char(10), not a numeric

    Phone numbers are like addresses. if your application is going to store non US centric numbers then don't attempt to impose US style validation or formatting. it will fail.

    country codes are not valid as phone numbers, there are a more than just the USA the same "country code" as the US, feristance Canada.
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Nov 2014
    Posts
    5
    Quote Originally Posted by Celko View Post
    (person_id INTEGER PRIMARY KEY IDENTITY(1000,2),

    IDENTITY is non-relational and proprietary. It is a table property (a count of the physical insertion attempts to one disk on one machine). So by definition, let me say that again, by definition, it cannot be a key. You just mimicked a magnetic tape file in SQL!
    Magnetic tape file? This table can only be read sequentially? Why can't an IDENTITY column be used as a key? What do you recommend SQL Server users do, instead?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •