Results 1 to 12 of 12
  1. #1
    Join Date
    Feb 2010
    Posts
    75

    Unanswered: int vs varchar columns - performance

    We have a very big table (50 million rows) that is becoming slower and slower for select queries. We have done lots of performance analysis and have all of the recommended indixes in the right places (we think).

    But theres something bugging me about the table and I dont know if I have a case or not. The guy who designed the table likes to use char and varchar datatypes where I think int would make much more sense.

    The offending columns (in my opinion) are mostly status flags that can only have a few different values.

    An example would be a column for a person's sex. He would design it like this:

    Column Name: SEX
    Data Type: nvarchar(6)
    Possible values: 'male' , 'female'

    Where I would have preferred this design:

    Column Name: IS_MALE
    Data Type: bit (or maybe int)
    Possible values: 1, 0



    Another example

    He would design:

    Column Name: DIRECTION
    Data Type: nvarchar(5)
    Possible values: 'north', 'south', 'east', 'west'

    Where I would have preferred this design:

    Column Name: DIRECTION_ID
    Data Type: int
    Possible values: 0, 1, 2, 3 (with a lookup table linking 0 to North, 1 to South... etc if necessary)


    There are probably 5 to 10 columns like this in this particular table. His reason for using varchars is that its easier to read, and see what data is in the table without having to join to a bunch of lookup tables.

    My question is, would my design (using ints where possible) be better? And If so, how much better? I know my way would use slightly less disk space for the table, which is good, but mostly I'm interested in actual performance. Would using my way speed up SELECTs in any significant way? Keep in mind some of his nvarchar columns have indexes on them, so I would put the same index on my int column if I were to re-design the table. Are int columns inherently better/faster for indexes and lookups? I would think so but its just a hunch at this point.
    Also, these columns appear in WHERE clauses a lot. Would there be any significant speed up by saying WHERE DIRECTION = 1 instead of WHERE DIRECTION = 'north' ?

    Basically I feel that changing these columns to ints would help our performance issues, but I cant really be sure. I dont want to suggest that his design was bad without a little more confidence that I'm right.

    I will have to do some performance comparisons, but before I do that I wanted to see if anyone here sees anything wrong with my reasoning.

    Thanks a lot!

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by BeerOclock View Post
    The offending columns (in my opinion) are mostly status flags that can only have a few different values.
    so searching on them is not likely to use an index regardless of whether it's a string or integer column

    Quote Originally Posted by BeerOclock View Post
    There are probably 5 to 10 columns like this in this particular table. His reason for using varchars is that its easier to read, and see what data is in the table without having to join to a bunch of lookup tables.
    i agree with him completely

    you aren't going to gain much (if anything) of an improvement in performance, and in fact you will likely make performance worse, since in many existing queries now you would have to join to these extra tables you're thinking about, just to "translate" the integers back into human-recognizable strings
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    As long as the indexes exist, I'm not sure how much speed improvement you'll see. There should definitely be some improvement (a few percentage points) especially if you use the CHAR(1) or the TINYINT type. From the database perspective the two types offer equivalent performance. and the person that designed your schema might find character columns more palatable.

    I would VERY STRONGLY recommend that you use lookup tables and foreign keys instead of constraints. A constraint is part of the schema just like a foreign key, but you can update a lookup table with a simple SQL statement and changing the schema definition can require database downtime. The use of lookup tables also makes it possible to add additional language support to your application if that becomes necessary later.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    pat, your advice to use lookup tables seems to be based only on ensuring data integrity and making it easy to effect maintenance changes

    there is NO GOOD REASONfor using a surrogate key here, when the human-readable values (e.g. 'north', 'south', 'east', 'west') will function as foreign keys to the lookup table for data integrity purposes just as surely, and they will avoid the unnecessary join to the lookup table just for the translation

    in fact, this shouldn't be called a "lookup" table, but rather an "integrity" table
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    May 2011
    Posts
    11
    I belive both of the above guys are correct, i agree with pat on lookup tables for data integrity since its a big deal for a small performance loss, insignificant really.

    now i also agree with r937 on that fields with flag values Yes/No Male/Female
    can stay right there, since its not likely you`re going to have to register any other sex.

    if i were on your shoes i would turn these flag column into char(1) or int(1) and turn the other column in lookup tables as pat mentioned for several reasons

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Quote Originally Posted by BeerOclock View Post
    Column Name: DIRECTION
    Data Type: nvarchar(5)
    Possible values: 'north', 'south', 'east', 'west'
    You are about to be eaten by a gru.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by blindman View Post
    You are about to be eaten by a gru.
    xyzzy.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by ArielNessi View Post
    now i also agree with r937 on that fields with flag values Yes/No Male/Female can stay right there, since its not likely you`re going to have to register any other sex.
    You don't do much work in California, do you???

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    May 2011
    Posts
    11
    Quote Originally Posted by Pat Phelan View Post
    You don't do much work in California, do you???

    -PatP
    Well as for a Char(1) field i've got 27 letters in the keyboard, and there are still special characters and numbers, actually i live in brazil and around here about 4 typos should do the work

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by ArielNessi View Post
    Well as for a Char(1) field i've got 27 letters in the keyboard, and there are still special characters and numbers, actually i live in brazil and around here about 4 typos should do the work
    sheesh

    if you're going to ~encode~ whatever this data item is, you might as well use a TINYINT then, you'll have 127 (or 255) values to choose from

    my idea was not to encode it at all, but use the actual name as the key

    i will never understand why there is such a reluctance in the programming community to refer to a thing by the name of the thing, and why they feel it necessary or even advisable to have some kind of additional code for the thing that needs to be translated into the name of the thing

    when practical, use the actual name of the thing or a reasonable code

    take direction, as in the original post in this thread

    you could store names 'north', 'south', 'east', 'west', but you'd soon realize that you'd need the column to accept names like 'east northeast' and various other values

    it might then be smarter to use an additional code along with (or instead of) the name, so you'd store 'N', 'S', and so on, and even 'ENE' when necessary -- in fact, if you did this, i bet you probably wouldn't want to store the complete names anywhere

    but for pity's sake, don't use an integer here!! storing 17 as a direction ~requires~ a join just to figure out what the direction actually is, whereas with 'ENE' you could, if necessary, translate it into its proper name, but you wouldn't ~necessarily~ have to, if all you wanted to do was understand it

    that's the thing with stupid keys like IDENTITY keys, they have no meaning whatsoever

    that's sometimes a good thing, but not, in my opinion, for codes like these
    Last edited by r937; 05-25-11 at 19:43.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Beeroclock, are you performing regular reindexing? Updating statistics?
    Have you considered using indexed views for common queries?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    May 2011
    Posts
    11
    Quote Originally Posted by r937 View Post
    sheesh

    if you're going to ~encode~ whatever this data item is, you might as well use a TINYINT then, you'll have 127 (or 255) values to choose from

    my idea was not to encode it at all, but use the actual name as the key

    i will never understand why there is such a reluctance in the programming community to refer to a thing by the name of the thing, and why they feel it necessary or even advisable to have some kind of additional code for the thing that needs to be translated into the name of the thing

    when practical, use the actual name of the thing or a reasonable code

    take direction, as in the original post in this thread

    you could store names 'north', 'south', 'east', 'west', but you'd soon realize that you'd need the column to accept names like 'east northeast' and various other values

    it might then be smarter to use an additional code along with (or instead of) the name, so you'd store 'N', 'S', and so on, and even 'ENE' when necessary -- in fact, if you did this, i bet you probably wouldn't want to store the complete names anywhere

    but for pity's sake, don't use an integer here!! storing 17 as a direction ~requires~ a join just to figure out what the direction actually is, whereas with 'ENE' you could, if necessary, translate it into its proper name, but you wouldn't ~necessarily~ have to, if all you wanted to do was understand it

    that's the thing with stupid keys like IDENTITY keys, they have no meaning whatsoever

    that's sometimes a good thing, but not, in my opinion, for codes like these
    It was a joke '-' 27 letters for different sex's. no need to go macarena
    Quote:
    Originally Posted by ArielNessi
    now i also agree with r937 on that fields with flag values Yes/No Male/Female can stay right there, since its not likely you`re going to have to register any other sex.

    You don't do much work in California, do you???

    -PatP
    It was a answer to this

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •