Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322

    Unanswered: Case Sensitive vs Insensitive

    After all the pain I've been going through with code pages and collation, I was asked how, when sql server does it's joins and predicate searches, how does it actual (internals now) know the an "A" = "a" in an insensitive search?

    I didn't have the answer.

    Damn, Now I really have to pick up Kelans book.
    Last edited by Brett Kaiser; 07-02-04 at 11:48.
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  2. #2
    Join Date
    Feb 2004
    Posts
    492
    I've seen tricks suchs as uppercasing or lowercasing both prior the compare. I think formatting routines such as these can be very quick because they only have lookup's to do.

  3. #3
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Well, the questions stems from an insensetive server

    You don't need to use CASE functions in this case

    "A" = "a"
    "A" = "A"
    "a" = "A", and
    "a" = "a"

    Internally each has it's on ASCII represenation.

    How does an insesitive ("the big jerk") server resolve this?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  4. #4
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Perhaps it "ands" 32 to both characters if either are in the ASCII range 65-90? Probably a number of ways of doing it. Is this one of those questions where an IT Director thinks he has the better of you, because he found a question you can not answer?

  5. #5
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by MCrowley
    Perhaps it "ands" 32 to both characters if either are in the ASCII range 65-90? Probably a number of ways of doing it. Is this one of those questions where an IT Director thinks he has the better of you, because he found a question you can not answer?
    No, they don't know, and because DB2 OS/390 is case sensitive, he didn't understand....and because I've got sql boxes built six ways to sunday, it's been bugging me...and that's how it arose...

    But now it's bugging me as well...
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  6. #6
    Join Date
    Jul 2004
    Posts
    60
    helluva question - 1st i'd smack the guy who asked.

    it's just like asking how does SQL server know 1 != 2?
    not a technically valid question, IMHO, but thats way the cookie crumbles sometimes, i reckon.

    my understanding is that collation is a 3 legged stool
    the sort order + the code page + the character set or "dictionary".

    if the sort order is binary, my personal favorite,
    its pretty simple: the bit pattern mapped to 'a' is not the same as 'A'. what ends up writing to disk is simply as different between 'a' and 'A' as it is between '1' and '2'. it knows 'a' is not 'A' because they are completely differnt patterns stored on the disk.

    i dont think i've ever totally understood how non-binary case-sensitive sort orders know the difference - but i do know it involves 'interrogating' the character set dictionary at a very very low level to get the info it needs - which is why it is a noticably slower way to store/join/compare your data.

    both binary and dictionary sorts 'technically' use bit patterns, but sometimes a particular character set stored in non-binary can have funky characters and stuff that make it not "always" work.

    in either scenario, depending on the level of the person asking the question - i'd think the bit pattern explanation is close enough for gov't work, after the back-hand, of course.

  7. #7
    Join Date
    Feb 2004
    Posts
    492
    I think ascii comparisons are from the old days. If case-insensitive is set, I would think it'll look at the codepage and have the codepage decide which character is its uppercase or lowercase equivalent.

  8. #8
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by oddity
    helluva question - 1st i'd smack the guy who asked.
    Now do you seriously think I'd entertain this question from someone who didn't SIGN my check?

    Anyway, it'a valid question, and from what ya'll (no I'm not from the south) have mentioned, it makes a lot of sense, and was thinking in those terms...

    Still gotta google some details...


    Thanks
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  9. #9
    Join Date
    Jul 2004
    Posts
    60
    LOL - i know what you mean.
    Gotta make that mortgage payment.

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    The problem is that the actual process isn't that simple. There is bitmap that shows which charcters are mapped, and a hash of arrays that allows each unicode character set to "remap" characters as needed. Each array contains the hash signature, a value for comparison, and a value for sorting. Most characters don't have lookup entries in the hash, so they use the actual binary value.

    Aren't you glad that you asked?

    -PatP

  11. #11
    Join Date
    Jul 2004
    Posts
    60
    hey pat -
    does the 'array' you refer to only store the unicode, and special non-unicode characters for languages that use them (tilde, etc)? or all characters?

    sounds as though the premise that the code page has all the info in it and SQL services communicate with its contents for sorting and comparison rules is the basic idea for dictionary sorts - or it uses the binary value directly for binary sorts.

    is that your take, at a high level?

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    The hash is the actual lookup mechanism. The array that is returned from the hash contains values that are used for comparisons and sorting.

    You can think of it something like this: When processing a character, it gets expanded to 16 bits if it wasn't already, then that 16 bit value is used as a lookup into the bitmask to see if this character gets special processing (most don't). If it needs processing, a hash lookup is done which returns the character itself, the value to use for comparison purposes, and a value to use for sorting purposes. If the character didn't need processing, it is copied into all three values in an empty array.

    -PatP

  13. #13
    Join Date
    Jul 2004
    Posts
    60
    thanks!
    'splains a little better why binary sorts are mo'better, IMO.
    they need no translation and/or re-sorting.

    of course, some users expect charater ordering to work like it did for them in 3rd grade, but, eh, NMFP. the database is faster, that IS my problem.

    i'm starting to think i'm an evil dba...


  14. #14
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Once you get past the surface...it's all about the internals and undersatanding how things work...

    Thanks again Pat...

    (He has a signed copy of Kelans book no doubt)
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  15. #15
    Join Date
    Jul 2004
    Posts
    60
    quite true.
    there's so much to learn i'll be busy the rest of my life.

    might be useful for this thread and/or compiling a detailed answer for your bossman:

    master..syscharsets is where SQL server goes to get the binary definitions for character sets and sort orders.

    peace.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •