Results 1 to 10 of 10
  1. #1
    Join Date
    Nov 2005
    Posts
    11

    Unanswered: Performance Help: Binary Comparisons vs Full Normalization?

    I have a pretty intensive query that I need performance help on. There are ~1 million de-normalized 'adjustment rows' that I am checking about 20 different conditions on, but each of these conditions has multiple possibile entries.

    For example, one condition is 'what counties apply' to each row? Now I could cross-join a table listing every county that applies to every row, which would mean 1 million rows X 3,000 potential counties. And for every one of these 20 condition, I'd need to be joining tables for each of these lookups.

    Instead, I was told to do a binary comparison of some sort, but I'm not exactly sure of how to do it. This way, I'm not needing to do any joins, but just have a large binary string, with bits representing each county.

    Since each query I know the exact county searched, I can see if each row applies (along with each of the other conditions I must check vs the other binary strings).

    I accomplished this using:
    AND Substring(County, @CountyIndex, 1) = '1'
    I have a character string for county, which is painfully slow when running all of these checks.

    My hope is if the county in the lookup is 872, I can just scan the table, looking at bit #872 for the county field in each record, rather than joining huge tables for every one of these fixed fields I need to test.

    My guess is the fastest way is some sort of binary string comparisons, but I can't find any good resources on the subject. PLEASE HELP!

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by cragi
    Instead, I was told to do ... PLEASE HELP!
    that's the person you should be asking
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    You are going to have trouble doing a bitwise (binary???) comparison on more than 64 values, as your bitmask will exceed the maximum size allowed by the bigint datatype. And I would imagine that if you are dealing with counties, and other such data elements, you will need more than 64 possible values.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    Nov 2005
    Posts
    11
    I'm realizing this isnt the person to be asking, which is why I'm looking around for better answers. Some DBAs shouldnt be handing out advice even when paid for it, but I'd research it if I had a good direction.

    As for the 64-bit question on the 3,000 bit field, yes I was concerned about this issue, and I was hoping someone had come across something similar to this issue. I've wondered about using a binary string or if I'm just best left to creating this other table of over 1 billion entries to represent the joins..

  5. #5
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    this won't work unless you have 64 values or less. from BOL:

    "In a bitwise operation, only one expression can be of either binary or varbinary data type." from: http://msdn2.microsoft.com/en-us/library/ms174965.aspx

    so you can't have more than 64 bits on both sides of the operator.

  6. #6
    Join Date
    Nov 2005
    Posts
    11
    any recommendations on what i 'can' do?

  7. #7
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    normalize?

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    normalize!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Nov 2005
    Posts
    11
    so u dont think any performance gain can be had by testing the bit field..even it was a 64bit field instead of 3000? You're saying it would be faster to normalize it? I would think if the bits can be tested efficiently, it would be faster than scanning through the additional 20 tables caused by normalizing

  10. #10
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    well, if you index properly, there won't be any scans. should be only seeks, which are fast.

    since you don't have that many values, the pages for the lookup tables are likely to be entirely cached in memory if used often, so the seeks will be quite fast.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •