Unanswered: Performance Help: Binary Comparisons vs Full Normalization?
I have a pretty intensive query that I need performance help on. There are ~1 million de-normalized 'adjustment rows' that I am checking about 20 different conditions on, but each of these conditions has multiple possibile entries.
For example, one condition is 'what counties apply' to each row? Now I could cross-join a table listing every county that applies to every row, which would mean 1 million rows X 3,000 potential counties. And for every one of these 20 condition, I'd need to be joining tables for each of these lookups.
Instead, I was told to do a binary comparison of some sort, but I'm not exactly sure of how to do it. This way, I'm not needing to do any joins, but just have a large binary string, with bits representing each county.
Since each query I know the exact county searched, I can see if each row applies (along with each of the other conditions I must check vs the other binary strings).
I accomplished this using:
AND Substring(County, @CountyIndex, 1) = '1'
I have a character string for county, which is painfully slow when running all of these checks.
My hope is if the county in the lookup is 872, I can just scan the table, looking at bit #872 for the county field in each record, rather than joining huge tables for every one of these fixed fields I need to test.
My guess is the fastest way is some sort of binary string comparisons, but I can't find any good resources on the subject. PLEASE HELP!
You are going to have trouble doing a bitwise (binary???) comparison on more than 64 values, as your bitmask will exceed the maximum size allowed by the bigint datatype. And I would imagine that if you are dealing with counties, and other such data elements, you will need more than 64 possible values.
If it's not practically useful, then it's practically useless.
I'm realizing this isnt the person to be asking, which is why I'm looking around for better answers. Some DBAs shouldnt be handing out advice even when paid for it, but I'd research it if I had a good direction.
As for the 64-bit question on the 3,000 bit field, yes I was concerned about this issue, and I was hoping someone had come across something similar to this issue. I've wondered about using a binary string or if I'm just best left to creating this other table of over 1 billion entries to represent the joins..
so u dont think any performance gain can be had by testing the bit field..even it was a 64bit field instead of 3000? You're saying it would be faster to normalize it? I would think if the bits can be tested efficiently, it would be faster than scanning through the additional 20 tables caused by normalizing