If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Bit field normalization vs performance

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-21-05, 07:02
stupot-UK stupot-UK is offline
Registered User
 
Join Date: Dec 2005
Location: London
Posts: 2
Bit field normalization vs performance

I have a table of users with their corresponding interests. At the moment I am using bit fields to record their interests but I am aware that this is not normalized and probably not best practice.


user_ID | user_name | likes_fishing | likes_cycling | likes_football
================================================
1 | john | 1 | 1 | 1
2 | jeff | 1 | 0 | 0
3 | dave | 1 | 1 | 0

If I normalize this to a many to many relationship I get


user table
user_ID | user_name
===============
1 | john
2 | jeff
3 | dave

likes table
likes_ID | likes_type
===============
1 | fishing
2 | cycling
3 | football

user-likes table
user_ID | likes_ID
=============
1 | 1
1 | 2
1 | 3
2 | 1
3 | 1
3 | 2

All well and good. Now, how do I query for users that like both fishing AND cycling (john and dave) and is this the best practice for a web app where searching on these fields will be frequent.

Last edited by stupot-UK; 12-21-05 at 08:34.
Reply With Quote
  #2 (permalink)  
Old 12-22-05, 00:54
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Select username
from usertable a, likestable b, userlikestable c
where a.userid=c.userid
and b.likesid=c.likesid
and (b.likestype = fishing
or b.likestype = cycling)
__________________
visit: relationary
Reply With Quote
  #3 (permalink)  
Old 12-22-05, 05:22
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Quote:
Originally Posted by certus
Select username
from usertable a, likestable b, userlikestable c
where a.userid=c.userid
and b.likesid=c.likesid
and (b.likestype = fishing
or b.likestype = cycling)
That finds people who like fishing OR cycling. For fishing AND cycling you could do this:
Code:
Select username
from usertable a, likestable b1, userlikestable c1, likestable b2, userlikestable c2
where a.userid=c1.userid
and    b1.likesid=c1.likesid
and    b1.likestype = 'fishing'
and    a.userid=c2.userid
and    b2.likesid=c2.likesid
and    b2.likestype = 'cycling'
Or:
Code:
Select a.username
from usertable a
where a.userid IN
(select c.userid
 from likestable b, userlikestable c
 where b.likesid=c.likesid
 and    b.likestype = 'fishing'
 INTERSECT
 select c.userid
 from likestable b, userlikestable c
 where b.likesid=c.likesid
 and    b.likestype = 'cycling'
)
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #4 (permalink)  
Old 12-22-05, 06:24
stupot-UK stupot-UK is offline
Registered User
 
Join Date: Dec 2005
Location: London
Posts: 2
tradeoff

Thank you very much for the reply. I like the first method, it seems cleaner (from web app/dynamic sql generation perspective)

I guess the question is now, how much of a performance hit this will take if more than two criteria are required. The list of 'likes' could potentially contain 50 to 100 items. If I want to offer an interface where all 'likes' are searchable in any combination it will require a lot of joins. The options are either to limit the number of criteria that can be searched at one time or to keep the denormalized structure I am using at the moment and take the hit on storage. I am tempted to take the hit on storage as I am only ever going to have around 30,000 users in the db and storage is cheap whereas a performance drop costs me massively in usability and therefore membership. But this is where my understanding of the internal mechanics of db engines breaks down. Would allowing a select statement with 50 criteria on the denormalized table be just as slow as allowing a 50 criteria select using joins on the normalized tables. I'll do some testing but I'd be interested to hear peoples thoughts on this.
Reply With Quote
  #5 (permalink)  
Old 12-22-05, 14:11
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
I kind of like:
Code:
SELECT u.name
   FROM userTable AS u
   WHERE 2 = (SELECT Count(*)
      FROM LikesTable AS l
      JOIN UserLikesTable AS z
         ON (z.likes_ID = l.likes_ID)
      WHERE l.user_ID = u.user_ID
         AND l.likes_type IN ('fishing', 'cycling'))
You have to match the count with the list, but that doesn't strike me a s big deal, and you can then have a practical infinity of interests.

-PatP
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On