Results 1 to 5 of 5
  1. #1
    Join Date
    Dec 2005
    Location
    London
    Posts
    2

    Bit field normalization vs performance

    I have a table of users with their corresponding interests. At the moment I am using bit fields to record their interests but I am aware that this is not normalized and probably not best practice.


    user_ID | user_name | likes_fishing | likes_cycling | likes_football
    ================================================
    1 | john | 1 | 1 | 1
    2 | jeff | 1 | 0 | 0
    3 | dave | 1 | 1 | 0

    If I normalize this to a many to many relationship I get


    user table
    user_ID | user_name
    ===============
    1 | john
    2 | jeff
    3 | dave

    likes table
    likes_ID | likes_type
    ===============
    1 | fishing
    2 | cycling
    3 | football

    user-likes table
    user_ID | likes_ID
    =============
    1 | 1
    1 | 2
    1 | 3
    2 | 1
    3 | 1
    3 | 2

    All well and good. Now, how do I query for users that like both fishing AND cycling (john and dave) and is this the best practice for a web app where searching on these fields will be frequent.
    Last edited by stupot-UK; 12-21-05 at 09:34.

  2. #2
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Select username
    from usertable a, likestable b, userlikestable c
    where a.userid=c.userid
    and b.likesid=c.likesid
    and (b.likestype = fishing
    or b.likestype = cycling)

  3. #3
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Quote Originally Posted by certus
    Select username
    from usertable a, likestable b, userlikestable c
    where a.userid=c.userid
    and b.likesid=c.likesid
    and (b.likestype = fishing
    or b.likestype = cycling)
    That finds people who like fishing OR cycling. For fishing AND cycling you could do this:
    Code:
    Select username
    from usertable a, likestable b1, userlikestable c1, likestable b2, userlikestable c2
    where a.userid=c1.userid
    and    b1.likesid=c1.likesid
    and    b1.likestype = 'fishing'
    and    a.userid=c2.userid
    and    b2.likesid=c2.likesid
    and    b2.likestype = 'cycling'
    Or:
    Code:
    Select a.username
    from usertable a
    where a.userid IN
    (select c.userid
     from likestable b, userlikestable c
     where b.likesid=c.likesid
     and    b.likestype = 'fishing'
     INTERSECT
     select c.userid
     from likestable b, userlikestable c
     where b.likesid=c.likesid
     and    b.likestype = 'cycling'
    )

  4. #4
    Join Date
    Dec 2005
    Location
    London
    Posts
    2

    tradeoff

    Thank you very much for the reply. I like the first method, it seems cleaner (from web app/dynamic sql generation perspective)

    I guess the question is now, how much of a performance hit this will take if more than two criteria are required. The list of 'likes' could potentially contain 50 to 100 items. If I want to offer an interface where all 'likes' are searchable in any combination it will require a lot of joins. The options are either to limit the number of criteria that can be searched at one time or to keep the denormalized structure I am using at the moment and take the hit on storage. I am tempted to take the hit on storage as I am only ever going to have around 30,000 users in the db and storage is cheap whereas a performance drop costs me massively in usability and therefore membership. But this is where my understanding of the internal mechanics of db engines breaks down. Would allowing a select statement with 50 criteria on the denormalized table be just as slow as allowing a 50 criteria select using joins on the normalized tables. I'll do some testing but I'd be interested to hear peoples thoughts on this.

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I kind of like:
    Code:
    SELECT u.name
       FROM userTable AS u
       WHERE 2 = (SELECT Count(*)
          FROM LikesTable AS l
          JOIN UserLikesTable AS z
             ON (z.likes_ID = l.likes_ID)
          WHERE l.user_ID = u.user_ID
             AND l.likes_type IN ('fishing', 'cycling'))
    You have to match the count with the list, but that doesn't strike me a s big deal, and you can then have a practical infinity of interests.

    -PatP

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •