Results 1 to 9 of 9
  1. #1
    Join Date
    Sep 2007
    Posts
    2

    Unanswered: Design Question for Large Table

    Hi,

    What's the most efficient way to store the following information:

    * Table contains 1 million listings
    * Each listing can be geo-targeted to any of the 200+ countries
    * Searches return listings based on geo-location

    Storage options:

    Option #1 (normalized)
    * ListingsTable (PK listingID int) [1 million rows]
    * ListingGeoLocations (listingID, geoLocationID) [could be up to 200 million rows]

    Option #2 (denormalized)
    * ListingsTable (PK listingID int, binary(32) with bit-mask consisting of 200 bits one for each location)

    Did anyone have experience with similar structures? Which option is more efficient?

    Thanks,
    Av

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    option 2 sounds inferior to me

    let's say you were looking for listings corresponding to the 37th bit in the bitmask, how would you find them?

    by inspecting the bitmask of all 1 million rows, that's how
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    I'm too tired to answer

    Look up normalization on google
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Option #1.

    As Rudy pointed out, you are going to have problems doing bitwise calculations on bitmasks over 30 bits, after which you exceed the capacity of the BigInt datatype.

    Also, while under scenario #1 your table COULD reach 200 million rows, in reality what is the average number of countries you expect for each geolocation? Probably significantly less than 200.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Sep 2007
    Posts
    2
    I understand that option #1 is normalized and will allow the search for listings in a specific country using query. However, say when a search is performed, it is based on an set of keywords (which are also tied to listings).

    So say you looking for "dentists". You can about 100 listings that are related. Now, the APPLICATION (not SQL) will simply "filter" the 100 listings by doing geo-specific bitwise comparissons and will only display the listings that have the proper country bit set.

    Still Option #1? The search needs to be very fast.

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    You are NOT going to do bitwise operations on a 200 value bitmask.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by av613
    So say you looking for "dentists". You can about 100 listings that are related.
    you can? you can what? how does that work?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by r937
    you can? you can what? how does that work?
    Ask Harry potter
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  9. #9
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Why not have a
    Code:
    Listings table (Pk ListingID)
    Locations Table (PK LocationID)
    and,
    ListingLocations table (PK ListingID, LocationID)
    ListingLocations is an intersection table, used to define a many-to-many relationship. The two required fields, ListingID and LocationID are also foreign keys to the appropriate table.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •