Results 1 to 14 of 14
  1. #1
    Join Date
    Aug 2007
    Posts
    5

    Relevance modelling problem

    I have a problem regarding the modelling of a database for a website that requires relevance in its search..

    This is the scenario:

    say we have a couple of tables consisting of cities in a country:

    table 1: country
    -------------
    #cities
    New York
    L.A
    Chicago
    Boston
    Miami

    These are all in different positions(coodinates) in the country, what I want to do here is determine their position in relationship to eachother.

    I E New york has a distance to miami but another distance to boston and a third to L.A and so on.

    When I search this database using the phrase "New york" I want to be able to know the distances to all other cities in the table aswell.

    I know of 2 ways of doing this:

    1. Setting up a distances and match them in the search via a table as following:

    Distances
    ----------
    #New York #Boston etc.
    boston 25 km Chicago 50 km
    L.A 500 km Miami 300 km
    Chicago 30 km L.A 700 km
    and so on...

    2. Setting up tables for each city as following:

    New York
    ----------
    #city #distances
    Boston 25 km
    L.A 500 km

    and so on...

    These two methods would work with only 5 cities or maybe a few more, but my real problem is that I have 290 "cities" in consideration for the database, how can I manage these and get their relevance to eachother in a easier way than having to make 290 tables with far too many posts in each table.

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Hi

    Just in case this is one of those questions that gets very complicated very quickly:
    When you say "position" you mean only distance between cities? You don't intend to include direction or co-ordinates do you?
    What do you mean by "relevance" in the title? There appears to be no mention of this in the thread.
    What is ultimate goal of this? Are you, for example, trying to determine optimum route finding or something?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    BTW - you defo do not want to do method #2.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Actually - if you require any of the stuff that I suspect you will require then it is not something that I have any experience of. However, I do keep track of these sorts of things as and when they come up. For some reason SQLTeam seem to get a lot of these sorts of problems. Have a look at this page and scroll down to "Distance of Zip/Postcodes" (highlighted in yellow).
    http://www.sqlteam.com/forums/topic....rms=techniques
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    An old DBA named Pythagoras came up with a theorem for dealing with this problem a few years ago.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    How do you want to determine the distances between points of interest? There are multiple ways to do this, and which one you use will significantly affect your choice of how to store the data.

    Options that jump right to mind are:

    1) Computed distance (essentially great circle navigation)
    2) Geocoder assigned distance (CPO to CPO as measure by IPS).
    3) Teamster distance (by vehicle class, which is yet another variable to track)
    4) Freight handler distance (pickup plus center-to-center plus delivery) which is carrier specific.

    Each one of these has some pretty extensive discussion behind it, along with some additional design choices that will need to be made. If you know which one you want to use that will help narrow things down fast. If you don't know which measurement you want to use, then you need to explain your problem in a lot more detail so that we can help you decide.

    -PatP

  7. #7
    Join Date
    Aug 2007
    Posts
    5
    Hi

    First of all thx for the quick response(that link was very helpful).
    The goal of this project is to determine a distance relationship between the cities in the best way possible with as little rows as possible.
    By that I mean whats for example New Yorks distance to all cities around it and so on for each city.
    And if you know a good way where co-ordinates and directions helps to simplify the database please tell, because I cant come up with one.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    So which method should be used for determining distance? Solution #1 allows the fewest rows, #3 and #4 can use web services, #2 and #3 can allow very similar table structures, but no two are the same, and there are countless variations on each of these ideas depending on what you really want/need to do.

    The short question that you must answer to allow me to help you is: How do you want to measure the distance between points of interest (cities in your example)?

    -PatP

  9. #9
    Join Date
    Aug 2007
    Posts
    5
    We have looked further into option 1, can u please elaborate more on this one please or paste a good link.

    We have something in mind where you use the longitude and latitude of each city. And from these calculate the distance between these points.

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    If you are going to use the computed distance and your goal is to have the minimum number of rows, then all you need to do is have one table for cities and include the latitude and longitude as columns in that table. This is enough information to allow you to calculate the raw distance between the cities.

    This is purely an observation, but this is a pretty "compute intensive" way to do things... It will certainly get the job done, but it probably won't perform very well because it takes a fair bit of processing to determine the distances between cities and most queries will require the distances to be computed more than one time for each city pair.

    Can you describe what business problem you are trying to solve? I'm pretty sure that there has to be a simpler and more efficient way to solve the problem than using "brute force".

    -PatP

  11. #11
    Join Date
    Aug 2007
    Posts
    5
    Yes I realize that it may take some time to execute but I see no other way in doing this..
    The problem is that I am trying to make a website where one registers and then searches a database for work and then gets a list of jobs with a relevance to that users hometown or city.. the relevance is in kilometers from the hometown
    The further away a job, the smaller the relevance.

    I have 290 cities that has to be calculated, but all of them doesn´t have to have a tie to ALL others. A city just needs to have relevance to 30-40 others that are geographically close.
    We could enter all distances in tables but it is too timeconsuming

    So is there another way?

  12. #12
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Well,

    if you first create your city table, with each city's latitude & longitude, then run a query to populate a temporary table with all distances between all cities, you could then order the list by FROM city and distance. The table would be apx. 84000 rows.

    Then, populate a permanent table with the closest 40 for each "From" city. 290 * 40 = 11600 rows. Add an index on the "from" city, and you've got a quick, pre-calculated list.

    If a user needs to search for one of the cities that's NOT in the top 25, require them to provide the FROM and TO city, so only one great-circle calc would be required.

    BTW. Ref http://www.dbforums.com/showthread.php?t=1614040

    BTW. Given the capacity of servers today, you may be ahead with storing the entire data set. You could store the entire 84K rows, and the index, in roughly 5 MB. (assuming that you use an integer city number, rather than the city name in the lookup table.)
    Last edited by loquin; 08-23-07 at 17:44.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    But Lou - we have a real scalability issue here have we not? Each additional location results in exponential growth. You are right that it looks fine for this limited number of cities but it depends on the future needs....

    I'll admit I am light years behind most others on this problem since I have never had to deal with it but....
    Quote Originally Posted by Xasho
    The problem is that I am trying to make a website where one registers and then searches a database for work and then gets a list of jobs with a relevance to that users hometown or city.. the relevance is in kilometers from the hometown
    The further away a job, the smaller the relevance.
    Could you "box" off your geographic catchment area into, for example, 50KM boxes based on co-ordinates. Record which box each city falls in. You could then use this as a quick filter to reduce the number of cities you need to include in your finer distance calculations. Only include cities that are in the same box or an adjacent box to the city the user chose. Then work out the exact distances for this set of cities.

    Quote Originally Posted by Xasho
    We could enter all distances in tables but it is too timeconsuming
    That isn't your problem. Whether you store the distances or generate them on the fly you will use the same algorithm. You would not do it manually in either event.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  14. #14
    Join Date
    Aug 2007
    Posts
    5
    Thanks guys for all the helpful answers.. But I think we will go with loquin´s suggestion, I seems like that´s the best way of solving the problem

    I can update you guys in a month or so on how it went and problems that occured.

    Thx again

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •