Results 1 to 7 of 7
  1. #1
    Join Date
    Jan 2010

    Unanswered: string similarity algorithm.

    I have a database table of product titles. For each product, I want to link 10-20 similar products, ordered by relevance.

    Anybody have any suggestions about how I could go about this (the database design is OK, just the actual algorithm/process to link similar products is what I need)?

    The main concern is quality/relevance and accuracy.

    Any help appreciated, thanks.

  2. #2
    Join Date
    Apr 2002
    Toronto, Canada
    possible approaches:

    - similar titles based on soundex

    - purchase patterns (customers who bought this also bought...)

    - taxonomy (all titles in same category)

    - related by vendor

    - ... | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jan 2010
    - soundex.
    will give this a try.

    - purchase patterns
    - taxonomy
    do not have this data to use.

    - related by vendor.
    will also give this a try, but I would have to first extract an index of vendors before, as I do not currently have the vendor brand separate from the titles.


  4. #4
    Join Date
    Dec 2009

    another idea

    why not build a family tree of products:

    idea 1:
    - add a product_parent_id as the product it's related to
    - if the product_parent_id is null, no relation is found
    - if the product_parent_id is not null, then it's related to the parent product
    - now you can produce a full relation father-son products

    idea 2:
    - add a new table: product_relations_type (relation_type_id, relation, desc)
    - add a new table product_2_product_relation (product_id1, product_id2, relation_type_id)
    - this way you will have muliply products related to other products
    using multiply relation types.


  5. #5
    Join Date
    Jan 2010
    thanks for your suggestion, but as I said the database design is not the problem!

    the problem is how do I go about finding and saving the relations between products (actually populating the tables you have mentioned)

  6. #6
    Join Date
    Feb 2004
    In front of the computer
    Provided Answers: 54
    The first thing that I'd do is to come up with definitions for "Quality", "Relevance", and "Accuracy" that I could program.

    In other words, if I can't express what "Quality" means using code I can write along with the data that I have, then the term doesn't mean anything relevant to the problem that I'm trying to solve.

    I'm pretty sure that the database schema is at least part of your problem. If it wasn't the problem, you'd be able to pick the results that you want from an existing table or view.

    We can help you fix this problem, but right now you're like of like the patient that goes to the doctor, says "I'm sick", then expects the doctor to write a prescription. I can't speak for everyone, but we need to know more about your problem. Once you explain what those three words mean in your context (preferably as code we can examine), then I think we can help you a lot more.

    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  7. #7
    Join Date
    Jan 2010
    "I'm pretty sure that the database schema is at least part of your problem."
    -the database is part of the solution.

    I was probably being too general...

    I got good results using the Sphinx search server algorithm. (match all mode).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts