Results 1 to 11 of 11
  1. #1
    Join Date
    Mar 2004
    Location
    Huntington, IN
    Posts
    54

    Question Unanswered: Comparing records to calc similarity

    The database for my on-line store has about 2500 products. I've installed a new 'cross sell' module and want my system to determine the most similar products to 'recommend' to the store customer.

    Each product has 9 fields defining the various specs. I was considering taking part 1, comparing the 9 specs to the other items, and calculating a 'score' for each comparison, then reporting back the top xx scores, which represent the most similar items.

    Every way that I've looked at this would seem to create millions of comparisions and take forever to process.

    Is there a more efficient way to accomplish this?

    Thank you.

  2. #2
    Join Date
    Nov 2007
    Location
    Adelaide, South Australia
    Posts
    4,049
    Have you considered defining a "similar products" list for each product?

    What are your 9 specs?
    Owner and Manager of
    CypherBYTE, Microsoft Access Development Specialists.
    Microsoft Access MCP.
    And all around nice guy!


    "Heck it's something understood by accountants ... so it can't be 'that' difficult..." -- Healdem
    "...teach a man to code and he'll be frustrated for life! " -- georgev

  3. #3
    Join Date
    Mar 2004
    Location
    Huntington, IN
    Posts
    54
    Thanks for your response StarTrekker.

    I'm not sure what you mean by the similar products list. Do you mean some permanent table that stores the xx closest other products? That would at least break up the long processing job into small, stored jobs. I am, however, adding products weekly, which could obsolete the table...

    The nine classifiers are all numeric keys from other tables. ProdCat might be 01 for ball casters and 12 for twin wheel casters. Dim1 might be 32 for 2" height. Finish might be 24 for 'bright chrome'. Other classifiers include bearings, connector types, width, weight cap...

    The fact that two products have finish classifiers that are close numerically means nothing.

    Does that help?

    Thank you for your mental CPU time.

    Matt
    Casters & Wheels at Apollo Caster, Inc.

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    I'd suggest you develop an equivalence table where you define product A as being similar to product Z09
    you could refine the similarity with an additional code eg exact match/rebadged/generic part, similar product or replacement product.

    no doubt you could display products from the same manufacturer already
    eg
    Product A is a single wheel castor from Acme
    Product B is a single wheel castor from Oujamacallit
    Product z09 is a twin wheel castor from Acme
    Product SA is a castor spigot receptacle from Acme

    so A & B may be exact replacements
    if you buy A, B or z09 you may need SA

    its a pig to set up and maintain, but its not artificially limited
    if someone comes to you and wants a Oujamacallit z09 you can immediately offer a Acme A.

    you are not limited to any artificial limitation, there could be say 30 different makes of fitting that do the same job you customer doesn't neccesarily care who makes the part as long as they get it.

    you can reference obsolescent parts, eg product DF7 has been repalced by GHY
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Mar 2004
    Location
    Huntington, IN
    Posts
    54

    equivalence table

    healdem:

    Thank you for your great response. You are describing exactly what I am pursuing.

    I confess that I'm not sure the definition of an 'equivalence table.' I did some searching (yahoo, wiki, dbforums) and found nothing.

    What does the structure of an equivalence Table look like, and how would be created?

    Thank you.

    Matt

  6. #6
    Join Date
    Nov 2007
    Location
    Adelaide, South Australia
    Posts
    4,049
    Same as a normal table, it records what I suggested in having a table that lists the similar products for each product. Like it would have ProductID and SimilarProductID and that's it. Data in it makes it quick and easy to list the similar products.

    Your process that takes ages would only then need to update this data and be updated infrequently rather than every time you need to list those similar products.

    It's nothing magical to google, just a concept on how to deal with your problem.
    Owner and Manager of
    CypherBYTE, Microsoft Access Development Specialists.
    Microsoft Access MCP.
    And all around nice guy!


    "Heck it's something understood by accountants ... so it can't be 'that' difficult..." -- Healdem
    "...teach a man to code and he'll be frustrated for life! " -- georgev

  7. #7
    Join Date
    Nov 2007
    Location
    Adelaide, South Australia
    Posts
    4,049
    I think you'd have to "weight" each spec too. The fact it might be blue is not as significant as the fact that it's a 2" castor... if you know what I mean.
    Owner and Manager of
    CypherBYTE, Microsoft Access Development Specialists.
    Microsoft Access MCP.
    And all around nice guy!


    "Heck it's something understood by accountants ... so it can't be 'that' difficult..." -- Healdem
    "...teach a man to code and he'll be frustrated for life! " -- georgev

  8. #8
    Join Date
    Mar 2004
    Location
    Huntington, IN
    Posts
    54
    You're exactly right, StarTrekker! I am weighting the various specs - I just didn't add that to the thread for simplicity.

    I understand the table concept which keeps the processing down but am taken back to my original question. Is there a simple way of runnign the comparisons and systematically (with weighted comparisons) determining the best matches, for storage in the table?

    Currently, the code takes item 1, compares it to the other 2500 parts, and if the weighted similarity 'score' is xx or hgher, then store it in the table. Then, item 2...

    It can be streamlined a bit by aborting the process as soon as it is determined that the compared item is a lousy match. So...Is there a more efficient method?

    Thanks!
    Matt Wilcox
    Casters & Wheels at www.apollocaster.com
    www.apollocaster.com/store

  9. #9
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    sorry equivalence
    table: linkedproducts
    SourceProd 'fk points to product table
    RealtedProd 'fk points to product table
    weightingfactor

    so to find products that are related to Sourceprod

    select realtedprod from linkedproducts where sourceprod = whatever order by weighting

    if your productid is text/string then encapsualte the parameter as "whatever"
    I'd rather be riding on the Tiger 800 or the Norton

  10. #10
    Join Date
    Nov 2007
    Location
    Adelaide, South Australia
    Posts
    4,049
    I can't think of a more efficient method atm, but I am kinda busy and can't give it much thought...
    Owner and Manager of
    CypherBYTE, Microsoft Access Development Specialists.
    Microsoft Access MCP.
    And all around nice guy!


    "Heck it's something understood by accountants ... so it can't be 'that' difficult..." -- Healdem
    "...teach a man to code and he'll be frustrated for life! " -- georgev

  11. #11
    Join Date
    Mar 2004
    Location
    Huntington, IN
    Posts
    54
    Well, I did it and you expert programmers might shake your head at the code but I'm pretty pleased with it.

    Do while not rs1.eof (all records to find sim products for)
    -Do while not rs2.eof (near copy of database to compare each rs1 item to.
    --For x = 1 to TotalFactorsToConsider
    ---Test each factor for similarity. if match, add the weighted score to the "SimFactor" accum score.
    ----For efficiency, if after x loops, SimFactor has not exceeded xx % of total possible, then abort now and save time.
    --Next X
    --Have now analyzed all factors for this rs2 'compare' item.
    --Adjust SimFactor for this rs2 item for other factors, such as is it in stock?, does it cost more? (for upselling), etc.
    --Store this rs2 item and SimFactor in a temp table.
    -Loop rs2
    -Now have temp table of all rs2 items that were good compares for rs1 item.
    -Make this temp table a new rs3, ORDERED by SimFactor and selecting top 15 scores for output/ upload to website.
    -Write twice: once with rs2 items as cross-sells for rs1 and again for rs1 item as a cross-sell for each rs2.
    -With the second write just discussed, can Redefine rs2 to be all parts > than current rs1 item, that is, all parts except for rs1 already analyzed. Cuts processing in about half.
    Loop rs1
    Show stats, accumulators, time incurred. etc.

    So far, with limited auditing, appears to work effectively.

    I have about 2550 parts, so total loops are 2550 squared, or 6.5 million, divided by 2 since i'm writing twice with each item, or 3.2 million. This times the number of factors that it loops through and considers. That probably averages 4 so doing about 13 million loops. It takes about 3 hours to complete.

    I'll prob add coding so that new items can be processed individually without having to run the whole thing again.

    Maybe this will be hepful to someone else in some way. Or, if anyone sees a more efficient way of doing this, that would be appreciated.

    Thank you.

    Matt
    Matt Wilcox
    Casters & Wheels at www.apollocaster.com
    www.apollocaster.com/store

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •