Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Jan 2002
    Posts
    189

    keyword tagging... how to implement?

    I'm trying to implement keyword-tagging for an image upload facility: users will be able to upload an image and enter a comma-separated list of descriptive keywords. It will be built in PHP, talking to a MySQL database. I'm assuming that the table structure below will pretty much cover it, but I have some questions about how it'll work.

    Say I have:

    tblImages
    id
    filename

    tblKeywords
    id
    keyword

    tblImageKeywords
    imageID
    keywordID

    I guess the main question is how I stop lots of duplicates going into the keywords table. Someone uploads an image, the keywords are dropped into their table, and the ID's are added to tblImageKeywords. Fine. But then someone else adds a new image with some of the same keywords: I don't want to add those keywords again, but I can't do something like INSERT IGNORE because I need to get the ID of the existing keyword so I can create the entry in tblImageKeywords for that keyword against the new image.

    This sounds like a fairly common facility, but none of the examples I've seen really answer my questions. Can anyone point me in the right direction?

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    images (image_id, filename)
    keywords (image_id, keyword)

    No need for the surrogate key for keywords.
    George
    Home | Blog

  3. #3
    Join Date
    Jan 2002
    Posts
    189
    But... doesn't that destroy the many-to-many-ness of it? What if I need one keyword being applied to several images? I'd have to put the same keyword in with lots of image ID's.

  4. #4
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Code:
    SELECT images.*
    FROM   images
     INNER
      JOIN keywords
        ON images.image_id = keywords.image_id
    WHERE  keywords.keyword = 'TEST'
    George
    Home | Blog

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    This post is worth a read, it's pretty much spot on what you're looking at:

    http://www.dbforums.com/database-con...ml#post6383882
    George
    Home | Blog

  6. #6
    Join Date
    Jan 2002
    Posts
    189
    Yeah, I was just reading that, thinking "this is what I'm on about, but it still makes no sense"

    Something in me is having issues with lots of rows in the keywords table all having the same keyword in, albeit with different image ID's. I keep thinking I need one record per keyword, with another table that joins it to images. But I get Rudy's point in that thread about it merely shifting the redundancy around rather than eliminating it. Thanks for the help

  7. #7
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    A table of "just" keywords might be useful if:
    a) you want to restrict the keywords used to a list of pre-defined ones, and/or
    b) you want to present the user with a list of previously used keywords

    It still doesn't require any numeric, surrogate keyage though:

    images (image_id, filename)
    keywords (keyword)
    image_keywords (image_id references images, keyword references keywords)

  8. #8
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    b) keywords (image_id, keyword)
    Code:
    SELECT DISTINCT
           keyword
    FROM   keywords
    and to extend this, how about popularity of keywords while we're at it
    Code:
    SELECT keyword
         , Count(*) As times_used
    FROM   keywords
    GROUP
        BY keyword
    George
    Home | Blog

  9. #9
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I would also have a separate Keywords table holding allowed keywords. If you don't have a separate keywords table then you'll end up with a table full of garbage, garbige, rubbish, rubish, waste and waist. You could expand this a little bit by having a few fields that might allow users to propose new keywords and later have them approved (or not) by the owner ie

    images ( image_id, filename )
    keywords ( keyword, approved )
    image_keywords ( image_id , keyword )
    Last edited by mike_bike_kite; 03-02-09 at 15:07.

  10. #10
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    House
    HOuse
    HOUSE
    house
    ...
    All should be a single keyword.

    I tend to use the intersection table for a many-to-many relationship in this type of situation also.

    Spudhead:
    • Insert the file path into the images table
    • Limit the keywords to a single case.
    • Add a unique key/index to the keywords.keyword field.
    • Create a php procedure to add the keyword and return the ID for it. The procedure first looks for the keyword in the keywords table. If it is found, the sub returns the ID of the word it found. If not, it inserts the new keyword, and returns the key for the new keyword.
    • Finally, in your intersection table, insert the key pair (fileID and keywordID)
    Last edited by loquin; 03-02-09 at 15:47.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  11. #11
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by loquin
    House
    HOuse
    HOUSE
    house
    ...
    All should be a single keyword.

    I tend to use the intersection table for a many-to-many relationship in this type of situation also.

    Spudhead:
    • Insert the file path into the images table
    • Limit the keywords to a single case.
    • Add a unique key/index to the keywords.keyword field.
    • Create a php procedure to add the keyword and return the ID for it. The procedure first looks for the keyword in the keywords table. If it is found, the sub returns the ID of the word it found. If not, it inserts the new keyword, and returns the key for the new keyword.
    • Finally, in your intersection table, insert the key pair (fileID and keywordID)
    Would you store garbage, garbige, rubbish, rubish, waste and waist as separate keywords in the database then even though the users were all just talking rubbish?

    Why do you have the id field on keywords in your method - does it provide any advantages over just using the keyword as the keyword?

    In fact why have the keywords table at all in your example if you're going to store each and every word that gets typed in - why not just have a long varchar field in the image table and store all the keywords for an image in there?

    EDIT: this post looks more aggressive than it's meant to be. I am just curious why you chose those options.
    Last edited by mike_bike_kite; 03-02-09 at 16:28.

  12. #12
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    I generally won't try to correct others spelling or usage of keywords, even though they may be 'rubbesh.' Waist and waste are both potentially valid keywords, as are a good many acronyms. If I feel the need to limit keywords, I'll be more apt to apply a censored words list, rather than force someone to approve every new word before it's used.

    Using an integer type ID field is useful when joining the tables, for performance, especially as the keyword table gets large. In addition, by not including an ID field as an alternate key, you're forced to use the word in the intersection table, which can help to bloat the database. (The word or its key will be duplicated many times. Duplicating the key N+1 times will almost certainly take less storage than duplicating the word itself, N times. A Soundex field might also be useful in the keywords table. (yes, I know that most databases offer a built-in Soundex function, but, string functions can be expensive to run over and over.)
    Last edited by loquin; 03-02-09 at 16:40.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  13. #13
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I guess it all depends on how many key words you're expecting. Obviously if you store any word as a keyword along with any alternate word and all the combined misspellings then you'll end up with a hell of a lot of keywords. Perhaps the id field might then be useful then to make things faster.

    Your method would certainly require no maintenance which is a big advantage. The only issue would be that a user looking for images associated with the keyword rubbish won't find any of the images where the keyword is an alternate word or has simply been misspelt. I'd expect this sort of problem to get bigger as time went by.

  14. #14
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    True. One approach to help with this would be to 'suggest' spelling using the keyword look-up as the user enters their potential keyword.

    Quote Originally Posted by mike_bike_kite
    I guess it all depends on how many key words you're expecting. Obviously if you store any word as a keyword along with any alternate word and all the combined misspellings then you'll end up with a hell of a lot of keywords. Perhaps the id field might then be useful then to make things faster.
    the last time I set this approach up was with a test file word indexer. Windows file search is totally inadequate when it comes to word searches within text (and source code) files. So, I wrote an app to index all the text & source code files on my drives. It took a day to index the first time, but, I could then find the files which contained a given keyword in under a second. I just had to periodically index the files with a timestamp greater than the last index timestamp.
    Last edited by loquin; 03-02-09 at 17:01.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  15. #15
    Join Date
    Jan 2002
    Posts
    189
    I feel like a small child who's just asked a conference of obstetricians where babies come from

    So... umm... to recap.... my initial solution will do, although there's some debate about whether or not it's best to have an integer ID or the actual keyword as the PK in the keywords table, and the deciding factor is how big the list of keywords is likely to get? Is that about right?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •