If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > keyword tagging... how to implement?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-02-09, 11:10
Spudhead Spudhead is offline
Registered User
 
Join Date: Jan 2002
Posts: 189
keyword tagging... how to implement?

I'm trying to implement keyword-tagging for an image upload facility: users will be able to upload an image and enter a comma-separated list of descriptive keywords. It will be built in PHP, talking to a MySQL database. I'm assuming that the table structure below will pretty much cover it, but I have some questions about how it'll work.

Say I have:

tblImages
id
filename

tblKeywords
id
keyword

tblImageKeywords
imageID
keywordID

I guess the main question is how I stop lots of duplicates going into the keywords table. Someone uploads an image, the keywords are dropped into their table, and the ID's are added to tblImageKeywords. Fine. But then someone else adds a new image with some of the same keywords: I don't want to add those keywords again, but I can't do something like INSERT IGNORE because I need to get the ID of the existing keyword so I can create the entry in tblImageKeywords for that keyword against the new image.

This sounds like a fairly common facility, but none of the examples I've seen really answer my questions. Can anyone point me in the right direction?
Reply With Quote
  #2 (permalink)  
Old 03-02-09, 11:17
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
images (image_id, filename)
keywords (image_id, keyword)

No need for the surrogate key for keywords.
__________________
George
Twitter | Blog
Reply With Quote
  #3 (permalink)  
Old 03-02-09, 11:28
Spudhead Spudhead is offline
Registered User
 
Join Date: Jan 2002
Posts: 189
But... doesn't that destroy the many-to-many-ness of it? What if I need one keyword being applied to several images? I'd have to put the same keyword in with lots of image ID's.
Reply With Quote
  #4 (permalink)  
Old 03-02-09, 11:35
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
Code:
SELECT images.*
FROM   images
 INNER
  JOIN keywords
    ON images.image_id = keywords.image_id
WHERE  keywords.keyword = 'TEST'
__________________
George
Twitter | Blog
Reply With Quote
  #5 (permalink)  
Old 03-02-09, 11:36
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
This post is worth a read, it's pretty much spot on what you're looking at:

"id" attribute in db tables...
__________________
George
Twitter | Blog
Reply With Quote
  #6 (permalink)  
Old 03-02-09, 12:09
Spudhead Spudhead is offline
Registered User
 
Join Date: Jan 2002
Posts: 189
Yeah, I was just reading that, thinking "this is what I'm on about, but it still makes no sense"

Something in me is having issues with lots of rows in the keywords table all having the same keyword in, albeit with different image ID's. I keep thinking I need one record per keyword, with another table that joins it to images. But I get Rudy's point in that thread about it merely shifting the redundancy around rather than eliminating it. Thanks for the help
Reply With Quote
  #7 (permalink)  
Old 03-02-09, 12:47
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
A table of "just" keywords might be useful if:
a) you want to restrict the keywords used to a list of pre-defined ones, and/or
b) you want to present the user with a list of previously used keywords

It still doesn't require any numeric, surrogate keyage though:

images (image_id, filename)
keywords (keyword)
image_keywords (image_id references images, keyword references keywords)
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #8 (permalink)  
Old 03-02-09, 13:27
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
b) keywords (image_id, keyword)
Code:
SELECT DISTINCT
       keyword
FROM   keywords
and to extend this, how about popularity of keywords while we're at it
Code:
SELECT keyword
     , Count(*) As times_used
FROM   keywords
GROUP
    BY keyword
__________________
George
Twitter | Blog
Reply With Quote
  #9 (permalink)  
Old 03-02-09, 14:03
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
I would also have a separate Keywords table holding allowed keywords. If you don't have a separate keywords table then you'll end up with a table full of garbage, garbige, rubbish, rubish, waste and waist. You could expand this a little bit by having a few fields that might allow users to propose new keywords and later have them approved (or not) by the owner ie

images ( image_id, filename )
keywords ( keyword, approved )
image_keywords ( image_id , keyword )

Last edited by mike_bike_kite; 03-02-09 at 14:07.
Reply With Quote
  #10 (permalink)  
Old 03-02-09, 14:44
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
House
HOuse
HOUSE
house
...
All should be a single keyword.

I tend to use the intersection table for a many-to-many relationship in this type of situation also.

Spudhead:
  • Insert the file path into the images table
  • Limit the keywords to a single case.
  • Add a unique key/index to the keywords.keyword field.
  • Create a php procedure to add the keyword and return the ID for it. The procedure first looks for the keyword in the keywords table. If it is found, the sub returns the ID of the word it found. If not, it inserts the new keyword, and returns the key for the new keyword.
  • Finally, in your intersection table, insert the key pair (fileID and keywordID)
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin; 03-02-09 at 14:47.
Reply With Quote
  #11 (permalink)  
Old 03-02-09, 15:01
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by loquin
House
HOuse
HOUSE
house
...
All should be a single keyword.

I tend to use the intersection table for a many-to-many relationship in this type of situation also.

Spudhead:
  • Insert the file path into the images table
  • Limit the keywords to a single case.
  • Add a unique key/index to the keywords.keyword field.
  • Create a php procedure to add the keyword and return the ID for it. The procedure first looks for the keyword in the keywords table. If it is found, the sub returns the ID of the word it found. If not, it inserts the new keyword, and returns the key for the new keyword.
  • Finally, in your intersection table, insert the key pair (fileID and keywordID)
Would you store garbage, garbige, rubbish, rubish, waste and waist as separate keywords in the database then even though the users were all just talking rubbish?

Why do you have the id field on keywords in your method - does it provide any advantages over just using the keyword as the keyword?

In fact why have the keywords table at all in your example if you're going to store each and every word that gets typed in - why not just have a long varchar field in the image table and store all the keywords for an image in there?

EDIT: this post looks more aggressive than it's meant to be. I am just curious why you chose those options.

Last edited by mike_bike_kite; 03-02-09 at 15:28.
Reply With Quote
  #12 (permalink)  
Old 03-02-09, 15:29
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
I generally won't try to correct others spelling or usage of keywords, even though they may be 'rubbesh.' Waist and waste are both potentially valid keywords, as are a good many acronyms. If I feel the need to limit keywords, I'll be more apt to apply a censored words list, rather than force someone to approve every new word before it's used.

Using an integer type ID field is useful when joining the tables, for performance, especially as the keyword table gets large. In addition, by not including an ID field as an alternate key, you're forced to use the word in the intersection table, which can help to bloat the database. (The word or its key will be duplicated many times. Duplicating the key N+1 times will almost certainly take less storage than duplicating the word itself, N times. A Soundex field might also be useful in the keywords table. (yes, I know that most databases offer a built-in Soundex function, but, string functions can be expensive to run over and over.)
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin; 03-02-09 at 15:40.
Reply With Quote
  #13 (permalink)  
Old 03-02-09, 15:46
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
I guess it all depends on how many key words you're expecting. Obviously if you store any word as a keyword along with any alternate word and all the combined misspellings then you'll end up with a hell of a lot of keywords. Perhaps the id field might then be useful then to make things faster.

Your method would certainly require no maintenance which is a big advantage. The only issue would be that a user looking for images associated with the keyword rubbish won't find any of the images where the keyword is an alternate word or has simply been misspelt. I'd expect this sort of problem to get bigger as time went by.
Reply With Quote
  #14 (permalink)  
Old 03-02-09, 15:53
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
True. One approach to help with this would be to 'suggest' spelling using the keyword look-up as the user enters their potential keyword.

Quote:
Originally Posted by mike_bike_kite
I guess it all depends on how many key words you're expecting. Obviously if you store any word as a keyword along with any alternate word and all the combined misspellings then you'll end up with a hell of a lot of keywords. Perhaps the id field might then be useful then to make things faster.
the last time I set this approach up was with a test file word indexer. Windows file search is totally inadequate when it comes to word searches within text (and source code) files. So, I wrote an app to index all the text & source code files on my drives. It took a day to index the first time, but, I could then find the files which contained a given keyword in under a second. I just had to periodically index the files with a timestamp greater than the last index timestamp.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin; 03-02-09 at 16:01.
Reply With Quote
  #15 (permalink)  
Old 03-03-09, 05:25
Spudhead Spudhead is offline
Registered User
 
Join Date: Jan 2002
Posts: 189
I feel like a small child who's just asked a conference of obstetricians where babies come from

So... umm... to recap.... my initial solution will do, although there's some debate about whether or not it's best to have an integer ID or the actual keyword as the PK in the keywords table, and the deciding factor is how big the list of keywords is likely to get? Is that about right?
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On