If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Help with my design

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 08-03-10, 12:32
jpsnow jpsnow is offline
Registered User
 
Join Date: Aug 2010
Posts: 5
Help with my design

Hello all. I am new to the forums and fairly new to database design in general.

I am working on a project and having trouble developing a database design that will be efficient to search through.

The overall goal of the project will be to take in many lists of items and then store the likelihood of having different items. Here is an example:

List 1: A, B, C, D
List 2: A, B, G, L
List 3: A, B, G, L
List 4: B, C, D, L

Now I want to determine the likelihood of having item B & A being together...
So it would look like this:

A-B - 3 (3 because in 3/4 items A and B are together)....
A-C - 1 (1 because A and C are both present only in list one)
A-D - 1 (List 1 only)
A-G - 2 (List 2 and 3)
A-L - 2
B-C - 2
B-D - 2.... etc.

So I considered doing this just as I am showing above. Keeping a table of all of these different combinations, but there may be 100,000 different items and lists. If I am solving this right... if there are 100,000 items.. there would be 100,000 x 99,999 combinations in my table. This would take forever to sort.

In the end I want to be able to search for all combinations that contain a particular letter. So if I searched for A, I would get back A-B-3, A-C -1, A-D -1, A-G-2, and A-L -2.

Does anyone have any recommendations to make this more efficient?

Thanks for your help.

Justin
Reply With Quote
  #2 (permalink)  
Old 08-03-10, 12:39
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Hi Justin

I think it would be beneficial for you to outline your real scenario. This is abstracted and looks like one of those instances where crucial information is lost in the abstraction.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #3 (permalink)  
Old 08-03-10, 17:23
jpsnow jpsnow is offline
Registered User
 
Join Date: Aug 2010
Posts: 5
Ok, I will tell you a more realistic version and hopefully that will help.

Users can make music playlists. Each user will have their own playlist. I want to give users song recommendations based on a database of all of these playlists.

The users will select a song from their playlist and other songs will be recommended to them based upon the database of other users.


So... a user likes the song "Fly"

The database has the following playlists from other users:

Playlist 1: "Fly", "X", "Y", "Z"
Playlist 2: "X", "B", "Q", "L"
Playlist 3: "Fly", "Z", "Y", "Q"
Playlist 4: "Fly", "Z", "Q", "L"

It would recommend the song: Z because 3 other users who like "Fly" also like "Z".

The database I considered looks like this:

Song A Song B Num
"Fly" X 1 - only found in 1st playlist
"Fly" Y 2 - combo is in playlist 1 and 3
"Fly" Q 1 - only in playlist 4
"Fly" Z 3
...
X Y 1
L Y 0 - no combinations




Now as I said before I may have 100,000 or more different playlists and 100,000 or more different songs, so doing it my way I would have 100,000 x 99,999 different entries which would be very inneficient.

Any suggestions
Reply With Quote
  #4 (permalink)  
Old 08-04-10, 04:45
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
So the column headings for this table would be something like:
Code:
PlayListID | Track1 | Track2 | Track3 | Track4
Is that correct?
Also, please could you let me know your RDBMS?
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #5 (permalink)  
Old 08-04-10, 08:54
jpsnow jpsnow is offline
Registered User
 
Join Date: Aug 2010
Posts: 5
No, because my list of tracks will be incredibly large. It would have the following headings. Playlist ID, songs, userid. This would be my playlist table. I would also have a user table. Beyond that, am not sure. But with what I have, I want to take in a single song name and sort through the playlist table to find other songs to recommend.

The program then will search in playlists that contain "this song" and discover that 95/100 contain "that song". This is a high percent, so "that song" is recommended.

I'm not sure what you mean by rdms? I use mySQL and will be communicating with PHP
Reply With Quote
  #6 (permalink)  
Old 08-04-10, 09:01
jpsnow jpsnow is offline
Registered User
 
Join Date: Aug 2010
Posts: 5
I was also considering a combinations table. You will see my second table in my second post. It would have song 1, song 2, num playlists with this combo.

When a new playlist is created, it would add to the table if there is a new song combination (ex song a and song f in the same playlist) or it would simply update the third field if the combination is already in that table. This would work; but with 100,000 unique songs that would make an incredibly large number of items to search through.
Reply With Quote
  #7 (permalink)  
Old 08-04-10, 09:24
Pat Phelan Pat Phelan is online now
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,308
Let's try a very different arrangement and normalize this beastie. I think you'll like it!

One table for songs. It has one row for each song with details specific to that song like name, runtime, and possibly foreign keys to artist and genre.

One table for playlists. It has one row for each playlist with details like name, a foreign key to the playlist owner, and dates for the beginning and end of this playlist's life.

A new table for playlistsongs. It has just the details about a single song on one playlist like the Foreign Key for the playlist and the song.

Once you figure out the value for the question mark in this code, this schema allows you to run something like:
Code:
SELECT Count(*) AS score, songs.name
   FROM playlists AS myPlayLists
   JOIN playlists AS otherPlayLists
      ON (otherPlayLists.songPK = myPlayLists.songPK)
   JOIN songs
      ON (songs.songPK = otherPlayLists.songPK)
   WHERE  myPlayLists.ownerPK = ?
   GROUP BY songs.songPK
   ORDER BY Count(*) DESC, songs.name
This SQL snippet will scan every song on your playlists, match that song on every playlist that contains it, and summarize based on the number of ocurrences of other songs on playlists that contain songs on your playlists. It ought to run fairly quickly as along as your tables are indexed reasonably well.

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #8 (permalink)  
Old 08-04-10, 09:27
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
Ok - Pat basically went in exactly the same direction as me.
This is my code - broadly the same in the important areas.
Code:
USE test
GO

IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.play_list')) 
BEGIN
    DROP TABLE dbo.play_list
END

CREATE TABLE dbo.play_list
    (
        playlist_name       VARCHAR(50)     COLLATE Latin1_General_CI_AS    NOT NULL
      , track_ordinal       TINYINT                                         NOT NULL 
      , track               VARCHAR(90)     COLLATE Latin1_General_CI_AS    NOT NULL
      , CONSTRAINT pk_play_list PRIMARY KEY CLUSTERED (playlist_name, track_ordinal) WITH (FILLFACTOR = 100)
      , CONSTRAINT ix_play_list_track_ordinal_track_u_nc UNIQUE NONCLUSTERED (playlist_name, track) WITH (FILLFACTOR = 100)
    ) ON [PRIMARY]
GO

INSERT INTO dbo.play_list
    (
        playlist_name
      , track_ordinal
      , track
    )
VALUES  ('Playlist 1', 1, 'Fly')
      , ('Playlist 1', 2, 'X')
      , ('Playlist 1', 3, 'Y')
      , ('Playlist 1', 4, 'Z')
      , ('Playlist 2', 1, 'X')
      , ('Playlist 2', 2, 'B')
      , ('Playlist 2', 3, 'Q')
      , ('Playlist 2', 4, 'L')
      , ('Playlist 3', 1, 'Fly')
      , ('Playlist 3', 2, 'Z')
      , ('Playlist 3', 3, 'Y')
      , ('Playlist 3', 4, 'Q')
      , ('Playlist 4', 1, 'Fly')
      , ('Playlist 4', 2, 'Z')
      , ('Playlist 4', 3, 'Q')
      , ('Playlist 4', 4, 'L')

SELECT  TOP 5
        recomended_track                = recomendations.track
      , recomendataion_weight           = COUNT(*)
FROM    dbo.play_list                   AS base
INNER JOIN 
        dbo.play_list                   AS recomendations
ON  recomendations.playlist_name        = base.playlist_name
WHERE   base.track                      = 'Fly'
    AND recomendations.track           != 'Fly'
    AND recomendations.playlist_name   != 'Playlist 3'/*Assuming the person doing the search compiled this playlist:
                                                        we would not want to return any tracks from it in the results.*/
GROUP BY
        recomendations.track
ORDER BY
        COUNT(*) DESC
It is T-SQL (SQL Server) syntax but the important stuff is compatible with mySQL. The one thing that occurred to me was that the recommendation algorithm might be more sophisticated (for example proportion of playlists that contain the recommended song rather than a straight count).
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #9 (permalink)  
Old 08-04-10, 09:38
Pat Phelan Pat Phelan is online now
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,308
Poots and I are pretty much on the same page here. Get the general idea working, then you'll probably want to get a bit hinkey with your suggestion calcuation. One thing I was going to suggest (in a later post) is that those genre and artist FK values that I mentioned might contribute to that score as well, and another point would be that tracking when songs entered and left the playlist might give you strong indications of musical taste too.

There are lots of things you can explore about musical taste, especially when you have decent samples of a listner population!

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #10 (permalink)  
Old 08-04-10, 09:47
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
The single most important change Pat and I made is to the table design - there is one row per song in the playlist instead of one row per playlist. This is what Pat meant by normalising.
First normal form - Wikipedia, the free encyclopedia
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #11 (permalink)  
Old 08-05-10, 17:23
jpsnow jpsnow is offline
Registered User
 
Join Date: Aug 2010
Posts: 5
Thanks a lot guys! I am going to need to look at.it a.little more but I am on vacation now. I'll.let you know how it goes.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On