| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

08-03-10, 12:32
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 5
|
|
|
Help with my design
|
|
Hello all. I am new to the forums and fairly new to database design in general.
I am working on a project and having trouble developing a database design that will be efficient to search through.
The overall goal of the project will be to take in many lists of items and then store the likelihood of having different items. Here is an example:
List 1: A, B, C, D
List 2: A, B, G, L
List 3: A, B, G, L
List 4: B, C, D, L
Now I want to determine the likelihood of having item B & A being together...
So it would look like this:
A-B - 3 (3 because in 3/4 items A and B are together)....
A-C - 1 (1 because A and C are both present only in list one)
A-D - 1 (List 1 only)
A-G - 2 (List 2 and 3)
A-L - 2
B-C - 2
B-D - 2.... etc.
So I considered doing this just as I am showing above. Keeping a table of all of these different combinations, but there may be 100,000 different items and lists. If I am solving this right... if there are 100,000 items.. there would be 100,000 x 99,999 combinations in my table. This would take forever to sort.
In the end I want to be able to search for all combinations that contain a particular letter. So if I searched for A, I would get back A-B-3, A-C -1, A-D -1, A-G-2, and A-L -2.
Does anyone have any recommendations to make this more efficient?
Thanks for your help.
Justin
|
|

08-03-10, 12:39
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
Hi Justin
I think it would be beneficial for you to outline your real scenario. This is abstracted and looks like one of those instances where crucial information is lost in the abstraction.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
|
|
|

08-03-10, 17:23
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 5
|
|
|
|
Ok, I will tell you a more realistic version and hopefully that will help.
Users can make music playlists. Each user will have their own playlist. I want to give users song recommendations based on a database of all of these playlists.
The users will select a song from their playlist and other songs will be recommended to them based upon the database of other users.
So... a user likes the song "Fly"
The database has the following playlists from other users:
Playlist 1: "Fly", "X", "Y", "Z"
Playlist 2: "X", "B", "Q", "L"
Playlist 3: "Fly", "Z", "Y", "Q"
Playlist 4: "Fly", "Z", "Q", "L"
It would recommend the song: Z because 3 other users who like "Fly" also like "Z".
The database I considered looks like this:
Song A Song B Num
"Fly" X 1 - only found in 1st playlist
"Fly" Y 2 - combo is in playlist 1 and 3
"Fly" Q 1 - only in playlist 4
"Fly" Z 3
...
X Y 1
L Y 0 - no combinations
Now as I said before I may have 100,000 or more different playlists and 100,000 or more different songs, so doing it my way I would have 100,000 x 99,999 different entries which would be very inneficient.
Any suggestions
|
|

08-04-10, 04:45
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
So the column headings for this table would be something like:
Code:
PlayListID | Track1 | Track2 | Track3 | Track4
Is that correct?
Also, please could you let me know your RDBMS?
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
|
|
|

08-04-10, 08:54
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 5
|
|
No, because my list of tracks will be incredibly large. It would have the following headings. Playlist ID, songs, userid. This would be my playlist table. I would also have a user table. Beyond that, am not sure. But with what I have, I want to take in a single song name and sort through the playlist table to find other songs to recommend.
The program then will search in playlists that contain "this song" and discover that 95/100 contain "that song". This is a high percent, so "that song" is recommended.
I'm not sure what you mean by rdms? I use mySQL and will be communicating with PHP
|
|

08-04-10, 09:01
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 5
|
|
I was also considering a combinations table. You will see my second table in my second post. It would have song 1, song 2, num playlists with this combo.
When a new playlist is created, it would add to the table if there is a new song combination (ex song a and song f in the same playlist) or it would simply update the third field if the combination is already in that table. This would work; but with 100,000 unique songs that would make an incredibly large number of items to search through.
|
|

08-04-10, 09:24
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,308
|
|
Let's try a very different arrangement and normalize this beastie. I think you'll like it!
One table for songs. It has one row for each song with details specific to that song like name, runtime, and possibly foreign keys to artist and genre.
One table for playlists. It has one row for each playlist with details like name, a foreign key to the playlist owner, and dates for the beginning and end of this playlist's life.
A new table for playlistsongs. It has just the details about a single song on one playlist like the Foreign Key for the playlist and the song.
Once you figure out the value for the question mark in this code, this schema allows you to run something like:
Code:
SELECT Count(*) AS score, songs.name
FROM playlists AS myPlayLists
JOIN playlists AS otherPlayLists
ON (otherPlayLists.songPK = myPlayLists.songPK)
JOIN songs
ON (songs.songPK = otherPlayLists.songPK)
WHERE myPlayLists.ownerPK = ?
GROUP BY songs.songPK
ORDER BY Count(*) DESC, songs.name
This SQL snippet will scan every song on your playlists, match that song on every playlist that contains it, and summarize based on the number of ocurrences of other songs on playlists that contain songs on your playlists. It ought to run fairly quickly as along as your tables are indexed reasonably well.
-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
|
|

08-04-10, 09:27
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
Ok - Pat basically went in exactly the same direction as me.
This is my code - broadly the same in the important areas.
Code:
USE test
GO
IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.play_list'))
BEGIN
DROP TABLE dbo.play_list
END
CREATE TABLE dbo.play_list
(
playlist_name VARCHAR(50) COLLATE Latin1_General_CI_AS NOT NULL
, track_ordinal TINYINT NOT NULL
, track VARCHAR(90) COLLATE Latin1_General_CI_AS NOT NULL
, CONSTRAINT pk_play_list PRIMARY KEY CLUSTERED (playlist_name, track_ordinal) WITH (FILLFACTOR = 100)
, CONSTRAINT ix_play_list_track_ordinal_track_u_nc UNIQUE NONCLUSTERED (playlist_name, track) WITH (FILLFACTOR = 100)
) ON [PRIMARY]
GO
INSERT INTO dbo.play_list
(
playlist_name
, track_ordinal
, track
)
VALUES ('Playlist 1', 1, 'Fly')
, ('Playlist 1', 2, 'X')
, ('Playlist 1', 3, 'Y')
, ('Playlist 1', 4, 'Z')
, ('Playlist 2', 1, 'X')
, ('Playlist 2', 2, 'B')
, ('Playlist 2', 3, 'Q')
, ('Playlist 2', 4, 'L')
, ('Playlist 3', 1, 'Fly')
, ('Playlist 3', 2, 'Z')
, ('Playlist 3', 3, 'Y')
, ('Playlist 3', 4, 'Q')
, ('Playlist 4', 1, 'Fly')
, ('Playlist 4', 2, 'Z')
, ('Playlist 4', 3, 'Q')
, ('Playlist 4', 4, 'L')
SELECT TOP 5
recomended_track = recomendations.track
, recomendataion_weight = COUNT(*)
FROM dbo.play_list AS base
INNER JOIN
dbo.play_list AS recomendations
ON recomendations.playlist_name = base.playlist_name
WHERE base.track = 'Fly'
AND recomendations.track != 'Fly'
AND recomendations.playlist_name != 'Playlist 3'/*Assuming the person doing the search compiled this playlist:
we would not want to return any tracks from it in the results.*/
GROUP BY
recomendations.track
ORDER BY
COUNT(*) DESC
It is T-SQL (SQL Server) syntax but the important stuff is compatible with mySQL. The one thing that occurred to me was that the recommendation algorithm might be more sophisticated (for example proportion of playlists that contain the recommended song rather than a straight count).
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
|
|
|

08-04-10, 09:38
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,308
|
|
Poots and I are pretty much on the same page here. Get the general idea working, then you'll probably want to get a bit hinkey with your suggestion calcuation. One thing I was going to suggest (in a later post) is that those genre and artist FK values that I mentioned might contribute to that score as well, and another point would be that tracking when songs entered and left the playlist might give you strong indications of musical taste too.
There are lots of things you can explore about musical taste, especially when you have decent samples of a listner population!
-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
|
|

08-04-10, 09:47
|
|
King of Understatement
|
|
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
|
|
The single most important change Pat and I made is to the table design - there is one row per song in the playlist instead of one row per playlist. This is what Pat meant by normalising.
First normal form - Wikipedia, the free encyclopedia
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
|
|
|

08-05-10, 17:23
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 5
|
|
Thanks a lot guys! I am going to need to look at.it a.little more but I am on vacation now. I'll.let you know how it goes.
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|