# Thread: Help with my design

1. Registered User
Join Date
Aug 2010
Posts
5

## Help with my design

Hello all. I am new to the forums and fairly new to database design in general.

I am working on a project and having trouble developing a database design that will be efficient to search through.

The overall goal of the project will be to take in many lists of items and then store the likelihood of having different items. Here is an example:

List 1: A, B, C, D
List 2: A, B, G, L
List 3: A, B, G, L
List 4: B, C, D, L

Now I want to determine the likelihood of having item B & A being together...
So it would look like this:

A-B - 3 (3 because in 3/4 items A and B are together)....
A-C - 1 (1 because A and C are both present only in list one)
A-D - 1 (List 1 only)
A-G - 2 (List 2 and 3)
A-L - 2
B-C - 2
B-D - 2.... etc.

So I considered doing this just as I am showing above. Keeping a table of all of these different combinations, but there may be 100,000 different items and lists. If I am solving this right... if there are 100,000 items.. there would be 100,000 x 99,999 combinations in my table. This would take forever to sort.

In the end I want to be able to search for all combinations that contain a particular letter. So if I searched for A, I would get back A-B-3, A-C -1, A-D -1, A-G-2, and A-L -2.

Does anyone have any recommendations to make this more efficient?

Justin

2. King of Understatement
Join Date
Feb 2004
Location
One Flump in One Place
Posts
14,912
Hi Justin

I think it would be beneficial for you to outline your real scenario. This is abstracted and looks like one of those instances where crucial information is lost in the abstraction.

3. Registered User
Join Date
Aug 2010
Posts
5
Ok, I will tell you a more realistic version and hopefully that will help.

Users can make music playlists. Each user will have their own playlist. I want to give users song recommendations based on a database of all of these playlists.

The users will select a song from their playlist and other songs will be recommended to them based upon the database of other users.

So... a user likes the song "Fly"

The database has the following playlists from other users:

Playlist 1: "Fly", "X", "Y", "Z"
Playlist 2: "X", "B", "Q", "L"
Playlist 3: "Fly", "Z", "Y", "Q"
Playlist 4: "Fly", "Z", "Q", "L"

It would recommend the song: Z because 3 other users who like "Fly" also like "Z".

The database I considered looks like this:

Song A Song B Num
"Fly" X 1 - only found in 1st playlist
"Fly" Y 2 - combo is in playlist 1 and 3
"Fly" Q 1 - only in playlist 4
"Fly" Z 3
...
X Y 1
L Y 0 - no combinations

Now as I said before I may have 100,000 or more different playlists and 100,000 or more different songs, so doing it my way I would have 100,000 x 99,999 different entries which would be very inneficient.

Any suggestions

4. King of Understatement
Join Date
Feb 2004
Location
One Flump in One Place
Posts
14,912
So the column headings for this table would be something like:
Code:
`PlayListID | Track1 | Track2 | Track3 | Track4`
Is that correct?

5. Registered User
Join Date
Aug 2010
Posts
5
No, because my list of tracks will be incredibly large. It would have the following headings. Playlist ID, songs, userid. This would be my playlist table. I would also have a user table. Beyond that, am not sure. But with what I have, I want to take in a single song name and sort through the playlist table to find other songs to recommend.

The program then will search in playlists that contain "this song" and discover that 95/100 contain "that song". This is a high percent, so "that song" is recommended.

I'm not sure what you mean by rdms? I use mySQL and will be communicating with PHP

6. Registered User
Join Date
Aug 2010
Posts
5
I was also considering a combinations table. You will see my second table in my second post. It would have song 1, song 2, num playlists with this combo.

When a new playlist is created, it would add to the table if there is a new song combination (ex song a and song f in the same playlist) or it would simply update the third field if the combination is already in that table. This would work; but with 100,000 unique songs that would make an incredibly large number of items to search through.

7. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
Let's try a very different arrangement and normalize this beastie. I think you'll like it!

One table for songs. It has one row for each song with details specific to that song like name, runtime, and possibly foreign keys to artist and genre.

One table for playlists. It has one row for each playlist with details like name, a foreign key to the playlist owner, and dates for the beginning and end of this playlist's life.

A new table for playlistsongs. It has just the details about a single song on one playlist like the Foreign Key for the playlist and the song.

Once you figure out the value for the question mark in this code, this schema allows you to run something like:
Code:
```SELECT Count(*) AS score, songs.name
FROM playlists AS myPlayLists
JOIN playlists AS otherPlayLists
ON (otherPlayLists.songPK = myPlayLists.songPK)
JOIN songs
ON (songs.songPK = otherPlayLists.songPK)
WHERE  myPlayLists.ownerPK = ?
GROUP BY songs.songPK
ORDER BY Count(*) DESC, songs.name```
This SQL snippet will scan every song on your playlists, match that song on every playlist that contains it, and summarize based on the number of ocurrences of other songs on playlists that contain songs on your playlists. It ought to run fairly quickly as along as your tables are indexed reasonably well.

-PatP

8. King of Understatement
Join Date
Feb 2004
Location
One Flump in One Place
Posts
14,912
Ok - Pat basically went in exactly the same direction as me.
This is my code - broadly the same in the important areas.
Code:
```USE test
GO

IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.play_list'))
BEGIN
DROP TABLE dbo.play_list
END

CREATE TABLE dbo.play_list
(
playlist_name       VARCHAR(50)     COLLATE Latin1_General_CI_AS    NOT NULL
, track_ordinal       TINYINT                                         NOT NULL
, track               VARCHAR(90)     COLLATE Latin1_General_CI_AS    NOT NULL
, CONSTRAINT pk_play_list PRIMARY KEY CLUSTERED (playlist_name, track_ordinal) WITH (FILLFACTOR = 100)
, CONSTRAINT ix_play_list_track_ordinal_track_u_nc UNIQUE NONCLUSTERED (playlist_name, track) WITH (FILLFACTOR = 100)
) ON [PRIMARY]
GO

INSERT INTO dbo.play_list
(
playlist_name
, track_ordinal
, track
)
VALUES  ('Playlist 1', 1, 'Fly')
, ('Playlist 1', 2, 'X')
, ('Playlist 1', 3, 'Y')
, ('Playlist 1', 4, 'Z')
, ('Playlist 2', 1, 'X')
, ('Playlist 2', 2, 'B')
, ('Playlist 2', 3, 'Q')
, ('Playlist 2', 4, 'L')
, ('Playlist 3', 1, 'Fly')
, ('Playlist 3', 2, 'Z')
, ('Playlist 3', 3, 'Y')
, ('Playlist 3', 4, 'Q')
, ('Playlist 4', 1, 'Fly')
, ('Playlist 4', 2, 'Z')
, ('Playlist 4', 3, 'Q')
, ('Playlist 4', 4, 'L')

SELECT  TOP 5
recomended_track                = recomendations.track
, recomendataion_weight           = COUNT(*)
FROM    dbo.play_list                   AS base
INNER JOIN
dbo.play_list                   AS recomendations
ON  recomendations.playlist_name        = base.playlist_name
WHERE   base.track                      = 'Fly'
AND recomendations.track           != 'Fly'
AND recomendations.playlist_name   != 'Playlist 3'/*Assuming the person doing the search compiled this playlist:
we would not want to return any tracks from it in the results.*/
GROUP BY
recomendations.track
ORDER BY
COUNT(*) DESC```
It is T-SQL (SQL Server) syntax but the important stuff is compatible with mySQL. The one thing that occurred to me was that the recommendation algorithm might be more sophisticated (for example proportion of playlists that contain the recommended song rather than a straight count).

9. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
Poots and I are pretty much on the same page here. Get the general idea working, then you'll probably want to get a bit hinkey with your suggestion calcuation. One thing I was going to suggest (in a later post) is that those genre and artist FK values that I mentioned might contribute to that score as well, and another point would be that tracking when songs entered and left the playlist might give you strong indications of musical taste too.

There are lots of things you can explore about musical taste, especially when you have decent samples of a listner population!

-PatP

10. King of Understatement
Join Date
Feb 2004
Location
One Flump in One Place
Posts
14,912
The single most important change Pat and I made is to the table design - there is one row per song in the playlist instead of one row per playlist. This is what Pat meant by normalising.
First normal form - Wikipedia, the free encyclopedia

11. Registered User
Join Date
Aug 2010
Posts
5
Thanks a lot guys! I am going to need to look at.it a.little more but I am on vacation now. I'll.let you know how it goes.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•