Results 1 to 11 of 11
  1. #1
    Join Date
    Aug 2010
    Posts
    5

    Help with my design

    Hello all. I am new to the forums and fairly new to database design in general.

    I am working on a project and having trouble developing a database design that will be efficient to search through.

    The overall goal of the project will be to take in many lists of items and then store the likelihood of having different items. Here is an example:

    List 1: A, B, C, D
    List 2: A, B, G, L
    List 3: A, B, G, L
    List 4: B, C, D, L

    Now I want to determine the likelihood of having item B & A being together...
    So it would look like this:

    A-B - 3 (3 because in 3/4 items A and B are together)....
    A-C - 1 (1 because A and C are both present only in list one)
    A-D - 1 (List 1 only)
    A-G - 2 (List 2 and 3)
    A-L - 2
    B-C - 2
    B-D - 2.... etc.

    So I considered doing this just as I am showing above. Keeping a table of all of these different combinations, but there may be 100,000 different items and lists. If I am solving this right... if there are 100,000 items.. there would be 100,000 x 99,999 combinations in my table. This would take forever to sort.

    In the end I want to be able to search for all combinations that contain a particular letter. So if I searched for A, I would get back A-B-3, A-C -1, A-D -1, A-G-2, and A-L -2.

    Does anyone have any recommendations to make this more efficient?

    Thanks for your help.

    Justin

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Hi Justin

    I think it would be beneficial for you to outline your real scenario. This is abstracted and looks like one of those instances where crucial information is lost in the abstraction.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Aug 2010
    Posts
    5
    Ok, I will tell you a more realistic version and hopefully that will help.

    Users can make music playlists. Each user will have their own playlist. I want to give users song recommendations based on a database of all of these playlists.

    The users will select a song from their playlist and other songs will be recommended to them based upon the database of other users.


    So... a user likes the song "Fly"

    The database has the following playlists from other users:

    Playlist 1: "Fly", "X", "Y", "Z"
    Playlist 2: "X", "B", "Q", "L"
    Playlist 3: "Fly", "Z", "Y", "Q"
    Playlist 4: "Fly", "Z", "Q", "L"

    It would recommend the song: Z because 3 other users who like "Fly" also like "Z".

    The database I considered looks like this:

    Song A Song B Num
    "Fly" X 1 - only found in 1st playlist
    "Fly" Y 2 - combo is in playlist 1 and 3
    "Fly" Q 1 - only in playlist 4
    "Fly" Z 3
    ...
    X Y 1
    L Y 0 - no combinations




    Now as I said before I may have 100,000 or more different playlists and 100,000 or more different songs, so doing it my way I would have 100,000 x 99,999 different entries which would be very inneficient.

    Any suggestions

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    So the column headings for this table would be something like:
    Code:
    PlayListID | Track1 | Track2 | Track3 | Track4
    Is that correct?
    Also, please could you let me know your RDBMS?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Aug 2010
    Posts
    5
    No, because my list of tracks will be incredibly large. It would have the following headings. Playlist ID, songs, userid. This would be my playlist table. I would also have a user table. Beyond that, am not sure. But with what I have, I want to take in a single song name and sort through the playlist table to find other songs to recommend.

    The program then will search in playlists that contain "this song" and discover that 95/100 contain "that song". This is a high percent, so "that song" is recommended.

    I'm not sure what you mean by rdms? I use mySQL and will be communicating with PHP

  6. #6
    Join Date
    Aug 2010
    Posts
    5
    I was also considering a combinations table. You will see my second table in my second post. It would have song 1, song 2, num playlists with this combo.

    When a new playlist is created, it would add to the table if there is a new song combination (ex song a and song f in the same playlist) or it would simply update the third field if the combination is already in that table. This would work; but with 100,000 unique songs that would make an incredibly large number of items to search through.

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Let's try a very different arrangement and normalize this beastie. I think you'll like it!

    One table for songs. It has one row for each song with details specific to that song like name, runtime, and possibly foreign keys to artist and genre.

    One table for playlists. It has one row for each playlist with details like name, a foreign key to the playlist owner, and dates for the beginning and end of this playlist's life.

    A new table for playlistsongs. It has just the details about a single song on one playlist like the Foreign Key for the playlist and the song.

    Once you figure out the value for the question mark in this code, this schema allows you to run something like:
    Code:
    SELECT Count(*) AS score, songs.name
       FROM playlists AS myPlayLists
       JOIN playlists AS otherPlayLists
          ON (otherPlayLists.songPK = myPlayLists.songPK)
       JOIN songs
          ON (songs.songPK = otherPlayLists.songPK)
       WHERE  myPlayLists.ownerPK = ?
       GROUP BY songs.songPK
       ORDER BY Count(*) DESC, songs.name
    This SQL snippet will scan every song on your playlists, match that song on every playlist that contains it, and summarize based on the number of ocurrences of other songs on playlists that contain songs on your playlists. It ought to run fairly quickly as along as your tables are indexed reasonably well.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Ok - Pat basically went in exactly the same direction as me.
    This is my code - broadly the same in the important areas.
    Code:
    USE test
    GO
    
    IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.play_list')) 
    BEGIN
        DROP TABLE dbo.play_list
    END
    
    CREATE TABLE dbo.play_list
        (
            playlist_name       VARCHAR(50)     COLLATE Latin1_General_CI_AS    NOT NULL
          , track_ordinal       TINYINT                                         NOT NULL 
          , track               VARCHAR(90)     COLLATE Latin1_General_CI_AS    NOT NULL
          , CONSTRAINT pk_play_list PRIMARY KEY CLUSTERED (playlist_name, track_ordinal) WITH (FILLFACTOR = 100)
          , CONSTRAINT ix_play_list_track_ordinal_track_u_nc UNIQUE NONCLUSTERED (playlist_name, track) WITH (FILLFACTOR = 100)
        ) ON [PRIMARY]
    GO
    
    INSERT INTO dbo.play_list
        (
            playlist_name
          , track_ordinal
          , track
        )
    VALUES  ('Playlist 1', 1, 'Fly')
          , ('Playlist 1', 2, 'X')
          , ('Playlist 1', 3, 'Y')
          , ('Playlist 1', 4, 'Z')
          , ('Playlist 2', 1, 'X')
          , ('Playlist 2', 2, 'B')
          , ('Playlist 2', 3, 'Q')
          , ('Playlist 2', 4, 'L')
          , ('Playlist 3', 1, 'Fly')
          , ('Playlist 3', 2, 'Z')
          , ('Playlist 3', 3, 'Y')
          , ('Playlist 3', 4, 'Q')
          , ('Playlist 4', 1, 'Fly')
          , ('Playlist 4', 2, 'Z')
          , ('Playlist 4', 3, 'Q')
          , ('Playlist 4', 4, 'L')
    
    SELECT  TOP 5
            recomended_track                = recomendations.track
          , recomendataion_weight           = COUNT(*)
    FROM    dbo.play_list                   AS base
    INNER JOIN 
            dbo.play_list                   AS recomendations
    ON  recomendations.playlist_name        = base.playlist_name
    WHERE   base.track                      = 'Fly'
        AND recomendations.track           != 'Fly'
        AND recomendations.playlist_name   != 'Playlist 3'/*Assuming the person doing the search compiled this playlist:
                                                            we would not want to return any tracks from it in the results.*/
    GROUP BY
            recomendations.track
    ORDER BY
            COUNT(*) DESC
    It is T-SQL (SQL Server) syntax but the important stuff is compatible with mySQL. The one thing that occurred to me was that the recommendation algorithm might be more sophisticated (for example proportion of playlists that contain the recommended song rather than a straight count).
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Poots and I are pretty much on the same page here. Get the general idea working, then you'll probably want to get a bit hinkey with your suggestion calcuation. One thing I was going to suggest (in a later post) is that those genre and artist FK values that I mentioned might contribute to that score as well, and another point would be that tracking when songs entered and left the playlist might give you strong indications of musical taste too.

    There are lots of things you can explore about musical taste, especially when you have decent samples of a listner population!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    The single most important change Pat and I made is to the table design - there is one row per song in the playlist instead of one row per playlist. This is what Pat meant by normalising.
    First normal form - Wikipedia, the free encyclopedia
    Testimonial:
    pootle flump
    ur codings are working excelent.

  11. #11
    Join Date
    Aug 2010
    Posts
    5
    Thanks a lot guys! I am going to need to look at.it a.little more but I am on vacation now. I'll.let you know how it goes.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •