Results 1 to 3 of 3
  1. #1
    Join Date
    Oct 2006

    Lightbulb Unanswered: Friends Network (undirected graph) Database Design

    Hi. I see three possible ways in which a friends network database can be designed. Each way has its own advantages and disadvantages. Each one leads to very different querys and performance. There aren't many places where this is discussed, although I think this is an interesting problem. I would like you to give me your opinion on which one you think would be better.

    Excuse my lax sytax, i just want to highlight the important points.

    The first way

    TABLE user(
    user_id int primary key

    TABLE Friend(
    user_id int
    friend_id (foreign key references user.user_id)

    In this design, the Friend table saves who befriended whom, i.e. if user 1 befriended user 2, in the Friend table, one row would be added: 1 2.

    To insert a friend this would be the query: INSERT INTO friend VALUES ( 1, 2 );

    To get the list of friends of user 1, this query would work:

    (SELECT friend_id FROM friend WHERE user_id = 1)
    (SELECT user_id FROM friend WHERE friend_id = 1)

    The problem with this approach is that it gets difficult to know, for example, who are friends of user 1, since user 1 can be on the first column or in the second column (see the query). Similarly, all CRUD operations become difficult, requiring unions or the use of OR in where clauses, which sometimes mysql doesn't optimize very well.

    The second way

    It is similar to the first way except that for each friend relationship that is created, two rows are added to the friends table. In the preceeding example, the rows 1 2 and 2 1 would be added, reflecting the fact that user 1 is friend of user 2 and, also, user 2 is friend of user 1.

    To insert a friend this would be the query: INSERT INTO friend VALUES ( 1, 2), (2, 1)

    To get the list of friends of user 1, this would work:

    SELECT friend_id FROM friend WHERE user_id = 1

    In this design, it's easy and fast to get the list of friends for any user. However, one can say that there is duplicated data, i'm not sure if this can lead to further problems. This design highlights the nature of friendship, where if user 1 is user's 2 friend, the revense is also true.

    The third way

    The friend table would look like this:

    TABLE friend(
    friendship_id int,
    user_id int (foreign key references user.user_id)

    Here two rows are added for each pair of friends. The friendship itself gets its own id, but it is not a primary key, since it would be repeated twice. For instance, if user 1 and user 2 were friends, this would be represented like this:

    friendship_id user_id
    124 1
    124 2

    And the same for each pair of friends. This may lead to entangled querys, but with good performance.

    To insert a friendship between user 1 and 2 this would be the pair of querys:

    SELECT @max_friendship_id := MAX(friendship_id) FROM friend;
    INSERT INTO friend VALUES( @max_friendship_id + 1, '1' ), (@max_friendship_id + 1, '2')

    To find the friends of user 1, this would do the job:

    SELECT user_id
    FROM friend where
    friendship_id IN (select friendship_id from friend where user_id = '1') AND user_id != '1'

    To get the list of friends of user 1, one would need to do a nested query (bad performance in mysql), or I suppose this could be rewritten as a self-join. I think this design could be good if there is some extra information associated with the friendship itself, since each friendship has its own id. Also, maybe, this is the most 'relational' of the three designs.

    Well, these are the possible designs i see, but I don't get to decide which would be the best. This problem, in fact, isn't unique to social networking , but this question is the same as how to represent an undirected graphs within a relational database, for which I haven't found a clue in database design texts.

    I hope I stated the problem well. Thank you for your contributions and comments!
    Last edited by sideral; 10-10-06 at 15:06.

  2. #2
    Join Date
    Apr 2002
    Toronto, Canada
    excellent synopsis, and very nicely discussed

    my choice would be the second way

    why? because suppose you suddenly decide that it does matter who befriended whom -- what would be the impact on your queries for each of the three ways?

    the first way, for undirected, means complex queries which would have to be un-complexed

    the third way is impossible

    the second way means no change whatsoever | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Mar 2010
    Here is another option for finding friends of a given user...

    Given the table "users" which is the master list of users, and the table "friends" which has two foreign key columns user1_id and user2_id, either of which can be the user in question (id=123):

    SELECT * FROM users JOIN friends ON = IF(friends.user1_id=123, friends.user1_id, friends.user2_id) WHERE (friends.user1_id=123 OR friends.user2_id=123)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts