Hi. I see three possible ways in which a friends network database can be designed. Each way has its own advantages and disadvantages. Each one leads to very different querys and performance. There aren't many places where this is discussed, although I think this is an interesting problem. I would like you to give me your opinion on which one you think would be better.
Excuse my lax sytax, i just want to highlight the important points.
The first way
user_id int primary key
friend_id (foreign key references user.user_id)
In this design, the Friend table saves who befriended whom, i.e. if user 1 befriended user 2, in the Friend table, one row would be added: 1 2.
To insert a friend this would be the query: INSERT INTO friend VALUES ( 1, 2 );
To get the list of friends of user 1, this query would work:
(SELECT friend_id FROM friend WHERE user_id = 1)
(SELECT user_id FROM friend WHERE friend_id = 1)
The problem with this approach is that it gets difficult to know, for example, who are friends of user 1, since user 1 can be on the first column or in the second column (see the query). Similarly, all CRUD operations become difficult, requiring unions or the use of OR in where clauses, which sometimes mysql doesn't optimize very well.
The second way
It is similar to the first way except that for each friend relationship that is created, two rows are added to the friends table. In the preceeding example, the rows 1 2 and 2 1 would be added, reflecting the fact that user 1 is friend of user 2 and, also, user 2 is friend of user 1.
To insert a friend this would be the query: INSERT INTO friend VALUES ( 1, 2), (2, 1)
To get the list of friends of user 1, this would work:
SELECT friend_id FROM friend WHERE user_id = 1
In this design, it's easy and fast to get the list of friends for any user. However, one can say that there is duplicated data, i'm not sure if this can lead to further problems. This design highlights the nature of friendship, where if user 1 is user's 2 friend, the revense is also true.
Here two rows are added for each pair of friends. The friendship itself gets its own id, but it is not a primary key, since it would be repeated twice. For instance, if user 1 and user 2 were friends, this would be represented like this:
And the same for each pair of friends. This may lead to entangled querys, but with good performance.
To insert a friendship between user 1 and 2 this would be the pair of querys:
SELECT @max_friendship_id := MAX(friendship_id) FROM friend;
INSERT INTO friend VALUES( @max_friendship_id + 1, '1' ), (@max_friendship_id + 1, '2')
To find the friends of user 1, this would do the job:
FROM friend where
friendship_id IN (select friendship_id from friend where user_id = '1') AND user_id != '1'
To get the list of friends of user 1, one would need to do a nested query (bad performance in mysql), or I suppose this could be rewritten as a self-join. I think this design could be good if there is some extra information associated with the friendship itself, since each friendship has its own id. Also, maybe, this is the most 'relational' of the three designs.
Well, these are the possible designs i see, but I don't get to decide which would be the best. This problem, in fact, isn't unique to social networking , but this question is the same as how to represent an undirected graphs within a relational database, for which I haven't found a clue in database design texts.
I hope I stated the problem well. Thank you for your contributions and comments!