Results 1 to 5 of 5
  1. #1
    Join Date
    Nov 2004
    Posts
    3

    Arrow Unanswered: optimizing the join of 3 huge InnoDB tables

    Hello, I have the following two tables:

    userrelation: (around 60 GB InnoDB)
    user1id int(4) unsigned
    user2id int(4) unsigned
    This table specifies which pairs of users have some kind of friendship. Such relationship is also assumed to be directed: user1id -> user2id

    usergroup: (around 4 GB InnoDB)
    userid int(4) unsigned
    groupid int(4) unsigned
    This table specifies which group each user belongs to.

    Now I want to know for each pair of groups, how many directed user pairs exist. The very straighforward statement would be:

    Code:
    SELECT t1.groupid, t2.groupid FROM usergroup AS t1, usergroup AS t2, userrelation AS t3 WHERE t1.userid=t3.user1id AND t2.userid=t3.user2id ORDER BY t1.groupid, t2.groupid;
    Anyone knows if this statement could be further optimized? Does anyone have some similar experiences and could give a rough estimation of the execution time? Any guru could explain in some details how MySQL would process such a statement?

  2. #2
    Join Date
    Nov 2004
    Posts
    3

    Question

    if I use the following stmt, is there any difference? Why?

    SELECT t1.groupid, t2.groupid FROM userrelation AS t3
    LEFT JOIN usergroup AS t1 ON t1.userid=t3.user1id
    LEFT JOIN usergroup AS t2 ON t2.userid=t3.user2id
    GROUP BY t1.groupid
    ORDER BY t1.groupid, t2.groupid;

  3. #3
    Join Date
    Mar 2004
    Posts
    480
    1) are your tables indexed?

    2) yes the statements would be different since you are using a group by in the second statement but not the first

    3) if you use group by, you must include all non aggregate fields in your group by or you could end up with unpredictable results.

    4) have you ran your query with EXPLAIN in front of it which will give you the description of how the query works ?

  4. #4
    Join Date
    Nov 2004
    Posts
    3
    1) the 60GB table was indexed on the pair, the 4GB table has no index.
    2) i mean the differences in the join operations, ignoring the GROUP BY part
    3) do you mean I have to "GROUP BY t1.groupid, t2.groupid"? Why the results would be unpredictable if only "GROUP BY t1.groupid"?
    4) i have issued the sql command and will "EXPLAIN" influences the execution of the SELECT? I don't want the operation to be rolled back. (The SELECT is actually the subquery of an INSERT in my case.)

  5. #5
    Join Date
    Jan 2003
    Location
    Paris, France
    Posts
    320
    The query

    SELECT t1.groupid, t2.groupid FROM usergroup AS t1, usergroup AS t2, userrelation AS t3 WHERE t1.userid=t3.user1id AND t2.userid=t3.user2id ORDER BY t1.groupid, t2.groupid;

    And

    SELECT t1.groupid, t2.groupid FROM userrelation AS t3
    LEFT JOIN usergroup AS t1 ON t1.userid=t3.user1id
    LEFT JOIN usergroup AS t2 ON t2.userid=t3.user2id
    GROUP BY t1.groupid
    ORDER BY t1.groupid, t2.groupid;

    Are different

    In the first you use an INNER JOIN (only return row that match the condition)
    In the second you use a LEFT JOIN (return all rows from the 1ST table and all row that match the join condition)

    You do a join with out using index. That mean that MySql will make a FULL scan to perform this join
    You need to create index on usergroup on userid
    Olivier Miossec
    --
    http://www.lasso-developpeur.net/
    --

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •