Results 1 to 6 of 6
  1. #1
    Join Date
    Jun 2010
    Posts
    3

    Unanswered: Remove duplicate rows

    Hey there,

    I have a code to remove the duplicates in one of my table but I cant seem to be able to ORDER the way mysql select the values.

    Here is my initial table: http://img196.imageshack.us/img196/2421/tablesw.jpg

    Code:
    mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url") 
    mysql_query("DROP TABLE test") 
    mysql_query("RENAME TABLE new_table TO test")

    I want to remove the duplicated URL and make sure that it keeps only the value with the higher PR. I tried using the ORDER BY attribute but it doesnt seem to work. The first 3 values with a PR of 0 are always selected and placed in the new table...

    The good result should be that the 3 values (a,b,c) with a PR of 6 are found in the final table.

    I tried this and it doesnt work:

    Code:
    mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url ORDER BY pr DESC")

    I am looking for a fast way to achieve this because my table have over 200k entries.

    Thanks!

  2. #2
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    have you tried a where clause? As for order, in a table that is meaningless. Data can be placed anywhere, even when you cluster, due to out of order inserts or updates having moved a row to another page. The only time you have a garanteed order is in your select statement with an order by clause. If you are stating that you only want rows in your table that have a higher PR value, then I suggest you look at using a WHERE clause and a subselect that gets the MAX value of the PR column for each URL.
    Dave

  3. #3
    Join Date
    Jun 2010
    Posts
    3
    Well I cant really use a WHERE clause because I need to select all values and compare them.

    I just dont know how can I setup mysql to only keep the value with the higher pr.

    Let say he found 3 indentical entries. I would like that only the entry with the highest PR is keeped, and the 2 others are removed.

  4. #4
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    in that case use MAX
    Dave

  5. #5
    Join Date
    Jun 2010
    Posts
    3
    Here is the query I am actually using:

    Code:
    mysql_query("CREATE TABLE new_table as SELECT t1.* FROM old_table t1 JOIN (SELECT url,MAX(pr) max_pr FROM old_table GROUP BY url) t2 ON t2.url = t1.url AND t2.max_pr = t1.pr") or die(mysql_error());
    It works for a small table but when I try this query on a much bigger table (200k entries), the query freeze and it stop working..

  6. #6
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    try it without the join. You will have to name all the columns though. As is always noted do not use SELECT *. Try
    Code:
    SELECT COLS,....,MAX(PR) AS PR from old_table group by cols....
    May still be a little slow as you are performing a sort for the grouping of how ever many rows are in your old_table.
    Dave

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •