If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Remove duplicate rows

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-11-10, 11:54
Dieuz Dieuz is offline
Registered User
 
Join Date: Jun 2010
Posts: 3
Remove duplicate rows

Hey there,

I have a code to remove the duplicates in one of my table but I cant seem to be able to ORDER the way mysql select the values.

Here is my initial table: http://img196.imageshack.us/img196/2421/tablesw.jpg

Code:
mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url") 
mysql_query("DROP TABLE test") 
mysql_query("RENAME TABLE new_table TO test")

I want to remove the duplicated URL and make sure that it keeps only the value with the higher PR. I tried using the ORDER BY attribute but it doesnt seem to work. The first 3 values with a PR of 0 are always selected and placed in the new table...

The good result should be that the 3 values (a,b,c) with a PR of 6 are found in the final table.

I tried this and it doesnt work:

Code:
mysql_query("CREATE TABLE new_table as SELECT * FROM test GROUP BY url ORDER BY pr DESC")

I am looking for a fast way to achieve this because my table have over 200k entries.

Thanks!
Reply With Quote
  #2 (permalink)  
Old 06-11-10, 12:59
dav1mo dav1mo is offline
Registered User
 
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
have you tried a where clause? As for order, in a table that is meaningless. Data can be placed anywhere, even when you cluster, due to out of order inserts or updates having moved a row to another page. The only time you have a garanteed order is in your select statement with an order by clause. If you are stating that you only want rows in your table that have a higher PR value, then I suggest you look at using a WHERE clause and a subselect that gets the MAX value of the PR column for each URL.
Dave
Reply With Quote
  #3 (permalink)  
Old 06-11-10, 13:29
Dieuz Dieuz is offline
Registered User
 
Join Date: Jun 2010
Posts: 3
Well I cant really use a WHERE clause because I need to select all values and compare them.

I just dont know how can I setup mysql to only keep the value with the higher pr.

Let say he found 3 indentical entries. I would like that only the entry with the highest PR is keeped, and the 2 others are removed.
Reply With Quote
  #4 (permalink)  
Old 06-11-10, 13:58
dav1mo dav1mo is offline
Registered User
 
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
in that case use MAX
Dave
Reply With Quote
  #5 (permalink)  
Old 06-11-10, 16:23
Dieuz Dieuz is offline
Registered User
 
Join Date: Jun 2010
Posts: 3
Here is the query I am actually using:

Code:
mysql_query("CREATE TABLE new_table as SELECT t1.* FROM old_table t1 JOIN (SELECT url,MAX(pr) max_pr FROM old_table GROUP BY url) t2 ON t2.url = t1.url AND t2.max_pr = t1.pr") or die(mysql_error());
It works for a small table but when I try this query on a much bigger table (200k entries), the query freeze and it stop working..
Reply With Quote
  #6 (permalink)  
Old 06-14-10, 08:29
dav1mo dav1mo is offline
Registered User
 
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
try it without the join. You will have to name all the columns though. As is always noted do not use SELECT *. Try
Code:
SELECT COLS,....,MAX(PR) AS PR from old_table group by cols....
May still be a little slow as you are performing a sort for the grouping of how ever many rows are in your old_table.
Dave
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On