If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Informix > Removing duplicates from a table

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-28-06, 06:14
exterminator exterminator is offline
Registered User
 
Join Date: Jan 2006
Posts: 38
Removing duplicates from a table

I am stuck with yet another problem.

Here is the scenario. There is table where the unique-ness is identified based on a set of columns. The table does not have a serial column.

There are duplicates in the table too. Now, what I want to do is - Select only those records which do not have more than one occurence in the table for the given set of columns. I just simply want to leave the duplicate rows, all of them.

First thing that would come in one's mind would be SELECT DISTINCT but the problem is I want some more columns values apart from those that form the uniqueness rule. How should I go ahead with the solution to this? Please suggest. Thanks for taking a look. Best regards.
Reply With Quote
  #2 (permalink)  
Old 03-28-06, 08:57
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
You may not have a SERIAL column but don't forget about the unseen-but-always-there ROWID column ..... could be just what you're looking for.

I usually approach the "identify duplicates" problem by comparing a table to itself using table aliases & the ROWID. You have columns that identify the uniqueness of a row (perhaps there's an index based on these values) & other columns that you wish to include in the WHERE criteria so perhaps you can start with / build on this:

Assume that col1, col2, and col3 form the uniqueness

Select unique a.col1, a.col2, a.col3, a.col4, a.col5
from table1 a,
table1 b
where a.col1 = b.col1 and
a.col2 = b.col2 and
a.col3 = b.col3 and
a.col4 = b.col4 and
a.col5 = b.col5 and
a.ROWID <> b.ROWID

Last edited by mjldba; 03-28-06 at 10:15.
Reply With Quote
  #3 (permalink)  
Old 03-29-06, 05:45
exterminator exterminator is offline
Registered User
 
Join Date: Jan 2006
Posts: 38
Thanks mjldba. The rowid approach seems to fit in well but I saw the documentation and it stated that rowid is available only for fragmented tables. I don't really understand the details of a fragmented table and a non-fragmented table but does that affect the performance the table related DML statements and other operations? Which one is better from this perspective?

Also, what would be the approach if rowid is not available.. that is the table is an unfragmented one? Regards.
Reply With Quote
  #4 (permalink)  
Old 03-29-06, 11:03
artemka artemka is offline
Registered User
 
Join Date: May 2004
Location: New York
Posts: 248
The ROWID is always there, the only issue that you might face is that if your table is fragmented across multiple dbspaces then it could have a duplicate rowid

Cheers
Reply With Quote
  #5 (permalink)  
Old 03-29-06, 12:49
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
Agree with artemka's response. Do you know if you have a fragmented (one table distributed to more than 1 dbspace) or non-fragmented (one table in one dbspace) schema for this table?

In non-fragmented tables, the ROWID identifies the location of the row within the dbspace.

In fragmented tables, Informix does not assign a ROWID to each row but you can explicitly create a ROWID column by specifying WITH ROWID when the table is created.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On