Results 1 to 9 of 9
  1. #1
    Join Date
    Aug 2012
    Posts
    3

    Question Unanswered: Select similar data (not duplicate)

    Hi guys,

    I have been searching for an answer to this on google but all am finding is queries to find duplicate data.

    Basically I have a picklist table in a database and I have discovered that there are what looks like duplicate data (because the name is the same) but there is a different number on the end, as you can see from an example below.

    NO, ENTRY
    24, John Doe|26|05768
    24, John Doe|26|5768

    Do you know if there is a sql query that can be ran against this table that will look through the ENTRY column and select fields that are similar and not duplicates (duplicates can't exist due to PK constraints)

    I hope I have explained this enough.

    Thanks in advance

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Too lottle and incomplete examples, I think.

    NO, ENTRY
    24, John Doe|26|05768
    24, John Doe|26|5768
    Do you mean NO = '24, John Doe' and ENTRY = '26'? then what is column name for '05768' and '5768'?
    Or, NO = '24' and ENTRY = 'John Doe|26|05768' and 'John Doe|26|5768'?

    You wrote "because the name is the same", but no column name of "name"
    (Though I could guess 'John Doe' might be name, it was not clearly explained).

    You wrote "different number on the end", but '05768' and '5768' are equal if interpreted as numbers.
    What if they were '05768' and '7586'?
    Same? (because other part '24, John Doe|26' are same)
    or different? (because '05768' and '7586' are different)

  3. #3
    Join Date
    May 2005
    Posts
    119
    You could look at the beginning of each value and compare, say the first 10 characters and look for dups.

    select No, count(left(entry,10)),left(entry,10)from {your data table}
    groupby no,left(entry,10)
    havingcount(left(entry,10))>1

    This would compare and search on the first 10 characters and look for duplicate data. Make sense?

  4. #4
    Join Date
    Aug 2012
    Posts
    3
    Quote Originally Posted by tonkuma View Post
    Too lottle and incomplete examples, I think.


    Do you mean NO = '24, John Doe' and ENTRY = '26'? then what is column name for '05768' and '5768'?
    Or, NO = '24' and ENTRY = 'John Doe|26|05768' and 'John Doe|26|5768'?

    You wrote "because the name is the same", but no column name of "name"
    (Though I could guess 'John Doe' might be name, it was not clearly explained).

    You wrote "different number on the end", but '05768' and '5768' are equal if interpreted as numbers.
    What if they were '05768' and '7586'?
    Same? (because other part '24, John Doe|26' are same)
    or different? (because '05768' and '7586' are different)
    Hi tonkuma, thanks for your reply.
    I didn't think I had explained it very well, sorry. But what I was trying to show you was an example of 2 very similar entries into the table.

    Column1= NO (24) and Column2= ENTRY (name|26|number)

    In that example I was showing you, shows that 2 entries had been made for the same person, but it didn't pick up as a duplicate because them numbers were different.

    So what I am trying to do is run a query that will look through the table, preferably just the ENTRY column and check for similarities.

    Quote Originally Posted by Krista327 View Post
    You could look at the beginning of each value and compare, say the first 10 characters and look for dups.

    select No, count(left(entry,10)),left(entry,10)from {your data table}
    groupby no,left(entry,10)
    havingcount(left(entry,10))>1

    This would compare and search on the first 10 characters and look for duplicate data. Make sense?
    Hi Krista327, Yes this does make sense and I will give it a go thanks

    Now let me be a bit cheeky and ask if I would be able to do a query similar to yours that would search everything that is in front of the first '|' in that column.

    Would that be possible?

    Thanks with the responses so far guys.

  5. #5
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Let me confirm your requirements.

    From these data
    NO, ENTRY
    24, John Doe|26|05768
    24, John Doe|26|5768

    27, John Evil|26|05768
    27, John Evil|26|05768

    31, John Foul|26|5768
    31, John Foul|19|5768

    33, John Geek|23|05768
    34, John Geek|23|5768

    41, John Hell|29|05768
    42, John Hell|29|05768


    Do you want this result?
    NO, ENTRY
    24, John Doe|26|05768
    24, John Doe|26|5768

    31, John Foul|26|5768
    31, John Foul|19|5768

    33, John Geek|23|05768
    34, John Geek|23|5768

    So what I am trying to do is run a query that will look through the table, preferably just the ENTRY column and check for similarities.
    John Hell should be not included in results.
    Because, (though NO was different) ENTRY was duplicated.

  6. #6
    Join Date
    Aug 2012
    Posts
    3
    Quote Originally Posted by tonkuma View Post
    Let me confirm your requirements.

    From these data
    NO, ENTRY
    24, John Doe|26|05768
    24, John Doe|26|5768

    27, John Evil|26|05768
    27, John Evil|26|05768

    31, John Foul|26|5768
    31, John Foul|19|5768

    33, John Geek|23|05768
    34, John Geek|23|5768

    41, John Hell|29|05768
    42, John Hell|29|05768


    Do you want this result?
    NO, ENTRY
    24, John Doe|26|05768
    24, John Doe|26|5768

    31, John Foul|26|5768
    31, John Foul|19|5768

    33, John Geek|23|05768
    34, John Geek|23|5768


    John Hell should be not included in results.
    Because, (though NO was different) ENTRY was duplicated.
    Hi tonkuma,

    You are pretty close to what I want to achieve. Although I would point out the the NO column will always be populated with the number '24' and in the ENTRY column the middle part of the string will always be '|26|'. The only information that will be different is the name before the |26| and the number after it.

    Using the code that Krista327 provide was good but only gave me a count of possible duplicates in the results. If I could get it to show me a results page like yours where it actually shows you the similar/duplicate in a list that would be useful.

    Thanks

  7. #7
    Join Date
    May 2005
    Posts
    119
    Take the code I provided previously and replace the "left(entry,10)" with

    left(Entry,LEN(entry)-CHARINDEX('|',entry))

    This will compare everything to the left of the first bar.
    helpful?

  8. #8
    Join Date
    May 2005
    Posts
    119
    Oh, and this will only show you data for what is in the left of the bar.. you want to see the actual data too? That would throw off your counts. You could embed this into a parent query and use a where statement to show the full field.

    select No, entry from {yourdatatable} where no in (select... the above query)

  9. #9
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    My idea was...
    Code:
    SELECT no
         , entry
     FROM  {your_table} t
     WHERE EXISTS
           (SELECT 0
             FROM  {your_table} s
             WHERE LEFT(s.entry , CHARINDEX('|' , s.entry) )
                 = LEFT(t.entry , CHARINDEX('|' , t.entry) )
               AND s.entry
                <> t.entry
           )
    ;
    Last edited by tonkuma; 08-19-12 at 00:40. Reason: Remove "- 1" from 2nd parameter of LEFT functions.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •