If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > New Members & Introductions > Strange self join query result

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-06-11, 14:15
jetberrocal jetberrocal is offline
Registered User
 
Join Date: Aug 2011
Posts: 4
Red face Strange self join query result

Hello,

I am trying to get all duplicates records from a Tab Delimited TXT file using ODBC with the Microsoft Text Driver and schema.ini.

The problem is getting the extra fields with the duplicated fields.

To obtain the duplicates records I did the following query:

SELECT CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD, COUNT(*) AS CT FROM SSNUNIQUES.TXT WHERE VALIDSCORE >= " + limit.ToString
+ " GROUP BY CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD
HAVING Count(*) > 1

The VALIDSCORE column values range from -1 to 100 integers, I used 0 to get all records, since the records with VALIDSCORE = -1 are NULL in the grouping columns.

This query results in aproximatelly 50,000 records out of 400,000.

To clarify, the CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD columns may have a DIRECCION1STD, DIRECCION2STD with NULL values.

Now I am executing the following query in order to obtain each full duplicate record from the same TXT file:

SELECT T1.RECID, T1.ETC, T1.SS, T2.CT AS DUPCT, T1.NOMBRE,
T1.`PRIMER APELLIDO`, T1.`SEGUNDO APELLIDO`, T1.`DIRECCION 1`,
T1.`DIRECCION 2`, T1.`CIUDAD/PUEBLO`, T1.ZIPCODE,
T1.`FECHA DE SUBSCRIPCION`, T1.REFID, T1.VALIDSCORE,
T1.DIRECCION1STD, T1.DIRECCION2STD, T1.CITYSTD, T1.ZIPCODESTD
FROM SSNUNIQUES.TXT AS T1,
(SELECT CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD,
COUNT(*) AS CT FROM SSNUNIQUES.TXT WHERE VALIDSCORE >= " + limit.ToString + " GROUP BY CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD HAVING Count(*) > 1) AS T2
WHERE T2.DIRECCION1STD = T1.DIRECCION1STD
AND T2.DIRECCION2STD = T1.DIRECCION2STD
AND T2.CITYSTD = T1.CITYSTD
AND T2.ZIPCODESTD = T1.ZIPCODESTD
ORDER BY T1.CITYSTD, T1.ZIPCODESTD, T1.DIRECCION1STD,
T1.DIRECCION2STD, T1.ETC, T1.RECID

This results in aproximatelly 8,000 records out of 400,000 by 50,000.
The COUNT(*) value in the first query in average is 2 so I would expect to get 100,000 records, but instead I get only a subset of the records where the VALIDSCORE equals 0.

I tryed removing the ORDER BY clause so to reduce Microsof Text Driver memory problems, but still get the same record set.

So I have two questions:

1. Is the query well written to obtain what I intend to get, what should it be?
2. Is there any system setting that I could make to the VB.NET connection, command codes or to the computer registry?

Cordially,

Jose

Last edited by jetberrocal; 08-06-11 at 14:42.
Reply With Quote
  #2 (permalink)  
Old 08-08-11, 14:56
jetberrocal jetberrocal is offline
Registered User
 
Join Date: Aug 2011
Posts: 4
Smile Found solution for my problem. Beware nulls.

Found solution for my problem:

The records with NULL values where giving me the problem

I added to the columns with NULL the following:

IIF(T2.DIRECCION1STD IS NULL, '', T2.DIRECCION1STD) = IIF(T1.DIRECCION1STD IS NULL, '', T1.DIRECCION1STD)

Since I am using Microsoft Text Driver I cant not use ISNULL function.

The final query is now:

SELECT T1.RECID, T1.ETC, T1.SS, T1.NOMBRE, T1.`PRIMER APELLIDO`,
T1.`SEGUNDO APELLIDO`, T1.`DIRECCION 1`, T1.`DIRECCION 2`,
T1.`CIUDAD/PUEBLO`, T1.ZIPCODE, T1.`FECHA DE SUBSCRIPCION`,
T1.REFID, T1.VALIDSCORE, T1.DIRECCION1STD, T1.DIRECCION2STD,
T1.CITYSTD, T1.ZIPCODESTD, T2.CT AS DUPCT
FROM SSNREMAINS.TXT AS T1,
(SELECT CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD,
COUNT(*) AS CT
FROM SSNREMAINS.TXT WHERE VALIDSCORE >= " + limit.ToString + ""
GROUP BY CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD
HAVING Count(*) > 1) AS T2
WHERE IIF(T2.DIRECCION1STD IS NULL, '', T2.DIRECCION1STD) =
IIF(T1.DIRECCION1STD IS NULL, '', T1.DIRECCION1STD)
AND IIF(T2.DIRECCION2STD IS NULL, '', T2.DIRECCION2STD) =
IIF(T1.DIRECCION2STD IS NULL, '', T1.DIRECCION2STD)
AND T2.CITYSTD = T1.CITYSTD
AND T2.ZIPCODESTD = T1.ZIPCODESTD
ORDER BY T1.CITYSTD, T1.ZIPCODESTD, T1.DIRECCION1STD,
T1.DIRECCION2STD, T1.ETC, T1.RECID

Jose
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On