Results 1 to 2 of 2
  1. #1
    Join Date
    Aug 2011
    Posts
    4

    Red face Strange self join query result

    Hello,

    I am trying to get all duplicates records from a Tab Delimited TXT file using ODBC with the Microsoft Text Driver and schema.ini.

    The problem is getting the extra fields with the duplicated fields.

    To obtain the duplicates records I did the following query:

    SELECT CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD, COUNT(*) AS CT FROM SSNUNIQUES.TXT WHERE VALIDSCORE >= " + limit.ToString
    + " GROUP BY CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD
    HAVING Count(*) > 1

    The VALIDSCORE column values range from -1 to 100 integers, I used 0 to get all records, since the records with VALIDSCORE = -1 are NULL in the grouping columns.

    This query results in aproximatelly 50,000 records out of 400,000.

    To clarify, the CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD columns may have a DIRECCION1STD, DIRECCION2STD with NULL values.

    Now I am executing the following query in order to obtain each full duplicate record from the same TXT file:

    SELECT T1.RECID, T1.ETC, T1.SS, T2.CT AS DUPCT, T1.NOMBRE,
    T1.`PRIMER APELLIDO`, T1.`SEGUNDO APELLIDO`, T1.`DIRECCION 1`,
    T1.`DIRECCION 2`, T1.`CIUDAD/PUEBLO`, T1.ZIPCODE,
    T1.`FECHA DE SUBSCRIPCION`, T1.REFID, T1.VALIDSCORE,
    T1.DIRECCION1STD, T1.DIRECCION2STD, T1.CITYSTD, T1.ZIPCODESTD
    FROM SSNUNIQUES.TXT AS T1,
    (SELECT CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD,
    COUNT(*) AS CT FROM SSNUNIQUES.TXT WHERE VALIDSCORE >= " + limit.ToString + " GROUP BY CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD HAVING Count(*) > 1) AS T2
    WHERE T2.DIRECCION1STD = T1.DIRECCION1STD
    AND T2.DIRECCION2STD = T1.DIRECCION2STD
    AND T2.CITYSTD = T1.CITYSTD
    AND T2.ZIPCODESTD = T1.ZIPCODESTD
    ORDER BY T1.CITYSTD, T1.ZIPCODESTD, T1.DIRECCION1STD,
    T1.DIRECCION2STD, T1.ETC, T1.RECID

    This results in aproximatelly 8,000 records out of 400,000 by 50,000.
    The COUNT(*) value in the first query in average is 2 so I would expect to get 100,000 records, but instead I get only a subset of the records where the VALIDSCORE equals 0.

    I tryed removing the ORDER BY clause so to reduce Microsof Text Driver memory problems, but still get the same record set.

    So I have two questions:

    1. Is the query well written to obtain what I intend to get, what should it be?
    2. Is there any system setting that I could make to the VB.NET connection, command codes or to the computer registry?

    Cordially,

    Jose
    Last edited by jetberrocal; 08-06-11 at 15:42.

  2. #2
    Join Date
    Aug 2011
    Posts
    4

    Smile Found solution for my problem. Beware nulls.

    Found solution for my problem:

    The records with NULL values where giving me the problem

    I added to the columns with NULL the following:

    IIF(T2.DIRECCION1STD IS NULL, '', T2.DIRECCION1STD) = IIF(T1.DIRECCION1STD IS NULL, '', T1.DIRECCION1STD)

    Since I am using Microsoft Text Driver I cant not use ISNULL function.

    The final query is now:

    SELECT T1.RECID, T1.ETC, T1.SS, T1.NOMBRE, T1.`PRIMER APELLIDO`,
    T1.`SEGUNDO APELLIDO`, T1.`DIRECCION 1`, T1.`DIRECCION 2`,
    T1.`CIUDAD/PUEBLO`, T1.ZIPCODE, T1.`FECHA DE SUBSCRIPCION`,
    T1.REFID, T1.VALIDSCORE, T1.DIRECCION1STD, T1.DIRECCION2STD,
    T1.CITYSTD, T1.ZIPCODESTD, T2.CT AS DUPCT
    FROM SSNREMAINS.TXT AS T1,
    (SELECT CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD,
    COUNT(*) AS CT
    FROM SSNREMAINS.TXT WHERE VALIDSCORE >= " + limit.ToString + ""
    GROUP BY CITYSTD, ZIPCODESTD, DIRECCION1STD, DIRECCION2STD
    HAVING Count(*) > 1) AS T2
    WHERE IIF(T2.DIRECCION1STD IS NULL, '', T2.DIRECCION1STD) =
    IIF(T1.DIRECCION1STD IS NULL, '', T1.DIRECCION1STD)
    AND IIF(T2.DIRECCION2STD IS NULL, '', T2.DIRECCION2STD) =
    IIF(T1.DIRECCION2STD IS NULL, '', T1.DIRECCION2STD)
    AND T2.CITYSTD = T1.CITYSTD
    AND T2.ZIPCODESTD = T1.ZIPCODESTD
    ORDER BY T1.CITYSTD, T1.ZIPCODESTD, T1.DIRECCION1STD,
    T1.DIRECCION2STD, T1.ETC, T1.RECID

    Jose

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •