| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

03-26-11, 06:59
|
|
Registered User
|
|
Join Date: Mar 2011
Posts: 4
|
|
Db2 find duplicate record across multiple tables
|
|
i have 32 tables of each state road say, delhi_road,haryana_road,up_road,mh_road..etc for all states
i have edge_id column in each table which has a unique integer value.
now i want to find duplicate edge_id across these multiple and same table.
on same table i can make edge_id as a primary key ,but how can i check that delhi_road edge_id is not duplicated in haryana_road edge_id or mh_road edge_id or vice versa.
please suggest me a query/strored proc/or any other option to perform this .
Sanjay
|
|

03-26-11, 08:48
|
|
Registered User
|
|
Join Date: May 2009
Posts: 472
|
|
Here is one way that should work. There are probably others.
Code:
WITH DUP_EDGE (EDGE_ID)
AS (
SELECT EDGE_ID FROM TAB01 UNION ALL
SELECT EDGE_ID FROM TAB01 UNION ALL
SELECT EDGE_ID FROM TAB01 UNION ALL
...
SELECT EDGE_ID FROM TAB31 UNION ALL
SELECT EDGE_ID FROM TAB31
GROUP BY EDGE_ID
HAVING COUNT(*) > 1
)
SELECT 'TAB01' AS TAB_NAME
, EDGE_ID
FROM TAB01
INNER JOIN
DUP_EDGE
ON TAB01.EDGE_ID = DUP_EDGE.EDGE_ID
UNION ALL
SELECT 'TAB02' AS TAB_NAME
, EDGE_ID
FROM TAB02
INNER JOIN
DUP_EDGE
ON TAB02.EDGE_ID = DUP_EDGE.EDGE_ID
...
UNION ALL
SELECT 'TAB32' AS TAB_NAME
, EDGE_ID
FROM TAB32
INNER JOIN
DUP_EDGE
ON TAB32.EDGE_ID = DUP_EDGE.EDGE_ID
The Common Table Expression creates a 'virtual' table of distinct EDGE_ID values that are in the table(s) more than once.
This is used to join to each table to get the rows that have any of these duplicate values. To know which table the row comes from a 'label' TAB_NAM is added to the Select.
|
|

03-26-11, 10:52
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,195
|
|
|
|
Here is another example.
But, this may not perform well as Stealth_DBA's example,
because it is necessary to join all 32 tables at once and to specify 32*(32-1)/2 = 496 ORed conditions.
Code:
SELECT
COALESCE(t01.edge_id , t01.edge_id , ... , t31.edge_id) /* t32.edge_id is not necessary */
AS edge_id
, CASE WHEN t01.edge_id IS NOT NULL THEN 'Y' ELSE '.' END
|| CASE WHEN t02.edge_id IS NOT NULL THEN 'Y' ELSE '.' END
...
|| CASE WHEN t32.edge_id IS NOT NULL THEN 'Y' ELSE '.' END
AS duplicates
FROM
delhi_road t01
FULL OUTER JOIN
haryana_road t02
ON t02.edge_id = t01.edge_id
FULL OUTER JOIN
mh_road t03
ON t03.edge_id = COALESCE(t01.edge_id , t02.edge_id)
...
FULL OUTER JOIN
zzzz_road t32
ON t32.edge_id = COALESCE(t01.edge_id , t02.edge_id , ... , t31.edge_id)
WHERE
/*
number of ORed conditiones are
32 * (32 - 1) / 2 = 496
*/
t01.edge_id = t02.edge_id
OR t01.edge_id = t03.edge_id
...
OR t01.edge_id = t32.edge_id
OR t02.edge_id = t03.edge_id
OR t02.edge_id = t04.edge_id
...
OR t02.edge_id = t32.edge_id
OR t03.edge_id = t04.edge_id
OR t03.edge_id = t05.edge_id
...
OR t03.edge_id = t32.edge_id
...
...
OR t30.edge_id = t31.edge_id
OR t30.edge_id = t32.edge_id
OR t31.edge_id = t32.edge_id
/*
ORDER BY
edge_id
*/
;
This is a sample output of 4 tables:
Code:
------------------------------ Commands Entered ------------------------------
WITH
delhi_road (edge_id) AS (
VALUES 0,1,2,3,4,5,6,7 ,21
)
, haryana_road(edge_id) AS (
VALUES 0,1,2,3,8,9,10,11 ,22
)
, mh_road (edge_id) AS (
VALUES 0,1,4,5,8,9,12,13 ,23
)
, east_road (edge_id) AS (
VALUES 0,2,4,6,8,10,12,14 ,24
)
SELECT
COALESCE(t01.edge_id , t01.edge_id , t02.edge_id , t03.edge_id)
AS edge_id
, ' '
|| CASE WHEN t01.edge_id IS NOT NULL THEN 'Y' ELSE '.' END
|| CASE WHEN t02.edge_id IS NOT NULL THEN 'Y' ELSE '.' END
|| CASE WHEN t03.edge_id IS NOT NULL THEN 'Y' ELSE '.' END
|| CASE WHEN t04.edge_id IS NOT NULL THEN 'Y' ELSE '.' END
AS duplicates
, SMALLINT(t01.edge_id) AS id01
, SMALLINT(t02.edge_id) AS id02
, SMALLINT(t03.edge_id) AS id03
, SMALLINT(t04.edge_id) AS id04
FROM
delhi_road t01
FULL OUTER JOIN
haryana_road t02
ON t02.edge_id = t01.edge_id
FULL OUTER JOIN
mh_road t03
ON t03.edge_id = COALESCE(t01.edge_id , t02.edge_id)
FULL OUTER JOIN
east_road t04
ON t04.edge_id = COALESCE(t01.edge_id , t02.edge_id , t03.edge_id)
WHERE
t01.edge_id = t02.edge_id
OR t01.edge_id = t03.edge_id
OR t01.edge_id = t04.edge_id
OR t02.edge_id = t03.edge_id
OR t02.edge_id = t04.edge_id
OR t03.edge_id = t04.edge_id
ORDER BY
edge_id
;
------------------------------------------------------------------------------
EDGE_ID DUPLICATES ID01 ID02 ID03 ID04
----------- ---------- ------ ------ ------ ------
0 YYYY 0 0 0 0
1 YYY. 1 1 1 -
2 YY.Y 2 2 - 2
3 YY.. 3 3 - -
4 Y.YY 4 - 4 4
5 Y.Y. 5 - 5 -
6 Y..Y 6 - - 6
8 .YYY - 8 8 8
9 .YY. - 9 9 -
10 .Y.Y - 10 - 10
12 ..YY - - 12 12
11 record(s) selected.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|