| |
Welcome to the dBforums forums.
You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!
If you have any problems with the registration process or your account login, please contact contact support.
If you prefer not to see double-underlined words and corresponding ads, place your cursor here for ContentLink opt out.
|
 |
|

01-01-08, 16:48
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
|
DISTINCT causes aggregate to double in value
|
Happy New Year everyone.
I have a fairly complex query that aggregates several subqueries, views and tables. However when I add the DISTINCT modifer to the outside select statement, all of the aggregated values double in value, i.e. returns a result of 4 instead of 2.
I've looked for ambiguous field names that might impact the group by clause, but nothing makes sense as to why this is happening. I should also point out that no matter whether I use distinct or not, the query and subquery only return one row.
Rather than posting the very long query, below is a query that mimics what I'm doing:
Code:
select DISTINCT id, sum(enrollmentCount)
from (
select id,enrollmentCount
from myView...) //view uses aggregate COUNT(*) to calc enrollmentCount
group by id;
Adding the DISTINCT modifier to the sub query doesn't change the result, only adding it to the outer query.
I would have expected distinct to reduce my aggregate values, not increase them.
Has anyone ever seen this before? Any suggestions or thoughts of what to look for?
Thanks.
|
|

01-01-08, 16:56
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
|
|
i don't think that's possible with what you've shown us
GROUP BY id produces one row per id, so no matter what sum(enrollmentCount) is, there will only be one row per id, so DISTINCT is useless and should not change the results
of course, there are two factors here:
1 what you posted isn't what you've got
2 there might be a bug in the database management system
guess which one i think it is 
|
|

01-01-08, 17:08
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
|
I have to agree with you, that it's probably somewhere in my code and that DISTINCT should NOT impact a group by clause nor should it need to be in the statement. Still, it's weird.
Thanks.
|
|

01-01-08, 17:18
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
|
|
can you show a small subset of data that illustrates the problem?
have you tracked down the ids?
|
|

01-01-08, 18:18
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
Rudy,
The data comes from too many tables/subqueries to actually be able to clearly display it here, but thank you. However, let me try to at least provide something...
The view is retreiving counts from 9 separate LEFT joins, that are then being passed through DECODE to set their values to zero if null. The view is then dynamically joined again with 3 more LEFT joins and a WHERE clause is added to limit the records. Lastly all of the above is wrapped in an outer query which is where I'm trying to discover the problem.
Yes, I've gone down through the layers of the query/subqueries/views to make sure that the ID field is fully qualified with the table/alias to remove any ambiguity if that is what you mean by 'have you tracked down the ids'. I've also tried to look at the subqueries to see what might return more than one row to cause the problem. Unfortunately, almost all of my table use ID as their pk so this could be being picked up from any number of tables.
Where I suspect that problem to be coming from is the 9 LEFT joins and that the Oracle parser is picking up a NULL from one of these records and causing the problem. This is just a shot in the dark though.
|
|

01-01-08, 18:39
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
|
|
wow, no offence, but that's a mess
i have been in similar situations and it's no fun
try picking a small number of ids and tracking them through the subqueries and views individually, either by searching within reports or using WHERE clauses
if you are familiar with the totals that they should be producing, it should eventually tell you where the doubling is occurring
i often use temp tables going forward from the lowermost detailed data, then do the outer joins to lookup tables, save again, run an accumulation, save again, and so on
|
|

01-01-08, 18:44
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
Rudy,
Thanks. I was thinking of creating a tmp table from the view and testing with that, so thanks for the prodding. In this case, I do know the ID number as I chose it for the testing, so now I guess it's just time to spend some more time in the guts of the thing.
Thanks for helping me think this through.
Robert
|
|

01-01-08, 19:22
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
Okay, I've been able to narrow this down by stripping out the excess and using a tmp table (tmpDebug) instead of the underlying view.
The problem is in the EXISTS clause. I know that my EXISTS clause contains two matches from the domainMembership table. (The domainmembership table is the middle table in a many to many join between the course and other tables.) The correct value for the sum(enrolledCount) is supposed to be 2. But using the EXISTS clause with two matching records, returns 4. If there are 3 records in the domainMembership table, then the query is returning 6.
But, just because a record is found more than one time in the EXIST subquery clause, why would this impact the exterior aggregates - once for each record contained in the EXISTS subquery? This is acting more like a JOIN than an EXISTS criteria. I even tried adding DISTINCT to the where subquery, and that didn't help.
Code:
select distinct id,
courseTitle,
sum(enrolledCount) as enrolled
from (SELECT id,
courseTitle,
synchenrollmentCount + asynchEnrollmentCount as enrolledCount
FROM tmpDebug fcl
where UPPER(coursetitle) LIKE '%DISOLVE%'
AND status = 'A'
AND (session_status IN ('C') OR session_status is null)) rep
WHERE EXISTS (SELECT entityID
FROM DomainMembership DM, DomainMap MAP
WHERE DM.entityTypeID = 4
AND DM.domainID = MAP.childID
AND (MAP.parentID IN (1))
AND DM.entityID = REP.ID)
group by courseTitle,
id
Thanks.
|
|

01-01-08, 19:56
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
|
|
try this and see what it returns --
Code:
SELECT id
, count(*)
FROM tmpDebug fcl
where UPPER(coursetitle) LIKE '%DISOLVE%'
AND status = 'A'
AND (session_status IN ('C') OR session_status is null)
GROUP
BY id
HAVING count(*) > 1
see if those ids correlate to your dupes
|
|

01-01-08, 20:31
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
Rudy,
the result as written is null. Without the having clause, it returns just one record with a count of one for the result.
Thanks.
|
|

01-01-08, 21:09
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
|
|
um, you have only one id in your temp table?
i am confused again
|
|

01-02-08, 00:33
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
Rudy,
Sorry for the confusion. I have only one row that matches the criteria of:
Code:
UPPER(coursetitle) LIKE '%DISOLVE%'
AND status = 'A'
AND (session_status IN ('C') OR session_status is null)
However,
Code:
SELECT id
,status,session_status, count(*)
FROM tmpDebug fcl
where UPPER(coursetitle) LIKE '%DISOLVE%'
GROUP
BY id,status,session_status
returns.
RowID ID STATUS SESSION_STATUS COUNT(*)
1 1001 A A 46
2 1001 A C 1
Thanks.
|
|

01-02-08, 08:27
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
|
|
hmmm...
still no idea why it's duplicating the sums in the outer query...
|
|

01-02-08, 11:17
|
|
Registered User
|
|
Join Date: May 2004
Posts: 168
|
|
Okay, I was able to reproduce this in scott/tiger...
Note the difference in the aggregates between these two queries.
Code:
select deptno,sal
from (
select deptno,sum(sal) as sal
from emp emp1
where exists
(select * from emp emp2
where emp1.deptno=emp2.deptno)
group by deptno )
Generates
DEPTNO SAL
1 10 8750
2 20 10875
3 30 9400
but
Code:
select DISTINCT deptno,sal
from (
select deptno,sum(sal) as sal
from emp emp1
where exists
(select * from emp emp2
where emp1.deptno=emp2.deptno)
group by deptno )
DEPTNO SAL
1 10 26250
2 20 54375
3 30 56400
Consider the following deptno counts in emp:
Code:
select deptno,count(*)
from emp
group by deptno
Returns:
DEPTNO COUNT(*)
1 10 3
2 20 5
3 30 6
There is obviously something that Oracle does differently with how it handles WHERE clauses in combination with DISTINCT and aggregates one value per record found in the WHERE clause.
Thanks.
|
|

01-02-08, 16:34
|
|
Lost Boy
|
|
Join Date: Jan 2004
Location: Croatia, Europe
Posts: 2,717
|
|
... and your database version is ...?
Code:
SQL> select banner from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL> -- the first query:
SQL>
SQL> select deptno,sal
2 from (
3 select deptno,sum(sal) as sal
4 from emp emp1
5 where exists
6 (select * from emp emp2
7 where emp1.deptno=emp2.deptno)
8 group by deptno )
9 order by deptno;
DEPTNO SAL
---------- ----------
10 8750
20 10875
30 1250
40 8150
SQL>
Code:
SQL> -- the second query:
SQL>
SQL> select DISTINCT deptno,sal
2 from (
3 select deptno,sum(sal) as sal
4 from emp emp1
5 where exists
6 (select * from emp emp2
7 where emp1.deptno=emp2.deptno)
8 group by deptno )
9 order by deptno;
DEPTNO SAL
---------- ----------
10 8750
20 10875
30 1250
40 8150
SQL>
Unless I did something wrong, it seems to work correctly on 10g.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|