Welcome to the dBforums forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact support.

If you prefer not to see double-underlined words and corresponding ads, place your cursor
here for ContentLink opt out.

Go Back  dBforums > Database Server Software > Oracle > DISTINCT causes aggregate to double in value

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-01-08, 16:48
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
DISTINCT causes aggregate to double in value

Happy New Year everyone.

I have a fairly complex query that aggregates several subqueries, views and tables. However when I add the DISTINCT modifer to the outside select statement, all of the aggregated values double in value, i.e. returns a result of 4 instead of 2.

I've looked for ambiguous field names that might impact the group by clause, but nothing makes sense as to why this is happening. I should also point out that no matter whether I use distinct or not, the query and subquery only return one row.

Rather than posting the very long query, below is a query that mimics what I'm doing:

Code:
select DISTINCT id, sum(enrollmentCount) from ( select id,enrollmentCount from myView...) //view uses aggregate COUNT(*) to calc enrollmentCount group by id;

Adding the DISTINCT modifier to the sub query doesn't change the result, only adding it to the outer query.

I would have expected distinct to reduce my aggregate values, not increase them.

Has anyone ever seen this before? Any suggestions or thoughts of what to look for?

Thanks.
Reply With Quote
  #2 (permalink)  
Old 01-01-08, 16:56
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
i don't think that's possible with what you've shown us

GROUP BY id produces one row per id, so no matter what sum(enrollmentCount) is, there will only be one row per id, so DISTINCT is useless and should not change the results

of course, there are two factors here:

1 what you posted isn't what you've got
2 there might be a bug in the database management system

guess which one i think it is
__________________
r937.com | rudy.ca

pre-order my book Simply SQL from Amazon
Reply With Quote
  #3 (permalink)  
Old 01-01-08, 17:08
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
I have to agree with you, that it's probably somewhere in my code and that DISTINCT should NOT impact a group by clause nor should it need to be in the statement. Still, it's weird.

Thanks.
Reply With Quote
  #4 (permalink)  
Old 01-01-08, 17:18
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
can you show a small subset of data that illustrates the problem?

have you tracked down the ids?
__________________
r937.com | rudy.ca

pre-order my book Simply SQL from Amazon
Reply With Quote
  #5 (permalink)  
Old 01-01-08, 18:18
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
Rudy,

The data comes from too many tables/subqueries to actually be able to clearly display it here, but thank you. However, let me try to at least provide something...

The view is retreiving counts from 9 separate LEFT joins, that are then being passed through DECODE to set their values to zero if null. The view is then dynamically joined again with 3 more LEFT joins and a WHERE clause is added to limit the records. Lastly all of the above is wrapped in an outer query which is where I'm trying to discover the problem.

Yes, I've gone down through the layers of the query/subqueries/views to make sure that the ID field is fully qualified with the table/alias to remove any ambiguity if that is what you mean by 'have you tracked down the ids'. I've also tried to look at the subqueries to see what might return more than one row to cause the problem. Unfortunately, almost all of my table use ID as their pk so this could be being picked up from any number of tables.

Where I suspect that problem to be coming from is the 9 LEFT joins and that the Oracle parser is picking up a NULL from one of these records and causing the problem. This is just a shot in the dark though.
Reply With Quote
  #6 (permalink)  
Old 01-01-08, 18:39
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
wow, no offence, but that's a mess

i have been in similar situations and it's no fun

try picking a small number of ids and tracking them through the subqueries and views individually, either by searching within reports or using WHERE clauses

if you are familiar with the totals that they should be producing, it should eventually tell you where the doubling is occurring

i often use temp tables going forward from the lowermost detailed data, then do the outer joins to lookup tables, save again, run an accumulation, save again, and so on
__________________
r937.com | rudy.ca

pre-order my book Simply SQL from Amazon
Reply With Quote
  #7 (permalink)  
Old 01-01-08, 18:44
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
Rudy,

Thanks. I was thinking of creating a tmp table from the view and testing with that, so thanks for the prodding. In this case, I do know the ID number as I chose it for the testing, so now I guess it's just time to spend some more time in the guts of the thing.

Thanks for helping me think this through.

Robert
Reply With Quote
  #8 (permalink)  
Old 01-01-08, 19:22
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
Okay, I've been able to narrow this down by stripping out the excess and using a tmp table (tmpDebug) instead of the underlying view.

The problem is in the EXISTS clause. I know that my EXISTS clause contains two matches from the domainMembership table. (The domainmembership table is the middle table in a many to many join between the course and other tables.) The correct value for the sum(enrolledCount) is supposed to be 2. But using the EXISTS clause with two matching records, returns 4. If there are 3 records in the domainMembership table, then the query is returning 6.

But, just because a record is found more than one time in the EXIST subquery clause, why would this impact the exterior aggregates - once for each record contained in the EXISTS subquery? This is acting more like a JOIN than an EXISTS criteria. I even tried adding DISTINCT to the where subquery, and that didn't help.

Code:
select distinct id, courseTitle, sum(enrolledCount) as enrolled from (SELECT id, courseTitle, synchenrollmentCount + asynchEnrollmentCount as enrolledCount FROM tmpDebug fcl where UPPER(coursetitle) LIKE '%DISOLVE%' AND status = 'A' AND (session_status IN ('C') OR session_status is null)) rep WHERE EXISTS (SELECT entityID FROM DomainMembership DM, DomainMap MAP WHERE DM.entityTypeID = 4 AND DM.domainID = MAP.childID AND (MAP.parentID IN (1)) AND DM.entityID = REP.ID) group by courseTitle, id

Thanks.
Reply With Quote
  #9 (permalink)  
Old 01-01-08, 19:56
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
try this and see what it returns --
Code:
SELECT id , count(*) FROM tmpDebug fcl where UPPER(coursetitle) LIKE '%DISOLVE%' AND status = 'A' AND (session_status IN ('C') OR session_status is null) GROUP BY id HAVING count(*) > 1
see if those ids correlate to your dupes
__________________
r937.com | rudy.ca

pre-order my book Simply SQL from Amazon
Reply With Quote
  #10 (permalink)  
Old 01-01-08, 20:31
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
Rudy,

the result as written is null. Without the having clause, it returns just one record with a count of one for the result.

Thanks.
Reply With Quote
  #11 (permalink)  
Old 01-01-08, 21:09
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
um, you have only one id in your temp table?

i am confused again
__________________
r937.com | rudy.ca

pre-order my book Simply SQL from Amazon
Reply With Quote
  #12 (permalink)  
Old 01-02-08, 00:33
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
Rudy,

Sorry for the confusion. I have only one row that matches the criteria of:
Code:
UPPER(coursetitle) LIKE '%DISOLVE%' AND status = 'A' AND (session_status IN ('C') OR session_status is null)

However,
Code:
SELECT id ,status,session_status, count(*) FROM tmpDebug fcl where UPPER(coursetitle) LIKE '%DISOLVE%' GROUP BY id,status,session_status
returns.

RowID ID STATUS SESSION_STATUS COUNT(*)
1 1001 A A 46
2 1001 A C 1

Thanks.
Reply With Quote
  #13 (permalink)  
Old 01-02-08, 08:27
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 13,556
hmmm...

still no idea why it's duplicating the sums in the outer query...
__________________
r937.com | rudy.ca

pre-order my book Simply SQL from Amazon
Reply With Quote
  #14 (permalink)  
Old 01-02-08, 11:17
rtnetworks rtnetworks is offline
Registered User
 
Join Date: May 2004
Posts: 168
Okay, I was able to reproduce this in scott/tiger...

Note the difference in the aggregates between these two queries.

Code:
select deptno,sal from ( select deptno,sum(sal) as sal from emp emp1 where exists (select * from emp emp2 where emp1.deptno=emp2.deptno) group by deptno )

Generates
DEPTNO SAL
1 10 8750
2 20 10875
3 30 9400

but

Code:
select DISTINCT deptno,sal from ( select deptno,sum(sal) as sal from emp emp1 where exists (select * from emp emp2 where emp1.deptno=emp2.deptno) group by deptno )

DEPTNO SAL
1 10 26250
2 20 54375
3 30 56400

Consider the following deptno counts in emp:
Code:
select deptno,count(*) from emp group by deptno
Returns:
DEPTNO COUNT(*)
1 10 3
2 20 5
3 30 6

There is obviously something that Oracle does differently with how it handles WHERE clauses in combination with DISTINCT and aggregates one value per record found in the WHERE clause.



Thanks.
Reply With Quote
  #15 (permalink)  
Old 01-02-08, 16:34
Littlefoot Littlefoot is online now
Lost Boy
 
Join Date: Jan 2004
Location: Croatia, Europe
Posts: 2,717
... and your database version is ...?
Code:
SQL> select banner from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod PL/SQL Release 10.2.0.1.0 - Production CORE 10.2.0.1.0 Production TNS for 32-bit Windows: Version 10.2.0.1.0 - Production NLSRTL Version 10.2.0.1.0 - Production SQL> -- the first query: SQL> SQL> select deptno,sal 2 from ( 3 select deptno,sum(sal) as sal 4 from emp emp1 5 where exists 6 (select * from emp emp2 7 where emp1.deptno=emp2.deptno) 8 group by deptno ) 9 order by deptno; DEPTNO SAL ---------- ---------- 10 8750 20 10875 30 1250 40 8150 SQL>
Code:
SQL> -- the second query: SQL> SQL> select DISTINCT deptno,sal 2 from ( 3 select deptno,sum(sal) as sal 4 from emp emp1 5 where exists 6 (select * from emp emp2 7 where emp1.deptno=emp2.deptno) 8 group by deptno ) 9 order by deptno; DEPTNO SAL ---------- ---------- 10 8750 20 10875 30 1250 40 8150 SQL>
Unless I did something wrong, it seems to work correctly on 10g.
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On