# Thread: Counting customers that are in multiple groups

1. Registered User
Join Date
Oct 2010
Location
Atlanta, GA
Posts
213

## Unanswered: Counting customers that are in multiple groups

I have a large table that tells me customers and mailings (~4 million rows). So for each customer, I see what mailing(s) they received. It is at the mailing level so if a customer received 2 mailings, their will be 2 records for that customer number, one for each mailing they received. There are 3 different mailings.

I am trying to calculate the count of customers that received all 3, as well as those that received 1&2 only, 1&3 only, and 2&3 only. For example:

CustNo Mailing
Tim 1
Tim 3
Dan 1
Dan 3
Jane 1
Max 1
Max 2
Max 3
Lauren 2
Lauren 3

Results:

All 3 - 1
1&2 only - 0
1&3 only - 2
2&3 only - 1

I have tried doing some funky table joining and EXISTS, but the query runs and runs because of how much data it has to sift through. I was wondering if there is a trick to this that i am not thinking of.

Here is the code I am using to find Mailing 1&2 only customers, but I am not even sure it is getting me the right numbers. There are 2 different types of customers as well which is why I have custtype in there.

Code:
```select a.custtype,count(distinct a.custno)
from myTable a
inner join (select distinct custno from mytable where [mailing]=2) b
on a.custno=b.custno
where a.[mailing]=1
and not exists(select * from myTable c where a.custno=c.custno and c.[mailing]=3)
group by a.custtype```

2. World Class Flame Warrior
Join Date
Jun 2003
Location
Ohio
Posts
12,595

Code:
```;with TestData as
(select	'Tim' as CustNo, 1 as Mailing
union all select 'Tim' as CustNo, 3 as Mailing
union all select 'Dan' as CustNo, 1 as Mailing
union all select 'Dan' as CustNo, 3 as Mailing
union all select 'Jane' as CustNo, 1 as Mailing
union all select 'Max' as CustNo, 1 as Mailing
union all select 'Max' as CustNo, 2 as Mailing
union all select 'Max' as CustNo, 3 as Mailing
union all select 'Lauren' as CustNo, 2 as Mailing
union all select 'Lauren' as CustNo, 3 as Mailing)
select	*
into	#TestData
from	TestData

;with
BitSums as
(select	'1 only' as Mailing, 2 as BitSum
union select '2 only' as Mailing, 4 as BitSum
union select '3 only' as Mailing, 8 as BitSum
union select '1 and 2' as Mailing, 6 as BitSum
union select '1 and 3' as Mailing, 10 as BitSum
union select '2 and 3' as Mailing, 12 as BitSum
union select 'All 3' as Mailing, 14 as BitSum),
CustBitSums as
(select	CustNo,
sum(POWER(2, Mailing)) as BitSum
from	#TestData
group by CustNo)
select	BitSums.Mailing,
COUNT(CustNo) as CustCount
from	BitSums
left outer join CustBitSums on BitSums.BitSum = CustBitSums.BitSum
group by BitSums.Mailing

drop table #TestData```

3. Registered User
Join Date
Oct 2010
Location
Atlanta, GA
Posts
213
oh wow. i am still trying to wrap my head around this one. looks like the BitSum is sum(2^mailing#). Why/How does this work?

4. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
You are correct at least in a sense... The BitSum value is a computed bitmap.

This isn't a whole lot prettier, but it ought to perform well enough on your SQL 2008 table with a few million rows:
Code:
```CREATE TABLE #clawlan (
CustNo       VARCHAR(9)
,  Mailing      INT
)

INSERT INTO #clawlan (CustNo, Mailing)
VALUES ('Tim', 1) ,  ('Tim', 3),  ('Dan', 1)
,  ('Dan', 3),  ('Jane', 1),  ('Max', 1)
,  ('Max', 2),  ('Max', 3),  ('Lauren', 2)
,  ('Lauren', 3)

SELECT u.UniqueCust, a1.Mailing AS '1', a2.Mailing AS '2', a3.Mailing AS '3'
FROM (SELECT DISTINCT CustNo as UniqueCust
FROM #clawlan) AS u
LEFT OUTER JOIN #clawlan AS a1 ON (1 = a1.Mailing AND a1.CustNo = u.UniqueCust)
LEFT OUTER JOIN #clawlan AS a2 ON (2 = a2.Mailing AND a2.CustNo = u.UniqueCust)
LEFT OUTER JOIN #clawlan AS a3 ON (3 = a3.Mailing AND a3.CustNo = u.UniqueCust)

SELECT CustNo, [1], [2], [3]
FROM
(SELECT CustNo, Mailing
FROM #clawlan) AS SourceTable
PIVOT
(
Count(Mailing)
FOR Mailing IN ([0], [1], [2], [3])
) AS PivotTable;

DROP TABLE #clawlan```
-PatP
Last edited by Pat Phelan; 02-19-13 at 18:29. Reason: Added PIVOT example

5. Registered User
Join Date
Oct 2010
Location
Atlanta, GA
Posts
213
thank you both.

6. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
For what it is worth, blindman's BitSum is predicated on the assumption that there is only 1 row for a given customer/mailing combination... It goes horribly wrong if there are duplicates!

My first example (with the LEFT JOIN) goes wiggy, but in a carefully controlled and predictable manner that works optimally for some kinds of processing.

My second example (with the PIVOT) works more conventionally, in that it always produces one row for any given CustNo value, but sometimes generates a count other than 0 or 1.

-PatP

7. World Class Flame Warrior
Join Date
Jun 2003
Location
Ohio
Posts
12,595
True! I assumed a natural key between CustNo and MailingList.
But, if some customers received a mailling more than once this could be handled by a DISTINCT clause.

8. Registered User
Join Date
Jan 2013
Posts
359
Relational division is one of the eight basic operations in Codd's relational algebra. The idea is that a divisor table is used to partition a dividend table and produce a quotient or results table. The quotient table is made up of those values of one column for which a second column had all of the values in the divisor. There is a really good presentation on four ways to do this at:

http://www.cs.arizona.edu/people/mcc...esentation.pdf

This is easier to explain with an example. We have a table of customer_ids and the mailing_nbrs they can receive (dividend); we have a table of mailing_nbrs in the Mailings (divisor); we want the names of the customer_ids who can receive every mailing_nbr (quotient) in the Mailings. To get this result, we divide the Customer_Mailings table by the mailing_nbrs in the Mailings.

CREATE TABLE Customer_Mailings
(customer_id CHAR(15) NOT NULL,
mailing_nbr CHAR(15) NOT NULL,
PRIMARY KEY (customer_id, mailing_nbr));

CREATE TABLE Mailings
(mailing_nbr CHAR(15) NOT NULL PRIMARY KEY);

The important characteristic of a relational division is that the CROSS JOIN (Cartesian product) of the divisor and the quotient produces a valid subset of rows from the dividend. This is where the name comes from, since the CROSS JOIN acts like a multiplication operator.

Division with a Remainder

There are two kinds of relational division. Division with a remainder allows the dividend table to have more values than the divisor, which was Codd's original definition. For example, if a customer can receive more mailings than just those we have in the Mailings, this is fine with us.

SELECT DISTINCT customer_id -- or COUNT(DISTINCT customer_id)
FROM Customer_Mailings AS CM1
WHERE NOT EXISTS
(SELECT *
FROM Mailings
WHERE NOT EXISTS
(SELECT *
FROM Customer_Mailings AS CM2
WHERE (CM1.customer_id = CM2.customer_id)
AND (CM2.mailing_nbr = Mailings.mailing_nbr)));

This query for relational division was made popular by Chris Date in his textbooks, but it is not the only method nor always the fastest. Another version of the division can be written so as to avoid three levels of nesting. While it is not original with me, I have made it popular in my books.

SELECT CM1.customer_id
FROM Customer_Mailings AS CM1, Mailings AS M1
WHERE CM1.mailing_nbr = M1.mailing_nbr
GROUP BY CM1.customer_id
HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings);

There is a serious difference in the two methods. Truncate the Mailings, so that the divisor is empty. Because of the NOT EXISTS() predicates in Date's query, all customer_ids are returned from a division by an empty set. Because of the COUNT() functions in my query, no customer_ids are returned from a division by an empty set.

Exact Division

The second kind of relational division is exact relational division. The dividend table must match exactly to the values of the divisor without any extra values.

SELECT CM1.customer_id
FROM Customer_Mailings AS CM1
LEFT OUTER JOIN
Mailings AS M1
ON CM1.mailing_nbr = M1.mailing_nbr
GROUP BY CM1.customer_id
HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings)
AND COUNT(M1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings);

This says that a customer_id must have the same number of certificates as there mailing_nbrs in the Mailings and these all match to a mailing_nbr in the Mailings, not something else. The "something else" is shown by a created NULL from the LEFT OUTER JOIN.

Note on Performance

The Winter 1996 edition of DB2 ON-LINE MAGAZINE (IBM Data magazine) had an article entitled "Powerful SQL:Beyond the Basics" by Sheryl Larsen which gave the results of testing both methods. Her conclusion for DB2 at that time was that the nested EXISTS() version is better when the quotient has less than 25% of the dividend table's rows and the COUNT(*) version is better when the quotient is more than 25% of the dividend table.

9. Registered User
Join Date
Aug 2004
Location
Dallas, Texas
Posts
831
Scary enough, but I'm starting to understand Joe.

10. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
Originally Posted by corncrowe
Scary enough, but I'm starting to understand Joe.
Joe has earned a reputation for knowing a LOT about the mathematical fundamentals of databases (aka relational algebra and its kin).

Like everyone I've met that is wildly gifted, Joe sometimes has problems presenting what he knows to us mere mortals... If you consider the problems that you have dealing with database newbies that see databases and spreadsheets in the same light, then imagine what it would be like if you saw datbase questions from Joe's more advanced perspective then you get an idea of the communication problem that Joe faces.

Nobody is perfect, we all have flaws (except for me of course). Joe is working on improving how he is perceived, and he has made HUGE strides in that area. It is especially fun for me to watch what happens when people like blindman and Joe collide, where each is an ackowledged master of different skills and both are equally stubborn... The collision is always noisy and messy, but it is both great fun to watch and truly educating for those willing to dig into it and smart enough to avoid the pitfalls!

-PatP

11. World Class Flame Warrior
Join Date
Jun 2003
Location
Ohio
Posts
12,595
Uhmm....where in all that do you actually perform the query that was requested by the original poster?

12. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
Originally Posted by blindman
Uhmm....where in all that do you actually perform the query that was requested by the original poster?
If you're asking about Joe's post I think that he was showing HOW to do the task instead of providing a complete block of code ready to execute. This is actually a better way to answer questions from my perspective because it demonstrates without doing, so that the original poster can adapt the answer to their own problem. This neatly sidesteps the issues that we often have with unstated requirements from the original post!

-PatP

13. World Class Flame Warrior
Join Date
Jun 2003
Location
Ohio
Posts
12,595
Pat, his code does not show "how" to do the task.
His code shows how to do a cross-join, which is not sufficient for solving the OP's issue.

I provided a method using bitmasks.
You provided a method using UNIONs.
If Joe would like to provide a solution using CROSS JOIN, then that would appropriate.

Otherwise, Joe is just continuing his pattern of swooping into a thread after it has been answered in order to perform one or more of the following:
1) Tell the poster that he did not supply enough information to answer his question.
2) Berate the poster for not following his dogmatic naming standards and lecture them on forum etiquette.
3) Go off on a tangent in order to demonstrate his knowledge. Forgive me, but I feel this is simply using the forum for pimping.

I've been watching him do this time after time after time.

I'd have welcomed his above post as a stand-alone "article" in the Database Concepts and Design subforum. But in its present context the poster is likely to spend considerable time trying to decipher Joe's industry vernacular under the misunderstanding that it actually applies to his immediate problem.

14. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
Ok, granted that I'm violating my own suggestion from above about not posting a cut-and-paste, but I think that is the quickest way to make this clearer.
Code:
```--  Create tables

CREATE TABLE Customer_Mailings
(  customer_id  CHAR(15)    NOT NULL
,  mailing_nbr  CHAR(15)    NOT NULL,
PRIMARY KEY (customer_id, mailing_nbr)
);

CREATE TABLE Mailings
(  mailing_nbr  CHAR(15)      NOT NULL
PRIMARY KEY
);

--  Populate tables

INSERT INTO Customer_Mailings (customer_id, mailing_nbr)
VALUES ('Tim', '1') ,  ('Tim', '3'),  ('Dan', '1')
,  ('Dan', '3'),  ('Jane', '1'),  ('Max', '1')
,  ('Max', '2'),  ('Max', '3'),  ('Lauren', '2')
,  ('Lauren', '3')

INSERT INTO Mailings (mailing_nbr)
SELECT DISTINCT mailing_nbr
FROM Customer_Mailings

--  Standard relational division (more or less copied from Joe's posts

SELECT DISTINCT customer_id -- or COUNT(DISTINCT customer_id)
FROM Customer_Mailings AS CM1
WHERE NOT EXISTS
(SELECT *
FROM Mailings
WHERE NOT EXISTS
(SELECT *
FROM Customer_Mailings AS CM2
WHERE  (CM2.customer_id = CM1.customer_id)
AND (CM2.mailing_nbr = Mailings.mailing_nbr)
)
);

SELECT CM1.customer_id
FROM Customer_Mailings AS CM1
LEFT OUTER JOIN Mailings AS M1
ON CM1.mailing_nbr = M1.mailing_nbr
GROUP BY CM1.customer_id
HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings)
AND COUNT(M1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings);

--  Enhanced to address blindman's concerns

CREATE VIEW Interesting_Mailings
AS SELECT mailing_nbr
FROM Mailings
WHERE  mailing_nbr IN ('1', '3')

SELECT DISTINCT customer_id -- or COUNT(DISTINCT customer_id)
FROM Customer_Mailings AS CM1
WHERE NOT EXISTS
(SELECT *
FROM Interesting_Mailings
WHERE NOT EXISTS
(SELECT *
FROM Customer_Mailings AS CM2
WHERE  (CM2.customer_id = CM1.customer_id)
AND (CM2.mailing_nbr = Interesting_Mailings.mailing_nbr)
)
);

SELECT CM1.customer_id
FROM Customer_Mailings AS CM1
LEFT OUTER JOIN Interesting_Mailings AS M1
ON CM1.mailing_nbr = M1.mailing_nbr
GROUP BY CM1.customer_id
HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Interesting_Mailings)
AND COUNT(M1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Interesting_Mailings);```
-PatP
Last edited by Pat Phelan; 02-21-13 at 14:09. Reason: Fixed a cut-and-paste error of my own!

15. Registered User
Join Date
Aug 2004
Location
Dallas, Texas
Posts
831
Yeah, that helps a lot.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•