Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1

    Question Unanswered: Counting customers that are in multiple groups

    I have a large table that tells me customers and mailings (~4 million rows). So for each customer, I see what mailing(s) they received. It is at the mailing level so if a customer received 2 mailings, their will be 2 records for that customer number, one for each mailing they received. There are 3 different mailings.

    I am trying to calculate the count of customers that received all 3, as well as those that received 1&2 only, 1&3 only, and 2&3 only. For example:

    CustNo Mailing
    Tim 1
    Tim 3
    Dan 1
    Dan 3
    Jane 1
    Max 1
    Max 2
    Max 3
    Lauren 2
    Lauren 3


    Results:

    All 3 - 1
    1&2 only - 0
    1&3 only - 2
    2&3 only - 1

    I have tried doing some funky table joining and EXISTS, but the query runs and runs because of how much data it has to sift through. I was wondering if there is a trick to this that i am not thinking of.

    Here is the code I am using to find Mailing 1&2 only customers, but I am not even sure it is getting me the right numbers. There are 2 different types of customers as well which is why I have custtype in there.

    Code:
    select a.custtype,count(distinct a.custno) 
    from myTable a
    inner join (select distinct custno from mytable where [mailing]=2) b
    on a.custno=b.custno
    where a.[mailing]=1
      and not exists(select * from myTable c where a.custno=c.custno and c.[mailing]=3)
    group by a.custtype

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Bitmasks:

    Code:
    ;with TestData as
    		(select	'Tim' as CustNo, 1 as Mailing
    		union all select 'Tim' as CustNo, 3 as Mailing
    		union all select 'Dan' as CustNo, 1 as Mailing
    		union all select 'Dan' as CustNo, 3 as Mailing
    		union all select 'Jane' as CustNo, 1 as Mailing
    		union all select 'Max' as CustNo, 1 as Mailing
    		union all select 'Max' as CustNo, 2 as Mailing
    		union all select 'Max' as CustNo, 3 as Mailing
    		union all select 'Lauren' as CustNo, 2 as Mailing
    		union all select 'Lauren' as CustNo, 3 as Mailing)
    		select	*
    		into	#TestData
    		from	TestData
    
    ;with
    BitSums as
    	(select	'1 only' as Mailing, 2 as BitSum
    	union select '2 only' as Mailing, 4 as BitSum
    	union select '3 only' as Mailing, 8 as BitSum
    	union select '1 and 2' as Mailing, 6 as BitSum
    	union select '1 and 3' as Mailing, 10 as BitSum
    	union select '2 and 3' as Mailing, 12 as BitSum
    	union select 'All 3' as Mailing, 14 as BitSum),
    CustBitSums as
    		(select	CustNo,
    				sum(POWER(2, Mailing)) as BitSum
    		from	#TestData
    		group by CustNo)
    select	BitSums.Mailing,
    		COUNT(CustNo) as CustCount
    from	BitSums
    		left outer join CustBitSums on BitSums.BitSum = CustBitSums.BitSum
    group by BitSums.Mailing
    
    drop table #TestData
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1
    oh wow. i am still trying to wrap my head around this one. looks like the BitSum is sum(2^mailing#). Why/How does this work?

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    You are correct at least in a sense... The BitSum value is a computed bitmap.

    This isn't a whole lot prettier, but it ought to perform well enough on your SQL 2008 table with a few million rows:
    Code:
    CREATE TABLE #clawlan (
       CustNo       VARCHAR(9)
    ,  Mailing      INT
       )
    
    INSERT INTO #clawlan (CustNo, Mailing)
       VALUES ('Tim', 1) ,  ('Tim', 3),  ('Dan', 1)
    ,  ('Dan', 3),  ('Jane', 1),  ('Max', 1)
    ,  ('Max', 2),  ('Max', 3),  ('Lauren', 2)
    ,  ('Lauren', 3)
    
    SELECT u.UniqueCust, a1.Mailing AS '1', a2.Mailing AS '2', a3.Mailing AS '3'
       FROM (SELECT DISTINCT CustNo as UniqueCust
          FROM #clawlan) AS u
       LEFT OUTER JOIN #clawlan AS a1 ON (1 = a1.Mailing AND a1.CustNo = u.UniqueCust)
       LEFT OUTER JOIN #clawlan AS a2 ON (2 = a2.Mailing AND a2.CustNo = u.UniqueCust)
       LEFT OUTER JOIN #clawlan AS a3 ON (3 = a3.Mailing AND a3.CustNo = u.UniqueCust)
    
    SELECT CustNo, [1], [2], [3]
    FROM
    (SELECT CustNo, Mailing
        FROM #clawlan) AS SourceTable
    PIVOT
    (
    Count(Mailing)
    FOR Mailing IN ([0], [1], [2], [3])
    ) AS PivotTable;
    
    
    DROP TABLE #clawlan
    -PatP
    Last edited by Pat Phelan; 02-19-13 at 18:29. Reason: Added PIVOT example
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1
    thank you both.

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    For what it is worth, blindman's BitSum is predicated on the assumption that there is only 1 row for a given customer/mailing combination... It goes horribly wrong if there are duplicates!

    My first example (with the LEFT JOIN) goes wiggy, but in a carefully controlled and predictable manner that works optimally for some kinds of processing.

    My second example (with the PIVOT) works more conventionally, in that it always produces one row for any given CustNo value, but sometimes generates a count other than 0 or 1.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    True! I assumed a natural key between CustNo and MailingList.
    But, if some customers received a mailling more than once this could be handled by a DISTINCT clause.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Jan 2013
    Posts
    354
    Provided Answers: 1
    Relational division is one of the eight basic operations in Codd's relational algebra. The idea is that a divisor table is used to partition a dividend table and produce a quotient or results table. The quotient table is made up of those values of one column for which a second column had all of the values in the divisor. There is a really good presentation on four ways to do this at:

    http://www.cs.arizona.edu/people/mcc...esentation.pdf

    This is easier to explain with an example. We have a table of customer_ids and the mailing_nbrs they can receive (dividend); we have a table of mailing_nbrs in the Mailings (divisor); we want the names of the customer_ids who can receive every mailing_nbr (quotient) in the Mailings. To get this result, we divide the Customer_Mailings table by the mailing_nbrs in the Mailings.

    CREATE TABLE Customer_Mailings
    (customer_id CHAR(15) NOT NULL,
    mailing_nbr CHAR(15) NOT NULL,
    PRIMARY KEY (customer_id, mailing_nbr));

    Here is your divisor table

    CREATE TABLE Mailings
    (mailing_nbr CHAR(15) NOT NULL PRIMARY KEY);

    The important characteristic of a relational division is that the CROSS JOIN (Cartesian product) of the divisor and the quotient produces a valid subset of rows from the dividend. This is where the name comes from, since the CROSS JOIN acts like a multiplication operator.

    Division with a Remainder

    There are two kinds of relational division. Division with a remainder allows the dividend table to have more values than the divisor, which was Codd's original definition. For example, if a customer can receive more mailings than just those we have in the Mailings, this is fine with us.

    SELECT DISTINCT customer_id -- or COUNT(DISTINCT customer_id)
    FROM Customer_Mailings AS CM1
    WHERE NOT EXISTS
    (SELECT *
    FROM Mailings
    WHERE NOT EXISTS
    (SELECT *
    FROM Customer_Mailings AS CM2
    WHERE (CM1.customer_id = CM2.customer_id)
    AND (CM2.mailing_nbr = Mailings.mailing_nbr)));

    This query for relational division was made popular by Chris Date in his textbooks, but it is not the only method nor always the fastest. Another version of the division can be written so as to avoid three levels of nesting. While it is not original with me, I have made it popular in my books.

    SELECT CM1.customer_id
    FROM Customer_Mailings AS CM1, Mailings AS M1
    WHERE CM1.mailing_nbr = M1.mailing_nbr
    GROUP BY CM1.customer_id
    HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings);

    There is a serious difference in the two methods. Truncate the Mailings, so that the divisor is empty. Because of the NOT EXISTS() predicates in Date's query, all customer_ids are returned from a division by an empty set. Because of the COUNT() functions in my query, no customer_ids are returned from a division by an empty set.

    Exact Division

    The second kind of relational division is exact relational division. The dividend table must match exactly to the values of the divisor without any extra values.

    SELECT CM1.customer_id
    FROM Customer_Mailings AS CM1
    LEFT OUTER JOIN
    Mailings AS M1
    ON CM1.mailing_nbr = M1.mailing_nbr
    GROUP BY CM1.customer_id
    HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings)
    AND COUNT(M1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings);

    This says that a customer_id must have the same number of certificates as there mailing_nbrs in the Mailings and these all match to a mailing_nbr in the Mailings, not something else. The "something else" is shown by a created NULL from the LEFT OUTER JOIN.

    Note on Performance

    The Winter 1996 edition of DB2 ON-LINE MAGAZINE (IBM Data magazine) had an article entitled "Powerful SQL:Beyond the Basics" by Sheryl Larsen which gave the results of testing both methods. Her conclusion for DB2 at that time was that the nested EXISTS() version is better when the quotient has less than 25% of the dividend table's rows and the COUNT(*) version is better when the quotient is more than 25% of the dividend table.

  9. #9
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    Scary enough, but I'm starting to understand Joe.

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by corncrowe View Post
    Scary enough, but I'm starting to understand Joe.
    Joe has earned a reputation for knowing a LOT about the mathematical fundamentals of databases (aka relational algebra and its kin).

    Like everyone I've met that is wildly gifted, Joe sometimes has problems presenting what he knows to us mere mortals... If you consider the problems that you have dealing with database newbies that see databases and spreadsheets in the same light, then imagine what it would be like if you saw datbase questions from Joe's more advanced perspective then you get an idea of the communication problem that Joe faces.

    Nobody is perfect, we all have flaws (except for me of course). Joe is working on improving how he is perceived, and he has made HUGE strides in that area. It is especially fun for me to watch what happens when people like blindman and Joe collide, where each is an ackowledged master of different skills and both are equally stubborn... The collision is always noisy and messy, but it is both great fun to watch and truly educating for those willing to dig into it and smart enough to avoid the pitfalls!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Uhmm....where in all that do you actually perform the query that was requested by the original poster?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by blindman View Post
    Uhmm....where in all that do you actually perform the query that was requested by the original poster?
    If you're asking about Joe's post I think that he was showing HOW to do the task instead of providing a complete block of code ready to execute. This is actually a better way to answer questions from my perspective because it demonstrates without doing, so that the original poster can adapt the answer to their own problem. This neatly sidesteps the issues that we often have with unstated requirements from the original post!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Pat, his code does not show "how" to do the task.
    His code shows how to do a cross-join, which is not sufficient for solving the OP's issue.

    I provided a method using bitmasks.
    You provided a method using UNIONs.
    If Joe would like to provide a solution using CROSS JOIN, then that would appropriate.

    Otherwise, Joe is just continuing his pattern of swooping into a thread after it has been answered in order to perform one or more of the following:
    1) Tell the poster that he did not supply enough information to answer his question.
    2) Berate the poster for not following his dogmatic naming standards and lecture them on forum etiquette.
    3) Go off on a tangent in order to demonstrate his knowledge. Forgive me, but I feel this is simply using the forum for pimping.

    I've been watching him do this time after time after time.

    I'd have welcomed his above post as a stand-alone "article" in the Database Concepts and Design subforum. But in its present context the poster is likely to spend considerable time trying to decipher Joe's industry vernacular under the misunderstanding that it actually applies to his immediate problem.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Ok, granted that I'm violating my own suggestion from above about not posting a cut-and-paste, but I think that is the quickest way to make this clearer.
    Code:
    --  Create tables
    
    CREATE TABLE Customer_Mailings 
    (  customer_id  CHAR(15)    NOT NULL
    ,  mailing_nbr  CHAR(15)    NOT NULL, 
       PRIMARY KEY (customer_id, mailing_nbr)
    );
    
    CREATE TABLE Mailings
    (  mailing_nbr  CHAR(15)      NOT NULL
       PRIMARY KEY
    );
    
    --  Populate tables
    
    INSERT INTO Customer_Mailings (customer_id, mailing_nbr)
       VALUES ('Tim', '1') ,  ('Tim', '3'),  ('Dan', '1')
    ,  ('Dan', '3'),  ('Jane', '1'),  ('Max', '1')
    ,  ('Max', '2'),  ('Max', '3'),  ('Lauren', '2')
    ,  ('Lauren', '3')
    
    INSERT INTO Mailings (mailing_nbr)
       SELECT DISTINCT mailing_nbr
          FROM Customer_Mailings
    
    --  Standard relational division (more or less copied from Joe's posts
    
    SELECT DISTINCT customer_id -- or COUNT(DISTINCT customer_id)
       FROM Customer_Mailings AS CM1
       WHERE NOT EXISTS 
          (SELECT *
             FROM Mailings
             WHERE NOT EXISTS 
                (SELECT *
                   FROM Customer_Mailings AS CM2
                   WHERE  (CM2.customer_id = CM1.customer_id)
                      AND (CM2.mailing_nbr = Mailings.mailing_nbr)
    			)
    	   );
    
    SELECT CM1.customer_id 
       FROM Customer_Mailings AS CM1
       LEFT OUTER JOIN Mailings AS M1
          ON CM1.mailing_nbr = M1.mailing_nbr
       GROUP BY CM1.customer_id 
       HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings)
          AND COUNT(M1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Mailings);
    
    --  Enhanced to address blindman's concerns
    
    CREATE VIEW Interesting_Mailings
       AS SELECT mailing_nbr
          FROM Mailings
    	  WHERE  mailing_nbr IN ('1', '3')
    
    SELECT DISTINCT customer_id -- or COUNT(DISTINCT customer_id)
       FROM Customer_Mailings AS CM1
       WHERE NOT EXISTS 
          (SELECT *
             FROM Interesting_Mailings
             WHERE NOT EXISTS 
                (SELECT *
                   FROM Customer_Mailings AS CM2
                   WHERE  (CM2.customer_id = CM1.customer_id)
                      AND (CM2.mailing_nbr = Interesting_Mailings.mailing_nbr)
    			)
    	   );
    
    SELECT CM1.customer_id 
       FROM Customer_Mailings AS CM1
       LEFT OUTER JOIN Interesting_Mailings AS M1
          ON CM1.mailing_nbr = M1.mailing_nbr
       GROUP BY CM1.customer_id 
       HAVING COUNT(CM1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Interesting_Mailings)
          AND COUNT(M1.mailing_nbr) = (SELECT COUNT(mailing_nbr) FROM Interesting_Mailings);
    -PatP
    Last edited by Pat Phelan; 02-21-13 at 14:09. Reason: Fixed a cut-and-paste error of my own!
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  15. #15
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    Yeah, that helps a lot.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •