Results 1 to 13 of 13
  1. #1
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1

    Unanswered: Finding Purchase Freq and Average Time Between Orders

    I have a table of orders by merchandise category. I need to divide customers into their purchasing frequency (1x, 2x, 3x+) by merchandise category and the average time between orders by merchandise category. The average time is average of the differences between order dates.

    I can think about how to do this the long way with several steps, but being that there are many millions of orders, I am trying to make this as efficient as possible.

    SAMPLE DATA:
    Code:
    CREATE TABLE #Orders
    (EnteredDtKey INT,
    custkey INT,
    MerchCatDesc Varchar(50));
    INSERT INTO #Orders (EnteredDtKey, custkey, merchcatdesc) VALUES
    (1544, 21, 'LIGHTING'), 
    (993, 21, 'LIGHTING'), 
    (2866, 21, 'ACCESSORIES'), 
    (6388, 23, 'ACCESSORIES'), 
    (995, 23, 'ACCESSORIES'), 
    (1251, 24, 'ACCESSORIES'), 
    (1293, 24, 'ACCESSORIES'), 
    (996, 24, 'ACCESSORIES'), 
    (1774, 26, 'ACCESSORIES'), 
    (2872, 26, 'LIGHTING'), 
    (2907, 31, 'LIGHTING'), 
    (3342, 31, 'ACCESSORIES'), 
    (2907, 31, 'ACCESSORIES'), 
    (997, 31, 'ACCESSORIES'), 
    (4038, 31, 'ACCESSORIES'), 
    (3942, 32, 'ACCESSORIES'), 
    (4327, 33, 'LIGHTING'), 
    (2553, 33, 'ACCESSORIES'), 
    (1363, 33, 'ACCESSORIES')
    Results for this sample data attached.
    Attached Thumbnails Attached Thumbnails Capture.PNG  

  2. #2
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    I hope you are on SQL 2012
    Code:
    with CTE (custkey, timebetweenorders, rownumber)
    as
    (select custkey, 
    	entereddtkey - lag (entereddtkey, 1) over (partition by custkey order by entereddtkey),
    	row_number() over (partition by custkey order by entereddtkey)
     from #orders
      )
    
    select custkey, avg(timebetweenorders)
    from cte
    where rownumber > 1
    group by custkey
    order by custkey
    On reflection, the entereddtkey is probably not the actual date, is it?

  3. #3
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1
    Thanks for the reply MCrowley. The EnteredDtKey is an int that corresponds to a particular date (referenced from a calendar table). I can easily translate from one to the other. They structured the DW like that for easy date calcs. Sadly, we are on 2008 so LAG isn't supported

    For example:
    fiscaldaykey fiscaldt
    1001 11/4/1995 12:00:00 AM
    1002 11/5/1995 12:00:00 AM
    1003 11/6/1995 12:00:00 AM
    1004 11/7/1995 12:00:00 AM
    1005 11/8/1995 12:00:00 AM
    1006 11/9/1995 12:00:00 AM
    1007 11/10/1995 12:00:00 AM
    1008 11/11/1995 12:00:00 AM
    1009 11/12/1995 12:00:00 AM
    1010 11/13/1995 12:00:00 AM
    1011 11/14/1995 12:00:00 AM
    1012 11/15/1995 12:00:00 AM
    1013 11/16/1995 12:00:00 AM

  4. #4
    Join Date
    Jan 2013
    Posts
    354
    Provided Answers: 1
    YOU do not understand how to do a basic data model. First, there is no key in this silly non-table. Categories are not descriptions. Your use of “_key” violates ISO-11179 and common senses. A date is a unit of measurement, and cannot have a key! (what would a “liter_key” or a “mile_key” mean!?). And like all ACCESS programers you have the magical VARCHAR(50) column! ARRGH!

    CREATE TABLE Order_History
    (order_date DATE NOT NULL,
    customer_id CHAR(12) NOT NULL,
    PRIMARY KEY (order_date, cust_id),
    merchandise_category VARCHAR(25) NOT NULL);

    >> I have a table of orders by merchandise category. I need to divide customers into their purchasing frequency (1x, 2x, 3x+) by merchandise category and the average time between orders by merchandise category. The average time is average of the differences between order dates. <<

    Read what you wrote. We need a time frame. If a customer places, say, 12 orders in a year, he averages one order per month. It does not matter if he placed them 11 in in week in January and one in late December. Look at what he did from the start of the year until the date of this order:

    SELECT customer_id, merchandise category,
    DATEDIFF (days, '2014-01-01', order_date) AS day_delta,
    COUNT(*)
    OVER (PARTITION BY customer_id, merchandise category
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT_ROW)
    AS order_cnt
    FROM Order_History;

    Will that will work for 2008 and give you a useful measure?

  5. #5
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1
    Quote Originally Posted by Celko View Post
    A date is a unit of measurement, and cannot have a key! (what would a “liter_key” or a “mile_key” mean!?).
    I'll not dignify the rest of your post with a response, but I did want to comment on this, of which you have no understanding. The date key links back to a table that provides a ton of data aside from an actual date, including fiscal day, week, month, quarter, year, calendar day, week, month, quarter, year, week of month, and a bunch of others useful metrics. Whether or not this lines up or not with an ISO standard, I have no idea, but I can tell you that this model is invaluable compared to other places i have worked that did not do this.

    To use your example of "mile_key", that would more accurately be called "MileMarker_key" which would link to a table providing data like location, interstate, lat,long, etc.

  6. #6
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    OK, so we have to resort to self joining with the CTE. Does this get you closer to what you need?
    Code:
    with CTE (custkey, entereddtkey, rownumber, MerchCatDesc)
    as
    (select custkey, 
    	entereddtkey,
    	row_number() over (partition by custkey, MerchCatDesc order by entereddtkey),
    	MerchCatDesc
     from #orders
      )
    
    select a.custkey, a.MerchCatDesc, avg (datediff (dd, b.enteredDtkey, a.enteredDtkey)), count(*) as "Gaps"
    from cte b join
    	cte a on a.rownumber = b.rownumber + 1 and a.custkey = b.custkey and a.MerchCatDesc = b.MerchCatDesc
    group by a.custkey, a.MerchCatDesc
    order by Gaps
    I admit it is not in the format you want it in from your first post, but maybe this can be hammered into that after a few passes.

    EDIT:
    Oh, I should point out that in my copy of the testdata, I changed the the date key to a date column.
    Code:
    CREATE TABLE #Orders
    (EnteredDtKey date,
    custkey INT,
    MerchCatDesc Varchar(50));
    INSERT INTO #Orders (EnteredDtKey, custkey, merchcatdesc) VALUES
    (dateadd (dd, 1544, '1/1/2000'), 21, 'LIGHTING'), 
    (dateadd (dd, 993, '1/1/2000'), 21, 'LIGHTING'), 
    (dateadd (dd, 2866, '1/1/2000'), 21, 'ACCESSORIES'), 
    (dateadd (dd, 6388, '1/1/2000'), 23, 'ACCESSORIES'), 
    (dateadd (dd, 995, '1/1/2000'), 23, 'ACCESSORIES'), 
    (dateadd (dd, 1251, '1/1/2000'), 24, 'ACCESSORIES'), 
    (dateadd (dd, 1293, '1/1/2000'), 24, 'ACCESSORIES'), 
    (dateadd (dd, 996, '1/1/2000'), 24, 'ACCESSORIES'), 
    (dateadd (dd, 1774, '1/1/2000'), 26, 'ACCESSORIES'), 
    (dateadd (dd, 2872, '1/1/2000'), 26, 'LIGHTING'), 
    (dateadd (dd, 2907, '1/1/2000'), 31, 'LIGHTING'), 
    (dateadd (dd, 3342, '1/1/2000'), 31, 'ACCESSORIES'), 
    (dateadd (dd, 2907, '1/1/2000'), 31, 'ACCESSORIES'), 
    (dateadd (dd, 997, '1/1/2000'), 31, 'ACCESSORIES'), 
    (dateadd (dd, 4038, '1/1/2000'), 31, 'ACCESSORIES'), 
    (dateadd (dd, 3942, '1/1/2000'), 32, 'ACCESSORIES'), 
    (dateadd (dd, 4327, '1/1/2000'), 33, 'LIGHTING'), 
    (dateadd (dd, 2553, '1/1/2000'), 33, 'ACCESSORIES'), 
    (dateadd (dd, 1363, '1/1/2000'), 33, 'ACCESSORIES')

  7. #7
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1
    Quote Originally Posted by MCrowley View Post
    OK, so we have to resort to self joining with the CTE. Does this get you closer to what you need?
    Code:
    with CTE (custkey, entereddtkey, rownumber, MerchCatDesc)
    as
    (select custkey, 
    	entereddtkey,
    	row_number() over (partition by custkey, MerchCatDesc order by entereddtkey),
    	MerchCatDesc
     from #orders
      )
    
    select a.custkey, a.MerchCatDesc, avg (datediff (dd, b.enteredDtkey, a.enteredDtkey)), count(*) as "Gaps"
    from cte b join
    	cte a on a.rownumber = b.rownumber + 1 and a.custkey = b.custkey and a.MerchCatDesc = b.MerchCatDesc
    group by a.custkey, a.MerchCatDesc
    order by Gaps
    Hm, this interesting, but seems to be leaving stuff out. These are my results. For example, it is not showing the other customers that purchased lighting besides custkey 21.

    custkey MerchCatDesc [No name 1] Gaps
    23 ACCESSORIES 5393 1
    33 ACCESSORIES 1190 1
    21 LIGHTING 551 1
    24 ACCESSORIES 148 2
    31 ACCESSORIES 1013 3

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by clawlan View Post
    Hm, this interesting, but seems to be leaving stuff out. These are my results. For example, it is not showing the other customers that purchased lighting besides custkey 21.
    I would only expect it to include custkeys with more than one order for a product... Otherwise the elapsed time makes no sense.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    That was my assumption. If the customer makes only one purchase, then there is no average time between purchases. The "Gaps" column represents how many "in between" times there are between purchases. The LAG query will also exclude one-time buyers.

  10. #10
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1
    Quote Originally Posted by MCrowley View Post
    That was my assumption. If the customer makes only one purchase, then there is no average time between purchases. The "Gaps" column represents how many "in between" times there are between purchases. The LAG query will also exclude one-time buyers.
    oh, duh. Thanks. I am still working on getting it to look right. I also installed sql 2012 express on my workstation here so i can test out lag/lead. Our DW team keeps saying we are going go upgrade to 2012 at some point...

  11. #11
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by clawlan View Post
    Our DW team keeps saying we are going go upgrade to 2012 at some point...
    Encourage them, strongly!

    The only downside that I can see to an upgrade from SQL 2008 to SQL 2012 to support a data warehouse is the license cost if you don't re-size the hardware appropriately. The benefits to the DW team are huge in terms of features available and performance. The benefits to the end users is a bit harder to quantify, but the ROI in most studies is huge.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  12. #12
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    If possible, go for SQL Developer edition. You will get all of the Enterprise edition features to play with, and it should be relatively cheap to license. The big caution, of course, would be to know what features are Enterprise Edition, and which ones are Standard Edition, in case you are deploying to a Standard Edition server (sadly, the case with my environment).

  13. #13
    Join Date
    Oct 2010
    Location
    Atlanta, GA
    Posts
    211
    Provided Answers: 1
    Ok, based on how I need the data organized, this is the query I'm using:

    Code:
    select custkey, merchcatdesc,
    	entereddtkey - lag (entereddtkey, 1) over (partition by merchcatdesc,custkey order by entereddtkey),
    	row_number() over (partition by custkey order by merchcatdesc)
     from #orders
    With this result set, I can determine the 1x, 2x, and 3x+ buyers, and its separated out my merchCat. Thanks all for your help!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •