Here's what the table looks like:
CREATE TABLE productdiscounts (startdate DATE,enddate DATE,keyval CHAR(3),discount int)
INSERT INTO productdiscounts VALUESThe challenge was, we need to derive a set that has a row for each date range for which a different set of discounts applies. So, for product abc, we have a $1 discount from 1/1/12 through 1/16/12, and then on 1/17/12, an additional discount of $8 takes effect. On 2/1/12, the $1 discount goes out of effect, and then on 2/7/12, a new $4 discount takes effect. And so on.
('1/1/12','1/31/12','abc',1)
,('1/17/12','2/15/12','abc',8)
,('2/7/12','3/7/12','abc',4)
,('1/15/12','5/13/12','def',2)
,('4/17/12','4/28/12','def',16)
,('2/12/12','4/20/12','def',32)
Basically, the set we need to create should look like this:
keyval startdate enddate discountSo how do you take a set of staggering, overlapping date ranges, and come up with one continuous set of ranges, aggregating the discounts as you go?
abc 2012-01-01 2012-01-16 1
abc 2012-01-17 2012-01-31 9
abc 2012-02-01 2012-02-06 8
abc 2012-02-07 2012-02-15 12
abc 2012-02-16 2012-03-07 4
def 2012-01-15 2012-02-11 2
def 2012-02-12 2012-04-16 34
def 2012-04-17 2012-04-20 50
def 2012-04-21 2012-04-28 18
def 2012-04-29 2012-05-13 2
Our old friend, the numbers table
My first instinct was to use a numbers table and compute the aggregates against that set, like this:SELECT pd.keyval,d.dayval,SUM(pd.discount) FROMThe derived table d gives us a set of days from 1/1/12 through sometime in 2017 (2048 days), and by joining to the product discounts, we can add up the discount in effect on each day. The resultant set looks like this (I omitted the records for every single day):
(SELECT DATEADD(d,number,'1/1/12') AS dayval FROM master..spt_values WHERE type='p') d
INNER JOIN dbo.productdiscounts pd ON pd.startdate<=d.dayval AND pd.enddate>=d.dayval
GROUP BY pd.keyval,d.dayval
keyval dayval totaldiscountNow the question becomes, how do we find the starting and ending dates for each unique range? It's not enough to just look for MIN(dayval) and MAX(dayval) for a given aggregate, since the total discount might go from 1 to 9 and then back to 1 again. SQL 2012 has some window functions that would help with this, but our shop is barely on 2008.
abc 2012-01-01 1
...
abc 2012-01-16 1
abc 2012-01-17 9
...
abc 2012-01-31 9
abc 2012-02-01 8
...
abc 2012-02-06 8
abc 2012-02-07 12
...
abc 2012-02-15 12
We were pretty close to (ugh) walking this table with a cursor. But then we did the math. For two years of discounts, we'd have over 700 rows for each product, multiplied by the number of products. Using a cursor on millions or rows wasn't a pleasant prospect.
A fresh approach
Looking at the data, we realized that the number of dates involved was fairly small. Discounts tended to start and end at month end, so in a two year range, we had maybe 40 dates on which a discount started or ended. And so that got me thinking.Our desired date ranges would always start on either the start date of a particular discount, or on the day after the end date of a discount. If a discount wasn't starting or ending, there'd be no change in the total discount that day.
With this in mind, we can create a CTE of "interesting dates":
WITH interestingdates AS (Note that the DISTINCT will eliminate any duplicate dates in the list. Note also that the keyval (i.e. the product id) is also included, since a date is interesting only for the product affected by that discount.
SELECT DISTINCT keyval,startdate FROM dbo.productdiscounts
UNION SELECT DISTINCT keyval,DATEADD(d,1,enddate) FROM dbo.productdiscounts)
Armed with our list of interesting dates by product, we can calculate the end date of each range, which is just the day before the next start date:
,interestingranges as
(SELECT a.keyval,a.startdate,
(SELECT DATEADD(d,-1,MIN(startdate)) FROM interestingdates b WHERE b.keyval=a.keyval AND b.startdate>a.startdate) AS enddate
FROM interestingdates a)
Which leaves us in a pretty good spot. Here's what the interestingranges CTE looks like:
keyval startdate enddateNotice that the last date range for each product is open-ended, since our calculation of the end date is based on the next start date. These ranges might be handy; if not, it's easy to filter them out.
abc 2012-01-01 2012-01-16
abc 2012-01-17 2012-01-31
abc 2012-02-01 2012-02-06
abc 2012-02-07 2012-02-15
abc 2012-02-16 2012-03-07
abc 2012-03-08 NULL
def 2012-01-15 2012-02-11
def 2012-02-12 2012-04-16
def 2012-04-17 2012-04-20
def 2012-04-21 2012-04-28
def 2012-04-29 2012-05-13
def 2012-05-14 NULL
Having identified the interesting ranges, we just need to aggregate the discount amounts:
SELECT ir.keyval, ir.startdate, ir.enddate, SUM(pd.discount) AS discount
FROM interestingranges ir
INNER JOIN dbo.productdiscounts pd ON pd.startdate<= ir.startdate AND pd.enddate>=ir.enddate AND pd.keyval=ir.keyval
GROUP BY ir.keyval, ir.startdate, ir.enddate
Here's our completed aggregate:
keyval startdate enddate discountI purposely chose discount amounts that were powers of 2, so it's easy to check which discounts are in effect on any day against the aggregate. A $50 discount on def from 4/17 through 7/20, for instance has to be $32 + $16 + $2. In reality, it's possible that two (or more) contiguous date ranges to result in the same aggregate-- for example, a $5 discount could expire, and two new discounts for $3 and $2 could take effect on the same day. In our case, the aggregation was a little more involved, in such a way that the unique set of individual discounts still made each range unique, even if the total discount ended up the same.
abc 2012-01-01 2012-01-16 1
abc 2012-01-17 2012-01-31 9
abc 2012-02-01 2012-02-06 8
abc 2012-02-07 2012-02-15 12
abc 2012-02-16 2012-03-07 4
def 2012-01-15 2012-02-11 2
def 2012-02-12 2012-04-16 34
def 2012-04-17 2012-04-20 50
def 2012-04-21 2012-04-28 18
def 2012-04-29 2012-05-13 2
The total script ends up looking like this:
--an interesting day is one where a promotion starts or the day after a promotion ends
;WITH interestingdates AS (
SELECT DISTINCT keyval,startdate FROM dbo.productdiscounts
UNION SELECT DISTINCT keyval,DATEADD(d,1,enddate) FROM dbo.productdiscounts
)
--an interesting range is just each interesting date and the next interesting date, by key
,interestingranges as
(SELECT a.keyval,a.startdate,
(SELECT DATEADD(d,-1,MIN(startdate)) FROM interestingdates b WHERE b.keyval=a.keyval AND b.startdate>a.startdate) AS enddate
FROM interestingdates a
)
SELECT ir.keyval, ir.startdate, ir.enddate, SUM(pd.discount) AS discount
FROM interestingranges ir
INNER JOIN dbo.productdiscounts pd ON pd.startdate<= ir.startdate AND pd.enddate>=ir.enddate AND pd.keyval=ir.keyval
GROUP BY ir.keyval, ir.startdate, ir.enddate
No comments:
Post a Comment