Results 1 to 6 of 6
  1. #1
    Join Date
    Jun 2006
    Posts
    11

    Question Unanswered: Split partially overlapping records by date

    Hi All,

    I need to generate extra records based on overlapping dates using DB2's version of sql.

    I have a set of records with start and end dates, some of wich overlap completely or partially. I have to split these overlapping records so that there remain only records that don't overlap at all or have the same start and end dates. It is possible for many (in practice up to 5) records to overlap each other.

    Since there are so many ways in wich these overaps can take place, I cannot find a clear and sure way to clean this data. Any suggestions?


    TIA

    Peter

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    Moving to the DB2 forum
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    First you need to identify all the dates where overaps may occur - i.e. all the start and end dates. Then you can check each of the records against that list of dates and break it up wherever it meets one.

    I expect there is a clever way to do this in pure SQL, but I would be inclined to use procedural code.

  4. #4
    Join Date
    Jun 2006
    Posts
    11
    Thanks for the suggestion andrewst,

    Quote Originally Posted by andrewst
    First you need to identify all the dates where overaps may occur - i.e. all the start and end dates.
    I have such a set; made a selfjoin on the table checking on overlapping date. So I have a set of start and enddates that overlap others in 1st and 2nd column and the dates that they overlap with in 3rd and 4th column. So the records are actually multiplied by the amount of records they overlap with, excluding the record itself.

    Quote Originally Posted by andrewst
    Then you can check each of the records against that list of dates and break it up wherever it meets one.
    I got lost in this fase. What steps would you take (be it sql or procedural)?

    I can imagine this with 2 overlapping records, but here there can be several. For example, I have record A wich overlaps partially with record B and C. From each combinantion (A&B A&C) I get 2 slices of A, wich in turn overlap. So I have to combine those 4 slices resulting in... and that's just 3 overlaps.. These records A, B and C then will each overlap with other records....

    I can't think of a procedure to get all this done without producing incomprehensible spaghetti or ending up in some endless loop of matching everthing with everything.


    Any ideas on a clever way to do this?

    Peter

  5. #5
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    This query will get you the list of potential overlap dates:

    select start_date from records where ...
    union
    select end_date from records where ...

    Now you can use that to process the records.

    Here is an example (using Oracle so you would need to change some bits):
    Code:
    SQL> create table records
      2  ( set_id int
      3  , start_date date
      4  , end_date date
      5  );
    
    Table created.
    
    SQL> insert into records values (1, date '2006-01-01', date '2006-01-31');
    
    1 row created.
    
    SQL> insert into records values (1, date '2006-01-15', date '2006-01-20');
    
    1 row created.
    
    SQL> insert into records values (1, date '2006-01-17', date '2006-02-28');
    
    1 row created.
    
    SQL> insert into records values (1, date '2006-02-01', date '2006-03-31');
    
    1 row created.
    
    SQL> insert into records values (1, date '2006-01-28', date '2006-03-28');
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    This finds all the overlap dates:
    Code:
    SQL> select start_date from records where set_id=1
      2  union
      3  select end_date from records where set_id=1;
    
    START_DATE
    -----------
    01-JAN-2006
    15-JAN-2006
    17-JAN-2006
    20-JAN-2006
    28-JAN-2006
    31-JAN-2006
    01-FEB-2006
    28-FEB-2006
    28-MAR-2006
    31-MAR-2006
    
    10 rows selected.
    This shows all the required split records:
    Code:
    SQL> select distinct greatest(r.start_date,d.overlap_date) new_start
      2  ,      least(r.end_date,d.overlap_date) new_end
      3  from   records r,
      4  ( select start_date as overlap_date
      5    from records where set_id=1
      6    union
      7    select end_date as overlap_date
      8    from records where set_id=1
      9  ) d
     10  where d.overlap_date between r.start_date and r.end_date;
    
    NEW_START   NEW_END
    ----------- -----------
    01-JAN-2006 01-JAN-2006
    15-JAN-2006 15-JAN-2006
    17-JAN-2006 17-JAN-2006
    20-JAN-2006 20-JAN-2006
    28-JAN-2006 28-JAN-2006
    31-JAN-2006 31-JAN-2006
    01-FEB-2006 01-FEB-2006
    28-FEB-2006 28-FEB-2006
    28-MAR-2006 28-MAR-2006
    31-MAR-2006 31-MAR-2006
    
    10 rows selected.
    You could insert those and delete the old rows, or you could use procedural code something like:
    Code:
    for each record where ... do
      for each date in
       ( select start_date as overlap_date from records where ...
         union
         select end_date as overap_date from records where ...
       ) do
        if overlap_date between start_date and end_date then
          update record set end_date = overlap_date;
          insert into record (start_date, end_date, ...) values (overlap_date, end_date);
        end if;
      end do;
    end do;
    That is very much pseudo-code and needs a lot of polish, but hopefully you get the idea?

  6. #6
    Join Date
    Jun 2006
    Posts
    11
    Thanks Tony,

    That was what I needed! I'm on the right track now.

    Peter

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •