EECS 3101                            York University                                  Instructor: Andy Mirzaian

             A Greedy Algorithm -- The Interval Point Cover Problem

Input: A set P={pi|i=1..n }of n points, and I={[sj ,fj ]| j=1..m} of m intervals, all on the real line.(sj <fj.)
Output: (i) A point p in P that is not covered by any interval in  I, or
              (ii) a minimum cardinality subset C  of  intervals I  that collectively  cover all  points in P.

Example:  Points are "o" on the real line. Intervals are alphabetically named. C={c, e, f} is an optimal solution.
                    _____c_______________   _______e_______
      _a____  ___b____   ____d_____                 ________f__________
      ___________o_________o____o________o______o__________o_____________> real line

An Incorrect Greedy Method:   Pick the interval that covers  the most uncovered points.
Counter-example:  In the figure below,  interval a covers first half  of the points,  interval
b  covers last half of the points, and interval  c covers every point except the first and last.

 ____o_______________ooooooooo_______________ooooooo_________________o______> real line

___________a__________________            _________________b_____________
                           ___________________ c________________

A Correct Greedy Method:
Among the intervals that cover the leftmost uncovered point, pick the one with largest finishing time.
The algorithm follows. 

Algorithm  Opt-Interval-Point-Cover (P, I)
1    C <__                                                                  /* the covering set of intervals initialized to the empty set
2    NC <__ P                                                               /* the set of points not yet covered by C
3    while not ( NC = do                                          /* there is at least one more point not yet covered
4           p <__  leftmost point in NC                             /* leftmost point not yet covered should be covered first
5          I'  <__  set of intervals in I that cover p
6        if  I' =   then return  (p is not covered by I)
7        else
8                    select Ij=(sj,fj)  in I'  with max fj      /* greedy choice: Ij covers p and extends farthest to the right
9                    C <__  C  U  {Ij }                                   /* add Ij to the solution set
10                  NC <__  NC - { points covered by Ij }
11       end-if
12   end-while
13   return( C)

Loop Invariant:
Either (i) there is a point p in P that is not covered by any interval in I, or
(ii) there is at least one optimal solution B of covering intervals that is consistent
with the (accept/reject) decisions made so far, i.e.,
    (a) C is a subset of B, and
    (b) C covers all and only points in P-NC, and
    (c) all points in NC are to the right of every interval in C.

Correctness proof:
[Pre-Cond & (lines  1,2) ==> LI]:  Vacuously  true.
[LI' & (NC not empty) & (lines 4-11) ==> LI'']: By  LI', the leftmost point p in NC is not yet covered.
If at line 6 I' is empty, then part (i) of the LI becomes true. (We  exit fromthe loop and this establishes the Post-Cond.)
If I' is not empty, then we greedily select interval Ij=(s j,fj )in I' to cover point p. If Ij   is in the optimal solution  B, then
part (ii)(a) of LI'' is valid. However, if Ij is not in B, since B (if it exists) must cover p, then B must include another
interval from I'. Let that be interval  Ik=(sk ,fk). Note that Ik is not in C. Now,consider the solution
B' = (B-{Ik}) U {Ij}. Clearly B'  has the  same cardinality as B, and every point not covered by B-{Ik } is covered by Ij .
This follows from LI' for points already covered, and the greedy property of  Ij for points not yet covered.
Hence, optimality of B implies optimality of B'. Furthermore, B' agrees with C so far. i.e., LI''(ii)(a) is established.
Lines 9 and 10 establish LI''(ii)(b).  LI""(ii)(c) follows from LI'(ii)(c) and line 10.
[LI & (exit cond) ==> Post-Cond]: If we exit at line 6, we mentioned  above that part (i) of the Post-Cond is established.
If we exit at line 13, due to NC being empty, then LI(ii)(b) implies that C covers P, and LI(ii)(a) implies that |C|  < |B|, and
hence C must be optimum.
[Progress & Termination]: After each iteration we either cover at least one more point, or exit, having found a point that
can't be covered. So, the loop will iterate at most n times.

Efficient Implementation:
Imagine we scan the real line from left to right and process the events. The events are of two types -- point and interval events.
To process a point means to cover it by an  interval. To process an interval (when we hit its starting point) means to activate it.
We have used the line numbers corresponding to the algorithm above. This requires some rearrangement of the lines below.
To accelerate the processes at lines 4, 6, and 8 above, we use the following priority queues:

Events:  this is the priority queue of all unprocessed events. It's a min-heap of points and intervals, where the priority of
a point is its coordinate, and the priority of an interval is its activation time,i.e., its starting time.

ActInt:  this is the set of all intervals that have been activated, but not yet committed to. That is, we have past their starting point,
but they have not been placed in the solution set C. It's a max-heap, and  the priority of an interval is its finishing time. This is
to quickly get the set I' (actually a superset of it, including the useless intervals that we have completely scanned over ), and
execute line 8 of our greedy interval selection.

Note: To ensure that ActInt is not empty when we do ExtractMax  at line 8, we initially create a dummy, useless, interval
I0 =(- infinity, - infinity) and add it to the set of input intervals at line 1b.  This will be the first event to be activated
and added to ActInt at line 5a. It will stay in the priority queue ActInt until and unless we exit at line 6.

Algorithm  Opt-Interval-Point-Cover (P, I)
1     C  <__                                                               /* the covering set of intervals initialized to empty set
1a   last <__  - infinity                                                  /* the rightmost point covered by intervals in C
1b  I0 <__  (- infinity, - infinity) ;  I <__  I U {I0      /* add dummy interval I0 to I
2    Events <__ Build-min-Heap (P U  I )                    /* the unprocessed point and interval events
2a   ActInt <__                                                          /* max heap - the set of activated but unused intervals
3    while  not ( Events = do
4           e <__ ExtractMin(Events)                             /* next event to be processed
5a         if e is an interval then Insert (e, ActInt)       /* activate e by placing it in the max-heap; priority =finishing time
5b         else                                                                /* process point event e
5c            if e > last  then do                                     /* otherwise, point e is already covered by C
8                     (sj,fj)  <__  ExtractMax(ActInt)     /*greedy choice:  max fj  in ActInt
6                     if  e > fthen return (point e is not covered by I)      /* non of the activated intervals cover point e
7                    else
9                              C <__  C  U  {Ij }
10                            last <__  fj
11                  end-if
11a          end-if
12   end-while
13   return( C)

Revised Loop Invariant:
Either (i) there is a point p in P that is not covered by any interval in I, or
(ii) there is at least one optimal solution B of covering intervals that  is consistent
with the decisions made so far, i.e.,
    (a) C = { Ij  in B | f j  < last }
    (b) C covers every point in P with coordinate < last.
    (b') Every processed point P-Events has coordinate < last   (and hence, by (b) it is covered).
    (c) ActInt = {activated intervals} - C  = (I - Events) - C.

Correctness Proof for revised Algorithm:   Left to the  reader.

Time Complexity:
Lines 1, 1a, 1b, 2a take O(1) time. Line 2 takes O(n+m) time. The while-loop of line 3 iterates at most n+m times.
Line 4 takes O(log(n+m)) time. Line 5a takes O(log m) time (at most m activated intervals in the priority queue).
Line 8 also takes O(log m) time. So, the loop of lines 3-12 takes at most O((n+m) log(n+m)) time. Line 13 takes
O(m) time. Therefore, the worst-case time complexity is O((n+m) log(n+m)).