You
are to implement the SortMerge class.
You need to implement the constructor,
and two public methods, get_next() and close().
See the JavaDoc for the API.
These are part of Java Minibase's iterator package,
and should be declared as such.
Warning:
Yes, Java Minibase has an iterator package
and implements an Iterator class.
So does java.util!
Clearly, you cannot import both at the same time.
Javac will rightfully complain, if you try.
There is no occasion where you will need java.util.Iterator
for the project.
This is just so that you are aware of the issue.
The SortMerge constructor joins two tables,
call them R and S,
represented by the Iterator object am1 and am2,
respectively, using the merge join algorithm.
Read carefully about the TupleUtils class.
You will need to call the setup_op_tuple method
for "formatting" and handling the joined records.
You will also need to call the compare functions to compare
the join columns of records from R and S.
You will call the Sort constructor
to sort a table (represented by Iterator class).
The main method of SortMerge is get_next().
It returns the next joined tuple.
Note that SortMerge acts like a Java iterator.
It returns type Tuple.
When there are no more tuples from the join output,
it returns null.
Complication:
There may be duplicate values in the input stream of tuples,
both in the inner and the outer streams.
Consider a join of Sailors and Reserves on sid.
Clearly one sailor may have many reservations.
How do you handle this?
You need to collect the group of all tuples
from the inner input stream who match
on the join column value (collection A)
while your algorithm is producing the join matches
with that A record.
Then, if the next A record has the same
join column(s) value(s),
you will use the B group that you just collected
to find the matches with this new A record.
Where do you temporarily store this group
of B records, all of the same join value?
In a heap file on disk?
No. That would be quite expensive I/O-wise if we always did this!
In the buffer pool?
No.
(Although that could be made into a reasonable approach.
However, we would have change much in Minibase to do this.)
Instead,
we allocate a temporary tuple-buffer for the B group.
(Note that "buffer" here has nothing to do with "buffer pool" necessarily!)
Use the functions provided by the existing IoBuf class for this.
IoBuf is well behaved.
It keeps the group in main memory while it has room.
If the group being stored gets too big,
it creates a tempory heap-file on disk to put them.
So IoBuf handles the sticky issue of where the B group
should be temporarily stored for you.
This issue happens quite a bit in database-system implementation.
The IoBuf class is a useful mechanism in Minibase
that provides a solution.
Finally,
you need to implement close(),
which cleans up.
The outside code that has been calling get_next()
of the SortMerge object
is responsible for calling close() once done.
(Look at the tester SM_JoinTest.java
to see the sequence of calls.)
|