xxl.cursors
Class MergeSorter
java.lang.Object
|
+--xxl.cursors.PeekIteratorCursor
|
+--xxl.cursors.Sorter
|
+--xxl.cursors.MergeSorter
- public class MergeSorter
- extends Sorter
External Merge-Sort based on Replacement Selection. External sorting (i.e. the "logical" sort-operator)
can be implemented (with "physical" operators) in two
ways:
- 1.) Create sorted runs and merge them recursively (External Merge-Sort) or
- 2.) Divide the input recursively into partitions until each partition fits
into main memory. Then sort it with a main-memory sorting-algorithm
(External Distribution Sort)
This class implements the first algorithm.
In the open-phase, sorted runs are created with replacement selection. Note, that replacement selection
produces runs that are on average twice as large as the memory
available, or even bigger.
Then, the runs (Queues) are recursively merged until finalFanIn intermediate runs are left.
In the next-phase, the remaining runs are merged on demand, i.e. no final queue containing the
complete sorted output will be created. The next element is determined when the user calls
next().
Example usage (1):
Iterator input = new RandomIntegers(100000,300000); //create 300.000 random Integers in the range [0;100.000[
Sorter sorter = new MergeSorter(input,12, 12*4096,4*4096); //and sort them where
//input is the input to be sorted
//12 is the size of an Object of the input
//12*4096 is the memory available during the open-phase
//4*4096 is the memory available during the next-phase
while(sorter.hasNext()){
System.out.println(sorter.next()); //print them
}
Example usage (2):
Sorter sorter1 = new MergeSorter(input1,12, 12*4096,4*4096); //sort input1
Sorter sorter2 = new MergeSorter(input2,12, 12*4096,4*4096); //sort input2
Cursor join = new Join(sorter1, sorter2); //join the inputs
while(join.hasNext()){
System.out.println(join.next()); //print the result-tuples
}
The constructor of the MergeSorter at least needs to know the object-size (which can be estimated
by summing up the sizes of primitive types, references are assumed to occupy 8 bytes),
the memory available during the open-phase and the memory available during the next-phase.
The latter values can differ, e.g. for a sort-merge join for the inputs R and S it is a sensible
choice to allow
the MergeSorters to use the entire memory in the open-phase. In the next-phase however, two MergeSorters
and the join-operator
have to share the memory and the memory has to be divided, i.e. in the next-phase a MergeSorter
typically has less memory available than in the open-phase (see example 2).
Further options of more advanced constructors determine how the memory assigned to the MergeSorter is used
internally. There are four values:
- firstOutputBufferRatio the ratio of memory available to the outputBuffer during run-creation
(0.0: use only one page for the outputBuffer and what remains for the heap;
1.0: use as much memory as possible for the outputBuffer)
- outputBufferRatio the amount of memory available to the outputBuffer during intermediate merges (not the final merge)
(0.0: use only one page for the outputBuffer, what remains is used for the Merger and the input-buffers,
inputBufferRatio determines how the remaining memory is distributed between them;
1.0: use as much memory as possible for the outputBuffer)
- inputBufferRatio the amount of memory available to the inputBuffers during intermediate merges (not the final merge)
(0.0: use only one page for the inputBuffer, what remains is used for the Merger (maximal FanIn);
1.0: use as much memory as possible for the inputBuffers)
- finalInputBufferRatio the amount of memory available to the input buffers of the final (online) merge
(0.0: use the maximum number of inputs (maximal fanIn), i.e. perform the online merge as early as possible;
1.0: write the entire data into a final queue, the online "merger" just reads the data from
this queue)
If you call a constructor with the verbose-flag set to true
the MergeSorter displays how the memory was distributed internally. In addition,
the number of merges is displayed.
Example Output:
dittrich@hadar:~/newXXL> java -classpath class xxl/cursors/MergeSorter
-----------------------------------------------------------
GLOBAL
blockSize: 4096
objectSize: 12
OPEN phase
memSize: 49152
run creation
firstOutputBufferSize: 4096
heapSize: 3754
intermediate merges
outputBufferSize: 4096
inputBufferSize: 4096
fanIn: 10
NEXT phase
final merge
finalMemSize: 16384
finalInputBufferSize: 4096
finalFanIn: 3
-----------------------------------------------------------
merge: fanIn: 3 queues.size(): 41 --> 39
merge: fanIn: 10 queues.size(): 39 --> 30
merge: fanIn: 10 queues.size(): 30 --> 21
merge: fanIn: 10 queues.size(): 21 --> 12
merge: fanIn: 10 queues.size(): 12 --> 3
final merge: fanIn: 3
Objects: 300000
Most constructors have a parameter named newQueue.
The Function newQueue should return a Queue, which is used by the algorithm
to materialize the internal runs, i.e. this function determines whether the sort-operator
works on queues based on external storage or in main memory (useful for testing and counting).
Example usage (3):
Function newQueue = new Function(){
public Object invoke(Object function1, Object function2){
return new ListQueue(); //return a new ListQueue (resident in main memory)
}
};
The invoke()-method with two Object-parameters has to be overriden. This method is called by the MergeSorter
with two internal Functions whenever a new Queue is needed:
newQueue.invoke(getInputBufferSize(), getOutputBufferSize()); //internal call to newQueue
The two functions return an Integer-Object with the buffer-size for the input- and output-buffer of the
Queue. You should pass these Functions directly to the
Queue (If it can make use of these parameters). If the Queue gets a read-request it will then open an InputStream
and call getInputBufferSize().
This means, that the buffer-size is determined when it is needed and not when the Queue is constructed.
In other words, the computation of the buffer-size is delayed
until the size of the buffer is needed by the Queue to construct the buffer.
Example usage (4):
Function newQueue = new Function(){
File getFile(){
try{
File ret = File.createTempFile("RAF",".queue");
//return a new File-Object (a temp-file created by java.io.File
} catch (IOException e){System.out.println(e);}
}
Function newObject = new Function(){
public Object invoke(){
return new MyTYPE() //return a new Object
}
};
public Object invoke(Object function1, Object function2){
return new RandomAccessFileQueue(
getFile(),
newObject,
(Function) function1,
(Function) function2
);
}
}
- See Also:
xxl.cursors.DistributionSorter,
Sorter
|
Constructor Summary |
MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
double firstOutputBufferRatio,
double outputBufferRatio,
double inputBufferRatio,
int finalMemSize,
double finalInputBufferRatio,
Function newQueue,
boolean verbose)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available). |
MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
int finalMemSize)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available). |
MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
int finalMemSize,
boolean verbose)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available). |
MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int blockSize,
int objectSize,
int memSize,
double firstOutputBufferRatio,
double outputBufferRatio,
double inputBufferRatio,
int finalMemSize,
double finalInputBufferRatio,
Function newQueue,
Function newQueuesQueue,
java.util.Comparator queuesQueueComparator,
boolean verbose)
Creates a new MergeSorter. |
MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
int finalMemSize,
Function newQueue,
boolean verbose)
Creates a new MergeSorter (this is one of the most simple constructors available). |
MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
double firstOutputBufferRatio,
double outputBufferRatio,
double inputBufferRatio,
int finalMemSize,
double finalInputBufferRatio,
Function newQueue,
boolean verbose)
|
MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available). |
MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize,
boolean verbose)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available). |
MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize,
Function newQueue)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available). |
MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize,
Function newQueue,
boolean verbose)
Creates a new MergeSorter (this is one of the most simple constructors available). |
| Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
currentFanIn
protected int currentFanIn
fanIn
protected int fanIn
finalFanIn
protected int finalFanIn
heapSize
protected int heapSize
firstOutputBufferSize
protected int firstOutputBufferSize
inputBufferSize
protected int inputBufferSize
outputBufferSize
protected int outputBufferSize
finalInputBufferSize
protected int finalInputBufferSize
runsCreated
protected boolean runsCreated
openPhaseFinished
protected boolean openPhaseFinished
MergeSorter
public MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int blockSize,
int objectSize,
int memSize,
double firstOutputBufferRatio,
double outputBufferRatio,
double inputBufferRatio,
int finalMemSize,
double finalInputBufferRatio,
Function newQueue,
Function newQueuesQueue,
java.util.Comparator queuesQueueComparator,
boolean verbose)
- Creates a new MergeSorter.
- Parameters:
input - input to be sortedcomparator - the Comparator used to compare elementsblockSize - the size of a block (page)objectSize - the size of an Object in main memorymemSize - the memory available to the MergeSorter during the open-phase
MergeSorter
public MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
double firstOutputBufferRatio,
double outputBufferRatio,
double inputBufferRatio,
int finalMemSize,
double finalInputBufferRatio,
Function newQueue,
boolean verbose)
MergeSorter
public MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize,
Function newQueue,
boolean verbose)
- Creates a new MergeSorter (this is one of the most simple constructors available).
MergeSorter
public MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize,
boolean verbose)
- Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available).
MergeSorter
public MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize,
Function newQueue)
- Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
MergeSorter
public MergeSorter(java.util.Iterator input,
int objectSize,
int memSize,
int finalMemSize)
- Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
MergeSorter
public MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
double firstOutputBufferRatio,
double outputBufferRatio,
double inputBufferRatio,
int finalMemSize,
double finalInputBufferRatio,
Function newQueue,
boolean verbose)
- Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
TODO
MergeSorter
public MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
int finalMemSize,
Function newQueue,
boolean verbose)
- Creates a new MergeSorter (this is one of the most simple constructors available).
TODO
MergeSorter
public MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
int finalMemSize,
boolean verbose)
- Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available).
TODO
MergeSorter
public MergeSorter(java.util.Iterator input,
java.util.Comparator comparator,
int objectSize,
int memSize,
int finalMemSize)
- Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
TODO
showInfo
protected void showInfo()
getInputBufferSize
protected Function getInputBufferSize()
getOutputBufferSize
protected Function getOutputBufferSize()
close
public void close()
- Overrides:
- close in class PeekIteratorCursor
main
public static void main(java.lang.String[] args)