xxl.cursors
Class MergeSorter

java.lang.Object
  |
  +--xxl.cursors.PeekIteratorCursor
        |
        +--xxl.cursors.Sorter
              |
              +--xxl.cursors.MergeSorter

public class MergeSorter
extends Sorter

External Merge-Sort based on Replacement Selection. External sorting (i.e. the "logical" sort-operator) can be implemented (with "physical" operators) in two ways:

This class implements the first algorithm. In the open-phase, sorted runs are created with replacement selection. Note, that replacement selection produces runs that are on average twice as large as the memory available, or even bigger. Then, the runs (Queues) are recursively merged until finalFanIn intermediate runs are left. In the next-phase, the remaining runs are merged on demand, i.e. no final queue containing the complete sorted output will be created. The next element is determined when the user calls next().

Example usage (1):


	Iterator input = new RandomIntegers(100000,300000);		//create 300.000 random Integers in the range [0;100.000[
	Sorter sorter = new MergeSorter(input,12, 12*4096,4*4096);	//and sort them where
						//input is the input to be sorted
						//12 is the size of an Object of the input
						//12*4096 is the memory available during the open-phase											
						//4*4096 is the memory available during the next-phase											

	while(sorter.hasNext()){
		System.out.println(sorter.next());			//print them
	}

	
	
Example usage (2):


	Sorter sorter1 = new MergeSorter(input1,12, 12*4096,4*4096);	//sort input1
	Sorter sorter2 = new MergeSorter(input2,12, 12*4096,4*4096);	//sort input2
	Cursor join = new Join(sorter1, sorter2);			//join the inputs

	while(join.hasNext()){
		System.out.println(join.next());			//print the result-tuples
	}
	
	
The constructor of the MergeSorter at least needs to know the object-size (which can be estimated by summing up the sizes of primitive types, references are assumed to occupy 8 bytes), the memory available during the open-phase and the memory available during the next-phase. The latter values can differ, e.g. for a sort-merge join for the inputs R and S it is a sensible choice to allow the MergeSorters to use the entire memory in the open-phase. In the next-phase however, two MergeSorters and the join-operator have to share the memory and the memory has to be divided, i.e. in the next-phase a MergeSorter typically has less memory available than in the open-phase (see example 2).

Further options of more advanced constructors determine how the memory assigned to the MergeSorter is used internally. There are four values:


If you call a constructor with the verbose-flag set to true the MergeSorter displays how the memory was distributed internally. In addition, the number of merges is displayed.

Example Output:
dittrich@hadar:~/newXXL> java -classpath class xxl/cursors/MergeSorter
-----------------------------------------------------------
GLOBAL
		blockSize:              4096
		objectSize:             12

OPEN phase
		memSize:                49152

	run creation
		firstOutputBufferSize:  4096
		heapSize:               3754

	intermediate merges
		outputBufferSize:       4096
		inputBufferSize:        4096
		fanIn:                  10

NEXT phase
	final merge
		finalMemSize:           16384
		finalInputBufferSize:   4096
		finalFanIn:             3
-----------------------------------------------------------

merge: fanIn: 3  queues.size(): 41 --> 39
merge: fanIn: 10  queues.size(): 39 --> 30
merge: fanIn: 10  queues.size(): 30 --> 21
merge: fanIn: 10  queues.size(): 21 --> 12
merge: fanIn: 10  queues.size(): 12 --> 3
final merge: fanIn: 3
Objects: 300000


Most constructors have a parameter named newQueue. The Function newQueue should return a Queue, which is used by the algorithm to materialize the internal runs, i.e. this function determines whether the sort-operator works on queues based on external storage or in main memory (useful for testing and counting).

Example usage (3):
	
	Function newQueue = new Function(){
			public Object invoke(Object function1, Object function2){
				return new ListQueue();		//return a new ListQueue (resident in main memory)
			}
	};

	
The invoke()-method with two Object-parameters has to be overriden. This method is called by the MergeSorter with two internal Functions whenever a new Queue is needed:
	newQueue.invoke(getInputBufferSize(), getOutputBufferSize()); 	//internal call to newQueue

	
The two functions return an Integer-Object with the buffer-size for the input- and output-buffer of the Queue. You should pass these Functions directly to the Queue (If it can make use of these parameters). If the Queue gets a read-request it will then open an InputStream and call getInputBufferSize(). This means, that the buffer-size is determined when it is needed and not when the Queue is constructed. In other words, the computation of the buffer-size is delayed until the size of the buffer is needed by the Queue to construct the buffer.

Example usage (4):
	
	Function newQueue = new Function(){

			File getFile(){
				try{
					File ret = File.createTempFile("RAF",".queue");		
								//return a new File-Object (a temp-file created by java.io.File
				} catch (IOException e){System.out.println(e);}
			}
		
			Function newObject = new Function(){
				public Object invoke(){
					return new MyTYPE()	//return a new Object
				}
			};

			public Object invoke(Object function1, Object function2){
				return new RandomAccessFileQueue(
					getFile(),
					newObject,
					(Function) function1,
					(Function) function2
				);		
			}
		}

	

See Also:
xxl.cursors.DistributionSorter, Sorter

Field Summary
protected  int currentFanIn
           
protected  int fanIn
           
protected  int finalFanIn
           
protected  int finalInputBufferSize
           
protected  int firstOutputBufferSize
           
protected  int heapSize
           
protected  int inputBufferSize
           
protected  boolean openPhaseFinished
           
protected  int outputBufferSize
           
protected  boolean runsCreated
           
 
Fields inherited from class xxl.cursors.Sorter
blockSize, finalMemSize, memSize, objectSize
 
Fields inherited from class xxl.cursors.PeekIteratorCursor
peekIterator
 
Constructor Summary
MergeSorter(java.util.Iterator input, java.util.Comparator comparator, int objectSize, int memSize, double firstOutputBufferRatio, double outputBufferRatio, double inputBufferRatio, int finalMemSize, double finalInputBufferRatio, Function newQueue, boolean verbose)
          Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
MergeSorter(java.util.Iterator input, java.util.Comparator comparator, int objectSize, int memSize, int finalMemSize)
          Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
MergeSorter(java.util.Iterator input, java.util.Comparator comparator, int objectSize, int memSize, int finalMemSize, boolean verbose)
          Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available).
MergeSorter(java.util.Iterator input, java.util.Comparator comparator, int blockSize, int objectSize, int memSize, double firstOutputBufferRatio, double outputBufferRatio, double inputBufferRatio, int finalMemSize, double finalInputBufferRatio, Function newQueue, Function newQueuesQueue, java.util.Comparator queuesQueueComparator, boolean verbose)
          Creates a new MergeSorter.
MergeSorter(java.util.Iterator input, java.util.Comparator comparator, int objectSize, int memSize, int finalMemSize, Function newQueue, boolean verbose)
          Creates a new MergeSorter (this is one of the most simple constructors available).
MergeSorter(java.util.Iterator input, int objectSize, int memSize, double firstOutputBufferRatio, double outputBufferRatio, double inputBufferRatio, int finalMemSize, double finalInputBufferRatio, Function newQueue, boolean verbose)
           
MergeSorter(java.util.Iterator input, int objectSize, int memSize, int finalMemSize)
          Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
MergeSorter(java.util.Iterator input, int objectSize, int memSize, int finalMemSize, boolean verbose)
          Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available).
MergeSorter(java.util.Iterator input, int objectSize, int memSize, int finalMemSize, Function newQueue)
          Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).
MergeSorter(java.util.Iterator input, int objectSize, int memSize, int finalMemSize, Function newQueue, boolean verbose)
          Creates a new MergeSorter (this is one of the most simple constructors available).
 
Method Summary
 void close()
           
protected  Function getInputBufferSize()
           
protected  Function getOutputBufferSize()
           
static void main(java.lang.String[] args)
           
protected  void showInfo()
           
 
Methods inherited from class xxl.cursors.PeekIteratorCursor
hasNext, next, peek, remove, reset, supportsPeek, update
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

currentFanIn

protected int currentFanIn

fanIn

protected int fanIn

finalFanIn

protected int finalFanIn

heapSize

protected int heapSize

firstOutputBufferSize

protected int firstOutputBufferSize

inputBufferSize

protected int inputBufferSize

outputBufferSize

protected int outputBufferSize

finalInputBufferSize

protected int finalInputBufferSize

runsCreated

protected boolean runsCreated

openPhaseFinished

protected boolean openPhaseFinished
Constructor Detail

MergeSorter

public MergeSorter(java.util.Iterator input,
                   java.util.Comparator comparator,
                   int blockSize,
                   int objectSize,
                   int memSize,
                   double firstOutputBufferRatio,
                   double outputBufferRatio,
                   double inputBufferRatio,
                   int finalMemSize,
                   double finalInputBufferRatio,
                   Function newQueue,
                   Function newQueuesQueue,
                   java.util.Comparator queuesQueueComparator,
                   boolean verbose)
Creates a new MergeSorter.
Parameters:
input - input to be sorted
comparator - the Comparator used to compare elements
blockSize - the size of a block (page)
objectSize - the size of an Object in main memory
memSize - the memory available to the MergeSorter during the open-phase

MergeSorter

public MergeSorter(java.util.Iterator input,
                   int objectSize,
                   int memSize,
                   double firstOutputBufferRatio,
                   double outputBufferRatio,
                   double inputBufferRatio,
                   int finalMemSize,
                   double finalInputBufferRatio,
                   Function newQueue,
                   boolean verbose)

MergeSorter

public MergeSorter(java.util.Iterator input,
                   int objectSize,
                   int memSize,
                   int finalMemSize,
                   Function newQueue,
                   boolean verbose)
Creates a new MergeSorter (this is one of the most simple constructors available).

MergeSorter

public MergeSorter(java.util.Iterator input,
                   int objectSize,
                   int memSize,
                   int finalMemSize,
                   boolean verbose)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available).

MergeSorter

public MergeSorter(java.util.Iterator input,
                   int objectSize,
                   int memSize,
                   int finalMemSize,
                   Function newQueue)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).

MergeSorter

public MergeSorter(java.util.Iterator input,
                   int objectSize,
                   int memSize,
                   int finalMemSize)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available).

MergeSorter

public MergeSorter(java.util.Iterator input,
                   java.util.Comparator comparator,
                   int objectSize,
                   int memSize,
                   double firstOutputBufferRatio,
                   double outputBufferRatio,
                   double inputBufferRatio,
                   int finalMemSize,
                   double finalInputBufferRatio,
                   Function newQueue,
                   boolean verbose)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available). TODO

MergeSorter

public MergeSorter(java.util.Iterator input,
                   java.util.Comparator comparator,
                   int objectSize,
                   int memSize,
                   int finalMemSize,
                   Function newQueue,
                   boolean verbose)
Creates a new MergeSorter (this is one of the most simple constructors available). TODO

MergeSorter

public MergeSorter(java.util.Iterator input,
                   java.util.Comparator comparator,
                   int objectSize,
                   int memSize,
                   int finalMemSize,
                   boolean verbose)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is one of the most simple constructors available). TODO

MergeSorter

public MergeSorter(java.util.Iterator input,
                   java.util.Comparator comparator,
                   int objectSize,
                   int memSize,
                   int finalMemSize)
Creates a new MergeSorter based on xxl.collections.ListQueue (this is the most simple constructor available). TODO
Method Detail

showInfo

protected void showInfo()

getInputBufferSize

protected Function getInputBufferSize()

getOutputBufferSize

protected Function getOutputBufferSize()

close

public void close()
Overrides:
close in class PeekIteratorCursor

main

public static void main(java.lang.String[] args)