|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unipi.di.util.ExternalSort
public class ExternalSort
A multi-way external merge sort that use a TreeMap (Java Red-Black Tree) as
internal sorting data structure. This implementation is 100% pure Java and
is able to scale over GBs of data better than the unix sort
command.
This class has a main method so that it can be used via command line in a easy way.
Being a standard Java class it can be instantiated and configured by means of
its "set" methods. The sort process can be started by the run()
method.
By default this class reads from stdin and write in stdout. This behavior can
be changed using the setInFile(String)
and the setOutFile(String)
method.
Nested Class Summary | |
---|---|
class |
ExternalSort.ReverseComparator
A reverse comparator. |
protected class |
ExternalSort.SortingKey
Used to compare two strings by their sorting columns (aka sorted keys). |
protected class |
ExternalSort.Tuple
Used to store all the Strings associated to a sorting key with their run's ids. |
Field Summary | |
---|---|
protected it.unimi.dsi.mg4j.util.MutableString |
buff
|
protected int[] |
columns
|
protected String |
currKey
|
static int |
DEFAULT_PAGE_SIZE
|
static long |
DEFAULT_RUN_SIZE
|
protected boolean |
dist
|
protected long |
elapsedSecs
|
protected boolean |
EOF
|
protected boolean |
extract
|
protected InputStream |
in
|
protected String |
infile
|
protected long |
numberOfDumpedRows
|
protected long |
numberOfInputRows
|
protected boolean |
numeric
|
protected PrintStream |
out
|
protected String |
outfile
|
protected int |
pageSize
|
protected String |
prevKey
|
protected boolean |
reverse
|
protected long |
rowsCount
|
protected long |
runSize
|
protected HashMap<Long,Reader> |
runsMap
|
protected char |
sep
|
protected boolean |
uniq
|
protected boolean |
verbose
|
static String |
VERSION
|
Constructor Summary | |
---|---|
ExternalSort()
Create a new ExternalSort. |
Method Summary | |
---|---|
protected File |
createSortedRun(List<ExternalSort.SortingKey> chunk)
|
protected void |
dumpSortedRows()
|
protected void |
initDataStructure()
|
protected void |
loadNextPage(long runNumber)
|
static void |
main(String[] args)
|
protected static void |
printUsage()
|
void |
run()
Start the sorting process. |
void |
setColumns(int[] columns)
Set the columns to sort. |
void |
setExtract(boolean extract)
If true, dump out only the sorting column(s) omitting the other ones. |
void |
setInFile(String infile)
Set the input file to sort. |
void |
setKeysDistribution(boolean dist)
Instead to sort, dump the frequencies of the sorting keys in their sorted order (not in the frequency values order). |
void |
setNumeric(boolean numeric)
Compare the sorting values (rows or columns) as numerical values. |
void |
setOutFile(String outfile)
Set the output file. |
void |
setPageSize(int pageSize)
Set the page size to use in the second stage of the algorithm (pagination of the sorted runs). |
void |
setReverse(boolean reverse)
Set true to sort in reverse order (ascendent instead of descendent). |
void |
setRunSize(long runSize)
Set the size of the chunk (run) of text to sort in memory at the first stage of the algorithm. |
void |
setSeparator(char sep)
Set the character to use to split the rows in columns. |
void |
setUniq(boolean uniq)
Remove duplicates from the result. |
void |
setVerbose(boolean verbose)
Set true to have log messages on stdout during the sorting process. |
protected void |
updateProgressInfos(long start)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final long DEFAULT_RUN_SIZE
public static final int DEFAULT_PAGE_SIZE
public static final String VERSION
protected boolean verbose
protected HashMap<Long,Reader> runsMap
protected InputStream in
protected PrintStream out
protected String outfile
protected String infile
protected int[] columns
protected char sep
protected long runSize
protected int pageSize
protected long elapsedSecs
protected long numberOfDumpedRows
protected long numberOfInputRows
protected boolean reverse
protected boolean numeric
protected boolean uniq
protected String currKey
protected boolean dist
protected String prevKey
protected long rowsCount
protected boolean EOF
protected it.unimi.dsi.mg4j.util.MutableString buff
protected boolean extract
Constructor Detail |
---|
public ExternalSort()
Method Detail |
---|
public void setVerbose(boolean verbose)
verbose
- public void setReverse(boolean reverse)
reverse
- public void setNumeric(boolean numeric)
numeric
- public void setKeysDistribution(boolean dist)
dist
- public void setUniq(boolean uniq)
uniq
- public void setRunSize(long runSize)
runSize
- the size of memory available expressed in bytespublic void setPageSize(int pageSize)
pageSize
- the page size expressed in bytes.public void setColumns(int[] columns)
columns
- the list of columns to sortsetUniq(boolean)
,
setSeparator(char)
public void setSeparator(char sep)
sep
- public void setExtract(boolean extract)
extract
- true to dump out only the sorting column(s)public void setOutFile(String outfile) throws FileNotFoundException
outfile
-
FileNotFoundException
public void setInFile(String infile) throws FileNotFoundException
infile
-
FileNotFoundException
protected void updateProgressInfos(long start)
public void run() throws IOException
IOException
protected File createSortedRun(List<ExternalSort.SortingKey> chunk) throws IOException
IOException
protected void initDataStructure()
protected void loadNextPage(long runNumber) throws IOException
IOException
protected void dumpSortedRows() throws IOException
IOException
protected static void printUsage()
public static void main(String[] args) throws Exception
Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |