COLLAPSE 1.2 README
Collapse is a program for collapsing sequences to haplotypes (unique sequences). It can read DNA sequence alignments in NEXUS or PHYLIP "sequential" format. The program indicates which sequences belong to which haplotype and generates an alignment in phylip format including the haplotypes (this file is called inputfilename.haps).
Running the program
Executables are provided for Macintosh and Windows.
- In Macintosh the program can be started by double/clicking the application file Collapse1.2.app. Arguments can be typed in the command line.
- In Windows the program will run from a terminal window (sometimes called DOS or console window). The user needs to open a terminal, move to the Collapse folder, and type at the prompt Collapse1.2 inputfilename, or, for example, Collapse1.2 inputfilename -g -l3 -p
- In Unix-like OS move to the program folder and type make.The program should compile without much problem in any Unix-like environment (Linux, MacOSX, etc.). The makefile is provided for the gcc compiler, but if you have cc instead, just change gcc for cc in this file. To run the program type Collapse1.2 inputfilename, or, for example, Collapse1.2 inputfilename -g -l3 -p
An example file (in mac format) is provided.
Options
By default differences resulting from missing data are considered, and gaps are treated as a fifth state. In the presence of missing data collapsing sequences can be ambiguous. Using the default settings, even two sequences differ only by missing data, they will be considered different haplotypes. If missing data is ignored, collapsing can be order dependent, because sequences are collapsed with the first redundant sequence that is found. Program messages will warn the user when ambiguous situation might occur.
Program arguments are:
- -g : treat gaps as missing data (default is to treat gaps as a 5th state)
- -p : print data
- -n : export haplotypes in NEXUS format (default is PHYLIP format)
- -v : do not print invariable sites to the haplotype file (default is to print all sites)
- -m : do not count missing data as differences (default is yes)");
- -l : set connection limit (sequences differing at <= l sites will be collapsed) (default is l=0)
- -? : get some usage help
Disclaimer
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
David Posada - dposada@uvigo.es
May 2004