**WH** -- A COMPUTER PROGRAM FOR ISOLATION MODEL FITTING--*
DOCUMENTATION
Jody Hey Department of Genetics Rutgers University Nelson
Biological Labs 604
Allison Rd. Piscataway,
NJ 08854-8082 732-445-5272 fax
732-445-5870 http://lifesci.rutgers.edu/~heylab
*Some key internal parts of this program derive from the first program for the method that was written by John Wakeley.
This computer program and documentation may be freely copied and used by anyone, provided no fee is charged for it.
_______________________ Contents
_______________________
______________________
______________________ Overview
______________________
WH is a program that fits a simple speciation model, called the Isolation Model, to multilocus DNA sequence data sets. The isolation model assumes the following:
WH implements the methods described in Wakeley, J., and J. Hey, 1997 Estimating ancestral population parameters. Genetics 145: 847-855.
Wang, R. L., J. Wakeley and J. Hey, 1997 Gene flow and natural selection in the origin of Drosophila pseudoobscura and close relatives. Genetics 147: 1091-1106. ______________________ Downloadable Files Return to Contents ______________________ _______________________ Input File
Format Return to Contents _______________________
- If all that is to be done is a basic fitting of the isolation model, and if no tests are to be done, then each line is required to have 9 items - see DATA LINES below. - If simulations and statistical tests are to be done, then each line also requires 2 additional items – the population recombination rate estimates for each species - If linkage disequilibrium tests are also to be done, then there will be an additional 6 items, for a total of 17. DATA LINES FOR EACH LOCUS, ONE LINE PER LOCUS, IN ORDER:
If Simulations are to be done then then each line should also have:
If tests of Linkage disequilibrium are to be done then each line should also have:
Note if there is not an LD measurement for a locus, -10 is used.
Note on simulations.
The simulation results are sensitive to the amount of recombination. In the published descriptions of these simulations (Wang, Wakeley and Hey, 1997; Kliman et al., 2000) we used the gamma estimator of recombination (Hey and Wakeley, 1997). This estimator tends to have a bias such that the estimates are lower than the expected value of the parameter. The result of having lower recombination is to raise the variance of the observations (of exclusive, shared and fixed variants) and thus to broaden the distribution of test statitics of the fit of the model to the data. In this sense, the tests should be conservative. However this is not guaranteed, and users may want to exam the quality of the fit between the model and their data by considering a range of recombination rates.
Recombination rate estimates are sometime not available for both species. Also they are never available for the common ancestral species. Following is the method of assignment of population recombination rates: - The program takes 4Nc1i as input (4Nc for species 1, locus i) and then sets 4Nc2i = 4Nc1i theta2/theta1 4NcAi = 4Nc1i thetaA/theta1 -Obtaining 4Nc1i depends on whether one has estimates for species 1 or species 2 or both. If only 4Nc1i is available, then that's it. - If only 4Nc2i is available, then 4Nc1i = 4Nc2i theta1/theta2. If both are available, then 4Nc1i = (4Nc1i + 4Nc2i theta1/theta2)/2. _______________________ Running the
Program Return to Contents _______________________ The program file should reside either in the same folder as the data file or in a folder automatically searched by the operating system. The program can be run using command line parameters, or by simply typing the name of the program ('wh'). If command line parameters are not used, the program asks for the values of runtime parameters.
The user starts the program simply by going to the folder where the data file exists and typing the name of the program (e.g. 'sites') followed by the enter key. The program asks several questions about the data file and the desired analysis. Nearly all commands and options can also be entered using command line parameters.
The program can be started with or without the use of instructions at the command line.
Without command line instructions - simply type ‘wh’ at the prompt. The program will ask for basic information.
On a PowerPC, clicking on the program icon opens a small window in which command line parameters can be entered. The user can also just hit return at this point and the program will request runtime parameters.
Command Line Parameters: Type and enter wh -d'datafilename' -r'resultsfilename' -N'numsims' -L'ldtype' -A'ranseed'
Where:
_______________________ _______________________ Output is contained in the results file. There are three main sections: INPUT; MODEL FITTING RESULTS; and SIMULATION RESULTS.
INPUT simply lists in tabular form the data in the input file. MODEL FITTING RESULTS lists the following:
SIMULATION RESULTS lists the following:
______________________ Program
Limitations Return to Contents _______________________
For simulations, the program can only handle a total sample size for each locus of 32. If the program is compiled under Microsoft Visual C++ (as the distributed Win32 version is) then it can makes use of a compiler extension and can handle total per locus sample sizes of 64.
For basic model fitting, without simulations, larger sample sizes can be used.
During simulations, recombination within a locus can occur only between sequence segments. The program has been compiled with 50 segments per sequence, which should be more than sufficient for most data sets. However it is possible that this will not be sufficient for loci with long sequences and high amounts of recombination.
_______________________ Literature Cited Return to
Contents _______________________ Hey, J., and J. Wakeley, 1997 A coalescent estimator of the population recombination rate. Genetics 145: 833-846.
Kliman, R. M., P. Andolfatto, J. A. Coyne, F. Depaulis, M. Kreitman et al., 2000 The population genetics of the origin and divergence of the Drosophila simulans complex species. Genetics 156: 1913-31. Wakeley,
J. and J. Hey. 1997 Estimating ancestral population parameters. Wang, R.
L., J. Wakeley and J. Hey, 1997 Gene flow and natural selection in the origin of Drosophila
pseudoobscura and close relatives. Genetics 147: 1091-106.
This page was last changed February 04, 2004 |