Three separate software packages are created: HAPLOTYPER, EM-DeCODER, and HaplotypeManager, as listed in the Appendix of Niu et al. (2002).
HAPLOTYPER is a software for haplotype inference using the Bayesian algorithm, which is the property of the Harvard University and is protected by copyright. A patent has been applied for this software.
EM-DeCODER is a free software package that uses the EM algorithm for haplotype construction.
HaplotypeManager is a simple user interface software package that provides a graphical summary of haplotype frequency distribution data for multiple populations.
Availability:
If you are working for an academic institution or non-profit organization, you may downlaod the HAPLOTYPER program for free by click here.
If you are working for an for-profit institution, please contact Ms. Holly Foskett at
Harvard Office for Technology and Trademark Licensing, phone: (617) 496-0474, email:
holly_foskett@harvard.edu. Be sure to specify whether you are interested in obtaining an
executable file or the source code of HAPLOTYPER.
To request
executable of EM-DeCODER, please go here.
Input file format for HAPLOTYPER: Each line in the input file represents the marker data for each subject; in each line, each single nucleotide polymorphism (SNP) occupies one space. For each SNP, 0 stands for heterozygote, 1 for homozygous wild type, 2 for homozygous mutant, 3 for missing both alleles, 4 for knowing only the wild type allele [i.e., (A,*)], and 5 for knowing only the mutant allele.
An example is shown here.
Output file format of HAPLOTYPER: The output file consists of two parts. The first part lists the two predicted haplotypes with their respective IDs and the associated posterior probabilities. The second part is the summary of the overall haplotype frequency estimated from this sample. If the number of SNPs is smaller than 20, we also included a haplotype code (shown in the parentheses), which is a decimal number converted from the binary sequence of the haplotype configuration (e.g., haplotype 101 is converted to 22+20=5).
An example is shown here.
In order to use HaplotypeManager, Java run-time environment (JRE) 1.2 or above or Java Development Kit (JDK) is required. To check whether you have it or not, in Microsoft environment, open a DOS window, and type in the command "java -version". If the above command results in an error message such as "command can not be found" or the version number is less than 1.2, then you need to install JRE or JDK1.2. JRE/JDK 1.2 or above can be downloaded freely at: http://java.sun.com/j2se/
The haplotype frequencies for multiple populations are tabulated using ASCII format. Each column represents one population, and each row represents one specific haplotype.
An example is shown here. The data are taken from Table 5 of Peterson et al. (1999). The first column lists IDs corresponding to each haplotype (1-12 denote H1-H12). Columns 2-8 denote haplotype frequencies of populations: Africa, Chinese, Japanese, Europe, Mayan, Pima, and Ticuna, respectively.
The output provides a simple yet versatile graphical user interface (GUI) for displaying the haplotype data in a colored spreadsheet. The features are as follows:
HaplotypeManager provides option to present data either as histogram or scatter plot.
In case the data is huge and densely clustered, HaplotypeManager provides an unlimited zoom-in/out functionality so that detailed data can be checked. This function is only available for scatter plot.
HaplotypeManager provides quick tool-tips to show the corresponding haplotype ID and frequency.
HaplotypeManager also provides the "auto-data-check" function to verify each point has the right frequency value (0-1) and the sum of all data for each column is equal to 1.
HaplotypeManager has the "drag-and-drop" feature for the columns in the spreadsheet, which provides an automatic mode for the user to relocate data from one place to another. It also enable user to change the value for each cell in the spread-sheet and this modification will be immediately reflected in the plot.
HaplotypeManager calculates the "haplotype heterozygosity" (HET) (Stephens et al., 2001) and the "haplotype information content" [analogous to PIC; each haplotype is treated as an "allele" for a multi-allelic marker (Niu et al., 2001)] for the selected population (column).
HaplotypeManager enables the user to select columns by a simple point-and-click.
An example of the graphical output is shown here. This GUI is created jointly with Dr. Zhenjun Hu.
The Java software package HaplotypeManager (Updated on 04/10/02) can be downloaded freely from here. To run it, one may directly invoke the following command:
java -classpath hm.zip HaplotypeManager
or one may unzip the downloaded folder, and invoke the following command:
java HaplotypeManager
T. Niu, Z.S. Qin, X. Xu, and J. Liu (2002) Bayesian Haplotype Inference for Multiple Linked Single Nucleotide Polymorphisms. Am. J. Hum. Genet. (To Appear).
T. Niu, B. Struk, K. Lindpaintner (2001) Statistical considerations for genome-wide scans: design and application of a novel software package POLYMORPHISM. Hum Hered. 52:102-9.
R.J. Peterson, D. Goldman, J.C. Long (1999) Nucleotide sequence diversity in non-coding regions of ALDH2 as revealed by restriction enzyme and SSCP analysis. Hum Genet. 104:177-187.
J.C. Stephens, J.A. Schneider, D.A. Tanguay, et al. (2001) Haplotype variation and linkage disequilibrium in 313 human genes. Science 293:489-93.