ldgroup: linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs. ### URL, CONTACT http://www.fumihiko.takeuchi.name/publications.html Questions, comments or bugs? Please write to the address mentioned in the above web page, quoting 'ldgroup' in title. ### REFERENCE The program ldgroup implements the algorithm explained in the following paper. Please cite it for reference. Takeuchi, F. et al. (2005) Linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs, Genetics 170, 291--304. ### PROGRAM FILE README.txt this file ldgroup.R the program written in R ### R The ldgroup program is written in R language, which is available from http://www.r-project.org/ The program also requires an R library 'igraph', which can be downloaded using your R program. Relevant information is available from the following URL. http://www.stats.bris.ac.uk/R/src/contrib/Descriptions/igraph.html http://cneurocvs.rmki.kfki.hu/igraph/ ### INPUT FILE Differently from the explanation in the paper, ldgroup uses 'phased' genotype (ie haplotypes) as input. Thus, you need to phase your genotype data before running ldgroup. Since, phasing programs usually impute missing genotype, the input for ldgroup assumes to include no missing data. Output format of two phasing programs are readily supported. One is the Beagle software by Browning et al: http://www.stat.auckland.ac.nz/~browning/beagle/beagle.html I also prepared a sample input file for ldgroup in this format: test.phased Another is the MACH software by Li et al: http://www.sph.umich.edu/csg/abecasis/MACH/index.html I also prepared sample input files for ldgroup in this format: test.geno test.dat ### OUTPUT FILE; TAG SNP SELECTION Output file names are filenameradix.tagSNPs.txt where is the r^2 threshold for defining LD groups There is one row for each SNP. The SNP ID is in the third column. The SNPs are roughly classified by LD groups (first column), and then subdivided into complete-LD subgroups (second column). The SNPs having the same number for the first and second columns belong to the same complete-LD subgroup, and could be regarded as equivalent. Selecting one each from the equivalent SNPs comprises a set of tag SNPs. ### VERSION HISTORY 2007.11.16 ### LICENSING GNU General Public License (If you are not happy with this, please contact me for other licencing.)