Analyzing population genomic data
Project Leads: Michael Lynch (PI), Matthew S. Ackerman (Project lead)
Research made possible by: UITS Research Technologies' (RT's) High Performance Systems (HPS), RT's Scientific Application and Performance Tuning (SciApt), National Center for Genome Analysis Support (NCGAS), IU's Karst supercomputer
As the cost of sequencing an organism's genome has declined, it has become possible to sequence the genomes of many individuals within a population, giving birth to the era of population genomics. In order to use the rich data sets generated, two kinds of errors must be accounted for: 1) sequencing errors, where an erroneous base has been inferred by the sequencing machine, and 2) under sampling, where only one of the two copies, or haplotypes, of an individual's genome have been sequenced. Most techniques address these problems in the same fashion: by sequencing large amounts of DNA from each individual. Sequencing large amounts of DNA ensures that each location in an organism's genome is sequenced many independent times, making sequencing errors obvious and making it nearly certain that both of the copies of an individual's genome will be sequenced. However, we take an alternative approach. Instead of sequencing enough DNA to make the analysis simple,
About RT Groups: The High Performance Systems (HPS) group implements, operates, and supports some of the fastest supercomputers in the world, IU's Big Red II, the Quarry cluster, Karst, and the large memory Mason system, in order to advance Indiana University's mission in research, training, and engagement in the state. HPS also supports databases and database engines used by the IU community.
NSF GSS Codes:
Primary Field: Genetics (610) Human-Medical Genetics
Secondary Field: Computer Science (401) Data Modeling