Contributing to the Annotation of the Loblolly Pine Genome
Project Leads: Keithanne Mockaitis and Le-Shin Wu
National Center for Genome Analysis Support (NCGAS),UITS Research Technologies, research made possible by Mason, Data Capacitor
High Performance File Systems, UITS Research Technologies
A critical component of a successful genome sequencing project is to discover the genes contained within the genome. This step, called gene annotation, is particularly difficult. One approach to gene annotation is to sequence the RNA molecules found in the organism, and map these assembled transcripts back onto the newly assembled genome. This is what was done, with help from NCGAS, for the loblolly pine, which is at the center of a major multi-site sequencing effort. A paper detailing this work, “Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation,” was recently published in Genetics: http://www.genetics.org/content/196/3/891.
The loblolly pine is the most economically important tree in the United States, and the source of most of the wood pulp used to produce paper products. A complete and annotated genome will be used by plant breeders to develop strains of the tree optimized for different growing conditions, or resistant to environmental or biological threats such as drought or disease. This project is particularly difficult because the Loblolly pine genome is the largest plant genome yet sequenced, and is seven times larger than the human genome.
NCGAS bioinformatician Le-Shin Wu, working in close partnership with Indiana University faculty member Keithanne Mockaitis, provided bioinformatic assistance in running de novo RNA-sequence assemblies, and technical support with the Mason cluster. NCGAS additionally provided computational resources specifically designed to support these sorts of compute jobs.
The National Center for Genome Analysis Support supports life science research on the national cyberinfrastructure, enabling the US biological research community to analyze, understand, and make use of the vast amount of genomic information now available. NCGAS focuses particularly on transcriptome- and genome-level assembly, phylogenetics, metagenomics/transcriptomics and community genomics.
The High Performance File System group provides high-speed, disk-based storage of data for IU researchers.
NSF GSS Codes:
Primary Field: Genetics 610 - Genome Sciences/Genomics
Secondary Field: Computer Science 401 - Computer Systems Analysis