Skip navigation and jump directly to page content

 IU Trident Indiana University

Improved Transcriptome Reconstruction with HPC and Trinity

Project Leads: Robert Henschel

Funded in-part by:  National Institute of Health (NIH), award 5820281-5500000615

Research made possible by:  Karst supercomputer, Mason supercomputer, Data Capacitor II

Transcript Reconstruction Performance
Figure 1. Transcript Reconstruction Performance - Trinity assemblies from several versions were analyzed to find the number of single genes, and also given an overall evaluation score using the Detonate assembly evaluation tool, which gives a rating to the assembly based on the likelihood of its being perfectly correct. Result for the seven most recent versions of Trinity are show below. The values are given as log probabilities, so the shorter bars are the best scores.

When the latest release of Trinity (2.0.6) was made available in January, the performance and quality analyses performed at Indiana University were presented on the Trinity home page at, where several versions of Trinity are compared for the number of genes found, the time and memory required to build, and the overall accuracy of the assembly. To assure high quality assemblies over a wide range of species, analyses were performed for a vertebrate (M. muscula), an invertebrate (D. melanogaster) and a yeast (S. pombe).

Trinity, a sequence assembly application developed at the Broad Institute and the Hebrew University of Jerusalem, reconstructs a set of RNA transcripts from the reads produced by Illumina and other next-generation sequencing platforms. Indiana University performance tuning specialists have been working with the Broad Institute and the Technische Universitat Dresden High Performance Computing center to analyze and improve the performance of Trinity using an array of profilers and methodologies.

By analyzing reads taken from a variety of organisms, developers were able to find opportunities for parallelization and other performance improvements that reduced the time required to assemble 50 million reads from M. muscula (the house mouse) from 19 hours down to slightly more than five. Researchers also performed analyses on the resultant assemblies to confirm that the high quality of each assembly remained constant.

The mission of the Scientific Applications and Performance Tuning (SciAPT) group is to deliver and support software tools that promote effective and efficient use of IU's advanced cyberinfrastructure which, in turn, improves research and enables discoveries.

NSF GSS Codes:

Primary Field: Geosciences (610) - Genetics - Genome Sciences/Genomics