BLAST on the Open Science Grid
Project Leads: Robert Quick, Richard LeDuc, and Bill Barnett
High Throughput Computing & National Center for Genome Analysis Support, Science Community Tools Group, UITS Research Technologies
The popular web portal Galaxy, which life scientists use to analyze a wide array of sequence data, has been extended to allow users to submit BLAST jobs that run on the Open Science Grid (OSG).
A seemingly small change to a web page represents a major step forward in relieving a computational bottleneck that biologists and medical researchers encounter. The largest computational challenge facing life scientists is comparing new DNA, RNA, and Protein sequences with other known sequences to gain insights into the function of the new sequence. The most commonly used tool for this is a program known as BLAST. Genomic researchers often wait up to three or more weeks for BLAST to analyze a set of new sequences. Not only is this slow, but it also consumes local computer resources. In addition, scientists frequently use the popular Galaxy web portal to run smaller BLAST jobs along with hundreds of other analytic tools, but are forced to use different and more complex tools for running the larger jobs. The OSG provides a place to run very large computational jobs in parallel, but life scientists found it unapproachable. Life scientists can now run BLAST on systems across the nation, thanks to a system by which the Galaxy web portal automatically breaks apart large BLAST jobs and submits them to the OSG. This relieves stress on local computational systems, and has the added benefit of allowing jobs to complete more quickly.
PTI staff from Indiana University’s High Throughput Computing group and the National Center for Genome Analysis Support have extended the Galaxy web portal to run large jobs on the NSF-funded Open Science Grid.
The High Throughput Computing group supports use of High throughput computing by the IU and national research communities. HTC is primarily funded by the National Science Foundation.
The National Center for Genome Analysis Support enables the biological research community of the US to analyze, understand, and make use of the vast amount of genomic information now available. NCGAS focuses particularly on transcriptome- and genome-level assembly, phylogenetics, metagenomics/transcriptomics, and community genomics.
NSF GSS Codes:
Primary Field: Genetics (610) - Genome Sciences/Genomics
Secondary Field: Computer Science (401) Computer Systems Analysis