Accelerating Mass Spectrometer Data Analysis using IU Supercomputers
Project Leads: Scott McClary
Research made possible by: UITS Research Technologies' High Performance Systems (HPS), Scientific Applications and Performance Tuning (SciAPT), Karst supercomputer, Data Capacitor II
The Indiana University Chemistry department's Martin F. Jarrold Research Group (http://www.indiana.edu/~nano/index.html) studies a specialized technique of mass spectrometry called charge detection mass spectrometry (CDMS). The goal of mass spectrometry is to determine the mass of chemical and biological compounds, and with CDMS they are extending the upper limit of mass detection. The MFJ Research Group uses this technique to study problems such as virus assembly and DNA packaging.
Their mass spectrometer takes roughly 1 hour to analyze a sample and as a result outputs a 5GB dataset. Each 5GB dataset is then processed locally within their lab on windows servers that requires 4-6 hours to extract necessary information that is often buried under data "noise". Since their data analysis takes significantly (4 to 6 times) longer than their data collection, they have a well-defined bottleneck in their workflow.
In hopes of solving the MFJ Research Group's workflow issue, the Indiana University's Scientific Applications and Performance Tuning (SciAPT) specialists stepped in to analyze and improve the performance of their data analysis. By using performance analysis tools such as Score-P and Vampir, the SciAPT team was able to modify the Fortran program into highly scalable and efficient version that utilizes the power of IU's supercomputer named Karst. Running in parallel on Karst, the 5GB mass spectrometer data can be processed 11 times faster than on their local windows servers. This 11X speedup allows for a dataset to be processed in just 20-30 minutes.
With the assistance of Karst, the MFJ Research Group now has the ability analyze data faster than they can collect it, quickly reanalyze data with minimal overhead and expand their research to utilize additional mass spectrometers. Therefore, porting the data processing from local servers to Karst has allowed for the acceleration of groundbreaking discoveries within their field of research.
The High Performance Systems (HPS) group implements, operates, and supports some of the fastest supercomputers in the world – IU’s Big Red II, the Quarry cluster, Karst, and the large memory Mason system – in order to advance Indiana University's mission in research, training, and engagement in the state. HPS also supports databases and database engines used by the IU community.
The mission of the Scientific Applications and Performance Tuning (SciAPT) group is to deliver and support software tools that promote effective and efficient use of IU's advanced cyberinfrastructure which, in turn, improves research and enables discoveries.
NSF GSS Codes:
Primary Field: Biochemistry (602) - Biochemistry and molecular biology