June 22–26, 2014
Leipzig, Germany

Presentation Details

Name: Next Generation Sequencing: Using High Performance Computing Best Practices to Enable the Genomics Pipeline & Integrate with the Downstream Analytics
Time: Wednesday, June 25, 2014
12:45 pm - 01:00 pm
Room:   Hall 2
CCL - Congress Center Leipzig
Breaks:01:00 pm - 02:15 pm Lunch
Speaker:   Janis E. Landry-Lane, IBM
Abstract:   High-performance technical computing and storage solutions are required to process the data produced by Next Generation Sequencing that is doubling every five months. There are four phases in a sequencing project: a. Experimental design and sample collection, b. Sequencing c. Data management and d. Downstream analysis. In the data management phase (c), raw sequencing reads are mapped to a known reference genome or assembled de novo.  It takes a highly optimized HPC platform to keep pace with the genomic data analysis.  Additionally, with the use of compression, there are additional requirements. The applications and algorithms in the phase c. Data management,  are typically CPU and I/O intensive.  Janis Landry-Lane and Dr. Kathy Tzeng have looked at this workload and characterized it so that optimal HPC solutions and best practices can be applied to this workload.
Although the capability to rapidly and cheaply sequence the genome is important, this is not the primary goal of a sequencing project. It is only after analyzing the genome data with corresponding phenotype information, image analysis, and published scientific discovery, that researchers can obtain insights. Large computational capacity along with sophisticated algorithms are mandatory. Janis will address the work being done by IBM to integrate genomics and translational platforms with the optimal components to help researchers analyze the wide spectrum information and to speed the analytics so that the promise of genomics is realized.