ENCODE bioinformatics pipelines go the the cloud with DNAnexus.

Stanford University, the Data Coordination Center (DCC) for the ENCODE Project, has successfully adopted DNAnexus’s cloud based genomics platform for its next phase.

ENCODE is a public research consortium, funded by the NHGRI. Their mission is to build the Encylopedia Of DNA Elements (ENCODE). The first major milestone in their effort to identify all functional elements in the human genome, was the publication of the pilot study in June 2007’s Nature.

Consortium Members (NHGRI)

Following the early success of the the project, NHGRI funded new awards in 2007 to scale up ENCODE into its production phase. The DCC was put in place to track, store and display ENCODE data and release it into public databases. Tasked with centralizing the project’s raw sequencing data with uniform metadata standards and bioinformatics analysis, the DCC have now chosen DNAnexus to help them achieve this.

The team at Stanford were able to put ENCODE consortium’s initial bioinformatics pipelines onto the cloud, where they are now transforming raw sequencing data into a refined analysis. What does it take to make that transformation? 10 million core-hours of compute and will generate nearly 1 petabyte of raw data over the next 18 months on the DNAnexus platform.

“Many large-scale genomic studies have been limited by the lack of required compute power and collaborative data management infrastructure; this is a real hindrance in realizing the full potential of genomic medicine,” said Richard Daly, CEO of DNAnexus. “The DNAnexus global network provides hundreds of researchers at institutions worldwide secure and immediate access and use of ENCODE’s results. We believe the availability of the consortium’s gold-standard analysis pipelines and ENCODE data on a single integrated platform will accelerate genomic medicine.”

Stanford has open-sourced the ENCODE pipelines on GitHub, and they are also available in a public project on the DNAnexus platform.