Enabling Open Access to Cancer Genomics Data
The NCI Genomic Data Commons (GDC) is making a bold move, introducing Data Analysis, Visualization, and Exploration Tools an online, open access cancer research resource they’re calling DAVE.
Since DAVE was launched last year, the GDC has collected and harmonised a vast quantity of cancer genomics data, more than 4.5 petabytes, which has been fundamental in the recent progress against cancer and holds the promise for continued improvement in our ability to diagnose, treat, and care for patients.
Their vision, however, has been much grander than creating a data repository; they have been building a foundation for a knowledge system with the power for in-depth analyses that will extend the reach and utility of the data to a wide community of scientists. In transforming the GDC into this knowledge system, they are hoping to transform the research community.
Making the GDC Accessible and Usable for Many
The growth in cancer genomics has been one of the most exciting scientific and technological developments in cancer research, spurring significant advances in patient care and laying the groundwork for many future advances.
As a data-sharing platform, the GDC helps to standardise data collected from a diversity of patients and tumour types. The end result is a catalogue of harmonised genomic and clinical data, all of which must meet rigorous quality standards and undergo processing using the latest methods. This repository of accurate and robust data will continue to grow as NCI and the research community submits new data to it.
Now, as a data-analysis system, the GDC is taking major steps toward engaging the broader research community and encouraging further collaboration and data sharing. Central to NCI’s decision to make such a large trove of data available to the public was the understanding that NCI and its grantees do not have all the skills, tools, and knowledge necessary to unearth every hidden gem in the data. To make diverse discoveries about an exceedingly complex disease, we need great scientific minds across different disciplines—from laboratory scientists to statisticians to drug developers—to work together.
DAVE helps the GDC fulfill its mission of making research data widely accessible and usable by bringing the information technology infrastructure required for downloading, storing, and analyzing big data directly to researchers, making it easy for anyone in the cancer research community to work with the data available in the GDC. DAVE also makes it easier for experts in diverse areas of biology and other disciplines to incorporate cancer genomic data into their research.
DAVE: Data Analysis, Visualization, and Exploration
DAVE is a new web interface for exploring and analysing cancer genomic data, in real time, online, without the need to download or process the data. Researchers can navigate from project cohorts to individual patients, to specific genes and mutations of interest. DAVE includes specialised graphs to help researchers visualise genomic “signatures” of cancer and identify potential drivers of disease. Users can also plot patient survival curves and identify the molecular consequence of a mutation on the resultant protein.
Notably, DAVE provides an unprecedented level of flexibility in exploring the data. Researchers can create custom cohorts for analysis by selecting patients with particular altered genes or other relevant biological and clinical features. And researchers are no longer bound to analysing patients only in the context of their original project cohorts—a powerful innovation given the recent evidence that a tumour’s molecular features are far more accurate and informative for cancer subtyping than tissue of origin or histology.
This level of customisation allows researchers to dive deeply into their area of expertise to answer a host of research questions.