Poll Results: Can we handle big data?
Genomics generates enormous datasets better than almost any other branch of science. Are we ready to deal with the onslaught?
Do research facilities have enough expertise to handle large genomic datasets on their own?
Yes 7.7 %
No 76.9 %
“More use of cloud will be needed” 7.7 %
“With the right help” 7.7 %
During my PhD the largest dataset I ever generated was a list of detailed observations of sperm storage characteristics from some 150 stalk-eyed fly females. To me this was a gargantuan heap of data, just he right size for feeding through my simple multivariate statistical models and more than enough for a meaty thesis chapter.
150 data points pales into insignificance compared with the kind of data generated by next generation sequencing. 150,000 or more would be nearer the mark. At that level, your average t-test is not going to cut it. For that matter, neither is your average statistical computer programme. Now we’re very definitely into serious custom programming territory. Which begs the key question: are research facilities ready to handle the sheer quantities of data that their sequencing experiments will create?
According to our poll results, it seems that many of you believe that they almost certainly aren’t. At least not yet; the additional comments suggest that the right, user-friendly tools could improve institutional expertise. Back in August we covered news from Cincinnati Children’s Hospital Medical Center, in which a team had developed an integrated analysis platform designed to guide researchers through the computational analysis needed for NGS datasets. There are now a wealth of supporting companies and services available to support research facilities in handling their data.
The comment about cloud computing is strikingly relevant. In a recent interview with FLG Magazine, Jonathan Bingham of Google Genomics explained the enormous potential impact that cloud computing could have on how we handle genomic data:
“The real power will come when we embrace the fact that cloud computing brings entirely new capabilities. Imagine if you never had to download data again; if you didn’t have to manage a computer cluster, even a virtual one in the cloud; if you could use statistical tools, and you never had to think about running out of computer memory; if you didn’t have to downsample to analyze your complete cohort; if you could have an idea, and test it right away, without having to wait weeks or months to write complicated code and manage servers or even virtual machines.”
I suspect that this kind of powerful, high-speed cloud computing is the major direction for genomic data analysis. Sequencing technology is evolving at a fantastic rate, demanding that the available computing power keep up with it, and we shall have to keep desperately paddling to keep up!
This poll is in collaboration with Source BioScience, an international provider of laboratory products and services.
More on these topics