Researchers at the Commonwealth Scientific and Research Organisation (CSIRO) have developed a cloud-based tool that acts as rapid “search engine” of the genome to cut down analysis time of large data sets. As ever more genomic data is generated, shared and housed ready for analysis the race is on to find new tools capable of extracting meaning and insights from vast data sets that are projects to grow at astonishing rates as more people seek to find answers in their bio-data across the globe.  

Denis Bauer, senior research scientist and research team leader in bioinformatics at Australia’s Commonwealth Scientific and Research Organisation (CSIRO), explained that although there were estimates of two exabytes of data on YouTube and one exabyte on Twitter by 2025, there would be 20 exabytes of genomic data by that time.

Speaking at the YOW! 2017 software developers conference in Sydney, Dr. Bauer and big data and cloud architect Lynn Langit, also a speaker, described how, CSIRO has developed and deployed the VariantSpark tool to search through the genomic data, writes Computer Weekly. Essentially a machine learning platform for genomic variants that reduces the risk of false positives, VariantSpark would also be useful for other massive data analysis.

The advent of the internet of things and what Langit described as the “datafication of everything” would require the use of a tool like VariantSpark, which has been deployed on Amazon Web Services and could slash the time to perform exploratory analytic work from hours to minutes.

Bauer and her team will make VariantSpark available as open source code on Amazon Marketplace in 2018. In the longer term, CSIRO plans to develop a range of commercial tools and services around it, she added.

CSIRO is also developing tools to provide computational tools to screen embryos with particular disease markers.

Other groups are hotly pursuing machine learning approaches as the prizes for efficient and accurate analysis are so great not just in terms of human health but commercially. Last week, we reported on WuXi NextCODE’s tools and  Hannes Smarason, co-founder and CEO believes their tool  too does for medical researchers and clinicians what Google’s search engine did for internet users nearly two decades ago.

He explained, “For the first time, technology has really come together in a unique way to enable an entirely new industry around low-cost data generation on people’s sequences. Together with the computational power and the emergence of artificial intelligence (AI), to then make sense of and use that information for a variety of purposes is really opening up this market to significant disruption.”


Opportunities in China Will Accelerate WuXi NextCODE’s Global Genomic Platform

It’s clear as the data continues to gather this area will see rapid advances and a competitive environment from which the winners whose tools perform best and find favour with users will emerge and define genomic analysis pipelines as genomics hit the mainstream