Source: NHGRI

A novel artificial intelligence tool that can accurately call out variants in sequencing data was released as open source on the Google Cloud Platform yesterday. The tool, called DeepVariant, was developed during a collaboration between the Google Brain team and researchers from fellow-Alphabet subsidiary, Verily Life Sciences. The release was announced in a press release cross-posted to the Google Research blog and the Google Open Source blog.

When a genome is sequenced, the data will reveal small nucleotide insertions or deletions, as well as loci where a single nucleotide pair has been changed. These variants can be used to determine gene function and activity, the nature of a patient’s medical condition, or, in some cases, predict a healthy person’s risk of developing a disease. Even with the accuracy of modern sequencers, however, there will also be a handful of sites where the sequence data has been altered because of mistakes made during the sequencing process.

In the past, it has been difficult for machine learning-based tools to differentiate between these two types of alterations, particularly in regions of the genome that contain a lot of repetitive sequences. This has meant that variant calling tools have always had a certain level of inaccuracy when identifying small variants.

“Processing the [high throughput sequencing] output into a single, accurate and complete genome sequence is a major outstanding challenge,” reads the press release. “The importance of this problem, for biomedical applications in particular, has motivated efforts such as the Genome in a Bottle Consortium (GIAB), which produces high confidence human reference genomes that can be used for validation and benchmarking, as well as the precisionFDA community challenges, which are designed to foster innovation that will improve the quality and accuracy of HTS-based genomic tests.”

Want to learn more about computational genomics? Why don’t you try our Genomic Data and Cloud Computing 101s?

The Google Brain team and Verily wanted to develop a machine learning tool for variant calling that was capable of differentiating between accidental sequence changes and genetic mutations. To do so, they used millions of sequences collected by the GIAB project to teach their artificial intelligence, adjusting the parameters of the tool through a series of iterations.

Their work was rewarded last year when DeepVariant won the 2016 PrecisionFDA Truth Challenge for the Highest SNP Performance. Since then, the team believe that they have further reduced the error rate by 50%.

“DeepVariant is the first of what we hope will be many contributions that leverage Google’s computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community,” the press release concludes. “This is all part of a broader goal to apply Google technologies to healthcare and other scientific applications, and to make the results of these efforts broadly accessible.”