We’ve barely begun to see what’s possible: Interview with Jonathan Bingham, Google
This interview with Jonathan Bingham of Google Genomics first appeared in Issue 2 of Front Line Genomics Magazine. Jonathan will be presenting “7 Billion Genomes in 17 Minutes” at the Festival of Genomics California in November this year. Be sure to grab your ticket here!
When a company’s name enters the global vernacular, you know they’ve done something special. For many of us, ‘Googling’ is an essential part of navigating through our professional and personal lives. It’s hard it’s not hard to get excited about what the data wizards at Google will do for genomics. Jonathan Bingham, Product Manager at Google Genomics gives us some background on how Google Genomics came to be, and what they’re hoping to achieve.
The age we live in is increasingly data centric. Businesses are constantly looking for new ways to create and leverage information; while we are quick to adopt new tools and applications that make our personal lives easier. Central to all of this is Google. Be it, Google Analytics, Google Maps, or the Google search engine, the company has changed how we look and interact with the world around us.
As the field of genomics hits its stride, there is a swelling volume of sequencing data being amassed. So how will Google Genomics help us do things we can’t even imagine right now?
FLG: As the field of genomics is a rapidly growing and very exciting field to be a part of, what initially led you to be part of this community?
JB: Ever since I was a student, my interests have always been interdisciplinary. My practical side was drawn to computer science. My heart was drawn to applications far away from the typical software companies and corporate IT – including simulation, modelling, social science, the humanities. Bioinformatics was a new and exciting field, and I wanted to see where it would lead.
FLG: What made you decide to make the move from PacBio over to Google?
JB: From my time at PacBio, I knew that the health care system was about to be swept up in an information tsunami from ubiquitous DNA sequencing as well as imaging and sensors. When I learned of the potential to bring Google’s scale and technology to bear on these challenges, of course I wanted to be a part of it.
FLG: Google has developed a reputation for being able to do just about anything with sufficient data. There’s a lot of interest around what Google will be able to do for the field of genomics. Does that add any extra pressure?
JB: From chairman Eric Schmidt, who sits on the board of Mayo Clinic and Broad Institute, to founder Sergey Brin, who was married to the founder of 23andMe, to senior fellow Jeff Dean, who was an intern at the Center for Disease Control, to many other software engineers with backgrounds or friends who work in the field, there’s an appreciation for the challenges facing the genomics field. There’s also some healthy realism about what is and isn’t possible with data analysis. I’d say there’s more curiosity, and optimism, than pressure.
FLG: How is the amount of data produced within a project such as veteran affairs million genome project, comparable to the size of data Google handles on a daily basis?
JB: To put it in perspective, users upload over 300 hours of video per minute to YouTube. At typical smartphone video resolution, that’s like loading and indexing the raw sequence information from 30 whole human genomes every single minute. The Google Search index itself is over 100 petabytes in size. That’s like storing the genetic variant calls from 100 million people, and being able to look up results in a quarter of a second.
FLG: What makes genomics a market of interest for Google? Between Google Genomics, the Baseline project and other moonshots, it seems like you guys are investing quite seriously in healthcare in general?
JB: Given the growing importance of information science within the healthcare system, and the role that Google has played in developing technologies for large-scale analysis, it only makes sense for Google to get involved to help tackle some of the data challenges in genomics. In the broader health space, we can bring technology to bear as somewhat of an R&D lab for our life sciences partners, helping them answer big questions more quickly and efficiently.
FLG: How was Google Genomics started?
JB: Google Genomics got started up from the grassroots efforts of software engineers, in their 20% time, because they were interested in the field. They had the idea that Google Cloud Platform, and a technology called BigQuery, which we use extensively to interactively explore trillions of log entries, might be a natural fit for mining sequence data. Other software engineers had thoughts about how to compress and index DNA sequences. A Google VP with a deep personal interest in cancer genomics saw this activity, and helped catalyze the formation of a full-time team. That’s how it began.
FLG: Cloud technology is quickly becoming an indispensable tool for business and personal life. You’re collaborating with Institute for Systems Biology on the NCI Cloud Pilot program. What kind of impact do you think cloud technology is going to have for genomics research? What will it enable us to do that we couldn’t do before?
JB: Cloud technology makes a lot of sense in general. Most companies today don’t even think about building their own power generating plants. Why would they build their own data centers? In the first wave, many bioinformaticians think about cloud technology the same way they think about university compute clusters. They imagine doing the same things, but not having to deal with an IT department or manage the hardware themselves. And they’re saving money and moving faster with that approach, so it’s a great beginning.
The real power will come when we embrace the fact that cloud computing brings entirely new capabilities. Imagine if you never had to download data again; if you didn’t have to manage a computer cluster, even a virtual one in the cloud; if you could use statistical tools, and you never had to think about running out of computer memory; if you didn’t have to downsample to analyze your complete cohort; if you could have an idea, and test it right away, without having to wait weeks or months to write complicated code and manage servers or even virtual machines. Imagine if you could do this from anywhere in the world, without needing to be at a well-funded university with a large IT budget, without even needing more than a tablet or phone. We’ll have high school science fair participants in developing countries asking questions that faculty at our top universities have never thought to ask, and making discoveries that matter for basic science and for human health.
FLG: As genomic database tools grow and improve, what do you see as being the greatest benefit for drug developers and what will this mean for patients in the near future?
JB: For drug developers, there’s long been the idea of rational drug design, and of personalized medicine. Having more complete genomic databases will help, especially when combined with other kinds of clinical information. The benefits will come from getting to a scale where machine learning and sophisticated analytical methods become possible. That means there’s going to need to be more collaboration and data sharing.
Google Genomics is designed to facilitate that kind of collaboration by building on the open standards developed by the Global Alliance for Genomics and Health. A scalable, interoperable genomics platform will make possible entirely new applications. Drug developers and patients will benefit from the combined contributions of the community.
FLG: What’s your view on open source data sharing i.e. personal genome project, opensnp.org?
JB: Google Genomics supports sharing genomic data as widely or as narrowly as the institutional review board and the patient consent forms allow. If a study is limited to one set of researchers for one use only, we support setting restrictive access, and in fact that’s the default. If a study is open to collaborators, that’s supported as well. If it’s open to researchers for other qualified use through an application process, that’s supported. And if the project is truly open to anyone, like Personal Genome Project and opensnp.org, that’s supported too. Our belief is that sharing data more widely will benefit more people. Bartha Knoppers and others at the Global Alliance for Genomics and Health are framing the issues in a way that we support.
FLG: How easy is it to integrate a project into the cloud once it’s already begun?
JB: There are multiple ways to make the transition, some of them are really simple. To get started with Google Genomics, you can take the variant calls from your DNA sequencing experiments and load them up to the cloud, and begin exploring trends and patterns right away. In other words, you can focus on adding new capabilities, rather than replicating what you’re already doing on local compute resources. You can get started in an afternoon. You can also take your entire IT infrastructure and move it to Google Cloud Platform, using Google Compute Engine virtual machines, Google Cloud Storage for files, Google Cloud SQL as a relational database, Google Cloud Datastore as a NoSQL alternative. Anything you’re running locally, you can probably get it set up in the cloud, without too many changes. Longer term, you’ll get the most benefit from thinking about infrastructure differently.
FLG: At the moment these tools seem to largely be targeted to bioinformaticians, or at least researchers with a familiarity with analytical data outputs –How user friendly can you realistically make things without sacrificing depth?
JB: We’re starting with the applications where Google can bring the most unique value. We have a lot of great technologies for working with large data sets, and bioinformaticians are excited to get access to those tools. Institute for Systems Biology is building a more biologist-friendly interface for the Cancer Cloud Pilot. Autism Speaks is building a researcher portal for the MSSNG project. Over time, the trend is toward friendlier interfaces, making advanced tools more widely available.
FLG: What potential risks do you foresee for patient privacy, how do you think these can be handled?
JB: Google Cloud Platform supports HIPAA covered entities by signing Business Associates Agreements. We encrypt all information where it’s stored and when it’s transferred over the network. There are many security and compliance certifications that provide assurance that information is shared only as planned.
FLG: What have been the highlights for you since Google genomics first began? Atul Butte has compared the work at Google genomics to how travel agents felt when they saw expedia.
JB: So far a few of the highlights have been the positive community response to the announcement that Google Genomics had formed and joined the Global Alliance for Genomics and Health; the ICGC DREAM somatic mutation calling challenge; the NCI Cancer Cloud Pilot with Institute for Systems Biology; the launch of the MSSNG database for autism research; and the first peer-reviewed publication to mention Google Genomics. This is just the beginning. Over the course of 2015 and the coming years, the momentum will increase.
FLG: In 10 years’ time, will the term ‘Googling’ have taken on a life of its own in genomics?
JB: I’ve been struck by how often I hear that clinicians and researchers find out more about a genetic variant by starting with a Google Search. Results today are based on web pages. Google Genomics is building optimized storage, processing, and exploration that’s adapted to the domain. We’ve barely begun to see what’s possible. I think we’ll all be surprised by how genomics as a field looks in 10 years.
FLG: What do you think the next big story in genomics will be?
JB: The small stories are easier to predict – the incremental improvements, as technologies that exist today are applied to genomics. The big news stories that excite me most will be the unexpected breakthroughs and insights. These are exciting times for genomics.
FLG: We hear you’ve got some good restaurant recommendations for people travelling into the Bay Area. What are your top five picks?
JB: When not studying DNA, we should all make a point to eat foods filled with plant and animal DNA. In Berkeley, Gather offers some of the most inventive farm-to-table food. In San Francisco, Lers Ros is my favorite Thai restaurant. Orenchi in Santa Clara is a top spot for ramen. Madras Cafe in Sunnyvale serves excellent South Indian. Koi Palace in Daly City has amazing dim sum. Is that five already?
FLG: Any last questions, comments for readers?
JB: Thanks for reading about Google Genomics. We’re thrilled to collaborate with the global genomics community and make an impact together.