“It’s data on a non-trivial scale”: James Sietstra, Seven Bridges
In the wake of several exciting announcements this week, we chat with James Sietstra, President of Seven Bridges, about cancer research, big bioinformatics, and enabling collaboration.
This is a really exciting week to be a cancer researcher. As of today, one of the world’s largest and most comprehensive genomic datasets – The Cancer Genome Atlas (TCGA) – will be available for researchers worldwide to access, along with a suite of computational resources to analyse it. This is part of the National Cancer Institute’s Cancer Genomics Cloud Pilot programme, and their selected partner for this enormous project are the biomedical data analysis experts at Seven Bridges.
Actually, it’s a really exciting week for Seven Bridges too. Alongside their involvement with the Cancer Genomics Cloud, the company has today announced $45 million in Series A Funding, along with the appointment of two new members to their Board of Advisors: Tom Daschle, former U.S. Senate Majority Leader; and Kai-Fu Lee, founding president of Google China.
We took the opportunity to sit down with James Sietstra, President of Seven Bridges, for a chat about these exciting developments for the company, the big challenges facing big bioinformatics, and why collaboration is the future of successful disease research.
FLG: What will this latest round of investment enable Seven Bridges to do in the future? What does this give you the scope to accomplish?
JS: It’s a meaningful amount of capital, and really it’s going to allow us to tackle more big projects. Our progress in developing the platform for precision medicine and drug research comes from growing our team and investing heavily in R&D, and that’s what we will continue to use the capital for.
FLG: Your new advisory board members come from different backgrounds within the world of medicine. What new perspectives and new ideas do you feel they are going to bring to your team?
JS: This is exciting for us because the two new members really reflect who we are as a business. We straddle building a technology company and implementing platforms for governments and for national scale projects, so the new members of the advisory board reflect the identity of Seven Bridges.
Tom Daschle understands how the different agencies and departments in DC work, and has a long history of navigating national health policy in the Senate and in DC more generally, so I suspect we will continue to have assistance from him, and really value his feedback in how we approach those national scale projects in the US in particular.
Kai-Fu Lee is not only a cancer survivor but also has a really rich skillset. He founded Google China and understands how to scale technology companies in Asia-Pacific and in China more specifically. In his work with Google China and Microsoft Research he took ideas from concepts and actually scaled them into products. One of the first things we discussed with him was doing that with our work, actually in the UK. We were funded by Genomics England to do novel research in a new paradigm of data structure for variant calling against the reference, which is the Graph paradigm and is actually where the name of the company, Seven Bridges, comes from: the classic graph theory problem “The Seven Bridges of Konigsberg”. It’s a good opportunity for us to leverage their respective domain expertise.
FLG: What are the major challenges in a project like the Cancer MoonShot? How do you bring together commercial organisations, hospitals, healthcare providers, to make a project like that work?
JS: You build platforms to enable all of those institutions to work together. This is really the focus long-term of Seven Bridges: to enable patients, hospitals, institutions and the government to do collaborative science and to analyse ever-growing datasets to accelerate the speed of novel discovery. The Cancer Genomics Cloud is actually a wonderful example of this. Vice President Joe Biden gave a speech at Duke University on Wednesday in which he said that the science is there, the technology is there, we need to move faster and moving faster comes down to greater collaboration and greater sharing of information.
The Cancer Genomics Cloud really reflects foresight from the NCI leadership. It was a project started in 2013, so it’s been actively moving forward for the past three years and what it enables is any cancer researcher with an internet connection to access one of the best organised and richest cancer datasets in the world. 11,000 patients have contributed 33 different cancer types and subtypes, and in enabling that access, instead of having to spend two months downloading one to two petabytes of data onto a local cluster, analysing using tools that were implemented by bioinformaticians at that university, and then trying to collaborate on that data with another institution, you can do it all in a reproducible fashion in a single cloud environment.
Today a paediatric brain tissue researcher based at Duke University has 100 samples that she is working on, and if she wants to expand that dataset to the 600 samples that are at Moffitt Cancer Centre, a physical hard drive has to be sent from Duke to Moffitt. This has to go through a legal check for HIPAA compliance, and they have to make sure that, for the science to be reproducible, that the algorithms that were deployed are identical between the two institutions. The rate at which we can move forward is slowed because the collaboration opportunities are limited. That is some that that the NCI saw and is trying to improve, in part through the Cancer Genomics Cloud. It reflects the theme of the Cancer MoonShot 2020: enabling access, allowing the government to set the ground rules, encouraging patients to share their data, figuring out methods for hospitals to enable that data sharing, and aligning incentives across all of these institutions so that we can continue to improve cancer care.
FLG: You’ve highlighted the sheer scale of this database, more than a petabyte of really diverse information that has been made available through the Cancer Genomics Cloud. That must represent an enormous challenge in storage and data-handling, such as the input/output issues involved in running interpretation analyses on that scale. How have you gone about overcoming those sorts of processing challenges?
JS: It’s a non-trivial scale! As Andrew [Gruen, Director of Marketing] likes to remind folks at Seven Bridges, one petabyte is the equivalent of 2,000 years of music, and it’s difficult to even wrap your head around that much data. And it’s growing, we’re at just over a petabyte, TCGA is growing to two and half petabytes, and there are plans for it to get to 10+ petabytes. We spend a lot of time figuring out how to optimise cloud computing environments. We’ve been working on this since 2009, and there’re a number of challenges associated with enabling collaboration on those datasets. You want to be able to allow researchers to bring their bioinformatics pipeline to the data rather than having these redundant local clusters storing it around the country, all paying the storage costs. The more you can put that data in one accessible environment, the more efficient it is, the more researchers can share the tools that they’ve developed and the more reproducible those analyses can be. So a lot of this comes down to implementation on an environment that can be accessed by thousands of researchers, and through our software development kit and API, the common workflow language, that we’ve been an early proponent of, which is a method of describing bioinformatics workflows, creating an open environment so that data can be accessed by any researcher with an internet connection. It’s a huge amount of data and a big challenge!
FLG: Thank you very much for your time! Is there anything else that you would like to share with our readers?
JS: We’re really excited to deploy this capital, use the insights that are offered by our new advisory board members, and go live with the Cancer Genomics Cloud Pilot that we’ve been working on for a few years. I think it’s a great example of the types of projects that will enable the Cancer MoonShot, and the timing couldn’t be better – we didn’t plan that President Obama was going to say this is his State of the Union and then ask Congress for a billion dollars, but it’s very exciting. We’re excited about a lot of the growth opportunities, and really using the momentum, public attention and the capital to tackle more precision medicine projects, and continuing to do product platform development and research and development in order to enable us to tackle more of those projects.