The Man Who Makes Bioinformatics Fun And Accessible For All
We speak with Keith Bradnam about his career to date and his work as one of the premier genomic communicators.
Technology has enabled a lot of what genomics is achieving today. As amazing as it can be, it is still people who are applying it. To get the most out of the tools available, people need to be able to educate themselves and really understand how to apply them.
That makes people like Keith Bradnam, invaluable to the genomics community.
He writes text books, organises events, expertly maintains his blog, is one of the people to follow on twitter… You’d be forgiven for thinking that science communication was his full time job. He’s also an integral part of the UC Davis Genome Center.
In a rare moment of free time, we caught up with Keith to find out how it all began and what he’s working on at the moment…
FLG: You do a tremendous amount for the field of Bioinformatics. Outside of your own research projects and mentoring students, you do a fantastic job of raising the profile of bioinformatics and making it much easier to understand. You’re one of the people to follow on twitter in genomics, you have a fantastic blog (www.acgt.me), you cowrote ‘Unix and Perl to the Rescue!’, and you put together the Assemblathon. It all started for you back in 1993 when you became one of Europe’s first Bioinformatics MSc students. What made you move from ecology to bioinformatics?
KB: One of the main reasons that I chose ecology was that I was interested in evolution, and ecology deals with this at the ‘big end’ of the scale (e.g. evolution of species and ecosystems). Bioinformatics offers just as many opportunities to study evolution, but now you are looking at the other end of the scale (e.g. evolution of genes and genomes). One thing that bioinformatics lacks, however, is the ability to hike about on windy hillsides and throw quadrats around.
FLG: The Internet has developed considerably since the early 90’s, as has IT in general. We were also a long way off from the kind of sequencing data we have today. What kind of work were you being taught to do back then?
KB: No matter what the year, there are some things that always seem to hold true for bioinformatics research. Some of the things that I had to deal with back in 1993 remain issues today. Namely: 1. The amount of CPU power, memory, and bandwidth that your computer has will always struggle to keep up with the ever-growing size of the sequence files that you want to download and process. 2. You will spend more time than you had planned converting files from one bioinformatics file format into another. 3. At some critical juncture in your research, you will struggle to satisfactorily complete a task because of the incompleteness – or total lack – of documentation for a particular piece of bioinformatics software that you need to use. But thinking back to my MSc course, it is important to note that the Internet was a very different place back then. The World Wide Web had only just begun to develop and there was very little content on it that was useful to the field of bioinformatics. If you wanted to download interesting data sets, you needed to be conversant with tools such as Gopher, not to mention Veronica and Archie of course! So some of the teaching had to cover how to use these tools (along with telnet and FTP) so you could find some data to work with.
FLG: The course clearly set you up for a very successful career as a bioinformatician. At the time, there wasn’t quite the same demand for that skill set as there is today. Was doing a PhD the only viable route to apply those skills, or was there ever any thought of leaving academia?
KB: In late 1994, when I finished my MSc, there were very few ‘career paths’ in bioinformatics outside of academia. Biotech companies were just in the process of developing bioinformatics research groups and job opportunities in the private sector were few and far between. In my early attempts to find a job in the private sector I sometimes found myself competing for jobs with other graduates from my MSc (there were only 12 of us as I recall)! So despite being promised that ‘bioinformatics is going to be huge’, I actually found myself – for the first time in my life – unemployed after finishing my MSc. I realised that I wanted to consider an academic path instead and worked for a small software company in Cambridge for a year or so during which I took the time to find a place where I could work on something that interested me. Ultimately that led me to the University of Nottingham where I started my PhD in 1996. I seem to recall that even in academia, there didn’t seem to be many groups that specialised in bioinformatics research, so there were not many suitable places to apply to.
FLG: You’ve been developing genome databases ever since you completed the PhD. How did you end up enjoying life in California at UC Davis?
KB: While I was working at the Wellcome Trust Sanger Institute, I shared some office space with Ian Korf who was a visiting post-doc at the time. That was his last position before he moved to Davis to set up his own lab. Within a few months of moving to California, he heard that I might be interested in a career change and encouraged me to apply for a post-doc position in his lab. So I became the first person to join his group back in 2005, and since then I have been failing spectacularly at my ongoing plan to stay for ‘just a year or two’.
FLG: What kind of projects are you working on at the moment?
KB: The main thing that I am working on is a new collaboration with Danielle Lemay who is a Faculty member here at the UC Davis Genome Center. Danielle is the guru of all things to do with milk genomics and I’m providing bioinformatics support to help characterize genomic ele ments that may be involved with mammary gland function in cows. I’m also wrapping up two other collaborative projects that primarily involve genome data from Arabidopsis thaliana. One project is helping to characterize the patterns of ‘genome catastrophe’ in some specific lines of A. thaliana; these plants have an extra copy of one of their chromosomes which becomes highly scrambled (similar things happen in human cancer genomes). The other project involves developing tools that can computationally predict how much certain plant introns might enhance gene expression. There is good experimental evidence that some introns in A. thaliana can increase expression up to 10x (compared to when intron of interest is removed), but these wet-lab experiments can take up to a year to do properly. In contrast, our software tool does a good job at predicting the boost in expression in just seconds! As well as beginning some preliminary work regarding a possible Assemblathon 3 contest, I manage and update the Genome Center’s website and twitter account, act as system administrator for our lab, mentor students, and continue to deal with a surprisingly large number of emails about the CEGMA tool that our lab developed a long time ago.
FLG: You’re passion for science communication seems to have really flourished while at UC Davis. Is there anything in particular that encouraged you to develop that side of things?
KB: On the one hand, the rise of social media – not to mention the increasing ease-of-use of many blogging platforms – means that it is easier than ever to reach out to others. Twitter has proven to be such an incredibly useful tool for academics, both to disseminate information and also to find out about the latest research in your particular field. In particular, the live tweeting of conference talks (when permitted) really helps ensure that everyone can benefit from the results of publicly funded scientific research being shared in an open manner. I’ve been using twitter since 2008 and I think this was the spark that started me on the the path to writing regularly on my ACGT blog. Sometimes I encounter things in my day-to-day life as a scientist that surprise, delight, or annoy me. These can all be good things to blog about. Although it may seem that I spend a lot of time criticizing poorly chosen scientific acronyms, I really prefer writing pieces that hopefully help others to better understand a subject. There is a lot of ‘accepted bioinformatics wisdom’ that can be hard for newcomers to the field to get their heads around, so occasionally I like to explain concepts in a way that even your grandparents could (hopefully) understand. Analogies can be good for this. The other big motivator for me to develop science communication skills is because I’ve found so many scientific presentations to be confusing, unfocused, and often tedious. Most talks are informative, but when you have talks that are informative without being memorable then it is often a waste of time for all involved. I want all of the students from our lab to give presentations that are informative as well as being memorable, and (hopefully) entertaining.
Too many people seem to give presentations that fail to address the #1 question that all talks should aim to address – why should the audience care about this? This can usually be addressed by making your talk tell a story. I.e. it should have a beginning, middle, and end. The most common problem in talks from early career scientists is ‘too much middle’ (lots of results) without first setting the scene or clarifying what all those results mean.
FLG: As part of your ‘101 Questions’ series on your blog, you’ve interviewed a who’s who of bioinformaticians. Are you finding any trends emerging in what people enjoy or don’t enjoy about current bioinformatics research?
KB: I’ve noticed that bioinformaticians like the diversity of the research experience that can occur. Sometimes this is driven by rapid changes in underlying technologies, such as sequencing, that opens new doors to researchers, but sometimes it just comes about from the many collaborations that emerge in the field. If you are skilled at slicing and dicing genome data, you may find yourself sometimes switching projects to work on completely different species, with different biological challenges, but all of which are underpinned by genome data written in the same language.
FLG: We thought you might enjoy the 101 Q experience yourself…
KB: Sorry, I reserve the right to interview myself on my blog at a later date!
FLG: As we mentioned at the start of the interview, you’re also one of the guys behind the Assemblathon. Could you explain what the Assemblathon is for our non-bioinformatician readers?
KB: There is a lot of talk these days of the ‘$1,000 genome sequence’, but this is a little misleading. You may be able to buy $1,000 worth of sequencing data, but the raw output of most of the latest sequencing machines is typically a very large set of incredibly short DNA sequences. They only have utility once they are assembled into much longer sequences. This problem can be likened to trying to solve an extremely large jigsaw puzzle. However, in the field of genome assembly your puzzle box may contain up to 100 copies of most of the pieces, no copies at all of some of the pieces, and quite a few pieces will be from a completely different puzzle. Furthermore, you may be missing the lid to the puzzle box so you might not know what the puzzle is meant to look like (or how big it should be). If it wasn’t obvious by now, genome assembly is not an easy problem to solve! In trying to tackle this issue, scientists have developed many different software tools that try to perform the task of putting a genome together from all of the constituent pieces. However it is far from clear which tools are the best. This is not a simple problem because you can define ‘best’ in many different ways, and while researchers may hope for a resulting genome assembly that is the best however it is defined, this doesn’t yet seem possible with today’s tools. So the Assemblathon contests were an idea to test a bunch of different genome assemblers to a) see how they differ and b) see whether we can judge any of them as being consistently better than others. The results from the first two Assemblathons suggest that there is a lot of room for improvement in the field of genome assembly. However, the technology is moving so fast that this situation will hopefully improve in the near future.
FLG: Any plans to follow the success of Assemblathon 2 with Assemblathon 3?
KB: Many plans, but nothing formalized just yet. This may change in the coming weeks though…stay tuned!
FLG: We spoke briefly about Bioinformatics in the 1990’s. The role has changed considerably since then. At what point does that skill set become a core competency for researchers in general?
KB: Bioinformatics has become quite a sprawling field which can involve many different facets of biology, computing, mathematics, and statistics. I like to think of ‘classic bioinformatics’ as the ability to run command-line programs against some biological data and use Unix commands and (relatively) simple scripting languages to slice and dice that data. If you accept my definition of ‘classic bioinformatics’, then I feel that this skill set is becoming more prevalent, but the pace of change is still too slow. I feel that the real change needs to happen before students even start at university. Coding and data processing skills should be a required part of all educational curriculums. My experience from helping high school students in the annual Young Scholars Program that UC Davis organizes tells me that such students are not only capable of learning to code, but they are often excited and enthusiastic to learn such skills.
FLG: Will there ever come a point where you make a conscious decision to move full time into a communications or writing career?
KB: Very possibly! It seems that I spend an increasing amount of my time writing for websites and social media, both personally and professionally. This is an avenue I’m keen to explore (also see answer to last question).
FLG: Do you have plans to expand on the mythology of the Molluskan Zodiac?
KB: The Molluskan Zodiac already plays an important guiding role in the lives of many people. Never before has a horoscope been based on the lives of marine invertebrates, and never before has a system of divination been implemented through use of a Perl script. Aside from one early issue that arose, there have no been no recent upheavals in Molluskan Zodiac mythology and there are no plans for any further changes.
FLG: Thank you very much for sharing your thoughts with us. Is there anything else you’d like to say to our readers?
KB: I’m finally planning to moving back to the UK (or possibly western Europe) in early 2016 and would love to talk to anyone who knows of opportunities that might be a good fit for me. I’m interested in a position relating to bioinformatics/genomics which involves outreach/communication and/or training/teaching.
This interview was originally published in Front Line Genomics Magazine Issue 3