Eliminating variables is crucial for robust experimental design. But should one size fit all?

With our monthly focus on Data, for October, we have been talking to computer scientists quite a lot. Bioinformaticians in particular. Usually when we research around a topic, we find a nice variety of challenges and problems that people are facing. When talking about data analysis, there have been two that everyone has mentioned. This comes from within pharma, academia, vendors, start ups, everywhere….

  • There needs to be a standard file format, and standard processes.
  • We need more bioinformaticians.

I wrote a little more formally on the second of these issues for this month’s focus piece. But here, I wanted to take a closer look at standardisation.

Why don’t we have standards already?

This is relatively simple. There have been a lot of people working independently, on a small scale. So there’s never really been a need to standardise file formats, or processes, if you’re just working from your own data. As we scale up, and projects start to rely on data taken from literature or multiple sources, things get complicated. Of course, the data formatting issue, has a fair amount to do with vendor-lock-in as well, but that’s a whole other debate for another time….

I had a great conversation with Kumar Sankaran last week, around what they’re trying to do over at Leucine Rich Bio (Full disclosure here, this isn’t a sponsored post, and they have no commercial activity with us at all. I just really like what they’re trying to do, and thoroughly enjoyed hearing about it!). Anyone who has had the dubious pleasure of one of my phone calls will know that they can be a bit winding at times, but in this instance we talked a lot about standardisation of processes and experimental design.

This kind of standardisation is something that’s very important for them, as it is crucial for any kind of robust analysis. The importance of reducing the number of variables, is one of the very first things you learn in science class at around 9 years of age…. So why is it a problem now?

I think it’s probably down to two things. As a research scientist, you need to have ingenuity and initiative. These two attributes are responsible for nearly every scientific discovery. The only thing I’d add to that mix, is luck. But the unfortunate side effect, is that following standard protocols isn’t top of the list. In my lab days, it was all about using what you already had. New equipment, kits, software can all be pretty expensive. Finding ways to incorporate what we already had into new experiments was part of the fun. – I was pretty spectacular with a ‘manual’, retro Zeiss micromanipulator at one time….

So in a way standardising protocols is, almost, against the spirit of experimental research. Everything is there to be improved upon, and customised! Unfortunately when you’re pooling in data from lots of different studies, or comparing against reference data, it does cause some issues…

Where does the pressure need to come from?

In healthcare, If we go all the way to the end of the chain to the end user of genomic data, it’s the physician. They recognise the need for their own genomic education, themselves. If they are going to make health-affecting recommendations, they need to be sure they are basing their decisions on accurate information. And this comes down to not only understanding the genomic science, but also the analytical science. So as their understanding grows, they will be in a position to exert some pressure on getting protocols standardised. But without that pressure coming from the top, it won’t happen.

In terms of file formats. People have different views. It maybe one of those things that just sorts itself out over time. Genomic data science, is still relatively young after all. But one of the big contributing factors towards standardisation, is collaboration.

As consortia form, they will begin to standardise amongst themselves. There are also working groups, trying to develop solutions here too. Or it may be, that bringing in some outside help is the way to go. It certainly worked for the banks when they standardised their data formats. Unfortunately, I think Bioinformaticians will still be spending much of their time reformatting…

More on these topics