For the first time, researchers at UC Santa Cruz have been able to use long read nanopore sequencing to assemble the genomic sequence of a human Y chromosome centromere. The work, published in Nature Biotechnology yesterday, marks the first time that researchers have been able to resolve the highly repetitive sequences that make up the centromere.

In 2003, when the Human Genome Project was officially concluded, it was announced that we had successfully sequenced a complete human genome for the first time. It would have been more accurate, if less dramatic, to say that we had in fact sequenced as much of the human genome as it was possible to sequence with the technology available. Estimates of how much of the genome we missed vary, but it is thought to be somewhere in the region of 7%.

Almost all of these sequencing gaps occurred in two regions of each chromosome: the telomeres, the protective caps at each end of the chromosome, and the centromere, the central region where the two arms of the chromosome connect. The reason for their exclusion from the project was that both regions consist of very short, repetitive sequences that are almost impossible to resolve using short read sequencing. For example, if the sequence is AGCCATAGCCATAGCCATAGCCAT (i.e. 4 copies of AGCCAT), it can be impossible to know where a read of ATAGC is positioned, as that sequence appears multiple times within the gene.

Now, we have access to sequencing technologies that are capable of generating reads that span entire repetitive regions for the first time. The new study took advantage of this technology, specifically nanopore sequencing, to resolve the sequence of a human Y chromosome centromere in a way that has not been possible in the past.

“Prior to our work, no sequence technology, or collection of sequence technologies have been sufficient to ensure proper assembly through these regions,” said Karen Miga, PhD, lead researcher of the paper and Assistant Research Scientist at UC Santa Cruz.

Because of our inability to sequence them, the link between centromere DNA and function has been poorly understood. However, we do know that damage to or loss of the centromere can cause catastrophic damage to a cell line, which can ultimately lead to genetic conditions such as cancer. With this research, and work that may be published in the future, it is possible that we will be able to build up a clearer picture of how centromeres function, and potentially develop more effective ways of preventing or treating centromere loss.

On a more academic level, this research is representative of how far sequencing has come in the last fifteen years. Using Sanger sequencing, the technique behind the Human Genome Project, this type of centromere assembly would have been impossible; now, it may be possible for us to build a truly complete human genome sequence, without gaps.

 “We are on a trajectory for a complete genome,” said Dr Miga. “I, for one, look forward to a day that where we are finally able to roll up our sleeves and study the function of these mysterious sequences.”

More on these topics