Princeton University researchers have used AI techniques to uncover junk DNA mutations leading to autism (ASD). The findings are the first to link mutations in regulatory DNA with a disease like ASD, and possibly prove that the changes affect how genes are expressed in the brain.

The scientists behind the study said in Nature Genetics that the approach could be used in the future to study the role of non-coding mutations in many diseases, including cancer.

Past studies have suggested that only around 30% of autism cases without familial history are caused by protein-coding gene mutation. The rest are believed to occur through mutations in junk DNA, though this has not been proven before due to the difficulty of sorting through the whole genome to find alterations in regulatory DNA and predicting how these could contribute to autism.

The Princeton team trained a machine learning (ML) model to predict how a non-coding DNA variation could affect gene expression. This was applied to an autism data population consisting of the whole genomes of almost 2,000 family “quartets”, including a child with autism, one unaffected child, and two unaffected parents with no familial history of the disorder.

Using this dataset the ML program learnt patterns in the genome and taught itself to discern biologically-relevant DNA sections. It could then predict whether alterations in junk DNA could affect the 2,000 or more protein interactions which affect gene regulation.

The results indicated that junk DNA mutations affected similar genes and functions as those previously believed to have been linked with autism in coding DNA. The writers noted that: “Notably, our study reveals important biological convergences among the genetic dysregulations associated with ASD.

“Our analyses of the disease impact of mutations with effects on DNA and RNA point to similar sets of impacted genes and pathways, indicating that the effects of regulatory mutations are convergent. Furthermore, high-impact noncoding regions that we find in ASD probands affect the same genes previously found to be impacted by LoF [loss of function] coding mutations in ASD…This convergence provides support for a causal contribution of non-coding regulatory mutations to ASD etiology.”