Scientists have developed a new algorithm that can detect gene mutations involved in diseases such as autism and obsessive-compulsive disorder.
With three billion letters in the human genome, it seems hard to believe that adding a DNA base here or removing a DNA base there could have much of an effect on our health, researchers said.
But such insertions and deletions can dramatically alter biological function, leading to diseases from autism to cancer. Still, it has been difficult to detect these mutations.
More From This Section
The letters in the human genome carry instructions to make proteins, via a three-letter code. Each trio spells out a "word;" the words are then strung together in a sentence to build a specific protein.
If a letter is accidentally inserted or deleted from our genome, the three-letter code shifts a notch, causing all of the subsequent words to be misspelled.
These "frameshift" mutations cause the protein sentence to become unintelligible. Loss of a single protein can have devastating effects for cells, leading to dysfunction and sometimes to serious diseases.
DNA insertions and deletions vary in length and sequence. Each indel can range in size from one DNA letter to thousands, and they are often highly repetitive. Their variability has made it challenging to identify indels, despite major advancements in genome sequencing technology.
A team of CSHL scientists, including Assistant Professors Mike Schatz, Gholson Lyon, and Ivan Iossifov, and Professor Michael Wigler, has devised a way to mine existing genomic datasets for indel mutations.
The method, which they call Scalpel, begins by grouping together all of the sequences from a given genomic region. Scalpel - a computer formula, or algorithm - then creates a new sequence alignment for that area, much like piecing together parts of a puzzle.
In work published in the journal Nature Methods, the team used Scalpel to search for indels in patient samples.
Lyon analysed a patient with severe Tourette syndrome and obsessive-compulsive disorder, identifying and validating more than a thousand indels to demonstrate the accuracy of the method.
The CSHL team performed a similar analysis to search for indels that are associated with autism.
Researchers discovered a total of 3.3 million indels across 593 families, but most appeared to be relatively harmless. Still, a few dozen mutations stood out to be specifically associated with autism.