DNA Sequencing
Genotyping, Sequencing
DNA, the acronym for deoxyribonucleic acid, is the genetic code inherited from your mother and facther that exists in almost every cell in your body. The code is made up of about 3 billion bases or letters, consisting of A, T, G, or C (which stand for adenine, thymine, guanine and cytosine), that spell out the personal biological instructions that tell your body how to function over time. You are born with this code, and it doesn’t change. Genotyping is the process of determining which genetic variants an individual possesses, while sequencing is a method used to determine the exact sequence of a certain length of DNA.
When someone has their “DNA sequenced,” it almost always means one of four things ref.
- Their entire genome of three billion plus base pairs may be sequenced in a process appropriately called whole genome sequencing, or WGS.
- Alternatively, they may have whole exome sequencing, or WES, in which the ~1.5% of their genome that codes for proteins is sequenced and the non-coding regions are not.
- Genotyping is offered by several direct-to-consumer companies, in which single bases are identified in certain locations in the genome where variations often occur.
- Finally, patients with certain diseases may have a specific gene or genes sequenced in a very defined protocol to help with diagnostics and treatment. For example, cancer gene panels that include more than a hundred of genes are currently being evaluated for efficacy in determining targeted therapies.
Genotyping vs. Sequencing
To explain the difference between the technologies that read DNA, think of a book. Imagine the string of letters that make up your genetic code are like words on a page, telling a story, chapter by chapter. Genotyping is like reading a few scattered words on a page. Sequencing reads whole sentences, paragraphs and chapters. To sum it up quickly, genotyping gives you small packets of data to compare while sequencing gives you more data, with more meaning and context, today and down the road.
Genotyping looks for information at specific place in the DNA where we know important data will be. Microarrays (or “arrays”, for short) are just one approach to genotyping but it has paved the way for understanding how common variations in our DNA may be associated with health conditions like diabetes and heart disease. However, while identifying this specific point in the DNA – or “word” on the page – is incredibly important, it also signals to reasearchers that something in that “paragraph” could be even more significant or provide context. This is the equivalent to reading the word “knife” on a page, yet not knowing whether the book is a cookbook or a murder mystery. This is what genotyping is good at: finding what we know today, where we know it will be. That’s a great tactic if you know what you are looking for. But what about what we don’t know? And what about context?
Sequencing looks at all the letters, in the order they are spelled out in your DNA. In some cases, it looks only at a gene, a stretch of DNA that has the instructions for a specific protein. In other cases, the sequencing can look at the entire sequence, all 3 billion or so letters. Through significant advances in technology, we’ve drastically decreased the time and labor costs to do sequencing. The ability to look at the entire sequence of genes faster and cheaper through next generation sequencing (NGS) allow us to see beyond the commonly known variations in your DNA, thus enabling scientists to identify more of the unique variations from person to person. In turn, this leads to deeper discovery about the genetic underpinnings of your health. Without sequencing, we simply miss these nuances to the story. We only know, “there is an apple in this book”. This is what sequencing is good at: uncovering the context, meaning, detail, and granularity.
WGS vs WES
Exome
An exome is the sequence of all the exons in a genome, reflecting the protein-coding portion of a genome. In humans, the exome is about 1.5% of the genome.
Proteins are encoded by genes. And genes consist of two major components, exons and introns. Exons contain the nucleotides that directly encode for proteins, whereas introns are stretches of DNA between the exons and do not encode for proteins. The entire collection of all the exons from all the genes in a genome is called an exome. In the case the human genome, the exome only corresponds to about 1.5% of the genome’s roughly 3 billion nucleotides. Genome scientists have developed laboratory methods that allow them to just sequence a genome’s exome; in other words, just the part of the genome that directly encodes for proteins. So, you will often hear genomicists talk about an exome sequence, which is a very small part of the overall genome (or whole-genome) sequence.
When to Use Whole-Genome or Whole-Exome Sequencing
The complete genomic information within a sample or individual is known as the whole genome. Exons are the genome’s protein-coding regions and are collectively known as the exome. Despite the exome’s relatively small proportion of the whole genome (approximately 2%), exomes encode most known disease-related variants.
Use | When you need to |
---|---|
WGS | Analyze the whole genome, including coding, non-coding, and mitochondrial DNA Discover novel genomic variants (structural, single nucleotide, insertion-deletion, copy number) Identify previously unknown variants for future targeted studies |
WES | Increase throughput capabilities Optimize cost per sample Analyze manageable data sets and maximize data storage |
Note: Both exons and introns are also present in untranslated regions (UTRs) and non-coding RNAs. Misuse of the term exon is problematic, for example, ‘‘whole-exome sequencing’’ technology targets <25% of the human exome, primarily regions that are protein coding. Not all exons are protein coding: Addressing a common misconception