Discovery biology, pre-genomics, followed a linear path: develop a hypothesis, design an experiment and carefully control variables, one at a time, to prove or disprove the hypothesis.
The challenge with this “classical” approach is not only the “one step after another” pace, but the fact that the formation of a hypothesis is limited by our current knowledge and imagination. Not only do we not know what we don’t know, we cannot even put boundaries on how vast that negative knowledge space is—a thimble or an ocean’s worth of possibility?
When it comes to biology, our knowledge remains very limited, even after decades of eye-opening discoveries. Even for the most well-researched, monogenic diseases, such as cystic fibrosis, there are vast gaps in our knowledge. It was only in 2018 that scientists, using single cell genomics, discovered the actual cell type in the human airway that expresses the CFTR mutation causing cystic fibrosis.
While the following quote accurately describes the field of genomics, it entirely misses the dramatic impact that genomics platforms have had on scientific experimentation:
“Genomics: a branch of biotechnology concerned with applying the techniques of genetics and molecular biology to the genetic mapping and DNA sequencing of sets of genes or the complete genomes of selected organisms”Merriam Webster
Over the last 20 years, new experimentation platforms like arrays, next-generation sequencing and single cell analysis have dramatically increased experimental throughput while decreasing experimental cost, thereby enabling genome-wide analysis within a single experiment.
These exponential improvements (10^6+) have unleashed a multitude of non-hypothesis driven approaches that have catapulted the genomics field forward; scientists have moved from developing and testing a hypothesis regarding a single gene (…then rinse and repeat) and instead now study many, many genes simultaneously.
While these non-hypothesis driven, hugely multiplexed experiments have turbo-charged the field of biology, it has largely been a revolution of observation.
For instance, in genome-wide association studies (GWAS), scientists commonly compare, across the genome, a diseased population versus a control population -seeking variations that are statistically different. These studies have yielded putative disease-causing variants and, by inference, putative disease-causing genes.
That is terrific; however, there have been two limiting factors:
- One has had to rely on low throughput, high cost experiments to validate these findings. Additionally, the identified variants often display no connection to our current functional understanding of the disease, showing up in “junk” DNA – primarily referred to as junk because we do not understand its functional importance… yet.
- Perhaps more importantly, studies have been limited to interrogating naturally occurring biology. We are only testing what is available in nature, yet evolution is incomplete. The ability to test non-natural variation would eliminate the “looking for your keys under the lamp-post” conundrum and dramatically expand the exploration space, very likely resulting in many, heretofore unimaginable, biological insights and potential commercial opportunities.
Advances in synthetic biology, such as CRISPR, and the initial merging of genomics with synthetic biology is beginning to allow genomics to explore more than the naturally occurring genetic space, moving the field from an “observation of nature” science to an interventional endeavor.
While next-generation sequencing has revealed millions of naturally occurring variants, the ability to intervene in a cell to create a new variation and then observe, test and measure the results at scale will dramatically expand our biological knowledge.
An early approach is Perturb-seq, which enables researchers to turn off single genes in single cells, thousands at a time, and then observe the results through RNA sequencing.
Perturb-seq provides genomic scale and resolution, allowing both multiplexed intervention in a genome as well as analysis of non-natural configurations for the first time.
However, Perturb-seq addresses only one type of mutational change, that of stopping a single gene’s transcription. Additional tools would allow us to experimentally analyze other types of genetic variation—single nucleotide polymorphisms, multi-base variations, structural changes, promoter swaps—at genomic scale.
Beyond increasing biological potential, scientists will want to explore such variation in combinations of variants. For diseases caused by a single, highly penetrant mutation, we have learned a lot.
In fact, this knowledge has fueled a growth industry in discoveries to treat rare diseases by pharma and biotech companies. We know a lot about monogenic diseases because they are easy to study with our current tools. However, most diseases plaguing our society, like heart disease and cancer, result from the interactions between multiple genes and their environment. To unravel this complexity, we need genomic scale tools that create multiple mutations in cell lines and then challenge them with different conditions, all in large multiplex and for reasonable experimental cost.
These capabilities are on the horizon. For instance, a recently launched tool from Inscripta created over 200,000 massively parallelized, precisely defined variants distributed throughout the E. coli genome.
There were several remarkable effects: (1) the discovery of over 100 variants that increase lysine production, most of which were outside of known genes; and (2) the discovery of three variants which each increased production only several fold individually, but over 14,000-fold when combined. These results further underscore our limited understanding of biology even in one of the most studied pathways in one of the most studied organisms.
Although recent advances have unveiled tremendous knowledge and revived interest and excitement among scientists and entrepreneurs, the more we learn the more we realize we still have to learn.
Distributed genomic tools that allow for cost-effective, highly multiplexed, recursive genetic editing combined with current genetic analysis platforms like single cell and next-generation sequencing will enable a closed loop of intervention and analysis that allows us to explore the full potential of nature’s variations.
We can hardly imagine, much less prognosticate, how this will impact our understanding of biology, or the new products that will arise to treat disease, produce better crops, and create novel materials.
John Stuelpnagel is a co-founder of both Illumina and Ariosa Diagnostics. Additionally, he currently serves as Chairman of the Board of Directors of Inscripta. Bryan Roberts is a partner at Venrock. Venrock is an investor in Inscripta.
An explosion of knowledge: Multiplexed interventional genomics was originally published on Fortune.