By analyzing bacterial data, researchers have discovered thousands of rare new CRISPR systems that have multiple functions and can enable genome editing, diagnostics, and more.
Microbial sequence databases contain information on enzymes and other molecules amenable to biotechnology. But these databases have grown so large in recent years that it has become difficult to efficiently search for enzymes of interest.
A new search algorithm for CRISPR systems
Now, scientists at the McGovern Institute for Brain Research withPratt Institute of MIT and Harvard, and the National Center for Biotechnology Information (NCBI) National Institutes of Health They developed a new search algorithm that identified 188 new rare CRISPR systems in bacterial genomes, covering thousands of unique systems. This work was published in the journal on November 23 Science.
The algorithm, which comes from the lab of pioneering CRISPR researcher Professor Feng Zhang, uses big data clustering approaches to rapidly search large amounts of genomic data. used their algorithm called Fast Locality-Sensitive Hashing-based Clustering (FLSHclust) to mine three major public databases containing data from a variety of unusual bacteria found in coal mines, breweries, Antarctic lakes, and dog saliva. . Scientists have discovered a surprising number and diversity of CRISPR systems, including those that can make edits. DNA In human cells, others can be targeted RNAAnd with many different functions.
The new systems can be used to edit mammalian cells with less off-target effects than current Cas9 systems. They may one day be used as diagnostics or as molecular recordings of intracellular activity.
Exploring the diversity of CRISPR
Their search highlights CRISPR’s unprecedented level of diversity and flexibility, and the researchers say many more rare systems have yet to be discovered as databases continue to grow.
„Biodiversity is a treasure, and as we continue to sequence more genomes and metagenomic samples, there is an increasing need for better tools like FLSHclust to search that sequence space to find molecular gems,” Zhang says. The study’s senior author is the James and Patricia Poitras Professor of Neurology at MIT with joint appointments in the Departments of Brain and Cognitive Science and Bioengineering. Zhang is an investigator at the McGovern Institute for Brain Research at MIT, a core founding member at Broad, and an investigator at the Howard Hughes Medical Institute. Eugene Koonin, a renowned NCBI investigator, is also the co-senior author of the study.
Looking for CRISPR
CRISPR, which stands for Clustered Regularly Interspaced Short Palindromic Repeats, is a bacterial defense system designed as a multi-tool for genome editing and diagnostics.
To mine protein and nucleic databases acid The researchers developed a methodology based on an approach borrowed from the big data community to sequence sequences for novel CRISPR systems. This technique, called locale-sensitive hashing, combines objects that are similar but not exactly the same. Using this approach allowed the team to analyze billions of protein and DNA sequences NCBIIts Whole Genome Shotgun Database, and Collaborative Genome Institute – in weeks, whereas previous methods of searching for similar items would have taken months. They designed their algorithm to search for genes associated with CRISPR.
„This new methodology allows us to analyze data in a short period of time that can actually retrieve results and generate biological hypotheses,” said Soumya Kannan PhD ’23, co-first author of the study. Kannan was a graduate student in Zhang’s lab when the study began and is now a postdoc and junior fellow at Harvard University. Han Alte-Tran PhD ’23, a graduate student in Zhang’s lab during the course and currently a postdoc University of WashingtonThe other co-author of the study is the first author.
„It’s a testament to what you can do when you improve research methods and use as much data as possible,” Alte-Tron says. „It’s really exciting to be able to improve the level we’re looking for.”
Discovery of new CRISPR variants
In their analysis, Altae-Tran, Kannan and their colleagues noticed that the thousands of CRISPR systems they found fell into some existing and many new categories. They studied several new systems extensively in the laboratory.
They found several new variants of known type I CRISPR systems that use a guide RNA 32 base pairs longer than Cas9’s 20-nucleotide guide. Because of their long guide RNAs, these type I systems can be used to develop highly precise gene-editing technologies that are less prone to off-target editing. Zhang’s team showed that these two systems can make small edits to the DNA of human cells. Because these type I systems are similar to CRISPR-Cas9, they can be delivered to cells in animals or humans using the same gene-delivery technologies used for CRISPR today.
One of the type I systems also showed „parallel action”—a broad degradation of nucleic acids after the CRISPR protein binds its target. Scientists have used similar systems to make infectious disease diagnoses SherlockA Tool Capable of rapidly sensing a single molecule of DNA or RNA. Zhang’s team thinks the new systems could also be adapted to diagnostic technologies.
Researchers have discovered new mechanisms of action for some type IV CRISPR systems, as well as a type VII system that precisely targets RNA and may be used in RNA editing. Other systems can be used as recording devices – a molecular document of when a gene is expressed – or as sensors of specific activity in a living cell.
Mining Biochemical Data
The scientists say their method could help them search for other biochemical systems. „Anyone who wants to work with these large databases to learn how proteins evolve or discover new genes can use this search algorithm,” Alte-Tran says.
The researchers say their findings illustrate how diverse CRISPR systems are, but most are rare and found only in unusual bacteria. „Some of these microbial systems were found only in water from coal mines,” says Kannan. „If someone wasn’t interested in it, we wouldn’t have seen those systems. Expanding our sample diversity is critical to continuing to expand the diversity of what we can detect.”
Reference: Han Alde-Tran, Soumya Kannan, Anthony J. Subarsky, Kepler S. Myers, F. Ezra Demircioglu, Lucas Moller, Celine Gocalar, Rachel Oshiro, Kira S. Makarova, Rhiannon K. McRae, Eugene V. Koonin and Feng Zhang, 23 November 2023, Science.
This work was supported by the Howard Hughes Medical Institute; K. at MIT. Lisa Yang and Hawk E. Dawn Center for Molecular Therapy; Pratt Institute Programmable Therapies Gift Donors; The Pershing Square Foundation, William Ackman and Neri Ackman; James and Patricia Poitras; BT Charitable Trust; Asness Family Foundation; Kenneth C. Griffin; The Phillips Family; David Cheng; and Robert Metcalf.
„Oddany rozwiązywacz problemów. Przyjazny hipsterom praktykant bekonu. Miłośnik kawy. Nieuleczalny introwertyk. Student.