Despite the ubiquity of genome sequence data, unraveling the contributions of genetic variation to phenotypic diversity remains one of genomics’ greatest challenges. There is a clear need for systematic, perturbation-based approaches that permit the study of phenotypic...
Despite the ubiquity of genome sequence data, unraveling the contributions of genetic variation to phenotypic diversity remains one of genomics’ greatest challenges. There is a clear need for systematic, perturbation-based approaches that permit the study of phenotypic consequences of genetic variants. This project aims to develop these approaches, and apply them to interrogate the functional impact of genetic variation in different environmental and genetic contexts. The combined insights and tools generated by our work will aid in developing predictive models of the effects of genetic variation within specific environmental and biological contexts, providing guiding principles for understanding the consequences of human genetic variation.
We first worked on the improvement of our CRISPR/Cas9-based variant engineering pipeline (Roy et al. 2018), which allows us to engineer single nucleotide and amino acid variants genome-wide in yeast and quantify fitness by short barcode sequencing. We have been able to substantially improve editing efficiency of the system to allow its application in pooled library screens where a high rate of editing is needed. We also performed proof-of-principle experiments investigating phenotypic consequences of ~20,000 engineered variants in several environmental conditions, and are in the process of validating some of our findings.
To exclude the potential that off-target edits by the CRISPR/Cas9 system could confound our experiments, we applied deep whole-genome sequencing to hundreds of edited strains. While we found no evidence for off-target effects, we observed that some regions of the genome were prone to structural variant formation. To investigate these unwanted editing outcomes on a genome scale, we developed a high-throughput experimental pipeline that allows for the investigation of the outcome of editing via whole-genome sequencing for thousands of strains at very low cost. We plan to apply this protocol to determine on-target editing outcomes for thousands of variants across the genome, define the ‘editable’ part of the yeast genome, and identify factors that allow predicting whether an edit can be made. These insights will allow us to optimize the design of future variant libraries, and together with the improved efficiency of our variant engineering system will enable us to generate extremely high quality variant libraries for investigating the functional impact of genetic variation.
Building on recent developments in the single cell field, we have developed a targeted, single-cell RNA-sequencing assay for the massively parallel molecular phenotyping of cells carrying defined genetic variants. We were able to obtain rich molecular phenotyping information on the expression of ~200 genes across ~1000 CRISPR-mediated genetic perturbations. Furthermore, we could show that the targeted nature of the readout increases the sensitivity of the assay approximately 10- to 30-fold compared to conventional single-cell RNA-sequencing. Due to the low cost of the assay, it is suited for the massively parallel molecular phenotyping of 1000s of strains or cell lines. Ultimately, a better understanding of these intermediate molecular layers will not only identify pathways through which variants mediate their effects on phenotype but may inform attempts to dynamically model phenotypic outcomes.
Taken together, our progress substantially advances our technical capabilities to investigate effects of genetic variants at the genome scale while at the same time expanding our understanding of the genetic architecture of several complex traits in yeast. Applying our novel methods to the full scale of variants proposed in this project will enable us to gain principally novel insights into the process of natural evolution and the function of cellular networks. As our methods can be adapted to other organisms or biological questions, they have the potential to significantly impact a wide range of fields
In the current reporting period we have made major progress in the following areas:
(1) We have substantially improved the efficiency of MAGESTIC (Roy et al. 2018), our high-throughput CRISPR-based variant engineering system, which forms the basis for this project. Specifically, we have modulated the expression of guide RNA and Cas9 to maximize editing efficiency. We have also optimized our donor recruitment method to boost editing survival of strong guides and editing efficiency of weak guides, resulting in an overall improvement of repair efficiency of >10-fold. We are currently preparing a manuscript describing these innovations.
(2) To exclude that off-target edits by the CRISPR/Cas9 system confound our experiments we performed deep whole-genome sequencing of ~300 edited strains. While we did not see any evidence of off-target edits for guides with no secondary targets in the genome, we noticed that some regions of the genome were prone to structural variant formation when editing was attempted there. To investigate such unwanted on-target editing outcomes on a global scale, and learn which factors contribute to these outcomes, we developed a streamlined experimental pipeline that allows us to investigate the outcome of editing via whole-genome sequencing at a cost of ~1 Euro per strain. This pipeline is based on two technologies previously developed in my group: REDI (Smith et al. MSB 2017), a protocol to isolate single variants from a complex pool of variant strains, and an inexpensive Tn5 transposase purification strategy and accompanying library preparation protocol (Hennig et al. G3 2018). We have adapted the Tn5 protocol to allow processing of 1000 strains in two days at a cost of 0.34 cents per sample for library preparation. We plan to investigate on-target editing outcomes of ~10,000 strains across the genome to define the ‘editable’ part of the yeast genome, and identify factors that allow to predict whether an edit can be made. We anticipate that a manuscript will be ready for submission within the next reporting period.
Using these insights, we will be able to identify and avoid potentially problematic regions of the genome in future, and optimize our computational pipelines for library design. The combined improvements from (1) and (2) will then enable us to generate extremely high quality libraries for objectives 1 and 2.
(3) Using our improved variant engineering pipeline, we are currently in the process of generating variant libraries as described in Objective 1b. Before proceeding to phenotyping of the full libraries, we decided to perform proof-of-principle phenotyping experiments with a subset of 20,000 variants in 10 conditions. Using this small sub-library we have optimized the parameters of our phenotyping pipelines, such as the number of replicates and sampling time-points, the library coverage at each step, the types of controls used, as well as several sample processing steps for making next-generation sequencing libraries for barcode sequencing. We have also used these data to optimize the computational pipelines for inferring phenotypic effects of these edited variants from barcode sequencing. We are currently validating some of our findings by re-engineering variants with interesting phenotypes, verifying the edits using our inexpensive Tn5-based whole-genome sequencing method (described above), and validating their phenotypes. This will ensure the robustness and fidelity of our phenotyping and analysis pipelines before we expand to the full library of ~100,000 variants (as described in Objective 1b).
(4) We have developed a targeted, single-cell RNA-seq assay for the massively parallel molecular phenotyping of cells carrying defined genetic variants (Objective 2c). We were able to obtain rich molecular phenotyping information on the expression of ~200 genes across ~1000 CRISPR-mediated genetic perturbations. Importantly, we could show that the targeted nature of the readout increas
•We have optimized the editing and repair efficiency of our platform to engineer variants genome-wide, resulting in >10-fold improvement over the previous version.
•We have developed a novel experimental pipeline combining three key technologies developed in the group that allows to inexpensively and routinely sequence-validate variant strains generated in a complex pool.
•We have implemented an assay to query the effect of genetic perturbations by CRISPR for thousands of perturbations on hundreds of potential target genes in single cells. Our method has a 10- to 30-fold higher sensitivity for detecting transcriptional responses than conventional single cell methods while significantly lowering sequencing cost. This allows characterizing molecular responses to genetic perturbations in thousands of single, genetically diverse cells. As our experimental strategy is highly modular, it can easily be adapted to other biological questions.