Metabolic engineering creates improved microbes for industrial biotechnology. Rational design of industrial microbes revolves around modifications of genes with known roles in the production pathway of interest. However, genes that are unrelated to the production pathway are...
Metabolic engineering creates improved microbes for industrial biotechnology. Rational design of industrial microbes revolves around modifications of genes with known roles in the production pathway of interest. However, genes that are unrelated to the production pathway are also known to substantially impact productivity. To date there are no methods that allow the prediction of such distal genes on a rational basis. Effects of distal genes are indirect and mediated through regulatory interactions between metabolites and proteins, most of which are currently unknown even in the well-studied microbe Escherichia coli. The lack of knowledge of metabolite-protein interactions thus effectively prohibits systematic exploration of distal regulatory relationships, with the consequence that models used to predict metabolic engineering targets are severely limited and rarely applied in industrial biotechnology.
The resulting gap between strain design and construction is a genuine problem for industrial biotechnology. Tight regulation of metabolism causes unpredictable responses to genetic modifications, which can substantially affect cellular fitness and robustness. Thus designing regulation of synthetic pathways on a rational basis will break new grounds in metabolic engineering and opens up novel applications in industrial biotechnology.
In this ERC funded project, we proposed to bridge the gap between strain design and construction by a genome-wide endeavor to map regulatory interactions between metabolites and proteins. The overall objective of this project is two-fold. First, we map regulation of the E. coli metabolic network by downregulating single enzymes with CRISPR interference (CRISPRi) and measuring the proteome and metabolome responses. Second, we use the knowledge about metabolic regulation to build superior E. coli strains that cease growth upon induction and focus all metabolic resources towards the synthesis of chemicals. A computational work package is at the interface of these primary objectives. We develop and test computational methods to integrate metabolomics data from thousands of perturbed metabolic states, create models with these data and eventually use them to design production strains.
The first objective of this project was to create a library of E. coli strains for transcriptional down-regulation (interference) of metabolic genes. We successfully created such a library that includes all 1515 metabolic genes in the latest genome-scale E. coli metabolic model. We achieve tight and inducible interference by expressing an enzymatically dead Cas9 (dCas9) protein from the genome, which is guided to the interference target by co-expressing a single guide RNA from a plasmid.
In total, we constructed 7184 sgRNAs, which we cloned in a pooled approach using array-synthesized oligonucleotides. From this pool of 7184 strains, we selected a panel of 30 strains for a pilot study to understand the cellular response during the transition phase from an unrepressed state into a repressed state for each of the 30 target enzymes. So far, we were able to identify several regulatory mechanisms that compensate the decreases in enzymes levels. In the case of amino acids, this works via the known transcriptional and allosteric feedback mechanisms. In the case of the pentose phosphate pathway, we discovered a new regulatory mechanism that compensates decreases of an enzyme by activating alternative pathways that by-pass the critical enzyme. Thus, the 30-enzymes pilot study confirmed that downregulation of enzymes informs about regulatory mechanisms. Now, we have established an experimental set-up that captures the early and acute response to enzyme-level perturbations, and we are currently using this set-up to measure the metabolome response of all 7184 CRISPRi strains from the metabolism-wide library.
The analysis of the large metabolomics datasets is the second challenge of this project, and we approach this challenge by integrating the data with computational models. Early in the project we realized that conventional models based on ordinary differential equations are not capable to handle a large amount of information and faithfully infer regulation. For this reason, we are currently pursuing alternative data-driven approaches that use machine learning methods to infer regulation. For example, graph neural network could already predict regulatory interactions in a model of primary metabolism, and we are now extending them to integrate measured metabolomics data.
In the third part of the project, we sought to arrest growth of E. coli with CRISPR interference of essential genes. It turned out that degradation of dCas9 is a major problem that prevents a stable and long-lasting growth arrest. To overcome this problem we stabilized the growth arrest by introducing gene deletions and then re-inserting unstable enzyme variants that degrade during the bioprocess. With this approach, we were able to produce about 3 g/L of the α-amino acid L-citrulline in stationary E. coli. 
In the first phase of the project, we established a workflow that combines CRISPRi of enzymes with metabolomics. We innovated an experimental set-up to measure more than 500 metabolites in thousands of strains within a few days. By the end of the project, we want to measure our newly created metabolism-wide knockdown library (>7000 CRISPRi strains) with this method. Moreover, we will test the potential to automatize the sampling procedure by directly injecting CRISPRi cultures into the mass spectrometer, which would also allow us to test dynamic transitions of the metabolome in real time. For data integration and analysis we established graph neural networks as an inference method that is fast, scalable and able to handle large data sets. This is a novel application of machine learning to network inference based on metabolomics data. We are confident that we can map the regulation of the entire E. coli metabolic network with such large-scale metabolomics data set from thousands of perturbed metabolic states.