Why is our genome the way it is? Why, for example, is it so very large? In understanding the answers to questions like these we hope to understand which parts of our genome are functional and why. Knowing which parts are functional can in turn could lead to improved...
Why is our genome the way it is? Why, for example, is it so very large? In understanding the answers to questions like these we hope to understand which parts of our genome are functional and why. Knowing which parts are functional can in turn could lead to improved diagnostics and to improved gene -based therapies.
We are particularly interested in a core idea in evolutionary genomics, namely that selection should be less efficient when populations are small. This has been hypothesised to explain why our genome is so large - selection is too weak in large bodied organisms to be able to prevent the spread - by chance - of insertions that are just a little bit bad for us. We want to see if this idea can be extended: if selection is weak and leads to a bloated genome, might our genome also be prone to errors and if so, does this mean that selection in us is commonly on error mitigation devices?
One result of such selection to mitigate errors could be an increased role for what have been thought to be largely irrelevant parts of our genome. We focus on so-called silent sites - silent because it is thought that mutations at these sites have no impact on us. We have however shown that there is selection on such sites/mutations. Why is this? In understanding this can we make better new genes to help disease-bearing patients and can we improve diagnosis?
The objectives of the project are thus
- to examine the role of error in evolution - both as a means to cause selection to prevent it and as a means to the evolution of novelty.
- to go from understanding the relationship between error prevention and selection on synonymous sites and so as to improve both diagnostics and the ethology of disease
- to go from understanding of errors and innocuous mutations to improve therapeutics both by improving new genes and by defining sites in the genome where these new genes are less likely to cause knock-on errors by affecting the expression of neighbours.
This work is of societal relevance not just because has the potential to impact on medicine directly, but because we are also asking fundamental questions about how evolution works and, philosophically, what it is to be human. Are we are perfect genetic machine or a barely adequate error prone product of inefficient selection?
This project aims to better define the roles of genetic errors to address fundamental questions in evolution: a) determine the nature of error proofing devices (WP1), b) to determine the commonality of such devices, most especially whether they are more common when population size is low (and possibly hence error rates are high) (WP1), c) to appraise the role of errors in the generation of novelty (WP4). In addition, we aim to apply this knowledge d) to make better transgenes (WP3), e) to improve diagnostics (WP2) and f) to define safe harbour zones for transgene insertion (WP4).
A focus of our interest has been exonic splice enhancers, these being exonic motifs that act to reduce the rate of missplicing. Understanding their distribution within and between genomes is central to aims a and b (WP1). Using this information is central to aims d and e (WP3 and WP2 respectively). If selection is acting to preserve ESEs (and intraRNA protein binding sites more generally) we expect signatures of this in SNP profiles and in interspecific conservation profiles (WP1 underpinning WP2). Having highlighted the apparent disconnect between estimates of the impact of synonymous mutations on splice disruption derived from evolutionary and experimental approaches [1], we have subsequently made a large step forward in squaring this circle having determined for the first time both the commonality and strength of selection on mutations that disrupt error controlling ESE [2] (WP1, underpining WP2). This established common and strong selection, consistent both with the experimental data and for an important role of misplicing in human disease (WP2). Consistent with this we estimated 25-45% of all diseases are associated with missplicing [3] (WP1 and underpining WP2). We have in addition, applied a simple ESE disruption metric to establish whether disease-associated mutations might act via splicing disruption [4] (WP2) and analysed the extent to which mutations in tumours disrupt splicing [5] (WP1 underpining WP2). In addition, we have shown that selection is not simply to preserve motifs – there is also selection to avoid inappropriate binding of RNA-Binding proteins, indicative of selection to prevent errors [6] (WP1, underpining WP2).
Beyond the employment of the information for diagnostics (WP2), we have been active in converting the information to make better transgenes for gene therapy (aim d, WP3). First, by quantitative analysis of the impact of exonic splice enhancers (ESEs) on rates of synonymous site evolution in intronless genes, we have defined ESEs that will be needed in intronless transgenes [7], for reasons other than splice modification. We have developed a website (Enhance transgenes, not yet public) that enables the user to upload a gene sequence and that we will convert to an optimized transgene. The approach is to mimic human intronless genes in their site-specific GC content and allow the user to select several options, crucially whether to specifically ablate ESEs. A first approach has been trialed and implemented – giving better results than a commercial alternative [8]. The website development is complete (at least in first iteration) and full scale experimental benchmarking is underway.
While application of insights is important, a key novely of the program was to ask what genomic features might be adaptations to mitigate errors (WP1). We found for example, that over-use of the nucleotide A at CDS fourth sites is best understood as a trap for error-prone transcription initation as it permits immediate ribosomal rescue (NTGA becomes TGA, a stop) [9].
The theoretical greater novelty is the notion that error control is more important when the effective population size (Ne) is low, as low Ne gives higher error rates – this meaning, unusually, stronger selection when Ne is low, the opposite of the classical prediction from the nearly neutral model (WP1). We examined the role of errors in gene evolution as a f
This project interfaces both fundamental evolutionary genetics and medicine. We have provided the first robust evidence that the correct view of the human genome is that it is bloated owing to weak selection, but in addition that this weak selection has led to more errors and in turn more error mitigation. Thus in contradiction to classical theory, selection - at least for error mitigation - can be stronger when populations are small. These results have a direct societal impact in reforming the notion of human perfection.
We have demonstrated the existence of a species with error prone translation owing to the presence of two tRNAs for the same codon. This breaks the last rule of genetic codes: in this species we cannot predict the proteome just knowing the genome as translation of one codon is stochastic.
As regards applications to medicine, the first application of our novel protocol to design new genes for gene therapy outperformed the commercially available alternative.
Our research into the evolution of error prone gene expression has led to us being able to isolate naive human stem cells and provide an improved growth medium for them.