Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - GENENET (Gene networks to investigate lateral gene transfer in parasitic protozoa)

Teaser

Protozoan pathogens cause major diseases affecting humans, livestock and plants in the developing World and they are an emerging problem for the developed world. Despite their importance for human health, these pathogens are still poorly studied with respect to their genome...

Summary

Protozoan pathogens cause major diseases affecting humans, livestock and plants in the developing World and they are an emerging problem for the developed world. Despite their importance for human health, these pathogens are still poorly studied with respect to their genome evolution and its importance for pathogen biology. The increasing availability of complete genomes provides opportunities to gain a better understanding of genome content, to understand the role of lateral gene transfer in providing new pathogenic abilities, and to identify how pathogens differ from their free-living relatives and from their hosts – potentially identifying more selective therapeutic targets. The project has applied a multidisciplinary approach combining sophisticated Bayesian phylogenetics and network-based methods to identify how vertical inheritance and lateral gene transfer (LGT) have affected the genomes and metabolism of important protozoan pathogens and eukaryotes generally. It’s goals are to deliver detailed insights into how lateral gene flow has affected the genomes of strategically chosen pathogens and free-living microbial eukaryotes, with general implications for understanding how all eukaryotic genomes, including our own, have evolved and continue to evolve. Our overall objectives were:
1. To evaluate and implement the use of protein similarity networks to identify LGT affecting microbial eukaryotes including major pathogens and their free-living relatives, and to compare and benchmark the network results with tree-based inferences and exemplars.
2. To systematically quantify prokaryote-to-eukaryote LGT affecting protozoan pathogens and microbial eukaryotes and to identify how host life style, geography and ecology might influence gene transfer.
3. To investigate the potential functional impact of LGT in pathogen and microbial eukaryote biology by mapping LGTs onto functional pathways.

Work performed

\"Global protein similarity networks to identify LGT were constructed for a set of genomes including pathogenic kinetoplastid (e.g. Trypanosoma and Leishmania) protozoa, representatives of the 5 main super groups of eukaryotes, the major groups of Archaea and a representative sample of Bacteria. To benchmark the sensitivity of our approach for detecting LGT, we also included eukaryotic genomes where systematic screens for LGT using tree-based methods have already been published. All of the data were downloaded from public databases. To assemble a representative sample of bacterial gene diversity we used the Eggnog v 4 data-base of orthologous groups and linear programming (www.gnu.org/software/glpk/) to maximize coverage and minimize redundancy of the bacterial protein sequence diversity in our sample. The \"\"Evolutionary gene and genome network\"\" (EGN) software was then used to make a sequence similarity network for all of the proteins in our data set. To remove weaker edges based upon short proteins or alignment segments we used a quality criterion based upon Homology-derived Secondary Structure of Proteins (HSSP) scores. The HSSP distance is a measure of sequence similarity that considers both pairwise sequence identity and alignment length - higher HSSP values are thus required to infer homology between short proteins or alignments. The use of HSSP scores greater than 5 above the HSSP threshold curve, has previously been shown to be improve homology predictions in tree-based analyses for detecting LGT. The sequence similarity networks obtained using EGN represent sequences as nodes and all pairwise sequence relationships as edges. The networks were screened for potential LGTs using the R/Igraph package v. 1.0.1 to identify eukaryotic nodes showing linkages with taxonomic distributions that were not consistent with simple vertical inheritance. The simplest examples of LGT comprised examples where eukaryotic nodes were only connected to prokaryotic nodes in the network, but we also identified eukaryotic nodes that showed significantly fewer connections to other eukaryotes (assessed using randomization), and significant similarity to prokaryotes. In order to further test if these outliers were candidate LGTs, we measured the geodesic distances: the number of edges in the shortest path between eukaryotic nodes, normalized by the diameter of the network, as a measure of the proximity of eukaryotic nodes to each other, and identified eukaryotic nodes that had significantly larger geodesic distance compared to other eukaryotes. We also used the Jaccard similarity index (JI) to quantify node interaction profile similarity to identify eukaryotic nodes that shared interaction profiles that were more similar to adjacent prokaryotic nodes than to adjacent eukaryotic nodes. Analyses that combine different complementary approaches to infer LGT often improve the quality of predictions, so we subjected the LGTs identified by these network-based methods to further analysis by tree-based and non-tree based methods, to identify an “consensus set” of LGTs. One of the most interesting findings of this work was that LGTs affect all eukaryote genomes, suggesting that prokaryote-to-eukaryote LGT is a pervasive force shaping the genomes of microbial eukaryotes. The genes being transferred affect metabolic pathways like glycolysis and amino acid metabolism but also include genes of potential adaptive significance for the pathogenic lifestyle. The identity of prokaryotic donor lineages shows a strong correlation with shared habitat. At present the results of these analyses are being prepared for publication. The dataset of LGTs will also be deposited in public databases providing a resource of potential targets for therapeutic intervention against pathogenic protozoa.
\"

Final results

This combined approach recovered more than 60% of previously published cases of LGT based upon phylogenetic analyses, as well as identifying a large number of new cases (~6000) of LGT. One of the most interesting findings of this work was that LGTs affect all eukaryote genomes, suggesting that prokaryote-to-eukaryote LGT, outside of endosymbiosis and involving diverse prokaryotes, is a pervasive force shaping the genomes of microbial eukaryotes. The genes being transferred are mainly those affecting metabolic pathways like glycolysis and amino acid metabolism but also include genes of potential adaptive significance, for example for the enzymes used by mucosal parasites to degrade host polysaccharides. The identity of prokaryotic donor lineages shows a strong correlation with shared habitat, so that relatively closely related eukaryotes that occupy different habitats have LGTs from different types of prokaryotes. At present the results of these analyses are being prepared for publication. The dataset of LGTs will also be deposited in public databases providing a resource of potential targets for therapeutic intervention against pathogenic protozoa.

Website & more info

More info: http://www.ncl.ac.uk/camb/staff/profile/martinembley.html.