Hyper-specialized bamboo lemurs possess a reduced suite of xenobiotic-metabolizing cytochrome P450 genes
Data files
Dec 20, 2023 version files 9.88 MB
-
bamboo-lemur-p450s.tar.gz
9.88 MB
-
README.md
1.33 KB
Abstract
Subfamilies of cytochrome P450 proteins have been strongly linked to the metabolism of physiologically disruptive compounds such as alkaloids, terpenoids, and other xenobiotics. Consistent with this function, these genes have adaptively evolved in response to environmental pressures exerted on animals, such as herbivores, that consume elevated amounts of toxic xenobiotics or plant secondary metabolites (PSMs). Theory on evolutionary tradeoffs predicts that highly specialized herbivores should exhibit a relatively narrow toolkit of adaptations to accommodate the concomitantly narrow arrays of PSMs in their diets. The bamboo lemurs of Madagascar (genera Prolemur and Hapalemur) represent an interesting test case for this theory because of their dietary hyper-specialization, as these lemurs consume bamboo and grasses at rates otherwise unseen in the order Primates. To test whether the hyper-specialized folivory of these primates is reflected in a similarly specialized and narrow P450 gene suite, we assembled a dataset of confidently assembled CYP1-3 genes for two species of bamboo lemur and 13 additional lemur species. With this dataset, we tested the predictions that bamboo lemurs would exhibit, first, greater rates of gene loss for xenobiotic-metabolizing P450s and, second, relaxed selection on xenobiotic-metabolizing P450 subfamilies relative to lemurs without such dietary hyper-specialization. We found support for the prediction of gene loss in the CYP2B, CYP2C, CYP2D, CYP2J, and CYP3A subfamilies, all of which encode xenobiotic metabolizers. We inferred relaxation of selection for the CYP1A and CYP2D subfamilies. The CYP2F subfamily exhibited a signal of significant intensification of selection in the bamboo-lemur lineage. The evolution of the P450 genes in bamboo lemurs provides support for the evolutionary tradeoff hypothesis, and we further hypothesize that, rather than adapting to a general array of PSMs, bamboo lemurs have instead adapted to the primary toxin in their diet, the highly potent poison cyanide.
README: Hyper-specialized bamboo lemurs possess a reduced suite of xenobiotic-metabolizing cytochrome P450 genes.
https://doi.org/10.5061/dryad.wm37pvmtq
This dataset contains all the input data needed to reproduce results from a manuscript of the same title. The data herein include alignments (standard format and LASTZ format), as well as output trees and statistics from the programs PhyML and jModelTest.
Description of the data and file structure
The tar.gz archive contains three subdirectories: LASTZ, CDS, and PhyML. The LASTZ folder contains the pairwise alignments used to find genes for nearly all downstream parts of the paper. For each alphanumeric subfamily (representing individual cytochrome P450 gene-subfamilies), there are .gb files for the reference species (in order to provide some information about reference-gene locations) and .bam files for all query species. The CDS subdirectory contains all the codon alignments required to to replicate the HyPhy analyses in the previously mentioned manuscript. All trees for those tests should be taken from the PhyML/Output subdirectory. The PhyML folder comprises all of the output from our PhyML runs mentioned in the paper, as well as the input data needed to check those runs (found in the PhyML/jModelTest and PhyML/Alignments subdirectories).
Methods
Data gathering
In addition to a novel genome assembly for Hapalemur griseus, we mined data from publicly available genome assemblies for 14 species: Prolemur simus (Hawkins et al., 2018), Lemur catta (Palmada-Flores et al., 2022), Eulemur flavifrons and E. macaco (Meyer et al. 2015), Propithecus coquereli (Lowe and Eddy 1997; Guevara et al. 2021), Indri indri (accession number: GCA_004363605.1), Daubentonia madagascariensis (accession number: GCA_004027145.1), Mirza coquereli (accession number: GCA_004024645.1), Mirza zaza (Hunnicutt et al., 2020), and Microcebus murinus (Averdam et al., 2011; Lecompte et al., 2016), as well as the following additional species of mouse lemur: Mic. griseorufus, Mic. mittermeieri, Mic. ravelobensis, and Mic. tavaratra (Hunnicutt et al., 2020). These assemblies, along with any associated annotation files, were downloaded locally and formatted into BLAST databases within Geneious Prime, version 2022.1.1.
We located the loci for all annotated CYP1-3 homologs in the L. catta, Prop. coquereli, and Mic. murinus by using the associated annotation (GFF3) files for each. We defined these loci by the non-P450 genes that bounded them; therefore, those surrounding genes were used initially as queries for local BLAST searches. In this way, each locus was linked to two searches per species. The three reference genomes listed above were used because they are all members of separate strepsirrhine families (Lemuridae, Indriidae, and Cheirogaleidae, respectively), and they were each therefore used as a starting point to extract the desired CYP1-3 genes or loci for confamilial species. Ideally, a pair of BLAST searches would return results that included the same scaffold. By locating both BLAST hits on each of these scaffolds, we were able to extract genomic regions that were hypothetically orthologous to those P450 loci in the L. catta assembly. After locating the scaffolds in each assembly corresponding to each P450 locus, we used LASTZ (Harris, 2007) to interrogate the homology of those scaffolds by aligning them to the confirmed P450 locus from the appropriate confamilial reference genome. Positive results from these alignments were checked using the Mauve genome aligner (Darling et al., 2004) on the same sequences. If output from both of these aligners indicated that the reference had homology with the query scaffold(s), then the annotations from the reference genome were used to extract the corresponding sequence in the other species’ genome. In this way, we mined the genome assemblies listed above for as many complete P450 genes loci as we could confidently locate.
Inference of gene birth and death
For this first portion of the study, we used only species for which each CYP1-3 locus could be wholly collected from a single scaffold or reasonably reconstructed if not found on a single scaffold using the process described above. In order to model the events of gene birth and death in this subset of lemur species, our alignment strategy followed a similar workflow as outlined in previous work with other datasets (Chaney et al., 2018, 2020), but several modifications were made for this project in order to allow for more standardization and automation across subfamilies. First, all of the P450 genes were extracted from each species’ locus according to the annotation file associated with its confamilial reference. Then, all of the genes from a given P450 subfamily were aligned using MAFFT (Katoh & Standley, 2013), and the resulting alignment was stripped of all sites containing any gaps using trimAl (Capella-Gutierrez et al., 2009). After the best-fitting nucleotide substitution model was inferred by jModelTest (Darriba et al., 2012), this stripped alignment was visualized with PhyML 3.0 and the strength of that resulting phylogenetic tree was tested by comparing it to 1000 bootstrap replicates (Guindon et al., 2010).
The gene trees constructed with PhyML were then passed to Possvm (Grau-Bové & Sebé-Pedrós, 2021). This program uses the intrinsic information contained in a phylogram to infer speciation and gene-duplication events; it does this using the species-overlap algorithm in the ETE3 toolkit (Huerta-Cepas et al., 2007, 2016). Briefly, this algorithm compares the intersection of species present in both descendants of an internal node of a tree to the union of species present in those descendants; using these values, the algorithm computes a species-overlap score which it then uses to identify each internal node as either a speciation event, having an overlap score, or a duplication event, having a high overlap score (Huerta-Cepas et al., 2007). Once the identities of each node were estimated in this way, we then manually examined each subtree rooted by a node called as a duplication event to infer whether any gene loss had occurred. This was examined on a case-by-case basis using the reasoning that, after a duplication event, each descendant of that node should recapitulate the organismal phylogeny present at the time of duplication. Therefore, any species missing in one of those subtrees must have lost one of the duplicates born in the earlier duplication event as long as the subtree in question was well-resolved in terms of bootstrap support. In cases where multiple species lineages may be absent, we deferred to the parsimonious hypothesis that a loss event would have occurred prior to the divergence of those lineages, rather than a more complicated hypothesis that the same paralog had been independently lost in both species after their split. We visualized the Possvm output using the program Treerecs (Comte et al., 2020) and then, in some cases, manually modified the depicted gene-evolution scenario in order to accommodate the Possvm results.
References
Averdam, A., Kuschal, C., Otto, N., Westphal, N., Roos, C., Reinhardt, R., & Walter, L. (2011). Sequence analysis of the grey mouse lemur (Microcebus murinus) MHC class II DQ and DR region. Immunogenetics, 63(2), 85–93. https://doi.org/10.1007/s00251-010-0487-3
Capella-Gutierrez, S., Silla-Martinez, J. M., & Gabaldon, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972–1973. https://doi.org/10.1093/bioinformatics/btp348
Chaney, M. E., Piontkivska, H., & Tosi, A. J. (2018). Retained duplications and deletions of CYP2C genes among primates. Molecular Phylogenetics and Evolution, 125, 204–212. https://doi.org/10.1016/j.ympev.2018.03.037
Chaney, M. E., Romine, M. G., Piontkivska, H., & Tosi, A. J. (2020). Diversifying selection detected in only a minority of xenobiotic-metabolizing CYP1-3 genes among primate species. Xenobiotica, 50. https://doi.org/10.1080/00498254.2020.1785580
Comte, N., Morel, B., Hasić, D., Guéguen, L., Boussau, B., Daubin, V., Penel, S., Scornavacca, C., Gouy, M., Stamatakis, A., Tannier, E., & Parsons, D. P. (2020). Treerecs: An integrated phylogenetic tool, from sequences to reconciliations. Bioinformatics, 36(18), 4822–4824. https://doi.org/10.1093/bioinformatics/btaa615
Darling, A. C. E., Mau, B., Blattner, F. R., & Perna, N. T. (2004). Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Research, 14(7), 1394–1403. https://doi.org/10.1101/gr.2289704
Darriba, D., Taboada, G. L., Doalla, R., & Posada, D. (2012). jModelTest 2: More models, new heuristics and parallel computing. Nature Methods, 9(8), 772.
Guevara, E. E., Webster, T. H., Lawler, R. R., Bradley, B. J., Greene, L. K., Ranaivonasy, J., Ratsirarson, J., Harris, R. A., Liu, Y., Murali, S., Raveendran, M., Hughes, D. S. T., Muzny, D. M., Yoder, A. D., Worley, K. C., & Rogers, J. (2021). Comparative genomic analysis of sifakas (Propithecus) reveals selection for folivory and high heterozygosity despite endangered status. Science Advances, 7(17), 1–13. https://doi.org/10.1126/sciadv.abd2274
Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., & Gascuel, O. (2010). New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 59(3), 307–321.
Grau-Bové, X., & Sebé-Pedrós, A. (2021). Orthology Clusters from Gene Trees with Possvm. Molecular Biology and Evolution, 38(11), 5204–5208. https://doi.org/10.1093/molbev/msab234
Harris, R. S. (2007). Improved Pairwise Alignment of Genomic DNA. PhD Dissertation. The Pennsylvania State University.
Hawkins, M. T. R., Culligan, R. R., Frasier, C. L., Dikow, R. B., Hagenson, Ryan., Lei, Runhua., & Louis, E. E. (2018). Genome sequence and population declines in the critically endangered greater bamboo lemur (Prolemur simus) and implications for conservation. BMC Genomics, 19(1), 445. https://doi.org/10.1186/s12864-018-4841-4
Huerta-Cepas, J., Dopazo, H., Dopazo, J., & Gabaldón, T. (2007). The human phylome. Genome Biology, 8(6), R109. https://doi.org/10.1186/gb-2007-8-6-r109
Huerta-Cepas, J., Serra, F., & Bork, P. (2016). ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Molecular Biology and Evolution, 33(6), 1635–1638. https://doi.org/10.1093/molbev/msw046
Hunnicutt, K. E., Tiley, G. P., Williams, R. C., Larsen, P. A., Blanco, M. B., Rasoloarison, R. M., Campbell, C. R., Zhu, K., Weisrock, D. W., Matsunami, H., & Yoder, A. D. (2020). Comparative Genomic Analysis of the Pheromone Receptor Class 1 Family (V1R) Reveals Extreme Complexity in Mouse Lemurs (Genus, Microcebus) and a Chromosomal Hotspot across Mammals. Genome Biology and Evolution, 12(1), 3562–3579. https://doi.org/10.1093/gbe/evz200
Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution, 30(4), 772–780. https://doi.org/10.1093/molbev/mst010
Lecompte, E., Crouau-Roy, B., Aujard, F., Holota, H., & Murienne, J. (2016). Complete mitochondrial genome of the gray mouse lemur, Microcebus murinus (Primates, Cheirogaleidae). Mitochondrial DNA Part A, 27(5), 3514–3516. https://doi.org/10.3109/19401736.2015.1074196
Lowe, T. M., & Eddy, S. R. (1997). tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence. Nucleic Acids Research, 25(5), 955–964. https://doi.org/10.1093/nar/25.5.955
Meyer, W. K., Venkat, A., Kermany, A. R., Van De Geijn, B., Zhang, S., & Przeworski, M. (2015). Evolutionary history inferred from the de novo assembly of a nonmodel organism, the blue-eyed black lemur. Molecular Ecology, 24(17), 4392–4405. https://doi.org/10.1111/mec.13327
Palmada-Flores, M., Orkin, J. D., Haase, B., Mountcastle, J., Bertelsen, M. F., Fedrigo, O., Kuderna, L. F. K., Jarvis, E. D., & Marques-Bonet, T. (2022). A high-quality, long-read genome assembly of the endangered ring-tailed lemur (Lemur catta). GigaScience, 11, 1–7. https://doi.org/10.1093/gigascience/giac026