Data from: Evolutionary diversity in tropical tree communities peaks at intermediate precipitation
Neves, Danilo M. et al. (2020), Data from: Evolutionary diversity in tropical tree communities peaks at intermediate precipitation, Dryad, Dataset, https://doi.org/10.5061/dryad.gf1vhhmk0
Global patterns of species and evolutionary diversity in plants are primarily determined by a temperature gradient, but precipitation gradients may be more important within the tropics, where plant species richness is positively associated with the amount of rainfall. The impact of precipitation on the distribution of evolutionary diversity, however, is largely unexplored. Here we detail how evolutionary diversity varies along precipitation gradients by bringing together a comprehensive database on the composition of angiosperm tree communities across lowland tropical South America (2,025 inventories from wet to arid biomes), and a new, large-scale phylogenetic hypothesis for the genera that occur in these ecosystems. We find a marked reduction in the evolutionary diversity of communities at low precipitation. However, unlike species richness, evolutionary diversity does not continually increase with rainfall. Rather, our results show that the greatest evolutionary diversity is found in intermediate precipitation regimes, and that there is a decline in evolutionary diversity above 1,490 mm of mean annual rainfall. If conservation is to prioritise evolutionary diversity, areas of intermediate precipitation that are found in the South American ‘arc of deforestation’, but which have been neglected in the design of protected area networks in the tropics, merit increased conservation attention.
We constructed a genus-level phylogeny comprising 1,100 angiosperm genera found in lowland tropical South America, following protocols developed by Dexter & Chave1. We used two chloroplast DNA gene regions: rbcL and matK. These genes were chosen based on their universality, data availability, typical sequence quality, degree of genus-level discrimination, sequencing costs and because they are recommended for standard DNA barcoding in plants2-4. We generated 198 novel rbcL and 264 novel matK sequences from leaf fragments collected during extensive fieldwork across South America (all sequences generated in this study have an associated voucher; see Appendix 3 for a list of accession numbers and voucher information). Further sequences were also obtained through Genbank (http://www.ncbi.nlm.nih.gov/), and restricted to accessions that had an associated voucher. We had both rbcL and matK sequences for 808 genera (73%), only rbcL for 128 genera (12%) and only matK for 163 genera (15%). Sequences that were unavailable for a single region for a given genus were left as missing data. Exploratory sequence alignments and phylogenetic reconstructions enabled us to exclude sequences that were likely to represent taxonomic misidentifications. The details of DNA extraction, PCR, and DNA sequencing protocols can be found in Gonzales et al.5 A list of sampled genera, their respective family and GenBank accession numbers are available in Appendix 3.
We conducted multiple sequence alignments, separately for each region, using MAFFT v.6.8226, followed by manual adjustments in Mesquite (http://mesquiteproject.org). After manual alignments, we reduced remaining alignment issues by removing all sites which were missing data for >99% of genera. All rbcL and matK sequences were then combined to generate a starting maximum likelihood tree using RAxML v.7.2.7 in the CIPRES Science Gateway (https://www.phylo.org). A topological constraint specifying the major relationships among angiosperm orders was imposed based on the Angiosperm Phylogeny Group7. The early-branching angiosperm Nymphaea alba L. (Nymphaeaceae) was specified as an outgroup. This initial phylogeny was made ultrametric by using nonparametric rate smoothing method8 implemented in the ape package9 in the R statistical environment10, and then used as a starting tree in a Bayesian Markov Chain Monte Carlo (MCMC) approach to simultaneously estimate tree topology and divergence times of taxa11. These analyses were performed using the BEAST software v.1.8.2 on the CIPRES server (https://www.phylo.org). An uncorrelated lognormal relaxed molecular clock was implemented, and the tree prior was a Birth-Death Incomplete Sampling model of speciation12. We used 86 previously compiled fossil-based age constraints to calibrate node ages13,14. Internal nodes were constrained using a log-normal distribution with a mean value equal to the fossil age, a standard deviation of 2 and a hard constraint for a minimum age equal to 80% of the estimated fossil age. No constraints were placed on the root age of the tree. We optimized operator settings before conducting the final runs by using a preliminary tree in test runs of 106 generations.
We carried out three independent MCMC runs for 70.2 x 105, 80.3 x 105 and 58.6 x 105 generations, under the same estimation conditions. We excluded burn-ins of 103 and 2 x 103 generations for the first two and third runs, respectively. We used LogCombiner to combine the three independent runs before sampling 282 trees evenly spaced across the posterior distribution, which were used to assemble a consensus tree. The consensus tree was assembled following the all compatible consensus rule; i.e., we choose the topology of a given node based on the relationship found in a plurality of trees from across the posterior distribution. This ensures a fully bifurcating topology that represents the most probable relationships of taxa. Lastly, we used TreeAnnotator (http://beast.bio.ed.ac.uk/treeannotator) to assign branch-lengths and divergence times (node heights) as the mean values from across the posterior distribution.
We also generated a species-level phylogeny, using the genus-level phylogeny as a basis. This consisted of imputing all 8,174 species in the floristics dataset by simulating a random birth-death phylogeny for each genus, using a speciation rate of 1 and an extinction rate of 0.9. We conducted these simulations in the TreeSim15 package in R10. This procedure produced a fully bifurcating phylogeny for each genus with the number of tips (i.e. species) corresponding to the number of species in the genus in our dataset. Importantly, we retained the stem age of the genus as estimated using our temporally calibrated phylogeny, while the crown age of genera was a product of the simulation (the mean expectation under a coalescent process is an age half that of the crown age, but there is variability around this expectation). The result of our approach is that the phylogenetic diversity represented by an individual species is proportional to the stem age of the genus divided by the species richness of the genus.
1. Dexter, K. G. & Chave, J. Evolutionary patterns of range size, abundance and species richness in Amazonian trees. PeerJ, 2043v1 (2016).
2. CBOL Plant Working Group. A DNA barcode for land plants. Proc. Natl. Acad. Sci. USA 106, 12794-12797 (2009).
3. Kress, W. J. & Erickson D. L. DNA barcodes: methods and protocols. Methods Mol. Biol. 858, 3-8 (2012).
4. Kress, W. J., Lopez, I. C. & Erickson, D. L. Generating plant DNA barcodes for trees in long-term forest dynamics plots. Methods Mol. Biol. 858, 441-458 (2012).
5. Gonzalez, M. A. et al. Identification of Amazonian trees with DNA barcodes. PLoS One 4, e7483 (2009).
6. Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059-3066 (2002).
7. The Angiosperm Phylogeny Group. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 181, 1-20 (2016).
8. Sanderson M. J. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol. Biol. Evol. 19, 101-109 (2002).
9. Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289-290 (2004).
10. R Core Team. R: a language and environment for statistical computing. Version 3.1.0. R Foundation for Statistical Computing, Vienna (2016). Available at: http://www.Rproject.org/
11. Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, p. 214 (2007).
12. Stadler, T. On incomplete sampling under birth-death models and connections to the sampling-based coalescent. J. Theor. Biol. 261, 58-66 (2009).
13. Baker, T. R. et al. Fast demographic traits promote high diversification rates of Amazonian trees. Ecol. Lett. 17, 527-536 (2014).
14. Magallon, S., Gomez-Acevedo, S., Sanchez-Reyes, L. L. & Hernandez-Hernandez T. A metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. New Phytol. 207, 437-453 (2015).
15. Stadler, T. TreeSim: Simulating Phylogenetic Trees. R package version 2.3 (2017). Available at: https://CRAN.R-project.org/package=TreeSim
Genus_phylogeny.tre - Time-calibrated molecular phylogeny of 1,100 angiosperm genera found in lowland tree communities of tropical South America. Phylogenetic reconstruction based on sequences of rbcL and matK plastid regions from plants collected during fieldwork or available in GenBank. Tree topology and divergence times of taxa were estimated using a Bayesian Markov Chain Monte Carlo approach. Branch lengths were time-scaled using a relaxed molecular clock with fossil-based age constraints implemented on nodes.
Genus_multiphylogeny.tre - Set of 100 time-calibrated molecular phylogenies from a Bayesian Markov Chain Monte Carlo (MCMC) posterior distribution. Each phylogeny comprises 1,100 angiosperm genera found in lowland tree communities of tropical South America. Phylogenetic reconstruction based on sequences of rbcL and matK plastid regions from plants collected during fieldwork or available in GenBank. Tree topology and divergence times of taxa were estimated using a Bayesian MCMC approach. Branch lengths were time-scaled using a relaxed molecular clock with fossil-based age constraints implemented on nodes.
Species_phylogeny.tre - Phylogeny of angiosperm species found in lowland tree communities of tropical South America, generated using the genus-level phylogeny as a basis (see Genus_phylogeny.tree above). This consisted of pruning the genus-level phylogeny to 852 genera in the community dataset (http://www.neotroptree.info/), and then imputing all 8,174 species in this dataset by simulating a random birth-death phylogeny for each genus, using a speciation rate of 1 and an extinction rate of 0.9.
Appendix 3.xlsx - List of 1,100 angiosperm genera used in the phylogenetic reconstructions, their respective sources of rbcL and matK sequences, GenBank accession numbers, eDNA numbers (i.e., internal codes for sequences generated at the Royal Botanic Garden Edinburgh), collectors, collector numbers and herbaria where vouchers were deposited (when applicable). RBG = Royal Botanic Garden Edinburgh. Asterisk indicates the 852 genera used in the community phylogenetic analyses.
Natural Environment Research Council/UK, Award: NE/I028122/1
Conselho Nacional de Desenvolvimento Científico e Tecnológico/Brazil, Award: SISBIOTA 563084/2010-3
Conselho Nacional de Desenvolvimento Científico e Tecnológico/Brazil, Award: 236805/2012-6 PDE CsF
Conselho Nacional de Desenvolvimento Científico e Tecnológico/Brazil, Award: 305617/2018-4
National Science Foundation/USA, Award: DEB-1556651
Mohamed bin Zayed Species Conservation Fund, Award: 12053537
Leverhulme Trust/UK, Award: International Academic Fellowship
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior/Brazil, Award: Program "Visiting Senior Professor in the Amazon"
Natural Environment Research Council, Award: NE/I028122/1
National Science Foundation, Award: DEB-1556651
Conselho Nacional de Desenvolvimento Científico e Tecnológico, Award: 563084/2010-3