WorldClim, elevation and distribution data for all palms from: The ecology of palm genomes: Repeat-associated genome size expansion is constrained by aridity
Data files
Jun 02, 2022 version files 14 MB
Abstract
Genome size varies 2,400-fold across plants, influencing their evolution through changes in cell size and cell division rates which impact plants’ environmental stress tolerance. Repetitive element expansion explains much genome size diversity, and the processes structuring repeat ‘communities’ are analogous to those structuring ecological communities. However, which environmental stressors influence repeat community dynamics has not yet been examined from an ecological perspective.
We measured genome size and leveraged climatic data for 91% of genera within the ecologically diverse palm family (Arecaceae). We then generated genomic repeat profiles for 141 palm species, and analysed repeats using phylogenetically-informed linear models to explore relationships between repeat dynamics and environmental factors.
We show that palm genome size and repeat ‘community’ composition are best explained by aridity. Specifically, Ty3-gypsy and TIR elements were more abundant in palm species from wetter environments, which generally had larger genomes, suggesting amplification. In contrast, Ty1-copia and LINE elements were more abundant in drier environments.
Our results suggest that water stress inhibits repeat expansion through selection on upper genome size limits. However, elements which may associate with stress-response genes (e.g., Ty1-copia) have amplified in arid-adapted palm species. Overall, we provide novel evidence of climate influencing the assembly of repeat ‘communities’.
Methods
Geographic occurrence data were collated from an existing palm distribution dataset which contained occurrence data from GBIF (www.gbif.com; dataset https://doi.org/10.15468/dd.at82kf) and from herbarium specimens (collected from K and L). To collect data from GBIF, all palm names published at that time (March 2018, from WCVP (2020)) were searched against the GBIF taxonomic backbone, and occurrences were retrieved for the 7,469 names that matched. Occurrences were then reconciled to a list of accepted palm names at the time (WCVP, 2020), and cleaned based on the GBIF coordinate issue flags and using the R (R Development Core Team, 2013) package CoordinateCleaner v1.0-7 (Zizka et al., 2019). We first corrected issues such as incorrect coordinate signs, and removed coordinates falling into maritime areas, city, province or country centroids, biodiversity institutions and coordinates with zero values or with an uncertainty > 100 km. Finally, we removed duplicate coordinates, coordinates inconsistent with the country assignment of the record or falling outside the native distribution range of the species and those recorded before 1945.
Based on this refined occurrence dataset, we downloaded environmental data from WorldClim for all 472 species with genome size estimates using the R package raster (Hijmans & van Etten 2012). Data were extracted for each individual in the occurrence dataset for all palm species, comprising all nineteen bioclimatic variables from the WorldClim dataset, which detail biologically significant measures of temperature and precipitation (BIO1 to BIO19), as well as elevation data. From this we calculated a ‘per-species’ mean for each variable by averaging every value for all individuals of a species.