Evolution of repetitive genomic content and gene families over geo-climatic gradients in Brassicaceae
Data files
Nov 26, 2025 version files 35.55 KB
-
filter_gbif_downloads.R
20.82 KB
-
gbif_download_links.xlsx
12.64 KB
-
README.md
1.90 KB
-
thin_gbif_datasets.R
191 B
Abstract
On temperature gradients such as elevation or latitude, species turnover is common and specialists can persist in extreme environments. This is likely paralleled by adaptive and possibly also non-adaptive changes on a molecular level, from genes to the structure of genomes. Here we investigated associations between elevation and latitude, partly represented by climate variables, with features of the genome including genome size, transposable element (TE) content and gene family expansion and contraction by comparative genomics using the plant family Brassicaceae. Together, the geo-climatic variables were good predictors of TE content and genome size, explaining 40-60% or more of the variation among species. The relationship between mean annual temperature and TE content was U-shaped, with species of cooler and hotter climates generally having more TEs. The relationships with elevation and mean annual precipitation (both corrected for temperature) were positive. Patterns were most prevalent for the most abundant TE class, long terminal repeat elements (LTR). Gene family expansions and contractions in species of high elevations highlighted a restructured genomic architecture regarding cell wall modeling, the response to temperature stimulus and processes involved in posttranslational protein modifications. Results point to abiotically extreme environments either favoring high TE contents or constraining TE silencing on the level of species. Furthermore, establishing in distinct geo-climatic regions seems associated with considerable parallel evolution with overlapping gene families changing copy numbers.
Dataset DOI: 10.5061/dryad.79cnp5j8z
Description of the data and file structure
Due to different licensing of GBIF entries, I provide here the download links for the datasets used and the R scripts used to filter and thin the data.
-
gbif_download_links.xlsx: Links used to download the data from GBIF.
-
filter_gbif_downloads.R: R script used to filter the downloaded files from GBIF.
-
thin_gbif_datasets.R: R script to thin the data.
abbreviations:
| A. alpina | aalp |
|---|---|
| A. thaliana | atha |
| A. arenosa | aareno |
| A. halleri | ahal |
| A. lyrata | alyr |
| A. arenicola | aare |
| A. caerulea | acae |
| A. ciliata | acil |
| B. divaricarpa | bdiv |
| B. puberula | bpub |
| B. retrofracta | bret |
| B. stricta | bstricta |
| B. vulgaris | bvul |
| C. himalaica | chim |
| C. hirsuta | chir |
| C. planisiliqua | cpla |
| C. rubella | crub |
| C. hirsuta | chir |
| C. resedifolia | cres |
| E. heterophyllum | ehet |
| E. syriacum | esyr |
| E. yunnanense | eyun |
| E. salsugineum | esal |
| K. saxatilis | ksax |
| L. africanum | lafr |
| L. aucheri | lauc |
| M. erraticum | merr |
| N. brachypetala | nbra |
| N. rotundifolia | nrot |
| P. turrita | ptur |
| R. bulbosa | rbul |
| S. altissimum | salt |
| S. irio | siri |
| T. glabra | tgla |
| T. parvula | tpar |
Access information
Other publicly accessible locations of the data:
- GBIF
Data was derived from the following sources:
- GBIF
The median of elevation, median of latitude, median of mean annual temperature (MAT) and mean annual precipitation (MAP) were calculated using the WorldClim database (Fick & Hijmans, 2017) per species based on the GBIF entries for each species (Tab. S7). GBIF entries were first filtered manually for the correct subspecies and the species’ native distribution range according to Plants of the World Online (2024). Then, sampling points were thinned to only keep entries which were at least 5km away from each other using the function “geoThin” from the package “enmSdmX” (Smith, 2022) in R v4.3.0. For A. thaliana and C. hirsuta, 10% of the data was randomly taken of all entries from GBIF before thinning, as it had too many entries for this step to run with reasonable memory-efficiency (original and thinned files will be deposited on Dryad after acceptance).
