Chromosomal evolution, environmental heterogeneity, and migration drive spatial patterns of species richness in Calochortus (Liliaceae)
Data files
Nov 22, 2024 version files 17.40 MB
-
Calchortus_NuclearLociAlignments.zip
4.09 MB
-
Calochortus_Liliaceae_5PlastidGeneAlignments.zip
29.62 KB
-
Calochortus_OnExUltrametric.tre
2.69 KB
-
CalochortusPlastomeOneEx_varsPlus_forDating.fasta
13.27 MB
-
README.md
5.78 KB
Abstract
We used nuclear genomic data and statistical models to evaluate the ecological and evolutionary processes shaping spatial variation in species richness in Calochortus (Liliaceae, 74 spp.). Calochortus occupies diverse habitats in the western United States and Mexico and has a center of diversity in the California Floristic Province, marked by multiple orogenies, winter rainfall, and highly divergent climates and substrates (including serpentine). We used sequences of 294 low-copy nuclear loci to produce a time-calibrated phylogeny, estimate historical biogeography, and test hypotheses regarding drivers of present-day spatial patterns in species number. Speciation and species coexistence require reproductive isolation and ecological divergence, so we examined the roles of chromosome number, environmental heterogeneity, and migration in shaping local species richness. Six major clades – inhabiting different geographic/climatic areas, and often marked by different base chromosome numbers (n=6-10) – began diverging from each other ~10.3 million years ago. As predicted, local species number increased significantly with local heterogeneity in chromosome number, elevation, soil characteristics, and serpentine presence. Species richness is greatest in the Transverse/Peninsular Ranges where clades with different chromosome numbers overlap, topographic complexity provides diverse conditions over short distances, and several physiographic provinces meet allowing immigration by several clades. Recently diverged sister-species pairs generally have peripatric distributions, and maximum geographic overlap between species increases over the first million years since divergence, suggesting that chromosomal evolution, genetic divergence leading to gametic isolation or hybrid inviability/sterility, and/or ecological divergence over small spatial scales may permit species co-occurrence.
README: Chromosomal evolution, environmental heterogeneity, and migration drive spatial patterns of species richness in Calochortus (Liliaceae)
https://doi.org/10.5061/dryad.kwh70rzbw
This repository contains the molecular data used for the phylogenetic analyses conducted in Karimi et al. 2022 (https://doi.org/10.1073/pnas.2305) for Calochortus (Liliaceae). The nuclear data was derived from a custom-bait anchored hybrid enrichment set which resulted in 295 low-copy nuclear loci. Alignments for all Calochortus samples and outgroup species are provided here as standard fasta file alignments. In addition, for the BEAST dated phylogeny, we extracted five plastid genes from our Calochortus reads and aligned them to plastid data originally extracted from https://doi.org/10.1002/ajb2.1178. These new alignments are provided as standard fasta file alignments. The maximum clade credibility (MCC) tree derived from BEAST analyses that was used in all downstream analyses is provided here.
Description of the data and file structure
- Alignments provided of low-copy nuclear loci derived from custom-designed anchored hybrid enrichment baits. | Calchortus_NuclearLociAlignments.zip |
- We performed custom probe design, anchored hybrid enrichment and sequencing as outlined in the manuscript. For each locus, we aligned sequences in Mafft v7.023 and then trimmed/masked alignments with an automated procedure using default parameters (i.e. MINGOODSITES=14, PROPGOOD=0.5). We verified alignment quality by inspecting the resulting alignments in Geneious. Raw reads are provided in the Sequence Read Archive. Here we provide the sequence alignments.
Alignments of 5 plastid genes for initial dating analysis
| Calochortus_Liliaceae_5PlastidGeneAlignments.zip |- We used getOrganelle for de novo assembly of a Calochortus venustus plastome from Illumina short reads and Oxford Nanopore long reads. We used this plastome for subsequent reference-based assemblies. Raw reads were trimmed using Trimmomatic ver. 0.40 with the simple clip threshold (2:30:10) for adaptors and* quality-trimmed based on a 5-bp sliding window with a minimum Phred score of 20. Reads were mapped against the reference plastome using BWA with the bwa mem algorithm. The resulting bam files were sorted, PCR duplicates removed, SNPs phased, and merged using samtools ver. 1.3. We then extracted five plastid genes (atpB, psaA*,* psbD*,* rbcL*, *rps4*s) from assembled plastomes using custom BLAST. Resulting genes were aligned using MAFFT.
The plastome phylogeny as used as a constraint for input into BEAST. We used BEAST 2.6.6 (97) to calibrate the ASTRAL- III phylogeny against time employing five plastid genes (atpB, psaA, psbD, rbcL, rps4*s) extracted from assembled plastomes using custom BLAST (98). Given the lack of fossil calibration points in Liliales, we spliced our *Calochortus *tree into the across-monocots chronogram (99) pruned to Liliales using *ape *v5.6 (100). The branching topology of *Calochortus *and closely related Liliaceae s. s., *Tricyrtis, and Streptopus *was modified to that of Lu et al. (96) based on complete plastomes. We enforced prior calibrations at crown Liliales (110.53 My) and Liliaceae (42.74 My), each with a SD of 2 My and a normal distribution based on the 95% CI of (99). The birth–death tree prior was applied with an uncorrelated relaxed lognormal clock and GTR model. Site model parameters included four gamma categories, a gamma shape distribution of 1.0, estimated proportion of invariant sites, and empirical frequencies. We conducted two independent runs, each with MCMC chain length of 100,000,000 iterations, logged every 10,000 generations. A maximum clade credibility tree was inferred after a burn-in of 50% using TreeAnnotator in BEAST 2.6.6. We repeated this process using the nuclear phylogeny as a constraint and reran the dating analysis with additional secondary priors assigned to the *Calochortus *stem and crown from the plastome chronogram (*SI Appendix, Fig. S4). The resulting CI for the primary geographic clades between the nuclear and plastome chronograms were overlapping, so subsequent analyses used the nuclear tree only.
| CalochortusPlastomeOneEx_varsPlus_forDating.fasta |Ultrametric nuclear tree (maximum clade credibility (MCC) tree derived from BEAST analyses) used in BAMM, BioGeoBears and Pianka overlap analyses.
| Calochortus_OnExUltrametric.tre |
- We used BEAST 2.6.6 to calibrate the ASTRAL-III nuclear phylogeny (inferred from dataset 1) against time by first using plastid data (dataset 2). We enforced prior calibrations at crown Liliales (110.53 My) and Liliaceae (42.74 My), each with a standard deviation of 2 My and a normal distribution based on the 95% confidence intervals. The birth-death tree prior was applied with an uncorrelated relaxed lognormal clock and GTR model. Site model parameters included four gamma categories, a gamma shape distribution of 1.0, estimated proportion of invariant sites, and empirical frequencies. We conducted two independent runs, each with MCMC chain length of 100,000,000 iterations, logged every 10,000 generations. A maximum clade credibility tree was inferred after a burn-in of 50% using TreeAnnotator in BEAST 2.6.6. We repeated this process using the nuclear phylogeny as a constraint and reran the dating analysis with additional secondary priors assigned to the Calochortus stem and crown from the plastome chronogram.
Sharing/Access information
Data (monocot plastid sequences used for dating analysis) was derived from the following source: https://doi.org/10.1002/ajb2.1178
Methods
We used custom-designed baits for anchored hybrid enrichment, preparing Illumina sequence libraries for 158 samples (156 Calochortus and 2 outgroups). Our final dataset consisted of 294 low-copy nuclear loci (provided here as alignments). Specific methods for generating these data are also follows:
Sampling. We included 1 to 2 samples per species and subspecies; herbarium vouchers for new samples were deposited in the herbaria noted in SI Appendix, Table S9. Total genomic DNAs were extracted from silica-dried leaf or floral tis- sue using DNeasy plant kits (Qiagen, Valencia CA) following the manufacturer’s instructions. We included all extant Calochortus species except extremely rare C. rustvoldii [known from only two sites/pixels in the western Transverse Ranges (76))]; including it would little affect our analyses.
Library Preparation. We prepared Illumina sequence libraries for 158 samples (156 Calochortus and 2 outgroups) following Lemmon et al. (77–79). We used a Covaris ultrasonicator (with reduced time for degraded samples) to fragment DNA to 140 to 400 bp; performed end-repair and A-tailing, ligated common Illumina adapters onto the template DNA ends using a Beckman Coulter FXp liquid-handling robot, and performed indexing PCR.
AHE Probe Design. Following Hamilton et al. (80) and Banker et al. (81), we developed hybrid enrichment probes for Lobelia and Lilium. We mapped sequences from two assembled transcriptomes—Lobelia siphilitica (from E. Carpenter) and Lilium superbum (from J. Leebens-Mack and C. dePamphi- lis)—to probe sequences from 27 references of the Angiosperm V1 AHE design (82, 83). Mapped transcriptome sequences were aligned to Angiosperm V1 reference sequences using MAFFT v7.023b (84). We used Geneious R9 (85) to visually inspect alignments, remove transcriptome sequences that were not clearly homologous, and trim transcriptome sequences to exons represented by the 27 reference sequences. Probes were tiled uniformly at 2.7× density. This probe set (Angiosperm V2 AHE) design contains 29 references (57,471 probes), including transcriptomes representing Lobelia and Lilium.
To improve phylogenetic resolution within Calochortus, we utilized data from two more species and expanded the targets into regions flanking the exons in Angiosperm V2 AHE. We first collected whole-genome sequence data (Illumina paired-end 200-bp protocol) for Calochortus albus (160M reads) and C. flexuo- sus (536M reads). We then used methods and scripts from Banker et al. (83) to identify loci and design probes. After merging overlapping reads following (86), we mapped merged reads to probe sequences from Angiosperm V2 AHE using Liliaceae as a reference. After extending consensus sequences into flanking regions using iterative mapping (80, 81), we aligned by locus the resulting con- sensus sequences to the reference, using Geneious to visually inspect alignments and trim poorly aligned regions from alignment ends. We removed 27 of the 517 alignments to ensure target loci did not overlap. After masking repetitive regions (80), final alignments contained 375,527 sites. Tiling 120 bp probes at 4.5× density for both references produced 16,600 probes. We used this probe set (AHE Cal1) to produce an Agilent Technologies Custom SureSelect XT kit for hybrid DNA enrichment.
Library Enrichment and Sequencing. We pooled indexed libraries in groups of 16 before enriching with the probe kit just described. Before sequencing, we pooled enriched libraries and assessed quality by Kapa qPCR (Roche). We sequenced samples on an Illumina NovaSeq6000 at Florida State with a PE-150 bp protocol and 8-bp (dual) indexing. After filtering poor-quality reads with the Illumina CASAVA v1.8 high-chastity filter and demultiplexing, we obtained an average of 5.4M read pairs per sample (~1.6 Gb).
Nuclear Assemblies and Alignment. We processed reads, corrected sequenc- ing errors, and trimmed adaptors following (80, 81, 86). Assembly used a quasi-de novo approach, mapping reads to probe-region sequences for both Calochortus species in the Cal1 design. We used the resulting consensus sequences with ≥98× coverage to determine orthology (80) and then formed orthologous clus- ters and removed clusters containing <50% of the individuals, resulting in 294 loci analyzed. For each locus, we aligned sequences in Mafft v7.023 and then trimmed/masked alignments with an automated procedure (80) using default parameters (i.e., MINGOODSITES=14, PROPGOOD=0.5). We verified alignment quality by inspecting the alignments in Geneious. Upon manuscript acceptance, accession codes will be provided for raw reads in the Sequence Read Archive and sequences and alignments in GenBank.