Genetic factors predict hybrid formation in the British flora
Data files
May 22, 2023 version files 53.01 MB
-
Combined_phylogeny.txt
72.94 KB
-
ITS_alignment.txt
50.77 MB
-
Plastid_alignment.txt
2.17 MB
-
README.md
1.06 KB
Abstract
Natural hybridization can have a profound evolutionary impact, with consequences ranging from the extinction of rare taxa to the origin of new species. Natural hybridization is particularly common in plants; however, our understanding of the general factors that promote or prevent hybridization is hampered by the highly variable outcomes in different lineages. Here, we quantify the influence of different predictors on hybrid formation across species from an entire flora. We combine estimates of hybridization with ecological attributes and a new species-level phylogeny for over 1,100 UK flowering plant species. Our results show that genetic factors, particularly parental genetic distance, as well as phylogenetic position and ploidy, are key determinants of hybrid formation, whereas many other factors such as range overlap and genus size explain much less variation in hybrid formation. Overall, intrinsic genetic factors shape the evolutionary and ecological consequences of natural hybridization across species in a flora.
Methods
We estimated phylogenetic relationships from the Barcode UK dataset, which includes a three locus DNA barcode of rbcL, matK, and ITS2 for native flowering plant species. Due to the different sequence diversity and alignment success of plastid and nuclear ribosomal DNA across flowering plants, we used a single alignment of plastid sequences to infer relationships between all taxa, while nuclear ribosomal ITS2 was aligned separately for each genus and only used to infer congeneric relationships. Plastid DNA was aligned for all taxa using Geneious, while ITS2 was aligned by genus, padded with Ns, and gapped using the program catfasta2phyml. Phylogenetic inferences were made using IQ-TREE in an analysis with three partitions allowing models of molecular evolution to differ between loci, and including a multifurcating constraint tree based on Angiosperm Phylogeny Group IV (APGIV) relationships generated with Phylomatic. Tree support was estimated using 1000 ultrafast bootstrap replicates. The phylogeny was dated using treePL, using calibration with 30 well-spread phylogenetically assigned fossils across the flowering plant phylogeny.
Tree-based genetic distances from the combined sequence alignment of ITS2 and plastid data were inferred using the R function cophenetic.phylo() from the package ape while separate pairwise distances for ITS2 and plastid DNA were calculated with the R function dist.alignment() from the seqinr package. The resulting distances (either tree-based distances or pairwise distances) were the square root of pairwise distances. Tree manipulation took place in R, with the circular plot made with the R package circlize; the phylogeny was coerced into a circular dendrogram for visualization. Other plots were generated with the R package ggplot2 and lattice. All other data manipulation took place in R version 3.6.1 using base R, and packages data.table and dplyr.