Origin of subgenomes in the circumboreal allopolyploid carnivorous plant Drosera anglica (Droseraceae)
Data files
Mar 10, 2025 version files 61.01 MB
-
angPubRxiv.zip
61.01 MB
-
README.md
3.32 KB
Dec 01, 2025 version files 61.97 MB
-
Drosera_anglica_Dryad_Nov2025.zip
61.97 MB
-
README.md
4.31 KB
Abstract
Premise of Study
The parentage of one of the most widespread members of the carnivorous sundew genus Drosera, the enigmatic allopolyploid Drosera anglica, remains uncertain despite over 100 years of morphological, cytological, and more recently, molecular study.
Methods
Using transcriptomic and genomic data from 12 species and 20 populations across Drosera sect. Drosera including four D. anglica populations and a disjunct Idaho population of D. intermedia, we carried out reference-based assembly in HybPiper and phased sequences in HybPhaser. We estimated a species tree and quantified gene tree discordance; we calculated heterozygosity statistics and pairwise divergence between individuals. In addition to genome-wide analyses, we extracted and assembled rbcL and ITS reads from genome and transcriptome datasets to compare to previous Sanger sequencing data. We also generated flow cytometry data to verify the ploidy levels of D. anglica and D. rotundifolia populations.
Key Results
Sequences from phased subgenomes of D. anglica were sister to D. rotundifolia and D. linearis with high support. Both ITS and rbcL sequences of D. anglica were the most similar to D. linearis. Drosera anglica is intermediate between both parents in leaf shape and microhabitat; however, across D. sect. Drosera, neither leaf shape nor biogeographic distribution were a reliable indicator of phylogenetic relationships. Despite a range-wide sampling, we did not find evidence for multiple origins of D. anglica. Our results differed from previous parentage analyses based on chromosome pairing and Sanger sequencing with limited taxon sampling. Additionally, we found that the Idaho population previously identified as D. intermedia is D. anglica.
Conclusions
Drosera anglica arose from allopolyploidy between D. linearis (the chloroplast donor) and D. rotundifolia. Our study demonstrates the importance of taxon sampling, visualizing and careful examination of complex phylogenomic data, and presents an exemplar of detecting and analyzing allopolyploid relationships in plant lineages in general.
This repository contains data and analysis files for Mohn, RA, Yang, Y. Origin of subgenomes in the circumboreal allopolyploid carnivorous plant Drosera anglica (Droseraceae).
Drosera_anglica_Dryad_Nov2025.zip
The full dataset
Dspat_6443_refgenes.fa
The 6443 reference genes from D. spatulata used for input to HybPiper
The genes are labeled with their Drosera spatulata names as published in Palfalvi et al., 2020 Current Biology. https://doi.org/10.1016/j.cub.2020.04.051
They are in a fasta format.
The complete list of Drosera spatulata genes were obtained here: https://www.biozentrum.uni-wuerzburg.de/carnivorom/resources/
finalAlnPreCleaning
The alignments of recovered genes after phasing of D. anglica subgenomes before trimming with Phyx
This zipped folder contains a separate file for each gene, titled by the D. spatulata name. All samples with that gene are included in the gene alignment file in FASTA format.
finalConcatAlign.fa
The trimmed and concatenated alignment of genes used as input for RAxML. This is in a fasta file format.
The cleaned gene alignments were concatenated by sample to enable input into RAxML to estimate a species tree.
finalTreesPhased
The one-to-one ortholog trees from phased data used for ASTRAL tree estimation
The tree file estimated in RAxML from each gene with only one copy of a gene for each sample (or subgenome in the case of D. anglica).
Astral_finalSpTree.tre
The ASTRAL tree from phased data
A coalescence tree estimated from the gene trees in finalTreesPhased using ASTRAL
RAxML_finalConcatTree.tre
The RAxML tree from phased data
A tree estimated from the finalConcatAlign.fa file using RAxML.
DataVis_KsPlot_DistMat
A folder that contains the script, input files, and output files for everything calculated and visualized in R
D_ang_R_Dryad.R
R scripts that
- makes the map figure from GBIF data
- makes the figure of the % heterozygous loci vs. % Allele divergence
- makes the Ks plots
- calculates the pairwise distance matrix
anglica_rot_lin_inttiled.pdf
the output of the map figure script
4_Summary_table_combined.csv
the HybPhaser output reformated slightly for making the figure of heterozygous loci and allele divergence
ks_files
This folder contains the ks (ks_yn) files calculated for each sample in the Ks plots
It also contains a text document which is a match up between file name and species name
This is all for use with the D_ang_R_Dryad Ks plot visualization section.
KsPlot_Grid.pdf and KsPlot_Grid2.5.pdf
The output of the Ks Plot script with 2 different axes
1000_noamb
The cleaned fasta alignments of all genes used for calculating the pairwise distance between samples.
This is the input for calculating the distance matrix
1000_noamb_matrix
An output of the Rscript -- The pairwise distance matrices for each gene in 1000_noamb
1000_noamb_dist_mat.csv
All the gene pairwise matrices flattened into one table
Each column is a gene
Each row is the distance between two samples
1000_noamb_mean_dist_mat.csv
the mean pairwise distance across all genes
1000_noamb_median_dist_mat.csv
the median pairwise distance across all genes
rbcL_and_ITS
This folder contains the alignments for rbcL and ITS.
Full_rbcL_alignment.txt
The alignment of all rbcL samples longer than 300 bps
Final_rbcL_alignment.txt
The alignment of the final rbcL samples presented after filtering
Full_rRNA_variants.txt
The alignment of the ITS and rRNA for all samples.
rRNA_alignment.vcf
The variants called against MT784099.1 ITS from Drosera rotundifolia
FrozenScripts
Folder containing the two frozen copies of the phylogenomic_dataset_construction scripts
from https://bitbucket.org/yanglab/phylogenomic_dataset_construction/src/master/ used
in this analysis. v1 was used for the cleaning of RM207A, RM208A, RM209A, RM210A, RM214,
RM217, RM218, RM240, RM241, RM242, and for transcriptome assembly for all transcriptomes.
v2 was used for cleaning the remaining samples and for post-assembly processing.
Changes after Mar 10, 2025:
Nov 2025: Added the folder rbcL_and_ITS and frozen copies of scripts used in analyses.
