Evolution of the Sabulina verna group (Caryophyllaceae) in Europe: A deep split, followed by secondary contacts, multiple allopolyploidization and colonization of challenging substrates
Data files
Nov 14, 2023 version files 13.55 MB
Abstract
One of the major goals of contemporary evolutionary biology is to elucidate the relative roles of allopatric and ecological differentiation and polyploidy in speciation. In this study, we address the taxonomically intricate Sabulina verna group, which has a disjunct Arctic–alpine postglacial range in Europe and occupies a broad range of ecological niches, including substrates toxic to plants. Using genome-wide ddRAD sequencing combined with morphometric analyses based on extensive sampling of 111 natural populations, we aimed to disentangle internal evolutionary relationships and examine their correspondence with the pronounced edaphic and ploidy diversity within the group. We identified two spatially distinct groups of diploids: a widespread Arctic–alpine group and a spatially restricted yet diverse Balkan group. Most tetraploids exhibited a considerably admixed ancestry derived from both these groups, suggesting their allopolyploid origin. Four genetic clusters in congruence with geography and mostly supported by morphological traits were recognized in the diploid Arctic–alpine group. Tetraploids are split into two distinct and geographically vicariant groups, indicating their repeated polytopic origin. Furthermore, our results also revealed at least five-fold parallel colonization of toxic substrates (serpentine and metalliferous), altogether demonstrating a complex interaction between geography, challenging substrates, and polyploidy in the evolution of the group. Finally, we propose a new taxonomic treatment of this intricate complex.
README: Evolution of the Sabulina verna group (Caryophyllaceae) in Europe: a deep split, followed by secondary contacts, multiple allopolyploidization and colonization of challenging substrates
V.Lipanova 10.8.2023
In this study, we address the taxonomically intricate Sabulina verna group, which has a disjunct Arcticalpine postglacial range in Europe and occupies a broad range of ecological niches, including substrates toxic to plants. Using genome-wide ddRAD sequencing combined with morphometric analyses based on extensive sampling of 111 natural populations, we aimed to disentangle internal evolutionary relationships and examine their correspondence with the pronounced edaphic and ploidy diversity within the group.
Description of the data and file structure
vcf file of 111 populations of Sabulina verna
fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz
filtration pipeline:
hard filtering (according to reccomanded thresholds by GATK)
bcftools filter -i 'FS<60.0 && SOR<3 && MQ>20.0 && MQRankSum>-12.5 && QD>2.0 && ReadPosRankSum>-8.0' -Oz fylo_var.join.vcf.filtered.snp_sagout.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.vcf.gz
bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.vcf.gz | wc -l #135 350filter variants only with read depth (DP) > 6
bcftools view -i 'AVG(FMT/DP)>6' -Oz fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.vcf.gz
bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.vcf.gz | wc -l #49 728remove multiallelic SNPs and monomorphic SNPs
bcftools filter -e 'AC==0 || AC==AN' fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.vcf.gz | bcftools view -m2 -M2 -v snps -Oz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.vcf.gz
bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.vcf.gz | wc -l #37 821Remove variants with a high amount of missing genotypes
bcftools filter -e 'F_MISSING > 0.5' -Oz fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz
bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz | wc -l #34 613 (14.18%) missing data
check
bcftools stats -s - fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz > bcftools_stats_fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.txtremove individuals > 0.5 missing genotypes, individuals with average DP<4 and multiple individuals within one population (with the lowest coverage)
bcftools view -s ^M_cae_152_07,M_pau_031_0J,M_aut_156_02,M_at1_111_04,M_at1_118_06,M_at1_119_04,M_at1_121_07,M_cae_029_0E,M_cae_029_0B,M_cae_060_07,M_cae_060_09,M_cae_062_03,M_cae_063_10,M_cae_063_13,M_cae_064_08,M_cae_064_13,M_cae_074_03,M_cae_074_09,M_cae_075_07,M_cae_075_09,M_cae_076_12,M_cae_076_04,M_cor_151_03,M_cor_151_06,M_ger_095_01,M_ger_096_02,M_ger_096_06,M_ger_126_01,M_ger_126_05,M_ger_127_01,M_ger_127_08,M_ger_128_01,M_ger_128_02,M_ger_129_06,M_ger_140_0C,M_ger_140_0D,M_sme_153_01,M_sme_153_04,M_sme_154_01,M_sme_154_11,M_sme_155_02,M_tyA_093_07,M_ver_077_06,M_ver_077_12,M_ver_142_08 -Oz fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel.vcf.gz
13.26% missing dataafter removing the individuals remove again multiallelic SNPs and monomorphic SNPs
bcftools filter -e 'AC==0 || AC==AN' fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel.vcf.gz | bcftools view -m2 -M2 -v snps -Oz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz
bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz | wc -l #31225
final check
bcftools stats -s - fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz > bcftools_stats_fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.txt
Sharing/Access information
Manuscript submitted to Molecular Phylogenetics and Evolution; links to GenBank for fastq files available in the manuscript.