Skip to main content
Dryad

Evolution of the Sabulina verna group (Caryophyllaceae) in Europe: A deep split, followed by secondary contacts, multiple allopolyploidization and colonization of challenging substrates

Cite this dataset

Lipánová, Veronika et al. (2023). Evolution of the Sabulina verna group (Caryophyllaceae) in Europe: A deep split, followed by secondary contacts, multiple allopolyploidization and colonization of challenging substrates [Dataset]. Dryad. https://doi.org/10.5061/dryad.0p2ngf26b

Abstract

One of the major goals of contemporary evolutionary biology is to elucidate the relative roles of allopatric and ecological differentiation and polyploidy in speciation. In this study, we address the taxonomically intricate Sabulina verna group, which has a disjunct Arctic–alpine postglacial range in Europe and occupies a broad range of ecological niches, including substrates toxic to plants. Using genome-wide ddRAD sequencing combined with morphometric analyses based on extensive sampling of 111 natural populations, we aimed to disentangle internal evolutionary relationships and examine their correspondence with the pronounced edaphic and ploidy diversity within the group. We identified two spatially distinct groups of diploids: a widespread Arctic–alpine group and a spatially restricted yet diverse Balkan group. Most tetraploids exhibited a considerably admixed ancestry derived from both these groups, suggesting their allopolyploid origin. Four genetic clusters in congruence with geography and mostly supported by morphological traits were recognized in the diploid Arctic–alpine group. Tetraploids are split into two distinct and geographically vicariant groups, indicating their repeated polytopic origin. Furthermore, our results also revealed at least five-fold parallel colonization of toxic substrates (serpentine and metalliferous), altogether demonstrating a complex interaction between geography, challenging substrates, and polyploidy in the evolution of the group. Finally, we propose a new taxonomic treatment of this intricate complex.

README: Evolution of the Sabulina verna group (Caryophyllaceae) in Europe: a deep split, followed by secondary contacts, multiple allopolyploidization and colonization of challenging substrates

V.Lipanova 10.8.2023


In this study, we address the taxonomically intricate Sabulina verna group, which has a disjunct Arcticalpine postglacial range in Europe and occupies a broad range of ecological niches, including substrates toxic to plants. Using genome-wide ddRAD sequencing combined with morphometric analyses based on extensive sampling of 111 natural populations, we aimed to disentangle internal evolutionary relationships and examine their correspondence with the pronounced edaphic and ploidy diversity within the group.

Description of the data and file structure

vcf file of 111 populations of Sabulina verna

fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz

filtration pipeline:

  1. hard filtering (according to reccomanded thresholds by GATK)
    bcftools filter -i 'FS<60.0 && SOR<3 && MQ>20.0 && MQRankSum>-12.5 && QD>2.0 && ReadPosRankSum>-8.0' -Oz fylo_var.join.vcf.filtered.snp_sagout.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.vcf.gz
    bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.vcf.gz | wc -l #135 350

  2. filter variants only with read depth (DP) > 6
    bcftools view -i 'AVG(FMT/DP)>6' -Oz fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.vcf.gz
    bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.vcf.gz | wc -l #49 728

  3. remove multiallelic SNPs and monomorphic SNPs
    bcftools filter -e 'AC==0 || AC==AN' fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.vcf.gz | bcftools view -m2 -M2 -v snps -Oz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.vcf.gz
    bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.vcf.gz | wc -l #37 821

  4. Remove variants with a high amount of missing genotypes
    bcftools filter -e 'F_MISSING > 0.5' -Oz fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz
    bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz | wc -l #34 613 (14.18%) missing data
    check
    bcftools stats -s - fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz > bcftools_stats_fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.txt

  5. remove individuals > 0.5 missing genotypes, individuals with average DP<4 and multiple individuals within one population (with the lowest coverage)
    bcftools view -s ^M_cae_152_07,M_pau_031_0J,M_aut_156_02,M_at1_111_04,M_at1_118_06,M_at1_119_04,M_at1_121_07,M_cae_029_0E,M_cae_029_0B,M_cae_060_07,M_cae_060_09,M_cae_062_03,M_cae_063_10,M_cae_063_13,M_cae_064_08,M_cae_064_13,M_cae_074_03,M_cae_074_09,M_cae_075_07,M_cae_075_09,M_cae_076_12,M_cae_076_04,M_cor_151_03,M_cor_151_06,M_ger_095_01,M_ger_096_02,M_ger_096_06,M_ger_126_01,M_ger_126_05,M_ger_127_01,M_ger_127_08,M_ger_128_01,M_ger_128_02,M_ger_129_06,M_ger_140_0C,M_ger_140_0D,M_sme_153_01,M_sme_153_04,M_sme_154_01,M_sme_154_11,M_sme_155_02,M_tyA_093_07,M_ver_077_06,M_ver_077_12,M_ver_142_08 -Oz fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5.vcf.gz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel.vcf.gz
    13.26% missing data

  6. after removing the individuals remove again multiallelic SNPs and monomorphic SNPs
    bcftools filter -e 'AC==0 || AC==AN' fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel.vcf.gz | bcftools view -m2 -M2 -v snps -Oz -o fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz
    bcftools view -H fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz | wc -l #31225

final check
bcftools stats -s - fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.vcf.gz > bcftools_stats_fylo_var.join.vcf.filtered.snp_sagout.hardfilteredMQ20.DP6.var.biallelic.permiss0.5_sel_var.txt

Sharing/Access information

Manuscript submitted to Molecular Phylogenetics and Evolution; links to GenBank for fastq files available in the manuscript.