Hybridization and polyploidy shaped the evolutionary history of a complex of cryptic species in European woodrushes (Luzula sect. Luzula)
Data files
Oct 02, 2025 version files 10.07 GB
-
01.Diploids.zip
165.55 MB
-
02.MixedPloidy.zip
9.87 GB
-
03.Tetraploids.zip
33.01 MB
-
README.md
37.57 KB
Abstract
Luzula sect. Luzula (Juncaceae) is a taxonomically intricate group characterized by widespread polyploidy, agmatoploidy, and high morphological similarity. Focusing on the Eastern Alps, a key center of its diversity, we collected 1,002 samples of nine species and applied an integrative framework combining ddRADseq, plastid sequencing, relative genome size estimation, and chromosome counting to disentangle its evolutionary history. We first reconstructed phylogenetic relationships and assessed gene flow among diploids (dataset 01.Diploids), establishing a baseline for investigating the origin of polyploids. By analyzing patterns of genotype frequencies (dataset 03.Tetraploids) and genetic affinities to diploids (dataset 02.MixedPloidy), we inferred the most likely parental species of polyploids and identified key hybridization events shaping the current taxonomic and karyotypic diversity within this group. Our results reveal weak genetic differentiation among some diploid lineages, likely reflecting gene flow and incomplete lineage sorting. We propose a common allopolyploid origin of two tetraploids, which subsequently gave rise to a third tetraploid and a hexaploid species through interploidy hybridization. Although the parental species of some polyploids remain obscure, our genomic data highlight polyploidy and hybridization as major drivers of speciation in this poorly understood lineage. This study underscores the value of integrative approaches in resolving reticulate plant phylogenies and advances our understanding of polyploid speciation.
Data archive for: Hybridization and Polyploidy shaped the Evolutionary History of a Complex of Cryptic Species in European Woodrushes (Luzula sect. Luzula)
Authors: Valentin Heimer, Pau Carnicero, Carolina Carrizo García, Andreas Hilpold, Jasna Dolenc Koce, J. Luis Leal, Mingai Li, Claudio Varotto, Peter Schönswetter, Božo Frajman
Year: 2025
Contact: Valentin Heimer, valentin.heimer@uibk.ac.at
This data archive contains genomic data derived from ddRADseq and Sanger sequencing of plastid regions. For each analysis, the most relevant input and results files that should allow replication of the workflow are included. Scripts used for these analyses are organized in a corresponding structure available on Zenodo (https://doi.org/10.5281/zenodo.15719018). Demultiplexed ddRADseq reads are available from NCBI under BioProjects PRJNA1313421 and PRJNA1225458.
In the present work, we analyze a large-scale (n = 1,002) genomic dataset of European species of Luzula sect. Luzula to disentangle their evolutionary history. We first reconstructed phylogenetic relationships and assessed gene flow among diploids, establishing a baseline for investigating the origin of polyploids. By analyzing patterns of genotype frequencies and genetic affinities to diploids, we then inferred the most likely parental species of polyploids.
Description of the Data and file structure
Summary of data
Data in this repository are structured in three folders (01.Diploids.zip, 02.MixedPloidy.zip, 03.Tetraploids.zip) corresponding to the different datasets analyzed in the manuscript. Each of these folders comprises separate subfolders for every analysis performed, which contain the most relevant input and results files.
Usage notes
The different data types included in this repository are listed below, along with brief descriptions for how to work with them.
Descriptions of the FASTA and VCF file formats can be found on the SAMtools file-format specifications page.
.bed: Plain text files containing genomic coordinates in BED format. Various genomic analyses software
(e.g., BCFtools, VCFtools, SAMtools) can use BED format files to include or exclude specific genomic regions in analyses. BEDtools is a useful program for working with BED format files. Coordinates in BED files are 0-based and half-open.
.csv: A CSV (comma-separated values) file is a plain text file used to store tabular data, such as spreadsheets or databases, with each row representing a record and each value within a row separated by a comma. Can be opened with software such as LibreOffice Calc, OpenOffice Calc, Microsoft Excel or imported into R using the read.csv function.
.fa: DNA sequence data stored in the plain-text FASTA format.
.input: Plain text file containg the input for STRUCTURE analyses. The first column lists the individual ID and subsequent columns represent differnet genomic positions. Each individual is represented by one line per haplotype and alleles are shown as 0 or 1. Missing data is encoded as -9. These files can be edited in any text editor.
.list: Simple text file used by GATK that defines a list of genomic intervals (e.g., chromosomes or specific regions) that can be used to specify a subset of data for analysis, parallelization, or excluding certain regions. They usually follow the format <chr>:<start>-<stop> and can be viewed and edited in any text editor, such as vim or nano. More detailed information on the format is provided by the Broad Institute.
.log: Computer-generated text files that record all activities, operations, errors, and events that occur within a system or application. In this context, they were produced by the software SNAPP and document properties of the Markov-Chain-Monte-Carlo (MCMC) process, including effective sample size and mixing of chains. They are used to assess convergence of chains and sufficient sampling of the posterior parameter space. Log files produced by SNAPP can be analyzed in Tracer.
.nex: A Nexus file is a modular, extensible data format used primarily in phylogenetics. Nexus files always begin with a fixed header #NEXUS followed by multiple blocks. Each block starts with BEGIN block_name; and ends with END;. Blocks can contain taxa names, genomic sequnces, phylogenetic trees, distances, character sets and more. Nexus files can be edited in any text editor (e.g., Linux less command, Nano, Vim, or Text Editor).
.newick: NEWICK is a text-based format for representing phylogenetic trees in computer-readable form using (nested) parentheses and commas. The phylogenetic tree is represented in a single line, starting with > in the first column and a tree-recognition string, (e.g.,'Tree'), followed by nested parentheses describing the relations between the species represented in the tree. NEWICK files can be edited in any texteditor and graphically displayed in software like FigTree, TreeViewer, and drawtree.
.phy: PHYLIP format is a plain text format containing for multiple sequence alignments. It contains two sections: a single-line header describing the dimensions of the alignment, followed by the multiple sequence alignment itself. The header contains two integers (n and m) separated by one or more spaces. The first integer (n) specifies the number of sequences (i.e., the number of rows) in the alignment. The second integer (m) specifies the length of the sequences (i.e., the number of columns) in the alignment. PHYLIP files can be viewed and edited in software like AliView and MEGA.
.tbi: Tabix index file are binary files created from a compressed and position-sorted VCF file (.vcf.gz). It provides fast, region-based access to the data within the large VCF file, allowing users to quickly locate specific variants on a genomic region without decompressing the entire file. Tabix index files can be created using software like bcftools, tabix and GATK.
.treefile: The maximum-likelihood tree produced by IQ-TREE in NEWICK format, which can be visualized by any supported tree viewer programs like FigTree or iTOL.
.tsv: Tab-separated values files, a type of plain text file where fields are separated by tab characters. They can be opened with any text editor or read into R using the read.table function with the argument sep="\t".
.txt: Plain text files, readable or editable with any text editor (e.g., Linux less command, Nano, Vim, or Text Editor).
.vcf.gz: Compressed Variant Call Format (VCF) files, generated using bgzip. These contain detailed information about genetic variants, including their position relative to the reference genome, quality metrics, and individual genotype data. They can be examined, modified, and analyzed with tools such as BCFtools. Header lines in these files begin with #.
.xml: XML files define the data, models, and parameters for an analysis performed by the BEAST software (Bayesian Evolutionary Analysis by Sampling Trees). This XML file serves as the input for the BEAST program, which uses Markov chain Monte Carlo (MCMC) methods to perform phylogenetic analyses. Here, they describe the input data and paramters for a SNAPP analyses, implemented in BEAST. They can be opened and edited using any text editor and graphically displayed in BEAUti.
01.Diploids.zip
01.IQTREE
LUZALP_Core_Diploids_clean_R50_minDP8_R50_mac3.min4.phy.varsites.phy: Alignment used to build a maximum likelihood phylogenetic tree in IQ-TREE2. The file contains genotypes for diploid Luzula accessions produced in STACKS that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 8 and a minimum minor allele count (MAC) of 3 that are present in at least 50% of samples (R50). The vcf file was converted to phylip format using the script vcf2phylip.py.
LUZALP_Core_Diploids_clean_R50_minDP8_R50_mac3.min4.phy.varsites.phy.treefile: Best scoring maximum likelihood tree based on 27,664 SNPs produced in IQ-TREE2 under the TVM+F+ASC+R6 substitution model using 1000 ultrafast bootstrap replicates, ascertainment bias correction, and correction for overestimating node support.
02.SNAPP
SNAPP input files (.xml), log files (.log) and resulting trees (.trees) for four independent runs. Each run comprised 3,000,000 generations and a tree was saved every 1,000th generation. Input files were generated in BEAUti from a vcf file containing one SNP per RAD locus present in at least 80% of diploid Luzula accessions. The four runs were combined in LogCombiner v.2.7.7, discarding 10% of trees as burnin, resulting in a final species tree file (SNAPP_all_runs_combined_burnin10.tre).
03.STRUCTURE
populations.structure.input: Input for the STRUCTURE analysis, containing one random SNP per RAD locus present in at leat 70% of diploid Luzula accessions with a minor allele count of 3, exported in STRUCTURE format from the STACKS catalog using the program populations.
out_STR.zip: Results files of the STRUCTURE analysis based on 2,948 SNPs, which was run with the admixture model for 1,000,000 MCMC generations with 100,000 generations as burnin for K (number of groups) ranging from 1 to 12 with 10 replicates each. Can be upload to CLUMPAK for averaging across replicates and a summary of results.
04.Dsuite
Input files for the Dsuite introgression analysis of diploids (Supplementary Table S6):
LUZALP_Core_Diploids_clean_R50_minDP8_R50.vcf.gz: VCF file containing SNPs of diploid Luzula accessions called in STACKS and filtered to retain only sites with a minimum genotype read depth (minDP) of 8 that are present in at least 50% of samples (R50). Main input data for Dsuite.
DIPLOIDS_SNAPP_species_tree_R80.nwk: Species tree of the SNAPP analysis that provides the tree topology for computing f-branch statistics.
species_order.txt: Text file specifying the order of species for plotting of f-branch statistics.
Dsuite_samples_speciesmap.txt: Tab separated text file mapping each accession to a species.
02.MixedPloidy.zip
01.IQTREE
LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R50_mac3.min4.phy.varsites.phy: Alignment used to build a maximum likelihood phylogenetic tree in IQ-TREE2. The file contains genotypes for all Luzula accessions produced in GATK that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 8 for diploids, 30 for tetraploids and 40 for hexaploids, and a maximum genotype read depth (maxDP) of 200, and a minimum minor allele count (MAC) of 3 that are present in at least 50% of samples (R50). The vcf file was converted to phylip format using the script vcf2phylip.py.
LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R50_mac3.min4.phy.varsites.phy.treefile: Best scoring maximum likelihood tree based on 9,112 SNPs produced in IQ-TREE2 under the SYM+ASC+R8 substitution model using 1000 ultrafast bootstrap replicates, ascertainment bias correction, and correction for overestimating node support.
02.Dsuite
Input files for the Dsuite introgression analysis of all samples (Supplementary Table S7):
LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R50.vcf.gz: VCF file containing SNPs of Luzula accessions called in GATK that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 8 for diploids, 30 for tetraploids and 40 for hexaploids, and a maximum genotype read depth (maxDP) of 200, which are present in at least 50% of samples (R50). Main input data for Dsuite.
IQTREE.tree.nwk: Maximum likelihood tree computed in IQ-TREE 2 that provides the tree topology for computing f-branch statistics.
species_order.txt: Text file specifying the order of species for plotting of f-branch statistics.
Dsuite_samples_speciesmap.txt: Tab separated text file mapping each accession to a species.
03.STRUCTURE
STRUCTURE input and results for the analysis of the mixed-ploidy dataset, either containing all samples (All_samples) or a subset of equal sample size per species (Balanced_subset). The respective samples included in the analysis and their species identity are listed in STRUCTURE_SAMPLES_ALL_SPECIES.txt and STRUCTURE_SAMPLES_SUBSET_SPECIES.txt. In both cases, the analysis is based on SNPs of Luzula accessions called in GATK that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 8 for diploids, 30 for tetraploids and 40 for hexaploids, and a maximum genotype read depth (maxDP) of 200, which have a mimum minor allele count (MAC) of 3 and are present in at least 70% of samples (R70). SNPs were then linkage pruned using bcftools (bcftools +prune -m 0.2 -w 1000) and converted to STRUCTURE format using the script vcf_to_structure_hexa.py. Data are stored in the files LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG.vcf.gz for the full dataset and LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_R70_SUBSET.vcf.gz for the subsampled dataset. The out_STR.zip archives contain the results of the STRUCTURE analysis, which was run with the admixture model for 1,000,000 MCMC generations with 100,000 generations as burnin for K (number of groups) ranging from 1 to 12 with 10 replicates each. The .zip files be upload to CLUMPAK for averaging across replicates and a summary of results.
04.Polyrelatedness
LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG_R50.vcf.gz: VCF file containing SNPs of Luzula accessions called in GATK that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 8 for diploids, 30 for tetraploids and 40 for hexaploids, and a maximum genotype read depth (maxDP) of 200, which are present in at least 50% of samples (R50). Main data input for Polyrelatedness.
PolyRel_all.txt: Plain text file of the results of the polyrelatedness analysis, using the software's method-of-moment estimator of relatedness.
rel_violin.csv: Results of the polyrelatedness analysis with population and species identifiers added. Used for visualization.
05.Treemix
LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R80.vcf.gz: VCF file containing SNPs of Luzula accessions called in GATK that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 8 for diploids, 30 for tetraploids and 40 for hexaploids, and a maximum genotype read depth (maxDP) of 200, which are present in at least 80% of samples (R80). Main input data for Treemix.
treemix_input.table.gz: Input data in TreeMix format generated from the VCF file above using a python script.
Treemix_samples.txt: List of samples used in the TreeMix analysis.
Treemix_speciesmap.txt: Tab separated text file mapping each accession to a species.
Treemix_consensus_constree.newick: Consensus tree produced by TreeMix in newick format.
MigrationStats.txt: Inferred gene flow events from the TreeMix analysis, showing the species pairs, (adjusted) mean migration weigths, maximum p values and number of independent runs supporting this migration event.
06.Genomic_polarization
PART_1: Data filtering
00_mergevcf_AllSamples_RAW_NOPL-vcfs.list: List of all samples used in the analysis, containing the file names of individuals VCF files with the ploidy column removed for merging.
00_mergevcf_AllSamples_RAW-vcfs.list: List of all samples used in the analysis, containing the file names of individuals VCF files including the ploidy column removed for merging.
00_sampleList_diploid.txt: List of diploid samples used in the analysis.
00_sampleList_MERGED.txt: List of VCF files after merging of sample-level VCFs. Contains the raw merged VCF, the merged VCF after filtering for genotype depth, and the VCF after filtering to retain only sites called in at least 25% of samples.
00_sampleList_hexaploid.txt: List of hexaploid samples used in the analysis.
00_sampleList_tetraploid.txt: List of tetraploid samples used in the analysis.
00_sampleList.txt: List of all samples used in the analysis.
Glean_number_variants_03_after_SelectVariants_SINGLE_OUT.txt: Number of SNPs and indels for each sample before filtering.
Glean_number_variants_05_after_fix_invariant_sites_SINGLE_OUT.txt: Number of SNPs and for each sample after fixing invariant sites.
Glean_number_variants_08_after_merge_vcfs.txt: Total number of SNPs in the merged VCF containing all samples; before and after filtering to retain only sites that have been called in at least 25% of samples.
Glean_number_variants_12_FilterVariants_HF_SINGLE.txt: Number of SNPs and indels for each sample after filtering for quality and read depth.
Glean_number_variants_15_glean_VCFstats_single-VCF.txt: Total numbers of SNPs and indels for each sample; proportions of hetero- and homozygous sites, proportion of sites that failed quality filtering, average read depth across all sites and across sites that passed quality filtering.
pseudo_annotation.bed: Pseudo annotation file containing genomic regions defined by dividing each of the six chromosomes of the reference genome into 20 regions with an equal number of SNPs.
pseudo_annotation_CLEAN.bed: The same as pseudo_annotation.bed but without column names.
raw_vcf/: Folder containing raw VCF files and their corresponding indices for each sample. VCF files were produced in GATK by performing genotype calling for each individual separately and retaining variant and invariant sites.
PART_2: Generation of multiple sequence alignments and polarization
P01_consensus_seq
00_sampleList.txt: List of samples included in the analysis.
summary_IUPAC_fasta.txt: List: Summary of IUPAC nucleotide and ambiguity code counts for each sample after producing a consensus fasta file from the VCF file using bcftools consensus.
P03_MS_alignment
Folder containing multiple-sequence alignments (MSAs) prior to polarization for different datasets testing the origin of L. alpina (ALP), L. divulgata (DIV) and tetraploid L. multiflora (MUL_4x). Individual fasta files were produced from VCF files using bcftools consensus, aligned across species and then divided into individual fasta files for each genomic region specified in pseudo_annotation.bed.
P04_polarize
Results of genomic polarization for different datasets testing the origin of L. alpina (ALP_DIPLOID), L. divulgata (DIV_DIPLOID) and tetraploid L. multiflora (MUL_DIPLOID).
polarized_MSAs/: Folder containing polarized MSAs for each dataset for four iterations. The sample used as reference sequence for polarization is indicated in the folder name (e.g., IT1_LCAM_11371_01). Polarized MSAs were produced using the script 01s_polarizeTETRA.py. Polarized sequences are produced by masking variants that are identical between the polyploid and the reference sequence and thus contain only the fraction of the polyploid genome deviating from the reference.
locus_trees/: Folder containing maximum likelihood trees produced in IQ-TREE 2 for each locus from polarized MSAs for each dataset for four iterations. The sample used as reference sequence for polarization is indicated in the folder name (e.g., IT1_LCAM_11371_01). These maximum likelihood trees (.iqtree files) were then in the next step used as input for ASTRAL to compute a "locus tree" for each genomic region.
ASTRAL/: Folder containing maximum likelihood trees produced in IQ-TREE 2 for each locus from polarized MSAs for each dataset for four iterations in Newick format. These served as input for computing a "locus tree" for each genomic region in ASTRAL. Node support for these locus trees was computed using bootstrapping or quartet stores (ASTRAL.genes_trimN-ALT.fasta_UFBoot_MFP_ModelFinder-BS20.newick and ASTRAL.genes_trimN-ALT.fasta_UFBoot_MFP_ModelFinder-BS20.quartet.t8.newick, respectively). The sample used as reference sequence for polarization is indicated in the folder name (e.g., IT1_LCAM_11371_01). The frequency of pairing of the polyploid with every other species included in the analysis was then computed across locus trees using the script 07s_Polyploid_pairing_analysis.R and results are stored in sister_ID_analysis_GENE_TREES.txt.
07.Plastid_trees
V1_V5_combined_gapcoding.nex: Concatenated alignment of V1 and V5 plastid regions in Nexus format. Indels are coded as morphological data (0 and 1) and missing data are represented by "?".
V1_V5_combined_no_gapcoding.nex: Concatenated alignment of V1 and V5 plastid regions in Nexus format without indel coding.
V1.phy.MrAIC.txt: Result of model testing in MrAIC for determining the best substition model (F81) for the V1 plastid region.
V5.phy.MrAIC.txt: Result of model testing in MrAIC for determining the best substition model (JC69) for the V5 plastid region.
IQ-TREE
This folder contains alignments of plastid DNA used for inferring a maximum likelihood phylogenetic tree in IQ-TREE 2.
Core_V1_V5_combined_gc_FINAL_iqtree_indels.nex: Nexus file containing aligned indel sequences from both V1 and V5 plastid regions. Indels are encoded as morphological data (0 and 1) and missing data are represented by "?".
Core_V1_V5_combined_gc_FINAL_iqtree.nex: Nexus file containing aligned DNA sequences from both V1 and V5 plastid regions. Missing data are represented by "?".
Core_V1_V5_combined_gc_FINAL_iqtree_partitions.nex: Partinions file defining the different plastid DNA regions (V1 and V5) and indels for the IQ-TREE 2 analysis.
Core_V1_V5_combined_gc_FINAL_iqtree_partitions.nex.treefile: Best scoring maximum likelihood tree based on the partitioned plastid DNA alignment (V1 and V5, including indels as morphological characters) produced in IQ-TREE2 under the F81 (V1), JC69 (V5) and JC2 (indels) substitution models using 1000 ultrafast bootstrap replicates and correction for overestimating node support.
MrBayes
This folder contains a concatenated alignment of plastid DNA used for inferring a Bayesian phylogenetic tree in MrBayes.
Core_V1_V5_combined_gc_FINAL.nex: Partitioned alignment of V1 and V5 plastid regions used as input for MrBayes. Indels are coded as binary data (0 and 1) and missing data are represented by "?".
Luzula_cp_gap.con.tre: Bayesian consensus phylogenetic tree inferred in MrBayes. Four independent MCMC chains were run for 10,000,000 generations each. Trees were sampled every 1,000th generation using default priors and a burnin of 1,001 trees were discarded for each run. Substitution models were F81 (V1) and JC69 (V5).
TCS
CORE_V1_V5_FINAL_noOG.nex: Concatenated alignment of V1 and V5 plastid regions in Nexus format without the outgroup. Indels are coded as morphological data (0 and 1) and missing data are represented by "?". Used for computing haplotype networks in TCS.
03.Tetraploids.zip
02.GTFrequencies
This folder contains VCF files and their index files that were used to infer the mode of inheritance (di- vs. polysomic) of tetraploids by plotting genotype- vs allele frequencies. The analysis was performed on population level for alpine tetraplois (ALP_MUL(4x)) and L. divulgata (DIV). In both cases, VCF files contain genotypes for the respective tetraploid species produced in GATK that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 30 and a maximum genotype read depth (maxDP) of 200 that are present in at least 70% of samples (R70).
03.STRUCTURE
LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_Lalp_mul_R70_MAC3_var_LD.vcf.gz: VCF file containing SNPs of alpine tetraploid Luzula accessions called in GATK that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 30 and a maximum genotype read depth (maxDP) of 200, which have a mimum minor allele count (MAC) of 3 and are present in at least 70% of samples (R70). SNPs were then linkage pruned using bcftools (bcftools +prune -m 0.2 -w 1000).
LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_Lalp_mul_R70_MAC3_var_LD.structure.input: STRUCTURE input file produced from the VCF file above using the script vcf_to_structure.py.
out_STR.zip: Results of the STRUCTURE analysis, which was run with the admixture model for 1,000,000 MCMC generations with 100,000 generations as burnin for K (number of groups) ranging from 1 to 10 with 10 replicates each. The .zip files be upload to CLUMPAK for averaging across replicates and a summary of results.
Schema of the data archive
├── 01.Diploids.zip
│ ├── 01.IQTREE
│ │ ├── LUZALP_Core_Diploids_clean_R50_minDP8_R50_mac3.min4.phy.varsites.phy
│ │ └── LUZALP_Core_Diploids_clean_R50_minDP8_R50_mac3.min4.phy.varsites.phy.treefile
│ ├── 02.SNAPP
│ │ ├── SNAPP_all_runs_combined_burnin10.tre
│ │ ├── SNAPP_all_runs_combined.trees
│ │ ├── snapp_run1.log
│ │ ├── SNAPP_run1.trees
│ │ ├── SNAPP_run1.xml
│ │ ├── snapp_run2.log
│ │ ├── SNAPP_run2.trees
│ │ ├── SNAPP_run2.xml
│ │ ├── snapp_run3.log
│ │ ├── SNAPP_run3.trees
│ │ ├── SNAPP_run3.xml
│ │ ├── snapp_run4.log
│ │ ├── SNAPP_run4.trees
│ │ └── SNAPP_run4.xml
│ ├── 03.STRUCTURE
│ │ ├── out_STR.zip
│ │ └── populations.structure.input
│ └── 04.Dsuite
│ ├── DIPLOIDS_SNAPP_species_tree_R80.nwk
│ ├── Dsuite_samples_speciesmap.txt
│ ├── LUZALP_Core_Diploids_clean_R50_minDP8_R50.vcf.gz
│ └── species_order.txt
├── 02.MixedPloidy.zip
│ ├── 01.IQTREE
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R50_mac3.min4.phy.varsites.phy
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R50_mac3.min4.phy.varsites.phy.treefile
│ ├── 02.Dsuite
│ │ ├── Dsuite_samples_speciesmap.txt
│ │ ├── IQTREE.tree.nwk
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R50.vcf.gz
│ │ └── species_order.txt
│ ├── 03.STRUCTURE
│ │ ├── All_samples
│ │ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG_R70_MAC3_var_LD.structure.input
│ │ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG.vcf.gz
│ │ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG.vcf.gz.tbi
│ │ │ └── out_STR.zip
│ │ └── Balanced_subset
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_R70_SUBSET.vcf.gz
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_R70_SUBSET.vcf.gz.tbi
│ │ ├── out_STR.zip
│ │ ├── STRUCTURE_SAMPLES_ALL_SPECIES.txt
│ │ └── STRUCTURE_SAMPLES_SUBSET_SPECIES.txt
│ ├── 04.Polyrelatedness
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG_R50.vcf.gz
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG_R50.vcf.gz.tbi
│ │ ├── PolyRel_all.txt
│ │ └── rel_violin.csv
│ ├── 05.Treemix
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R80.vcf.gz
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_R80.vcf.gz.tbi
│ │ ├── MigrationStats.txt
│ │ ├── Treemix_consensus_constree.newick
│ │ ├── treemix_input.table.gz
│ │ ├── Treemix_samples.txt
│ │ └── Treemix_speciesmap.txt
│ ├── 06.Genomic_polarization
│ │ ├── PART_1
│ │ │ ├── 00_mergevcf_AllSamples_RAW_NOPL-vcfs.list
│ │ │ ├── 00_mergevcf_AllSamples_RAW-vcfs.list
│ │ │ ├── 00_sampleList_diploid.txt
│ │ │ ├── 00_sampleList_hexaploid.txt
│ │ │ ├── 00_sampleList_MERGED.txt
│ │ │ ├── 00_sampleList_tetraploid.txt
│ │ │ ├── 00_sampleList.txt
│ │ │ ├── Glean_number_variants_03_after_SelectVariants_SINGLE_OUT.txt
│ │ │ ├── Glean_number_variants_05_after_fix_invariant_sites_SINGLE_OUT.txt
│ │ │ ├── Glean_number_variants_08_after_merge_vcfs.txt
│ │ │ ├── Glean_number_variants_12_FilterVariants_HF_SINGLE.txt
│ │ │ ├── Glean_number_variants_15_glean_VCFstats_single-VCF.txt
│ │ │ ├── pseudo_annotation.bed
│ │ │ ├── pseudo_annotation_CLEAN.bed
│ │ │ └── raw_vcf
│ │ │ ├── Lalp_20782_01.vcf.gz
│ │ │ ├── Lalp_20782_01.vcf.gz.tbi
│ │ │ ├── Lalp_22262_05.vcf.gz
│ │ │ ├── Lalp_22262_05.vcf.gz.tbi
│ │ │ ├── Lcam_11371_01.vcf.gz
│ │ │ ├── Lcam_11371_01.vcf.gz.tbi
│ │ │ ├── Ldif_11841_04.vcf.gz
│ │ │ ├── Ldif_11841_04.vcf.gz.tbi
│ │ │ ├── Ldiv_12511_02.vcf.gz
│ │ │ ├── Ldiv_12511_02.vcf.gz.tbi
│ │ │ ├── Lexs_21921_01.vcf.gz
│ │ │ ├── Lexs_21921_01.vcf.gz.tbi
│ │ │ ├── Lmul_21351_01.vcf.gz
│ │ │ ├── Lmul_21351_01.vcf.gz.tbi
│ │ │ ├── Lniv_40881_01.vcf.gz
│ │ │ ├── Lniv_40881_01.vcf.gz.tbi
│ │ │ ├── Lpal_12561_01.vcf.gz
│ │ │ ├── Lpal_12561_01.vcf.gz.tbi
│ │ │ ├── Ltau_32742_01.vcf.gz
│ │ │ └── Ltau_32742_01.vcf.gz.tbi
│ │ └── PART_2
│ │ ├── P01_consensus_seq
│ │ │ ├── 00_sampleList.txt
│ │ │ └── summary_IUPAC_fasta.txt
│ │ ├── P03_MS_alignment
│ │ │ └── unpolarized_MSAs
│ │ │ ├── ALP
│ │ │ │ └── MSAs
│ │ │ ├── DIV
│ │ │ │ └── MSAs
│ │ │ └── MUL_4x
│ │ │ └── MSAs
│ │ └── P04_polarize
│ │ ├── ASTRAL
│ │ │ ├── ALP_DIPLOID
│ │ │ │ ├── IT1_LCAM_11371_01
│ │ │ │ ├── IT2_LEXS_21921_01
│ │ │ │ ├── IT3_LTAU_32742_01
│ │ │ │ └── IT4_LEXS_21921_01
│ │ │ ├── DIV_DIPLOID
│ │ │ │ ├── IT1_LCAM_11371_01
│ │ │ │ ├── IT2_LTAU_32742_01
│ │ │ │ ├── IT3_LPAL_12561_01
│ │ │ │ └── IT4_LTAU_32742_01
│ │ │ └── MUL_DIPLOID
│ │ │ ├── IT1_LCAM_11371_01
│ │ │ ├── IT2_LPAL_12561_01
│ │ │ ├── IT3_LTAU_32742_01
│ │ │ └── IT4_LPAL_12561_01
│ │ ├── locus_trees
│ │ │ ├── 20b_ALT1_RUN_IQ-TREE2_ALP_DIPLOID
│ │ │ │ ├── IT1_LCAM_11371_01
│ │ │ │ ├── IT2_LEXS_21921_01
│ │ │ │ ├── IT3_LTAU_32742_01
│ │ │ │ └── IT4_LEXS_21921_01
│ │ │ ├── 20b_ALT1_RUN_IQ-TREE2_DIV_DIPLOID
│ │ │ │ ├── IT1_LCAM_11371_01
│ │ │ │ ├── IT2_LTAU_32742_01
│ │ │ │ ├── IT3_LPAL_12561_01
│ │ │ │ └── IT4_LTAU_32742_01
│ │ │ └── 20b_ALT1_RUN_IQ-TREE2_MUL_DIPLOID
│ │ │ ├── IT1_LCAM_11371_01
│ │ │ ├── IT2_LPAL_12561_01
│ │ │ ├── IT3_LTAU_32742_01
│ │ │ └── IT4_LPAL_12561_01
│ │ └── polarized_MSAs
│ │ ├── 20a_Polarize_IUPAC_ALT1_ALP_DIPLOID
│ │ │ ├── IT1_LCAM_11371_01
│ │ │ ├── IT2_LEXS_21921_01
│ │ │ ├── IT3_LTAU_32742_01
│ │ │ ├── IT4_LEXS_21921_01
│ │ │ └── LPAL_12561_01
│ │ ├── 20a_Polarize_IUPAC_ALT1_DIV_DIPLOID
│ │ │ ├── IT1_LCAM_11371_01
│ │ │ ├── IT2_LTAU_32742_01
│ │ │ ├── IT3_LPAL_12561_01
│ │ │ └── IT4_LTAU_32742_01
│ │ └── 20a_Polarize_IUPAC_ALT1_MUL_DIPLOID
│ │ ├── IT1_LCAM_11371_01
│ │ ├── IT2_LPAL_12561_01
│ │ ├── IT3_LTAU_32742_01
│ │ └── IT4_LPAL_12561_01
│ └── 07.Plastid_trees
│ ├── IQ-TREE
│ │ ├── Core_V1_V5_combined_gc_FINAL_iqtree_indels.nex
│ │ ├── Core_V1_V5_combined_gc_FINAL_iqtree.nex
│ │ ├── Core_V1_V5_combined_gc_FINAL_iqtree_partitions.nex
│ │ └── Core_V1_V5_combined_gc_FINAL_iqtree_partitions.nex.treefile
│ ├── MrBayes
│ │ ├── Core_V1_V5_combined_gc_FINAL.nex
│ │ └── Luzula_cp_gap.con.tre
│ ├── TCS
│ │ └── CORE_V1_V5_FINAL_noOG.nex
│ ├── V1.phy.MrAIC.txt
│ ├── V1_V5_combined_gapcoding.nex
│ ├── V1_V5_combined_no_gapcoding.nex
│ └── V5.phy.MrAIC.txt
└── 03.Tetraploids.zip
│
├── 02.GTFrequencies
│ ├── ALP_MUL(4x)
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG_Lalp_mul_R70.vcf.gz
│ │ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_noOG_Lalp_mul_R70.vcf.gz.tbi
│ │ └── pops
│ └── DIV
│ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_Ldiv_R70.vcf.gz
│ ├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_Ldiv_R70.vcf.gz.tbi
│ └── pops
└── 03.STRUCTURE
├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_Lalp_mul_R70_MAC3_var_LD.structure.input
├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_Lalp_mul_R70_MAC3_var_LD.vcf.gz
├── LUZALP_Core_diminDP8_tetraminDP30_hexaminDP40_maxDP200_clean_nohybrids_noOG_Lalp_mul_R70_MAC3_var_LD.vcf.gz.tbi
└── out_STR.zip
