Variant calling in the Goldilocks Zone: how reference genome choice and read mapping stringency impact heterozygosity estimates and phylogenetic analyses
Data files
Nov 13, 2025 version files 195.42 GB
-
alba_bt2_e2e.vcf.gz
18.17 GB
-
alba_bt2_local.vcf.gz
18.88 GB
-
alba_bwa.vcf.gz
13.12 GB
-
bt2_endtoend_script.txt
33.43 KB
-
bt2_local_script.txt
32.77 KB
-
bwa_script.txt
44.21 KB
-
inputFiles.zip
51.74 KB
-
lobata_bt2_e2e.vcf.gz
19.22 GB
-
lobata_bt2_local.vcf.gz
20.09 GB
-
lobata_bwa.vcf.gz
12.13 GB
-
mongolica_bt2_e2e.vcf.gz
17.09 GB
-
mongolica_bt2_local.vcf.gz
18.33 GB
-
mongolica_bwa.vcf.gz
14.66 GB
-
README.md
4.50 KB
-
ReferenceGenomeTest_v1.0.Rmd
14.66 KB
-
rubra_bt2_e2e.vcf.gz
12.79 GB
-
rubra_bt2_local.vcf.gz
15.10 GB
-
rubra_bwa.vcf.gz
15.83 GB
Abstract
This archive includes a short read dataset mapped to different references using different mapping methods, the scripts and summary statistics for evaluating the efficiency and accuracy of mapping, the heterozygosity, and tree topology resulting from each reference and method.
Molecular Ecology Resources.
Data and files
rubra_bt2_e2e.vcf.gz
VCF file of bases called against the Q. rubra reference using Bowtie 2 --end-to-end
rubra_bt2_local.vcf.gz
VCF file of bases called against the Q. rubra reference using Bowtie 2 --local
rubra_bwa.vcf.gz
VCF file of bases called against the Q. rubra reference using BWA
alba_bt2_e2e.vcf.gz
VCF file of bases called against the Q. alba reference using Bowtie 2 --end-to-end
alba_bt2_local.vcf.gz
VCF file of bases called against the Q. alba reference using Bowtie 2 --local
alba_bwa.vcf.gz
VCF file of bases called against the Q. alba reference using BWA
lobata_bt2_e2e.vcf.gz
VCF file of bases called against the Q. lobata reference using Bowtie 2 --end-to-end
lobata_bt2_local.vcf.gz
VCF file of bases called against the Q. lobata reference using Bowtie 2 --local
lobata_bwa.vcf.gz
VCF file of bases called against the Q. alba reference using BWA
mongolica_bt2_e2e.vcf.gz
VCF file of bases called against the Q. mongolica reference using Bowtie 2 --end-to-end
mongolica_bt2_local.vcf.gz
VCF file of bases called against the Q. mongolica reference using Bowtie 2 --local
mongolica_bwa.vcf.gz
VCF file of bases called against the Q. mongolica reference using BWA
ReferenceGenomeTest_v1.0.Rmd
this file is an R markdown file for analyzing the summary statistics and generating the figures for this manuscript
it reads the files in the inputFiles folder and exports the figures as pdfs to the outputFiles folder
inputFiles.zip
this is a folder with all the needed input files to run the ReferenceGenomeTest_v1.0.rmd file
alb_bt2-l.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus alba reference genome using Bowtie2 --local
alb_bt2_e2e.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus alba reference genome using Bowtie2 --end-to-end
alb_bwa.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus alba reference genome using bwa mem
lob_bt2-l.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus lobata reference genome using Bowtie2 --local
lob_bt2_e2e.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus lobata reference genome using Bowtie2 --end-to-end
lob_bwa.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus lobata reference genome using bwa mem
mong_bt2-l.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus mongolica reference genome using Bowtie2 --local
mong_bt2_e2e.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus mongolica reference genome using Bowtie2 --end-to-end
mong_bwa.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus mongolica reference genome using bwa mem
rub_bt2-l.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus rubra reference genome using Bowtie2 --local
rub_bt2_e2e.tre.treefile
tree from the concatenated matrix of samples mapped to the Quercus rubra reference genome using Bowtie2 --end-to-end
rub_bwa.tre.treefile_fixed
tree from the concatenated matrix of samples mapped to the Quercus rubra reference genome using bwa mem
Stats3.xlsx
an excel file with the statistics and data on each sample and reference genome. The "HetMissing" tab consists of heterozygosity and missing sites derived from vcftools output. The "AssemblyStats" tab consists of values data output from Bowtie2 or Samtools. The "BranchLengths" tab values were derived from the IQ-TREE2 topologies. Finally the "Samples" tab contains information also available on NCBI about the specimens.
taxa_trans_table.txt
a file with names for renaming species tree data
bt2_endtoend_script.txt
Script for processing reads post chloroplast filtering via Bowtie 2 --end-to-end to statistic calculation
bt2_local_script.txt
Script for processing reads post chloroplast filtering via Bowtie 2 --end-to-end to statistic calculation
bwa_script.txt
Script for processing reads post chloroplast filtering via Bowtie 2 --end-to-end to statistic calculation
