Data and code from: A multifaceted approach reveals complex genomic mediation of white-nose syndrome resistance in the little brown bat (Myotis lucifugus)
Data files
Dec 04, 2025 version files 23.28 GB
-
all_samples.merge.txt
693 B
-
all_samples.txt
2.70 KB
-
ANGSD_Fst_10kbp_windows.zip
5.65 GB
-
comps.tsv
22.32 KB
-
gatk.snp.qual_hard_filtered_autosomes_thin.vcf.gz
146.35 MB
-
gatk.snp.qual_hard_filtered_autosomes.vcf.gz
17.38 GB
-
hicov_coverage.tsv
13.96 KB
-
lowcov_coverage.tsv
88.87 KB
-
pop_comps.tsv
93 B
-
pop_scaff.tsv
2.48 KB
-
pops.txt
41 B
-
Population_Map.tsv
4.40 KB
-
PRE.ind
332 B
-
README.md
6.34 KB
-
REHH_Rsb_10_Kbp_windows.zip
46.98 MB
-
RG_info.tsv
6.68 KB
-
SNP_filtering.txt
3.39 KB
-
XP-CLR_10kbp_windows.zip
64.52 MB
Abstract
Novel pathogens have become a major challenge faced by wildlife in the Anthropocene. White-nose syndrome (WNS), a fungal pathogen, has decimated bat populations across North America over the last two decades. Demographic and physiological evidence of resistance in one heavily affected species, Myotis lucifugus, has prompted multiple attempts to delineate the genomic underpinnings, but they show little congruence in their findings. This may be due, in part, to the limitations of the genomic resources utilized and/or analytical approaches employed. Here, we performed high-coverage whole-genome resequencing of M. lucifugus sampled prior to (n = 29) and 10 years after the arrival of WNS (n = 30), aligned to a new reference genome to identify signatures of selection associated with pathogen resistance. Using 41.9 million SNPs, we implemented a combination of hard and soft sweep detection analyses, leading to discovery of 405 genes with robust evidence of selection. Of these, 241 (59.5 %) were associated with enriched gene ontology (GO) terms, many of which were tied to neuron development, organization, and function. Further, approximately half (120) of genes associated with enriched GO terms interact with genes identified by previous studies. Our findings suggest WNS resistance is mediated through highly complex, polygenic mechanisms. Further, we demonstrate there are far more connections among WNS selection study results than previously recognized. We believe that the methods employed by our study illustrate a need for a paradigm shift in non-model selection studies and further highlight the value of genomics as a tool for conservation management.
Dataset DOI: 10.5061/dryad.ncjsxkt66
Description of the data and file structure
This dataset contains all analytical code for this manuscript, along with associated data files, two versions of the final VCF, and selection statistic output files. The genome annotation used with this data is available through https://github.com/docmanny/myotis-gene-annotations.
Data Files
VCFs:
- gatk.snp.qual_hard_filtered_autosomes.vcf.gz -- full set of filtered SNPs
- gatk.snp.qual_hard_filtered_autosomes_thin.vcf.gz -- filtered SNPs thinned by 10 Kbp distance
SNP Calling Pipeline Input Files:
- RG_info.tsv -- read group information for variant calling
- all_samples.txt -- sample IDs for all individual FASTQ files
- all_samples.merge.txt -- sample IDs merging all files for each individual
Selection Statistic Output Files:
- ANGSD_Fst_10kbp_windows.zip -- final output files from ANGSD_Fst.md
- XP-CLR_10kbp_windows.zip -- final output files from XP-CLR_outliers.R
- REHH_Rsb_10_Kbp_windows.zip -- final output files from REHH_Rsb_outliers.R
Sample and Population Text Files for Scripts:
- pops.txt -- text file of all population designations
- Population_Map.tsv -- population and sampling site information for each individual
- pop_comps.tsv -- text file of all pairwise comparisons
- PRE.ind -- all pre-WNS individuals
- pop_scaff.tsv -- text file input for phasing genotypes
Alignment and Variant Statistics:
- hicov_coverage.tsv -- per-scaffold alignment statistics for "high coverage" individuals
- lowcov_coverage.tsv -- per-scaffold alignment statistics for "shallow coverage" individuals
- SNP_filtering.txt -- Type and number of variants filtered per scaffold
Scripts & Code
SNP Calling Pipeline:
- call_SNPs_pipeline.zip
- Components:
- clean_align_callSNPs.sbatch -- master script
- HTS_preproc.slurm -- clean FASTQs; pipeline component
- hashDRAGMAP.slurm -- build reference genome hash table; pipeline component
- alignDRAGMAP.slurm -- align FASTQs to reference genome; pipeline component
- samtools_merge.slurm -- merge sample bams; pipeline component
- genome_wins.slurm -- create .bed for genome windows; pipeline component
- align_stats.slurm -- calculate alignment statistics; pipeline component
- STRtable.slurm -- create STR table; pipeline component
- bam_to_gvcf.slurm -- call variants per individual; pipeline component
- gvcf_to_vcf_scaff.slurm -- joint call variants; pipeline component
- vcf_scaff_to_snp.vcf.slurm -- combine VCFs and filter; pipeline component
- Dependencies: HTStream, DRAGMAP v1.2.1, picard, GATK4, samtools, bedtools, and bcftools
- Inputs: raw sequencing files (NCBI BioProject PRJNA1353610), M. lucifugus reference genome, RG_info.tsv, all_samples.txt, all_samples.merge.txt
- For more details see the associated github repository
Selection Statistic Calculation:
- ANGSD_Fst.md -- calculate population pariwise FST in 10 Kbp windows for all population comparisons
- Dependencies: ANGSD v0.940 and bcftools
- Inputs: gatk.snp.qual_hard_filtered_autosomes.vcf.gz, pops.txt, Population_Map.txt, pop_comps.tsv
- XP-CLR.sbatch -- calculate XP-CLR in 10 Kbp windows for all population comparisons
- rehh.md -- polarize VCF alleles, phase genotypes, and calculate Rsb 10 Kbp windows for all population comparisons
- Dependencies: java 17, bcftools, vcftools, vcffilterjdk, beagle v5.4, R, rehh, and strigr
- Inputs: gatk.snp.qual_hard_filtered_autosomes.vcf.gz, PRE.ind, pop_scaff.tsv, pop_comps.tsv
Identify Selection Statistic Outlier Windows:
- ANGSD_Fst_outliers.R -- identify outlier windos from final ANGSD_Fst.md output
- XP-CLR_outliers.R -- identify outlier windos from final XP-CLR.sbatch output
- REHH_Rsb_outliers.R -- identify outlier windos from final rehh.sbatch output
