Population genomics of big mice from the Faroe islands: Hybridization, colonization, and a new challenge to identifying genomic targets of selection
Data files
Jun 23, 2025 version files 10.96 GB
-
bwa_alignment.sh
786 B
-
README.md
3.02 KB
-
snp_calling.sh
1.92 KB
-
snps_domesticusMajor_norway7.vcf.gz
2.84 GB
-
snps_musculusMajor_norway6.vcf.gz
2.52 GB
-
snps_nolsoy8.vcf.gz
2.21 GB
-
snps_sandoy12.vcf.gz
3.38 GB
Abstract
Populations that colonize islands provide unique insights into demography, adaptation, and the spread of invasive species. House mice on the Faroe Islands evolved exceptionally large bodies after colonization, generating interest from biologists going back to Darwin. To reconstruct the evolutionary history of these mice, we sequenced genomes of population samples from three Faroe Islands (Nolsoy, Sandoy, Mykines) and Norway as a mainland comparison. Mice from the Faroe Islands are inferred to be hybrids between the subspecies Mus musculus domesticus and M. m. musculus, with ancestry alternating along the genome. The mice are predominantly M. m. domesticus in origin, reflecting repeated backcrossing to this subspecies following initial hybridization. Analyses based on the site frequency spectrum of single nucleotide polymorphisms and the ancestral recombination graph indicate that mice spread to the Faroe Islands on a timescale consistent with dispersal by Norwegian Vikings, with colonization of Sandoy likely preceding colonization of Nolsoy. Substantial reductions in nucleotide diversity and effective population size associated with colonization are inferred, raising the prospect that mice on the Faroe Islands evolved large body size during periods of heightened genetic drift. Genomic scans for positive selection uncover windows with unusual site frequency spectra, but this pattern is mostly generated by clusters of singletons in individual mice. Variants showing evidence of selection based on the ancestral recombination graph in Nolsoy and Sandoy are enriched for genes with neurological functions. Our findings reveal a dynamic evolutionary history for the enigmatic mice from Faroe Island and emphasize the challenges that accompany population genomic inferences in island populations.
Dataset DOI: 10.5061/dryad.5qfttdzh4
Description of the data and file structure
There are four SNP vcf files, one for each of the four following population groups: 12 unrelated wild mice from Faroe island Sandoy; 8 unrelated wild mice from Faroe island Nólsoy; 7 unrelated wild mice with domesticus (Mus musculus domesticus)-major ancestry (More *domesticus *ancestry than musculus or heterogeneous ancestry) from Norway; 6 unrelated wild mice with musculus (Mus musculus musculus)-major ancestry (More musculus ancestry than domesticus or heterogeneous ancestry) from Norway.
All Norway samples we sequenced are hybrids between domesticus and musculus. Tissue and DNA samples were provided by our collaborators. Illumina NovaSeq 150bp pair-end sequencing was completed by UW-Madison Genome Sequencing Center.
The files can be viewed using Bcftools. They can also be viewed using text editors or options 'more' or 'less' on Linux. They can also be manipulated with Bcftools or Vcftools.
All tools used are open source software.
The SNP calling steps are described below and the commands were provided in two scripts (bwa_alignment*.sh and snp_*calling.sh).
- Reads were mapped to reference genome mm10 using bwa (version 0.7.17) mem.
- PCR and optical duplicates were marked and removed with Picard (version 2.24.0).
- GATK (version 4.2.3.0) was used to call and filter variants for each population separately and short indels were removed.
- Vcf files for all populations were merged together, and missing genotypes were identified for each population.
- Missing genotypes were filled in using GATK –all-sites option for each population
- Original genotype calls and missing genotype calls were merged together.
Files and variables
File: snps_domesticusMajor_norway7.vcf.gz
Description: SNPs for seven wild unrelated Norway mice whose genomes have majority of domesticus ancestry.
File: snps_musculusMajor_norway6.vcf.gz
Description: SNPs for six wild unrelated Norway mice whose genomes have majority of musculus ancestry.
File: snps_nolsoy8.vcf.gz
Description: SNPs for eight wild unrelated mice from Faroe island Nólsoy.
File: snps_sandoy12.vcf.gz
Description: SNPs for twelve wild unrelated mice from Faroe island Sandoy.
Code/software
Please see files bwa_alignment.sh and snp_calling.sh. The steps to call snps with GATK described here are standard procedure. Please see GATK best practices: https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows
Access information
Other publicly accessible locations of the data:
- Raw sequencing reads are available from NCBI's Sequence Read Archive under the BioProject accession PRJNA1233310.
Data was derived from the following sources:
