Balancing selection maintains intraspecific diversity in a deep-sea fish
Data files
Nov 07, 2025 version files 556.51 KB
-
grenadier.vcf
554.55 KB
-
README.md
1.96 KB
Abstract
Segregating alleles in natural populations can be driven to fixation or loss by genetic drift or directional selection, or may be maintained in a polymorphic state by balancing selection. Balancing selection in a panmictic population is theoretically well established, but not widely understood at the molecular level. In this study, we focus on the evolutionary processes affecting non-synonymous variants at eight functionally relevant loci (based on candidate SNP genotyping) in a deep-sea fish species (Coryphaenoides rupestris) that lives across habitat zones ranging from ~200 m to ~2000 m depth. At each of these loci, one allele is predominant in the deeper water. Across a shallower depth range, we find that minor allele frequencies show a highly significant increase or decline progressively across five defined age categories. At single depths below a threshold depth, the deep-water allele declines in frequency with age. Together, these data indicate segregatio n to different depths, either shallow or deep, and balancing selection to retain variants needed for each depth range. This is supported by signals for long-term balancing selection at these loci (based on published genomic data). We discuss alternative interpretations and conclude that balancing selection, maintaining ecotype diversity, is the best supported mechanism.
Dataset DOI: 10.5061/dryad.34tmpg4xx
Description of the data and file structure
Specimens of 290 Coryphaenoides rupestris were collected from eight depths ranging from 750m to 1900m and run on a MiSeq (Illumina Inc.) with a 2 x 75 bp paired-end sequencing protocol. Raw reads were automatically de-multiplexed by the MiSeq (Illumina Inc.) Analysis Software using the individual-specific index barcodes. Paired-end reads were combined using FLASH (min overlap of 4 and max overlap of 50; Magoč and Salzberg 2011). Merged reads were then mapped to the compiled consensus sequences for the target loci using BWA-MEM v0.7.17-r1188 (Li and Durbin 2009). Mapped reads were converted from Sequence Alignment/Map (SAM) files to BAM files with SAMtools v1.13 (Li et al., 2009). Variable sites were identified using FreeBayes v1.3.6 (--haplotype-length 0 -kwVa –no-mnps –no-complex; Garrison and Marth 2012); the positions of all SNPs for each locus were recorded in a VCF file. Raw genotypes were extracted directly from the VCF using vcfR v1.15.0 (Knaus and Grunwald 2017) and custom R (CRAN) scripts.
Files and variables
File: grenadier.vcf
Description: A Variant Calling Format file that includes the genotypes of 12 loci across 290 specimens.
| Loci | Gene name |
|---|---|
| Crup_1041_33310 | ROCK1 |
| Crup_1041_367049 | EGFR1 |
| Crup_10763_28387 | b4galt2 |
| Crup_20180_17073 | adgalt2 |
| Crup_3193_255831 | EEFR1 |
| Crup_4264_107730 | ATG9 |
| Crup_5367_301064 | OBSL1 |
| Crup_9875_168788 | CAC1E |
| Crup_1964_323785 | LRRTM4 |
| Crup_3137_404746 | PKMYT1 |
| Crup_875_62292 | Grid1a |
Code/software
The file can be opened with a text editor and every other free software related to VCF view/handling.
