Data from: The genetics of adaptation in freshwater Eurasian shad (Alosa)
Data files
May 08, 2026 version files 227.07 MB
-
AllisShad_Aguieira_1_annotation.tar
32.59 MB
-
AllisShad_Aguieira_1.fasta.zip
194.48 MB
-
README.md
2.49 KB
Abstract
This dataset contains genomic data and associated analyses used in the study “The genetics of adaptation in freshwater Eurasian shad (Alosa).” Included are whole-genome pool-seq datasets, population genomic analyses, genome scan results, and associated metadata used to investigate parallel adaptation to freshwater environments in Eurasian shad lineages. Studying the genetics of phenotypic convergence can yield important insights into adaptive evolution. Here, we conducted a comparative genomic study of four lineages (species and subspecies) of anadromous shad (Alosa) that have independently evolved life cycles entirely completed in freshwater. Three naturally diverged (A. fallax lacustris, A. f. killarnensis, and A. macedonica), and the fourth (A. alosa) was artificially landlocked during the last century. To conduct this analysis, we assembled and annotated a draft A. alosa genome and generated whole-genome sequencing data for 16 anadromous and freshwater populations of shad. Widespread evidence for parallel genetic changes in freshwater populations within lineages was found, while parallel genetic changes across lineages were comparatively rare.
Dataset DOI: 10.5061/dryad.6djh9w13n
Description of the data and file structure
Files and variables
File: AllisShad_Aguieira_1.fasta.zip
Description:
Compressed FASTA file containing the draft genome assembly of Alosa alosa generated in this study.
File: AllisShad_Aguieira_1_annotation.tar
Description:
Compressed archive containing genome annotation files associated with the Alosa alosa draft genome assembly.
Notes
The genome assembly and annotation correspond to the reference genome described in:
Sabatino SJ et al. (2022) The genetics of adaptation in freshwater Eurasian shad (Alosa). Ecology and Evolution.
Raw sequencing reads and additional genomic resources are available through NCBI under the accession numbers reported in the associated publication.
Code/software
The genome assembly file can be viewed using standard text editors or genome analysis software capable of reading FASTA format files. Annotation files can be viewed using common genome browsers and annotation tools supporting standard genomic annotation formats.
Analyses associated with the study were conducted using commonly used open-source bioinformatics software available at the time of publication, including ALLPATHS-LG for genome assembly, RepeatMasker and RepeatModeler for repeat annotation, HISAT2 and Cufflinks for transcript alignment and assembly, MAKER2 and GeneMark-ES for genome annotation, BWA-MEM for sequence alignment, SAMtools, FreeBayes, Popoolation2, and PCadapt for variant analysis and population genomic analyses. Additional analyses included custom scripts and standard statistical workflows developed during the study.
Raw sequencing data and additional associated resources are available through NCBI/GenBank accessions reported in the associated publication. Source code and complete computational workflows are not included in this repository.
Access information
Other publicly accessible locations of the data:
- Raw sequencing reads and associated genomic resources are available through NCBI/GenBank under the accessions reported in the associated publication.
Data was derived from the following sources:
- Wild-caught anadromous and freshwater populations of Eurasian shad (Alosa) sampled from multiple locations in Europe, as described in the associated publication.
A draft genome assembly for Alosa alosa was generated using Illumina paired-end and mate-pair sequencing libraries and assembled using ALLPATHS-LG. Whole-genome pooled sequencing (pool-seq) was conducted for 16 anadromous and freshwater populations representing four Eurasian shad lineages. Reads were quality filtered with Trimmomatic and mapped to the A. alosa reference genome using BWA-MEM. Variants were identified using Samtools, Popoolation2, and FreeBayes.
Genome scans for loci potentially under selection were conducted using allele frequency differences (ΔAF) between anadromous and freshwater populations within each lineage using a sliding window approach. Additional analyses included population genetic statistics, tests for parallelism within and among lineages, and evolutionary analyses of ATPase-α1 loci. Full methodological details are available in the associated publication.
