High connectivity at abyssal depths: Genomic and proteomic insights into population structure of the pan-Atlantic deep-sea bivalve Ledella ultima (E. A. Smith, 1885)
Data files
Aug 21, 2025 version files 157.07 MB
-
2bRAD_pipeline.html
2.17 MB
-
mito.csv
1.81 KB
-
populations.snps.vcf
153.94 MB
-
Proteomics_pipeline.html
952.83 KB
-
README.md
4.85 KB
-
ref_pop.csv
1.10 KB
Abstract
The bivalve Ledella ultima is one of the most abundant protobranchs in the abyssal Atlantic, making it a valuable model organism for studying phylogeographic patterns and population connectivity. To examine the population structure of L. ultima across seven Atlantic basins spanning over 10000 km in latitude, we used single-nucleotide polymorphisms (SNPs) from 2b-RAD and proteomic fingerprinting by MALDI-TOF MS. Despite overall low genetic divergence, subtle genetic structure was detected by admixture analyses, supporting two source populations: one in the north and central Atlantic, and a second in the south Atlantic, with moderate admixture in the Brazil and Cape basins. Proteomic fingerprinting revealed two basin-separated groups with patterns distinct from the nuclear data, suggesting environmentally driven shifts in protein expression. Our findings underscore the value of integrating nuclear genomic and proteomic tools to decipher population connectivity at abyssal depths, where minimal genetic differentiation necessitates fine-scale analyses.
Dataset DOI: 10.5061/dryad.t1g1jwtdr
The following help file is in regards to the manuscript:
"High Connectivity at Abyssal Depths: Genomic and Proteomic Insights into Population Structure of the Pan-Atlantic Deep-Sea Bivalve Ledella ultima (E. A. Smith, 1885)"
Description of the 2b-RAD data and file structure
Restriction site–associated DNA genotyping with type IIB restriction endonucleases (2b-RAD; restriction enzyme BgcI) was carried out using 93 samples of Ledella ultima. Raw reads of 50-bp paired-end sequence strands are deposited as fastq.gz files to the NCBI Sequence Read Archive (SRA) under accession numbers SAMN47739826–SAMN47739903 (BioProjectPRJNA1245090). Raw reads were demultiplexed by internal barcodes and PCR duplicates removed using the custom script (https://github.com/pmartinezarbizu/2bRADpp). The populations.snps.vcf was generated using Stacks (v 2.68; Rochette et al. 2019), applying a minimum quality score threshold of 30 to retain high-confidence reads. Stacks modules were applied as follows:
ustacks
-m: tested for 3, 5 and 8
-M: 2
-N: 4
Gapped alignment: disabled
-m: 5 was selected, ensuring sufficient marker density while minimizing potential genotyping errors
cstacks
catalog of loci based on a map file (see ref_pop.csv)
sstacks, gstacks
alignment of sequence data to catalog to detect matching loci, and generation of genotypic data
populations
p1_r0.1
Downstream analyses were performed using individual SNP information (based on stacks output: populations.snps.vcf) in the custom R script 2bRAD_pipeline.html.
Filtering thresholds were:
>75% missing genotype calls
loci with >20% missing data
Singletons and non-polymorphic sites excluded
15 individuals did not meet the filtering thresholds and were excluded from further analysis
After quality filtering and SNP calling, a total of 2048 polymorphic loci were retained across 78 individuals.
The file mito.csv was used to map mitotypes from the mitochondrial DNA haplotype network analyses to the analyzed SNP data.
Remarks:
The basin abbreviation MAE (Mid-Atlantic East) was subsequently changed to CVB (Cape Verde Basin)
The basin abbreviation MAW (Mid-Atlantic West) was subsequently changed to GUB (Guyana Basin)
Code/software
Demultiplexed data were processed using Stacks (v 2.68; Rochette et al. 2019). All downstream analyses were performed using individual SNP information (based on stacks output: populations.snps.vcf) in the custom R script provided.
Data manipulation utilized the R packages vcfR (Knaus and Grünwald, 2017), adegenet (Jombart, 2008; Jombart and Ahmed, 2011), and SNPRelate (Zheng et al., 2012). Filtered VCF files were converted to genlight and genind objects for subsequent analyses.
Missing genotype data were imputed and reformatted for compatibility with the LEA (Latent Environmental Ancestry) package (Frichot and François, 2015). Population structure was inferred using the Sparse Nonnegative Matrix Factorization (sNMF) algorithm from the LEA package. Population structure was additionally assessed using Bayesian clustering from the conStruct package (Bradburd, 2019). The effective sample size (ESS) for each cluster in each MCMC chain was computed using the effectiveSize function from the coda package (Plummer et al., 2006) to assess the stability of the chains. The Gelman-Rubin diagnostic was performed to evaluate convergence across chains, with the Potential Scale Reduction Factor (PSRF) calculated and visualized.
Pairwise Nei’s genetic distances (Nei, 1972) were calculated at the individual level based on allele frequencies derived from multilocus genotypes within and between basins using the StAMPP package (Pembleton, Cogan, and Forster, 2013). A Discriminant Analysis of Principal Components (DAPC) was performed using the three main hierarchical clusters (based on Nei distances and Ward.D) as the discriminant factor. The fixation index (FST) was calculated based on Nei and Chesser’s (1983) corrected Genetic Differentiation Statistic, which accounts for genetic differentiation among populations by incorporating heterozygosity and adjusting for sample size biases, using the R package FinePop2 (Nakamichi, Kitada, and Kishino, 2020). Nei distances and Fst were visualized as a heatmap using the pheatmap package (Kolde, 2019).
Proteomic data
The reader is referred to the Materials & Methods section of the cited article. The pipeline for processing the proteomic data can be accessed via the file Proteomics_pipeline.html
