Integration of genomic and ecologic methods inform management of an undescribed, yet highly exploited, sardine species
Data files
Feb 18, 2024 version files 20.51 MB
-
abiotic_data.zip
2.31 MB
-
Harengulasp_OccurrenceRecords.csv
8.89 KB
-
r_scripts.zip
9.87 KB
-
README.md
9.26 KB
-
Report-DHare21-6652.zip
18.17 MB
Abstract
Assessing genetic diversity within species is key for conservation strategies in the context of human-induced biotic changes. This is important in marine systems where many species remain undescribed while being overfished, and conflicts between resource-users and conservation agencies are common. Combining niche modelling with population genomics can contribute to resolving those conflicts by identifying management units and understanding how past climatic cycles resulted in current patterns of genetic diversity. We addressed these issues on an undescribed but already overexploited species of sardine of the genus Harengula. We find that the species distribution is determined by salinity and depth, with a continuous distribution along the Brazilian mainland and two disconnected oceanic archipelagos. Genomic data indicates that such biogeographic barriers are associated with two divergent intraspecific lineages. Changes in habitat availability during the last glacial cycle led to different demographic histories among stocks. One coastal population experienced a 3.6-fold expansion, whereas an island-associated population contracted 3-fold, relative to the size of the ancestral population. Our results indicate that the island population should be managed separately from the coastal population, and that a Marine Protected Area covering part of the island population distribution can support the viability of this lineage.
This README file was generated by Jessica FR Coelho on 07-Feb-2024.
--------------------
GENERAL INFORMATION
--------------------
1. Title of Dataset: Integration of genomic and ecologic methods inform management of an undescribed, yet highly exploited, sardine species
DOI: 10.5061/dryad.np5hqbzzg
2. Author information:
Coelho, Jéssica Fernanda Ramos, Universidade Federal do Rio Grande do Norte, jessicovsky@gmail.com
Mendes, Liana, Universidade Federal do Rio Grande do Norte
Di Dario, Fabio, Universidade Federal do Rio de Janeiro
Carvalho, Pedro Hollanda, Universidade Federal do Rio de Janeiro
Dias, Ricardo, Universidade Federal do Rio de Janeiro
Lima, Sergio, Universidade Federal do Rio Grande do Norte
Verba, Julia, Ludwig Maximilian University of Munich
Pereira, Ricardo, Staatliches Museum für Naturkunde Stuttgart
3. Short summary of the study: In this study, we used niche modeling and population genomics to investigate how historical climate changes led to the current genetic diversity patterns of the tropical scaled-sardine Harengula sp. We observed that the distribution of this species is shaped by depth and salinity along the Atlantic Southwest and that these barriers are linked to two distinct populations: one that is spread across the Brazilian mainland coast and one in the oceanic island of Fernando de Noronha.
4. Date of data collection: 2021 (for ecological data deposited on online databases from 1946-2021) and 2020-2021 (fieldwork collection for genomic data)
5. Geographic location of data collection: Atlantic Southwest, coast of Brazil
6. Funding sources that supported the collection of the data: J.F.R.C. was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. Financial support to F.D.D. was provided by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico – PROTAX 443302/2020). SMQL receives a Conselho Nacional de Desenvolvimento Científico e Tecnologico (CNPq) productivity research grant (№ Proc 312066/2021-0). This study was developed in the context of the "Projeto MULTIPESCA- Ciência para a sustentabilidade da pesca, pescado e pescadores do Rio de Janeiro", which received support from the "Marine and Fisheries Research Project". The Marine and Fisheries Research Project is an offset measure established under a consent decree agreed between the company PRIO and the Federal Public Prosecutors’ Office in Rio de Janeiro. It is implemented by FUNBIO.
---------------------------------------------
Overview of folders/files and their contents
---------------------------------------------
Files List:
(1) Genomic data
\* Reduced representation (DArTseq data) of the genome of 91 scaled-sardines of the genus Harengula.
(2) Ecological data
\* Georeferenced occurrence records.
(3) R Scripts
\* Scripts used to filter and analyze genomic and ecological data.
-----------------
(1) GENOMIC DATA
-----------------
The folder Report-DHare21-6652.zip contains the genomic data generated by Diversity Arrays Technology (DArTseq) for the Brazilian scaled sardines Harengula sp.
SNPs can be coded as: 0=homozygous for the reference allele;
1=heterozygous; and
2=homozygous for the alternate allele.
NA, blank cells or - [hyphen] indicate missing values.
SNPs can also be coded as: A/A, A/C, C/T, G/A and so on, with ‐/‐ indicating missing values.
Details on file columns also available in Gruber et al. (2022).
The files are:
I. Report_DHare21-6652_SilicoDArT_1.csv - 'silicodart' (presence/absence) dataset.
Columns are
CloneID: Identifier of the sequence tag that is unique for each tag
AlleleSequence: Refers to the DNA sequence of the tag found in samples with a genotype score of '1'
TrimmedSequence: Same as the full sequence, but with sequencing adapters trimmed out
CallRate: Ratio of samples where the genotype call is either '1' or '0' as opposed to '‐' [hyphen]
OneRatio: Ratio of samples where the genotype score is '1'
PIC: Polymorphism Information Content
AvgReadDepth: Average read depth, calculated by adding the read counts for all samples then dividing by the number of samples that have non-zero tag read counts
StDevReadDepth: Standard deviation read depth is the standard deviation of the count of tag reads for all samples that have non-zero tag read counts
Qpmr: Mean of normalized non-zero tag read counts divided by the standard deviation of normalized non-zero tag read counts
Reproducibility: Ratio of technical replicate assay pairs with consistent marker scores
-> Asterisks and header in the dataset characterize the format of the file and serve for software reading purposes.
-> Cells from K6 to CW6 refer to individuals' collection identification; cells from K7 to CW7 identify individuals with location IDs, which are:
FNO Fernando de Noronha [island]
CE Ceara
RN Rio Grande do Norte
PB Paraiba
PE Pernambuco
AL Alagoas
BA Bahia
ABR Abrolhos [island]
RJ Rio de Janeiro
SP Sao Paulo
SC Santa Catarina
II. Report_DHare21-6652_SNP_2.csv - 'SNP' (single nucleotide polymorphism) dataset.
Individuals in this dataset were ordered by sampling latitude. In this file these columns were defined above: CloneID, AlleleSequence, TrimmedSequence and CallRate.
Cells from S7 to DE7 identify individuals with location IDs abbreviated as defined above.
Columns are
AlleleID: Same as CloneID with details on SNP position
SNP: Single Nucleotide polymorphism; refers to the mutation identified in the sequence tag
SnpPosition: Position of the SNP indentified in the sequence tag (zero is position 1)
OneRatioRef: Ratio of samples where genotype score is zero (0)
OneRatioSnp: Ratio of samples where genotype score is two (2)
FreqHomRef: Frequency of homozygotes for the reference allele
FreqHomSnp: Frequency of homozygotes for the mutation (SNP)
FreqHets: Frequency of heterozygotes (genotype score is one (1))
PICRef: Polymorphism Information Content for the reference allele
PICSnp: Polymorphism Information Content for the mutation (SNP)
AvgPIC: Mean of Polymorphism Information Content of both reference and SNP alleles
AvgCountRef: Mean of PIC for both reference and SNP alleles
AvgCountSnp: Sum of the tag read counts for all samples, divided by the number of samples for which tag read counts is not zero for the SNP allele
RepAvg: Ratio of technical replicate assay pairs with consistent marker scores
III. SampleFile-DHare21-6652-my_metadata.csv - Metadata associated to the SNP dataset; required when uploading the above dataset as a genlight object in R.
Columns are
id Unique individual identification
pop Population abbreviation (as defined above, sampling site collection)
lat Latitude of sample collection in decimal degrees
lon Longitude of sample collection in decimal degrees
--------------------
(2) ECOLOGICAL DATA
--------------------
Harengulasp_OccurrenceRecords.csv comprises the dataset of occurrence records used in the ecological niche model.
Columns are species names, latitude, and longitude, in decimal degrees as detailed above.
Climate data was downloaded from MARSPEC and can be accessed in R, using the script "enm.txt" avaliable in this repository and listed below.
Folder abiotic_data.zip contains the climate data in two folders:
I. current_bioregion: climate data for the present climate croped into marine bioregions as in Spalding et al. (2007);
bathy Depth of sea floor (meters, scaling factor: 1x)
slope Bathymetric slope (degrees, scaling factor: 1x)
sss_mean Mean Annual Sea Surface Salinity (practical salinity unit, scaling factor: 100x)
sst_mean Mean Annual Sea Surface Temperature (celsius, scaling factor: 100x)
II. lgm_bioregion: climate data for the Last Glacial Maximum (LGM) climate croped into marine bioregions as in Spalding et al. (2007).
bathy Depth of sea floor (meters, scaling factor: 1x)
slope Bathymetric slope (degrees, scaling factor: 1x)
sss_mean Mean Annual Sea Surface Salinity (practical salinity unit, scaling factor: 100x)
sst_mean Mean Annual Sea Surface Temperature (celsius, scaling factor: 100x)
---------------
(3) R Scripts
---------------
Folder r_scripts.zip:
For genomic analyses, R scripts (.txt format) follow the order:
1. filtering.txt
2. outlier_removal.txt
3. pcoa_colors.txt
4. pop_analyses.txt
Ecological niche models:
1. enm.txt
Extract all files from .zip folder, then copy/paste the text in each .txt file to R to run the analyses.
------------
REFERENCES
------------
Gruber B, Georges A, Mijangos JL, Pacioni C, Unmack PJ, Berry O, Clarck LV, Devloo-Delva F, Archer E. Importing and analysing SNP and Silicodart data generated by genome-wide restriction fragment analysis. Package: dartR 2022 v. 2.7.2. https://green-striped-gecko.github.io/dartR/
Spalding MD, Fox HE, Allen GR, Davidson N, Ferdaña ZA, Finlayson MA, Halpern BS, Jorge MA, Lombana AL, Lourie SA, Martin KD. Marine ecoregions of the world: a bioregionalization of coastal and shelf areas. BioScience. 2007 Jul 1;57(7):573-83.
- Coelho, Jéssica Fernanda Ramos et al. (2024), Larval dispersal and climate models provide insights into present and future distribution of a tropical sardine, Marine Biology Research, Journal-article, https://doi.org/10.1080/17451000.2024.2309562
- Coelho, Jéssica Fernanda Ramos; Mendes, Liana de Figueiredo; Di Dario, Fabio et al. (2024). Integration of genomic and ecological methods inform management of an undescribed, yet highly exploited, sardine species. Proceedings of the Royal Society B: Biological Sciences. https://doi.org/10.1098/rspb.2023.2746
