Skip to main content

Data from: Restriction site-associated DNA sequencing reveals local adaptation despite high levels of gene flow in Sardinella lemuru (Bleeker, 1853) along the northern coast of Mindanao, Philippines

Cite this dataset

Labrador, Kevin et al. (2022). Data from: Restriction site-associated DNA sequencing reveals local adaptation despite high levels of gene flow in Sardinella lemuru (Bleeker, 1853) along the northern coast of Mindanao, Philippines [Dataset]. Dryad.


Stock identification and delineation are important in the management and conservation of marine resources. These were highlighted as priority research areas for Bali sardinella (Sardinella lemuru) which is among the most commercially important fishery resources in the Philippines. Previous studies have already assessed the stocks of S. lemuru between Northern Mindanao Region (NMR) and Northern Zamboanga Peninsula (NZP), yielding conflicting results. Phenotypic variation suggests distinct stocks between the two regions, while mitochondrial DNA did not detect evidence of genetic differentiation for this high gene flow species. This paper tested the hypothesis of regional structuring using genome-wide single nucleotide polymorphisms (SNPs) acquired through restriction-site associated DNA sequencing (RADseq). We examined patterns of population genomic structure using a full panel of 3,573 loci, which was then partitioned into a neutral panel of 3,348 loci and an outlier panel of 31 loci. Similar inferences were obtained from the full and neutral panels, which were contrary to the inferences from the outlier panel. While the full and neutral panels suggested a panmictic population (global FST ~ 0, p > 0.05), the outlier panel revealed genetic differentiation between the two regions (global FST = 0.161, p = 0.001; FCT = 0.263, p < 0.05). This indicated that while gene flow is apparent, selective forces due to environmental heterogeneity between the two regions play a role in maintaining adaptive variation. Annotation of the outlier loci returned five genes that were mostly involved in organismal development. Meanwhile, three unannotated loci had allele frequencies that correlated with sea surface temperature. Overall, our results provided support for local adaptation despite high levels of gene flow in S. lemuru. Management therefore should not only focus on demographic parameters (e.g., stock size, catch volume), but also consider the preservation of adaptive variation.


Sardinella lemuru samples were collected from embayments found along the northern coast of Mindanao covering two geopolitical regions, Northern Mindanao Region (NMR) and Northern Zamboanga Peninsula (NZP). Dorsal muscle tissue was excised from each sample, preserved in a nucleotide stabilization solution, and then sent to Beijing Genomics Institute (BGI) for DNA extraction, library preparation, and restriction site-associated DNA sequencing (RADseq).

The genomic data was processed using the STACKS bioinformatics pipeline. Optimization was first performed on a subset of samples wherein the optimal genotyping parameters were determined following several criteria. These parameters were then used on the full dataset to obtain the single nucleotide polymoprhisms (SNPs) for population genomics analyses.

The SNPs were subjected to additional filters (e.g., missing data, Hardy-Weinberg Equilibrium, Linkage Disequilibrium), resulting to the full panel. This was then subjected to several outlier screening methods (Arlequin, BayeScan, fstHet, OutFLANK, and pcadapt); loci that were identified as outliers by at least one method were removed from the full panel to generate the neutral panel, while loci that were detected as outliers by at least two screening methods were used to generate the outlier panel.

The three SNP panels (full, neutral, outlier) were then subjected to population genomics analyses, such as the calculation of genetic diversity (e.g., no. of alleles, observed and expected heterozygosity), determination of spatial pattern of genetic structure using ordination methods (e.g., PCA, DAPC), calculation of the magnitude of genetic differentiation using pairwise FST, and testing hypotheses of genetic structure using AMOVA.

Loci under the outlier panel were also extracted from the STACKS catalog and were identified using NCBI-BLAST. Functional annotation of the identified sequences was done using the UniProt database. Moreover, the association between the outlier loci and environmental parameters were elucidated by correlating the allele frequencies with remotely sensed oceanographic parameters, namely sea surface temperature (SST), chlorophyll-a (chl-a), and particulate organic carbon (POC).

Usage notes

A. DATA_Sample_Metadata.csv - contains the sampling location and collection date

B. DATA_Stacks_Optimization - contains all the files used for visual diagnostics to guide selection of the optimal genotyping parameters in STACKS
C. DATA_SNP_Panels - contains the SNP panels (in GENEPOP format) used in the analyses 

D. SCRIPT_PopSNPs - Contains all information pertinent to the analyses done.
        /1_PopSNPs.Rmd - the R Markdown file
        /2_PopSNPs.html - output of the markdown file in .html format


Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development

The Marine Science Institute (Thesis Writing Grant)