Skip to main content

Introgressive hybridization in the west Pacific pen shells (genus Atrina): Restricted interspecies gene flow within the genome

Cite this dataset

Sekino, Masashi et al. (2023). Introgressive hybridization in the west Pacific pen shells (genus Atrina): Restricted interspecies gene flow within the genome [Dataset]. Dryad.



A compelling interest in marine biology is to elucidate how species boundaries between sympatric free‐spawning marine invertebrates such as bivalve molluscs are maintained in the face of potential hybridization. Hybrid zones provide the natural resources for us to study the underlying genetic mechanisms of reproductive isolation between hybridizing species. Against this backdrop, we examined the occurrence of introgressive hybridization (introgression) between two bivalves distributed in the western Pacific margin, Atrina japonica and Atrina lischkeana, based on single‐nucleotide polymorphisms (SNPs) derived from restriction site‐associated DNA sequencing. Using 1066 ancestry‐informative SNP sites, we also investigated the extent of introgression within the genome to search for SNP sites with reduced interspecies gene flow. A series of our individual‐level clustering analyses including the principal component analysis, Bayesian model‐based clustering, and triangle plotting based on ancestry–heterozygosity relationships for an admixed population sample from the Seto Inland Sea (Japan) consistently suggested the presence of specimens with varying degrees of genomic admixture, thereby implying that the two species are not completely isolated. The Bayesian genomic cline analysis identified 10 SNP sites with reduced introgression, each of which was located within a genic region or an intergenic region physically close to a functional gene. No, or very few, heterozygotes were observed at these sites in the hybrid zone, suggesting that selection acts against heterozygotes. Accordingly, we raised the possibility that the SNP sites are within genomic regions that are incompatible between the two species. Our finding of restricted interspecies gene flow at certain genomic regions gives new insight into the maintenance of species boundaries in hybridizing broadcast‐spawning molluscs.


The SNP data were obtained by RAD sequencing (restriction enzyme: Sbf I). We followed Sekino et al. (2016) for RAD library construction. The library and Phi X control (Illumina; 25% at a molar concentration ratio) were subjected to short-read sequencing with a NextSeq 500 sequencer in combination with NextSeq 500 High Output Kit (Illumina; 75 cycles of single-end sequencing).

We truncated the resulting short reads to 64 bases including partial Sbf I-recognition sequence (six bases) using the subprogram process_radtags of Stacks v1.35 or higher (Catchen et al. 2011). Thus, the subsequent variant calling was based on the remaining 58 bases (64 - 6). With FASTX-Toolkit v0.0.14 (, low quality reads, which had a base-quality score of less than 20 at 5% or more of the bases, were discarded. Reads retained after this filtering were mapped onto a reference genome of Atrina japonica (Sekino et al. 2023; DDBJ/EMBL/GenBank accession numbers, BROG01000001‒BROG01003391; Sequence Read Archive, DRR380487 and DRR380488) based on the MEM algorithm available in BWA v0.7.12 (Li & Durbin 2010) with default parameters. After extracting mapped reads (the subprogram view in SAMtools v0.1.19; Li et al. 2009), we set aside reads with alternative hits (“XA” tag) and chimeric reads (“SA” tag). The resulting data (bam format) were converted to the mpileup format (the subprogram mpileup of SAMtools). Based on the mpileup data, we performed variant calling using the subprogram mpileup2snp of VarScan 2 v2.4.4 (Koboldt et al. 2012). In this computation, the following parameters were set for calling a variant at a position: minimum base-quality score, 30; minimum number of reads that supported a variant, 5; threshold significance value to call a variant (Fisher’s exact test), 0.05. After variant calling and initial filtering, we performed more rigorous SNP filtering as follows (see also Sekino et al. 2023):

1. Biallelic sites were selected (VCFtools v0.1.16; Danecek et al. 2011).

2. For each specimen, sites with the coverage depth of < 30 were rejected (VCFtools).

3. Sites with the coverage depth of ≤ D+3√D, where D is the average depth over all SNP sites and specimens, were allowed (Li 2014).

4. Specimens with more than 10% missing genotypes were omitted.

5. Sites with minor allele frequency of > 0.05 across the population samples were selected (VCFtools).

6. Sites that were available in 90% or more of the specimens across the population samples as well as in each population sample were retained.

7. Sites were thinned so that neighboring sites were at least 1 kb apart in a contig of the reference genome.

8. Sites that were out of Hardy-Weinberg equilibrium (HWE) were removed (an exact test available in VCFtools; critical P of 0.05 without correction of significance level for multiple simultaneous comparisons). The HWE filtering was applied to two population samples of putatively pure species (A. japonica, HKDT; A. lischkeana, GOTO). Sites that failed to meet HWE in either sample or both were omitted.

9.  LD pruning with BCFtools version 1.8 (Li, 2011) was based on the LD statistic of r2 (threshold r2 = 0.1) estimated for pairs of sites with MAF of >0.05 in each of the HKDT and GOTO samples.

10. If a SNP site was in a contig that had no Sbf I-recognition sequence in the abovementioned A. japonica reference genome, the site was excluded.



  • Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3, 1, 171–182.
  • Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools. Bioinformatics, 27, 2156–2158.
  • Koboldt DC, Zhang Q, Larson DE,1 Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research, 22, 568–576.
  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078–2079.
  • Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, 26, 589–595.
  • Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27, 2987–2993.
  • Li H (2014) Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics, 30, 2843–2851.
  • Sekino M, Nakamichi R, Iwasaki Y, Tanabe AS, Fujiwara A, Yasuike M, Shiraishi M, Saitoh K (2016) A new resource of single nucleotide polymorphisms in the Japanese eel Anguilla japonica derived from restriction site-associated DNA. Ichthyological Research, 63, 496–504.
  • Sekino M,  Hashimoto K, Nakamichi R, Yamamoto M, Fujinami Y, Sasaki T (2023) Introgressive hybridization in the west Pacific pen shells (genus Atrina): restricted interspecies gene flow within the genome. Molecular Ecology.


Japan Society for the Promotion of Science, Award: JSPS KAKENHI 16K07859