Data from: Possible involvement of ghost introgressions in the striking diversity of Vomeronasal type 1 receptor genes in East African cichlids
Data files
Jun 10, 2025 version files 13.31 MB
-
cichlid_V1R1_phased.fasta
1.27 MB
-
cichlid_V1R1_unphased.fasta
929.11 KB
-
cichlid_V1R2_phased.fasta
1.27 MB
-
cichlid_V1R2_unphased.fasta
909.55 KB
-
cichlid_V1R3_phased.fasta
1.28 MB
-
cichlid_V1R3_unphased.fasta
934 KB
-
cichlid_V1R4_phased.fasta
1.16 MB
-
cichlid_V1R4_unphased.fasta
916.22 KB
-
cichlid_V1R5_phased.fasta
1.38 MB
-
cichlid_V1R5_unphased.fasta
1.08 MB
-
cichlid_V1R6_phased.fasta
1.29 MB
-
cichlid_V1R6_unphased.fasta
896 KB
-
README.md
1.90 KB
Abstract
Cichlids that have undergone adaptive radiation are genetically close but exhibit extreme ecological and morphological diversity, making them useful for understanding speciation mechanisms. Vomeronasal type 1 receptors (V1R) are highly conserved among teleost fish at the amino acid sequence level and believed to play a fundamental role in reproduction. We previously reported the surprisingly high sequence diversity of V1Rs among certain cichlid species, suggesting a possible role for V1Rs in their speciation. In this study, we investigated the process of evolutionary diversification of all 6 V1Rs (V1R1-6) by using the genome data of 528 cichlid species, encompassing nearly all lineages. In the case of V1R2, two highly divergent alleles (1.17%: variant sites/coding sequence (CDS) length) without recombination were preserved and shared among cichlids found in all of the East African Great Lakes. In the case of V1R6, numerous highly variable alleles that could be derived from multiple recombination events between two highly divergent alleles (1.39%: variant sites/CDS length) were found among the Lake Victoria cichlids. Additionally, we identified highly divergent alleles of V1R1 within the tribe Tropheini, and of both V1R3 and V1R6 within Trematocarini and Ectodini. Because one of the two divergent alleles of these V1Rs emerged rapidly during cichlid evolution, they are likely to have been derived from introgression. However, despite extensive investigations, we could not identify the source lineages for these introgressions, implying that they may have become extinct. This study revealed the potential role of introgression in explaining the remarkable diversity of V1Rs in East African cichlids.
Dataset DOI: 10.5061/dryad.18931zd8c
Files
-
*_unphased.fasta
We used short-read data from 28 tribes, 528 species, and 907 samples of cichlids deposited within the NCBI Sequence Read Archive (Figure S1, Table S1). Low-quality reads were removed using fastp v0.23.2 (Chen et al., 2018), and the filtered reads were mapped to the reference genome (Table S2) using bwa-mem2 v2.2.1 (Vasimuddin et al., 2019) and sorted and indexed using samtools v.1.15 (Danecek et al., 2021). The V1R gene regions were then extracted from the sorted BAM files using samtools v1.15 (Danecek et al., 2021). Variants were called from the extracted BAM files using the HaplotypeCaller tool of GATK v4.3.0 (McKenna et al., 2010). The first round of hard filtering for SNPs and INDELs was performed using the VariantFiltration tool of GATK v4.3.0 (McKenna et al., 2010) based on the following parameters: SNPs (QD<2.0, QUAL<30.0, SOR>4.0, FS>60.0, MQ<40.0, MQRankSum<-12.5, and ReadPosRankSum<-8.0) and INDELs (QD<2.0, QUAL<30.0, FS>200.0, SOR>10.0, and ReadPosRankSum<-20.0). Based on the resulting VCF file, quality recalibration of the BAM files was performed with the BaseRecalibrator and ApplyBQSR tools from GATK v4.3.0 (McKenna et al., 2010). The second round of hard filtering was conducted similarly to the first round. The filtered VCF files were used to generate V1R gene sequences in fasta format using bcftools v1.15 (Danecek et al., 2021)
-
*_phased.fasta
Sequences containing heterozygous variants were phased using HapCUT2 v1.3.3 (Edge et al., 2017).
Code/software
Scripts for obtaining V1R gene sequences of cichlids are available on Github(https://github.com/taki-sh/RegionCall)
