Accumulation of gene copy number variations during the early phase of free-spawning abalone speciation
Data files
May 10, 2024 version files 134.43 KB
Abstract
The genetic basis of speciation in free-spawning marine invertebrates is poorly understood. Although gene copy number variations (GCNVs) as well as nucleotide variations possibly trigger the speciation of these organisms, empirical evidence for such a hypothesis is limited. In this study, we searched for genomic signatures of GCNVs that may contribute to the speciation of Western Pacific abalone species. Whole-genome sequencing data suggested the existence of significant amounts of GCNVs in closely related abalones, Haliotis discus and H. madaka, in the early phase of speciation. In addition, the degree of interspecies genetic differentiation in the genes where GCNVs were estimated was higher than that in other genes, suggesting that nucleotide divergence also accumulates in the genes with GCNVs. GCNVs in some genes were also detected in other related abalone species, suggesting that these GCNVs are derived from both ancestral and de novo mutations. Our findings suggest that GCNVs have been accumulated in the early phase of free-spawning abalone speciation.
README: Title of Dataset:
The perl scripts to detect 3 allele sequences from bam files and their results. These are custom scripts I created, so it might be difficult to understand. If you find it confusing, please feel free to contact me. Please see text for other analysis methods in this study. The WGS data used in this study are provided in Table A1 in Appendix A.
Description of the data and file structure
VCF_haplotype_detect_pipeline_american_abalone.pl
This perl script was used to detect 3 allele sequences from bam files of North American abalones
VCF_haplotype_detect_pipeline_Japanese_abalone.pl
This perl script was used to detect 3 allele sequences from bam files of Japanese abalones
haplotype_all_counts_american_species_dryad.xlsx
Numbers of 3 allele sequences detected from bam files of North American abalones by VCF_haplotype_detect_pipeline_american_abalone.pl. This table provides a representation of abalone individuals across its columns, with each row corresponding to a different gene ID. For example, the cell of STRG.10195 and red1 shows that "red1 have no 3 allele sequences in the STRG.10195".
haplotype_all_counts_three_species_dryad.xlsx
Numbers of 3 allele sequences detected from bam files of Japanese abalones by VCF_haplotype_detect_pipeline_Japanese_abalone.pl. This table provides a representation of abalone individuals across its columns, with each row corresponding to a different gene ID. For example, the cell of STRG.10195 and HD1 shows that "HD1 have no 3 allele sequences in the STRG.10195".