Genomic landscape of introgression from the ghost lineage in a gobiid fish uncovers the generality of forces shaping hybrid genomes
Data files
Nov 06, 2023 version files 4.12 GB
-
annotation_data.tar.gz
317.53 MB
-
demographic_modeling.tar.gz
22.30 KB
-
genotyping_data.tar.gz
3.80 GB
-
README.md
26.32 KB
-
slidingwindow_results.tar.gz
3.38 MB
Dec 18, 2023 version files 4.12 GB
-
annotation_data.tar.gz
317.53 MB
-
demographic_modeling.tar.gz
22.30 KB
-
genotyping_data.tar.gz
3.80 GB
-
README.md
26.73 KB
-
scripts.tar.gz
108.65 KB
-
slidingwindow_results.tar.gz
3.38 MB
Abstract
Extinct lineages can leave legacies in the genomes of extant lineages through ancient introgressive hybridization. The patterns of genomic survival of these extinct lineages provide insight into the role of extinct lineages in current biodiversity. However, our understanding of the genomic landscape of introgression from extinct lineages remains limited due to challenges associated with locating the traces of unsampled “ghost” extinct lineages without ancient genomes. Herein, we conducted population genomic analyses on the East China Sea (ECS) lineage of Chaenogobius annularis, which was suspected to have originated from ghost introgression, with the aim of elucidating its genomic origins and characterizing its landscape of introgression. By combining phylogeographic analysis and demographic modeling, we demonstrated that the ECS lineage originated from ancient hybridization with an extinct ghost lineage. Forward simulations based on the estimated demography indicated that the statistic γ of the HyDe analysis can be used to distinguish the differences in local introgression rates in our data. Consistent with introgression between extant organisms, we found reduced introgression from extinct lineage in regions with low-recombination rates and with functional importance, thereby suggesting a role of linked selection that has eliminated the extinct lineage in shaping the hybrid genome. Moreover, we identified enrichment of repetitive elements in regions associated with ghost introgression, which was hitherto little-known but was also observed in the reanalysis of published data on introgression between extant organisms. Overall, our findings underscore the unexpected similarities in the characteristics of introgression landscapes across different taxa, even in cases of ghost introgression.
https://doi.org/10.5061/dryad.7wm37pw09
Brief description of the data and file structure
scripts.tar.gz
Note: "scripts.tar.gz" can be obtained from the Zenodo link (https://doi.org/10.5281/zenodo.10048869) tied to this Dryad page (see "Related works" section in the upper right corner). To get the script only, please visit this Zenodo link.
For your convenience, we have changed the "scripts.tar.gz" file to be available directly from Data files in this page as well (added on 12/18/2023).
The scripts used in this study (bash, python, R).
These scripts are categorized into the following 10 contents, which are hierarchized within each directory.
1. ddRAD-seq genotyping
2. WGS (whole genome resequencing) genotyping
3. repeats and gene annotation
4. population recombination rate estimation
5. potentially deleterious SNPs
6. popultion genetic analyses
7. phylogenetic analysis
8. hybrid detection
9. demographic estimation
10. introgression landscape characterization
The detailed hierarchical structure is given below in the section "Detailed description of the file structure".
genotyping_data.tar.gz
The compressed files of directories containing genotyping data generated in this study.
Five VCF files from RAD-seq, two VCF files from whole genome resequencing data, and one FASTA file of whole mitogenomic sequences.
The scripts used to analyze the demographic modeling are stored in "/01ddRADseq_genotyping/" or "02WGS_genotyping" in the scripts.tar.gz.
annotation_data.tar.gz
The compressed files of directories containing repeat annotation data and gene annotation data in this study.
The scripts used to analyze the demographic modeling are stored in "/03repeats_and_gene_annotation/" in the scripts.tar.gz.
demographic_modeling.tar.gz
The compressed file of a directory containing the results of the demographic modeling (distribution of AIC for each model, and maximum likelihood parameters for the best model) and the input site frequency spectrum.
The scripts used to analyze the demographic modeling are stored in "/09demographic_estimation/02demographic_modeling/" in the scripts.tar.gz.
slidingwindow_results.tar.gz
The compressed file of a directory containing the results of the sliding window anlysis (bed files summarizing the statistic γ in the HyDe analysis and some other features).
Please see "README_description_of_record_XXkb.txt" in the slidingwindow_results.tar.gz for column name descriptions.
The scripts used for this analysis are stored in "/10introgression_landscape_characterization/02sliding_window/" in the scripts.tar.gz.
"scripts.tar.gz"
- 01ddRADseq_genotyping
"genotyping_data.tar.gz"
Filtered genotyping dataset used in this study.
- 01RADseq
"annotation_data.tar.gz"
- 01repeat_annotation
- agohaze_sspace_x1.fa.masked.gz
"demographic_modeling.tar.gz"
- 01without_recent_size_change
- sorted_AIC_dist_wo_recent.csv
"slidingwindow_results.tar.gz"
- README_description_of_record_XXkb.txt
Sharing/Access information
Sequencial data used in this study can be available from DDBJ (accession numbers: DRR174909, DRR175781–DRR175796, DRR175830–DRR175860, DRR175876–DRR175955, DRR489922–DRR490073 for ddRAD-seq, DRR489903–DRR489921 for whole genome resequencing, and DRR490074–DRR490084 for RNA-seq).
We performed population genomics analysis based on RAD sequence data (279 individuals from 21 populations, and one outgroup individual) and whole genome resequencing data (18 individuals from three populations and one outgroup individual) for Chaenogobius annularis, a coastal goby species inhabiting the Japanese archipelago. Our analyses included repeat annotation, gene annotation, population recombination rate estimation, estimation of potential deleterious mutations, population genetic analysis, test for hybridization, PSMC analysis, fastsimcoal2-based demographic modeling, SLiM-based forward simulation, sliding window analysis, and permutation-based characterization of introgression landscapes.
- Kato, Shuya (2023), Genomic landscape of introgression from the ghost lineage in a gobiid fish uncovers the generality of forces shaping hybrid genomes, , Article, https://doi.org/10.5281/zenodo.10048869
- Kato, Shuya; Arakaki, Seiji; Nagano, Atsushi J. et al. (2023). Genomic landscape of introgression from the ghost lineage in a gobiid fish uncovers the generality of forces shaping hybrid genomes. Molecular Ecology. https://doi.org/10.1111/mec.17216
