Genomic landscape of introgression from the ghost lineage in a gobiid fish uncovers the generality of forces shaping hybrid genomes
Data files
Nov 06, 2023 version files 4.12 GB
-
annotation_data.tar.gz
-
demographic_modeling.tar.gz
-
genotyping_data.tar.gz
-
README.md
-
slidingwindow_results.tar.gz
Dec 18, 2023 version files 4.12 GB
-
annotation_data.tar.gz
-
demographic_modeling.tar.gz
-
genotyping_data.tar.gz
-
README.md
-
scripts.tar.gz
-
slidingwindow_results.tar.gz
Abstract
Extinct lineages can leave legacies in the genomes of extant lineages through ancient introgressive hybridization. The patterns of genomic survival of these extinct lineages provide insight into the role of extinct lineages in current biodiversity. However, our understanding of the genomic landscape of introgression from extinct lineages remains limited due to challenges associated with locating the traces of unsampled “ghost” extinct lineages without ancient genomes. Herein, we conducted population genomic analyses on the East China Sea (ECS) lineage of Chaenogobius annularis, which was suspected to have originated from ghost introgression, with the aim of elucidating its genomic origins and characterizing its landscape of introgression. By combining phylogeographic analysis and demographic modeling, we demonstrated that the ECS lineage originated from ancient hybridization with an extinct ghost lineage. Forward simulations based on the estimated demography indicated that the statistic γ of the HyDe analysis can be used to distinguish the differences in local introgression rates in our data. Consistent with introgression between extant organisms, we found reduced introgression from extinct lineage in regions with low-recombination rates and with functional importance, thereby suggesting a role of linked selection that has eliminated the extinct lineage in shaping the hybrid genome. Moreover, we identified enrichment of repetitive elements in regions associated with ghost introgression, which was hitherto little-known but was also observed in the reanalysis of published data on introgression between extant organisms. Overall, our findings underscore the unexpected similarities in the characteristics of introgression landscapes across different taxa, even in cases of ghost introgression.
README: Genomic landscape of introgression from the ghost lineage in a gobiid fish uncovers the generality of forces shaping hybrid genomes
https://doi.org/10.5061/dryad.7wm37pw09
Brief description of the data and file structure
scripts.tar.gz
Note: "scripts.tar.gz" can be obtained from the Zenodo link (https://doi.org/10.5281/zenodo.10048869) tied to this Dryad page (see "Related works" section in the upper right corner). To get the script only, please visit this Zenodo link.
For your convenience, we have changed the "scripts.tar.gz" file to be available directly from Data files in this page as well (added on 12/18/2023).
The scripts used in this study (bash, python, R).
These scripts are categorized into the following 10 contents, which are hierarchized within each directory.
1. ddRAD-seq genotyping
2. WGS (whole genome resequencing) genotyping
3. repeats and gene annotation
4. population recombination rate estimation
5. potentially deleterious SNPs
6. popultion genetic analyses
7. phylogenetic analysis
8. hybrid detection
9. demographic estimation
10. introgression landscape characterization
The detailed hierarchical structure is given below in the section "Detailed description of the file structure".
genotyping_data.tar.gz
The compressed files of directories containing genotyping data generated in this study.
Five VCF files from RAD-seq, two VCF files from whole genome resequencing data, and one FASTA file of whole mitogenomic sequences.
The scripts used to analyze the demographic modeling are stored in "/01ddRADseq_genotyping/" or "02WGS_genotyping" in the scripts.tar.gz.
annotation_data.tar.gz
The compressed files of directories containing repeat annotation data and gene annotation data in this study.
The scripts used to analyze the demographic modeling are stored in "/03repeats_and_gene_annotation/" in the scripts.tar.gz.
demographic_modeling.tar.gz
The compressed file of a directory containing the results of the demographic modeling (distribution of AIC for each model, and maximum likelihood parameters for the best model) and the input site frequency spectrum.
The scripts used to analyze the demographic modeling are stored in "/09demographic_estimation/02demographic_modeling/" in the scripts.tar.gz.
slidingwindow_results.tar.gz
The compressed file of a directory containing the results of the sliding window anlysis (bed files summarizing the statistic γ in the HyDe analysis and some other features).
Please see "README_description_of_record_XXkb.txt" in the slidingwindow_results.tar.gz for column name descriptions.
The scripts used for this analysis are stored in "/10introgression_landscape_characterization/02sliding_window/" in the scripts.tar.gz.
<br>
Detailed description of the file structure
"scripts.tar.gz"
- 01ddRADseq_genotyping
<br>
Scripts used for genotyping double digest restriction-site associated DNA (ddRAD-seq) data
- 01trimmomatic <br> Read filtering with Trimmomatic
- 02mapping <br> Mapping with bwa-mem
- 03sambamba <br> Filtering of BAM files by sambamba (retaining only uniquely mapped reads)
- 04SNPcall_by_mpileup
<br>
Genotyping with samtools mpileup / bcftools call
- 01pop1_example <br> Example of a script for joint calling when targeting one group
- 05merge_filtering
<br>
Merging and filtering of VCF files to create the final genotype datasets
- 01dataset1_2
<br>
Merging and filtering scripts for datasets 1 and 2
- 01filtering_dataset1 <br> Filtering scripts for dataet 1
- 02filtering_dataset2 <br> Filtering scripts for dataet 2
- 02dataset3
<br>
Merging and filtering scripts for dataset 3
- 01filtering <br> Filtering scripts for dataet 3
- 03dataset4
<br>
Merging and filtering scripts for dataset 4
- 01filtering <br> Filtering scripts for dataet 4
- 04dataset5
<br>
Merging and filtering scripts for dataset 5
- 01filtering <br> Filtering scripts for dataet 5
- 01dataset1_2
<br>
Merging and filtering scripts for datasets 1 and 2
- 02WGS_genotyping
<br>
Scripts used for genotyping whole genome resequencing (WGS) data
- 01nuclear_SNP
<br>
Genotyping whole nuclear SNPs
- 01mapping <br> Mapping with bwa-mem
- 02sambamba <br> Filtering of BAM files by sambamba (retaining only uniquely mapped reads)
- 03MarkDuplicate <br> Removing PCR duplicates in BAMfiles by MarkDuplicate (GATK4)
- 04SNPcall_by_GATK
<br>
Genotyping with GATK4
- 01SJ
<br>
Genotyping for the Sea of Japan (SJ) lineage
- 01HaplotypeCaller <br> Script for GATK HaplotypeCaller
- 02GenomicsDBImport_GenotypeGVCFs <br> Script for GATK GenomicsDBImport and GenotypeGVCFs
- 02ECS
<br>
Genotyping for the East China Sea (ECS) lineage
- 01HaplotypeCaller <br> Script for GATK HaplotypeCaller
- 02GenomicsDBImport_GenotypeGVCFs <br> Script for GATK GenomicsDBImport and GenotypeGVCFs
- 03PO
<br>
Genotyping for the Pacific Ocean (PO) lineage
- 01HaplotypeCaller <br> Script for GATK HaplotypeCaller
- 02GenomicsDBImport_GenotypeGVCFs <br> Script for GATK GenomicsDBImport and GenotypeGVCFs
- 04outgroup
<br>
Genotyping for the outgroup individual
- 01HaplotypeCaller <br> Script for GATK HaplotypeCaller
- 02GenomicsDBImport_GenotypeGVCFs <br> Script for GATK GenomicsDBImport and GenotypeGVCFs
- 05merge_filtering
<br>
Merging and filtering of VCF files to create the final genotype datasets
- 01Cannularis_only_dataset <br> Merging and filtering scripts for C. annularis only dataset
- 02with_outgroup_dataset <br> Merging and filtering scripts for with-outgroup dataset
- 01SJ
<br>
Genotyping for the Sea of Japan (SJ) lineage
- 05indelrealign <br> Local realignment around indels for BAM files with GATK3
- 06phasing <br> Read-aware-phasing with shape-it2
- 02mitogenome_sequence
<br>
Genotyping whole mitogenome sequence
- 01mapping <br> Mapping with bwa-mem
- 02sambamba <br> Filtering of BAM files by sambamba (retaining only uniquely mapped reads)
- 03MarkDuplicates <br> Removing PCR duplicates in BAMfiles by MarkDuplicate (GATK4)
- 04SNPcall_by_GATK
<br>
Genotyping with GATK4
- 01HaplotypeCaller <br> Script for GATK HaplotypeCaller
- 02GenomicsDBImport_GenotypeGVCFs <br> Script for GATK GenomicsDBImport and GenotypeGVCFs
- 03filtering_merge <br> Script for merging and filtering VCF records
- 04consensus <br> Script for obtaining consensus sequences from filtered VCF files
- 01nuclear_SNP
<br>
Genotyping whole nuclear SNPs
- 03repeats_and_gene_annotation
<br>
Repeat annotation and gene annotation of the reference sequence
- 01repeat_annotation <br> Repeat annotation with RepeatModeller and RepeatMasker
- 02gene_annotation
<br>
Gene annotation
- 01mapping_merge <br> Script for RNA-seq reads mapping and subsequent merging
- 02braker2
<br>
Gene annotation with braker2 pipeline
- 01braker2_RNAseq <br> Gene prediction using RNA-seq data with braker2
- 02braker2_protein <br> Gene prediction using protein sequences of related species with braker2
- 03TSRBRA <br> Merging two gene prediction files by TSEBRA
- 04filtering <br> Retaining only predicted genes whose expression are confirmed by RNA-seq data
- 05reformatting_by_agat <br> Reformatting og gff file by agat
- 06blast <br> Homology search against closely related protein sequences using blastp
- 04population_recombination_rate_estimation
<br>
Estimation of genome wide population recombination rate with LDhelmet
- 01calculate_theta <br> Calculating watterson's theta
- 02vcf2fasta <br> Converting phased VCF to fasta
- 03find_confs <br> 1st step of LDhelmet recombination rate estimation
- 04table_gen <br> 2nd step of LDhelmet recombination rate estimation
- 05pade <br> 3rd step of LDhelmet recombination rate estimation
- 06rjmcmc <br> 4th step of LDhelmet recombination rate estimation
- 07post_to_text <br> Final step of LDhelmet recombination rate estimation
- 08window
<br>
Summarizing the estimated population recombination rates in non-overlapping sliding windows
- 01_10kb <br> window size: 10kb
- 02_30kb <br> window size: 30kb
- 03_50kb <br> window size: 50kb
- 04_100kb <br> window size: 100kb
- 05potentially_deleterious_SNPs
<br>
Estimatiing potentially deleterious SNPs with PROVEAN
- 01SNPEff <br> Extracting non-synonymous mutations by SNP annotation with SNPEff
- 02provean <br> Assessing the potential deleteriousness of each non-synonymous mutations with PROVEAN
- 06popultion_genetic_analyses
<br>
Population genetic analyses using ddRAD-seq dataset
- 01ADMIXTURE <br> Scripts for clustering analysis with ADMIXTURE
- 02PCA <br> Scripts for principal component analysis with plink
- 03diverisity_indices <br> Scripts for calculating nucleotide diversity and pairwise *FST *between populations
- 07phylogenetic_analysis
<br>
Phylogenetic analysis with RAxML
- 01ddRADseq_RAxML <br> Scripts for phylogenetic analysis for ddRAD-seq dataset
- 02WGS_RAxML <br> Scripts for phylogenetic analysis for WGS nuclear SNPs dataset (with-outgroup dataset)
- 03whole_mitogenome_RAxML <br> Scripts for phylogenetic analysis for whole mitogenome sequences obtained from WGS reads
- 08hybrid_detection
<br>
Allele sharing pattern analysis to detect hybridization between populations / lineages
- 01ddRADseq_analysis
<br>
Analysis for ddRAD-seq dataset
- 01Dsuite <br> D-statistics calculation and *f-*branch analysis with D-suite
- 02HyDe <br> HyDe analysis
- 02WGS_analysis
<br>
Analysis for WGS dataset
- 01Dsuite <br> D-statistics calculation and *f-*branch analysis with D-suite
- 02HyDe <br> HyDe analysis
- 01ddRADseq_analysis
<br>
Analysis for ddRAD-seq dataset
- 09demographic_estimation
<br>
Demographic analyses
- 01PSMC
<br>
PSMC analysis to infer population size history
- 01coverage_calculation <br> Calculating of coverage for each BAM file
- 02PSMC <br> Scripts to run PSMC
- 03PSMC_bootstrap <br> Scripts to perform bootstrapping analysis of PSMC
- 02demographic modeling
<br>
Demographic modeling analysis to test the ghost introgression origin of the ECS lineage
- 01easySFS <br> Generating site frequency spectrum (SFS) from VCF with easySFS
- 02fastsimcoal2
<br>
Demographic modeling with fastsimcoal2
- 01Examined_models
<br>
All .est and .tpl files describing the demography and parameter rules of the examined models
All models are classified into three categories based on the different elements incorporated into the model.
The description of the bottom group of directories is omitted because only est and tpl are included in the directories of the corresponding models
- 01without_recent_size_change <br> Models without recent population size change (16 models)
- 02with_recent_sice_change <br> Models incorporating recent population size change (16 models)
- 03recent_gene_flow_without_recent_size_change <br> Additional models to examine whether recent gene flow can affect the results or not (four models)
- 02scripts_to_run_fsc <br> Scripts to run fastsimcoal2
- 01Examined_models
<br>
All .est and .tpl files describing the demography and parameter rules of the examined models
All models are classified into three categories based on the different elements incorporated into the model.
The description of the bottom group of directories is omitted because only est and tpl are included in the directories of the corresponding models
- 01PSMC
<br>
PSMC analysis to infer population size history
- 10introgression_landscape_characterization
<br>
Characterizing genomic landscape of introgression from the ghost lineage
- 01validation_by_forward_simulation
<br>
Evaluation of perfomance on estinating genomic landscape of introgression from the ghost lineage using forward simulation with SLiM
- 01_750kb_simulation
<br>
Preliminary forward simulations of 750 kb chromosome to determine demographic parameters
- 01slim <br> Scripts to run SLiM.Contains SLiM file (.slim) with adjusted demographic parameters used in the final
- 02vcftools <br> Filtering of simulated VCFs
- 03insertion_to_mimic_ghost_introgression <br> Script to mimic ghost introgression for simulated VCF, and to compare diversity indices with the observed data
- 02_10kb_simulation
<br>
Forward simulations of 10 kb chromosome to evaluate performance of each statistic
- 01slim <br> Scripts to run SLiM
- 02vcftools <br> Filtering of simulated VCFs
- 03insertion_to_mimic_ghost_introgression <br> Script to mimic ghost introgression for simulated VCF with different admixture rate
- 04sliding_window
<br>
Sliding window analysis on the simulated VCF
- 01geno_generate <br> Script to prepare input file (.geno)
- 02sliding window <br> Scripts to run sliding window analysis
- 03_30kb_simulation
<br>
Forward simulations of 30 kb chromosome to evaluate performance of each statistic
- 01slim <br> Scripts to run SLiM
- 02vcftools <br> Filtering of simulated VCFs
- 03insertion_to_mimic_ghost_introgression <br> Script to mimic ghost introgression for simulated VCF with different admixture rate
- 04sliding_window
<br>
Sliding window analysis on the simulated VCF
- 01geno_generate <br> Script to prepare input file (.geno)
- 02sliding window <br> Scripts to run sliding window analysis
- 04_50kb_simulation
<br>
Forward simulations of 50 kb chromosome to evaluate performance of each statistic
- 01slim <br> Scripts to run SLiM
- 02vcftools <br> Filtering of simulated VCFs
- 03insertion_to_mimic_ghost_introgression <br> Script to mimic ghost introgression for simulated VCF with different admixture rate
- 04sliding_window
<br>
Sliding window analysis on the simulated VCF
- 01geno_generate <br> Script to prepare input file (.geno)
- 02sliding window <br> Scripts to run sliding window analysis
- 05_100kb_simulation
<br>
Forward simulations of 100 kb chromosome to evaluate performance of each statistic
- 01slim <br> Scripts to run SLiM
- 02vcftools <br> Filtering of simulated VCFs
- 03insertion_to_mimic_ghost_introgression <br> Script to mimic ghost introgression for simulated VCF with different admixture rate
- 04sliding_window
<br>
Sliding window analysis on the simulated VCF
- 01geno_generate <br> Script to prepare input file (.geno)
- 02sliding window <br> Scripts to run sliding window analysis
- 01_750kb_simulation
<br>
Preliminary forward simulations of 750 kb chromosome to determine demographic parameters
- 02sliding_window
<br>
Sliding window analysis on the WGS SNPs dataset (with-outgroup dataset)
- geno_generate <br> Script to prepare input file (.geno)
- 02sliding_window_10kb_example
<br>
Example scripts to perform sliding window analysis and subsequent characterization of introgression landscape
- 01sliding_window <br> Scripts to run sliding window analysis
- 02characterizing_landscape_by_gamma
<br>
Scripts to characterize introgression landscape by sliding-window γ
- 01summarizing_all_features <br> Summarizing the target features into sliding windows <br> 1. 01repeat <br> Proportion of repetitive sequence <br> 2. 02CDS <br> Proportion of coding sequence (CDS) <br> 3. 03N-mt <br> Proportion of CDS of the representative N-mt genes (OXPHOS and mitoribosomal genes) <br> 4. 04OXPHOS <br> Proportion of CDS of the oxidative phosphorylation (OXPHOS) genes <br> 5. 05mitoribo <br> Proportion of CDS of the mitochondrial ribosomal (mitoribosomal) genes <br> 6. 06deleterious <br> Counts of potential deleterious alleles per lineage <br> 7. 07CDS_length <br> The length of CDS per window <br> 8. 08summarizing_all <br> Script to compile all features (including γ and population recombination rates) into bed files
- 02random_permutation <br> Random permutation to test whether the characteristics of each γ category differ from the genomic background <br> 1. 01other_than_recomb <br> Tests for characteristics other than population recombination rate <br> ##1. 01permutation <br> ##Scripts to perform random permutations <br> 2. 02recomb <br> Test for population recombination rate <br> ##1. 01permutation <br> ##Scripts to perform random permutations
- 03circle_permutation <br> Circular permutation to test whether the characteristics of each γ category differ from the genomic background <br> 1. 01other_than_recomb <br> Tests for characteristics other than population recombination rate <br> ##1. 01permutation <br> ##Scripts to perform circular permutations <br> 2. 02recomb <br> Test for population recombination rate <br> ##1. 01permutation <br> ##Scripts to perform circular permutations
- 04controlled_permutation <br> "Controlled permutation" to test whether the proportion of repetitive sequence in each γ category differs from the genomic background after controlling the proportion of CDS or the population recombination rate <br> 1. 01control_CDS <br> When controlling CDS <br> ##1. 01permutation <br> ##Scripts to perform controlled permutations <br> 2. 01control_averec <br> When controlling population recombination rate (averaged for three lineages) <br> ##1. 01permutation <br> ##Scripts to perform controlled permutations
- genomics_general <br> Modified script for two scripts in genomics_general provided by Dr. Simon Marin. (Please see https://github.com/simonhmartin/genomics_general)
- 01validation_by_forward_simulation
<br>
Evaluation of perfomance on estinating genomic landscape of introgression from the ghost lineage using forward simulation with SLiM
"genotyping_data.tar.gz"
Filtered genotyping dataset used in this study.
- 01RADseq
<br>
Genotyping datasets for ddRAD-seq data (VCF files with SNPs / SNPs and invariant sites).
Different merging and filtering were applied according to the purpose of the analysis (please see Supplementary Notes and Figure S1 in the paper).
- RAD_dataset1.vcf.gz <br> For population structure analysis (PCA, ADMIXTURE)
- RAD_dataset2.vcf.gz <br> For caluculation of diversity indices
- RAD_dataset3.vcf.gz <br> For phylogenetic analysis
- RAD_dataset4.vcf.gz <br> For allele sharing pattern analysis to detect hybridization (D-suite and HyDe)
- RAD_dataset5.vcf.gz <br> For demographic modelling
- 02WGS
<br>
Genotyping datasets for WGS data.
- 01nuclear_SNP
<br>
Genotyping datasets for whole genome nuclear SNPs (VCF files with SNPs).
The two datasets differ only in the presence or absence of outgroups.
- WGS_C.annularis_only_SNPs.vcf.gz <br> Whole genome nuclear SNPs only for 18 individuals of Chaenogobius annularis (without outgroup)
- WGS_with-outgroup_SNPs.vcf.gz <br> Whole genome nuclear SNPs for 18 individuals of C. annularis and one individual of C. gulosus (with outgroup)
- 02mitogenome_sequence
- whole_mitogenome_C.annularis.fasta <br> Whole mitogenome sequence multi-FASTA file for 18 individuals of C. annularis obtained from WGS data.
- 01nuclear_SNP
<br>
Genotyping datasets for whole genome nuclear SNPs (VCF files with SNPs).
The two datasets differ only in the presence or absence of outgroups.
"annotation_data.tar.gz"
- 01repeat_annotation
- agohaze_sspace_x1.fa.masked.gz <br> Fasta file of reference genome (accession number: GCA_015082035.1) soft-masked for the repetitive sequence
- agohaze_sspace_x1.fa.out <br> Catalog of repetitive sequences in the reference genome output by RepeatMasker
- 02gene_annotation
- agat_braker_FULL.gff3 <br> Gene annotation files after reformatting by agat. It consists only of predicted genes whose expression were confirmed by RNA-seq data ("conservative annotation" in the paper)
"demographic_modeling.tar.gz"
- 01without_recent_size_change
- sorted_AIC_dist_wo_recent.csv <br> The AIC distributions for each model without recent population size change, with the model name in the first column and the AIC in the second column. Input data for Figure 4a in the paper.
- best_parameter_set_in_best_models
- model7a
- model7a.bestlhoods <br> Best parameters in model7a. Each parameter is listed in "09demographic_estimation/02demographic_modeling/02fastsimcoal2/01 Examined_models/01without_recent_size_change/model7a" in "scripts.tar.gz" and Figure S13 in the paper.
- model7b
- model7b.bestlhoods <br> Best parameters in model7b. Each parameter is listed in "09demographic_estimation/02demographic_modeling/02fastsimcoal2/01 Examined_models/01without_recent_size_change/model7b" in "scripts.tar.gz" and Figure S13 in the paper.
- model7c
- model7c.bestlhoods <br> Best parameters in model7c. Each parameter is listed in "09demographic_estimation/02demographic_modeling/02fastsimcoal2/01 Examined_models/01without_recent_size_change/model7c" in "scripts.tar.gz" and Figure S13 in the paper.
- model7a
- 02with_recent_size_change
- sorted_AIC_dist_with_recent.csv
<br>
The AIC distributions for each model with recent population size change, with the model name in the first column and the AIC in the second column.Input data for Figure 4d in the paper.
- best_parameter_set_in_best_models
- model7a
- model7a.bestlhoods <br> Best parameters in model7a. Each parameter is listed in "09demographic_estimation/02demographic_modeling/02fastsimcoal2/01 Examined_models/02with_recent_sice_change/model7a" in "scripts.tar.gz" and Figure S14 in the paper.
- model7b
- model7b.bestlhoods <br> Best parameters in model7b. Each parameter is listed in "09demographic_estimation/02demographic_modeling/02fastsimcoal2/01 Examined_models/02with_recent_sice_change/model7b" in "scripts.tar.gz" and Figure S14 in the paper.
- model7c
- model7c.bestlhoods <br> Best parameters in model7c. Each parameter is listed in "09demographic_estimation/02demographic_modeling/02fastsimcoal2/01 Examined_models/02with_recent_sice_change/model7c" in "scripts.tar.gz" and Figure S14 in the paper.
- model7a
- best_parameter_set_in_best_models
- sorted_AIC_dist_with_recent.csv
<br>
The AIC distributions for each model with recent population size change, with the model name in the first column and the AIC in the second column.Input data for Figure 4d in the paper.
- 03recent_gene_flow_without_recent_size_change
- sorted_AIC_rec-ongmig_without_recent_population_size.csv <br> The AIC distributions for additional models incorporating recent gene flow (without recent population size change). The model name in the first column and the AIC in the second column. Input data for Figure S15 in the paper.
- 04observed_MSFS
- MSFS.obs <br> Multi-dimensions site frequency spectrum (MSFS) used as an input file for the demographic modeling. It was generated from RAD_dataset5.vcf.gz using easySFS
"slidingwindow_results.tar.gz"
- README_description_of_record_XXkb.txt <br> Text file describing each column of record_XXkb.bed.gz
- record_10kb.bed.gz
- record_30kb.bed.gz
- record_50kb.bed.gz
- record_100kb.bed.gz
Sharing/Access information
Sequencial data used in this study can be available from DDBJ (accession numbers: DRR174909, DRR175781–DRR175796, DRR175830–DRR175860, DRR175876–DRR175955, DRR489922–DRR490073 for ddRAD-seq, DRR489903–DRR489921 for whole genome resequencing, and DRR490074–DRR490084 for RNA-seq).
Methods
We performed population genomics analysis based on RAD sequence data (279 individuals from 21 populations, and one outgroup individual) and whole genome resequencing data (18 individuals from three populations and one outgroup individual) for Chaenogobius annularis, a coastal goby species inhabiting the Japanese archipelago. Our analyses included repeat annotation, gene annotation, population recombination rate estimation, estimation of potential deleterious mutations, population genetic analysis, test for hybridization, PSMC analysis, fastsimcoal2-based demographic modeling, SLiM-based forward simulation, sliding window analysis, and permutation-based characterization of introgression landscapes.