Data from: Testing for a role of postzygotic incompatibilities in rapidly speciated Lake Victoria cichlids
Cite this dataset
Feller, Anna F.; Peichel, Catherine L.; Seehausen, Ole (2024). Data from: Testing for a role of postzygotic incompatibilities in rapidly speciated Lake Victoria cichlids [Dataset]. Dryad. https://doi.org/10.5061/dryad.ksn02v7bf
Abstract
Intrinsic postzygotic hybrid incompatibilities are usually due to negative epistatic interactions between alleles from different parental genomes. While such incompatibilities are thought to be uncommon in speciation with gene flow, they may be important if such speciation results from a hybrid population. Here we aimed to test this idea in the endemic cichlid fishes of Lake Victoria. Hundreds of species have evolved within the lake in <15k years from hybrid progenitors. While the importance of prezygotic barriers to gene flow is well established in this system, the possible relevance of postzygotic genetic incompatibilities is unknown. We inferred the presence of negative epistatic interactions from systematic patterns of genotype ratio distortions in experimental crosses and wild samples. We then compared the positions of putative incompatibility loci to regions of high genetic differentiation between sympatric sister species as well as between members of clades that may have arisen in the early history of this radiation, and further determined if the loci showed fixed differences between the closest living relatives of the lineages ancestral to the hybrid progenitors. Overall, we find little evidence for a major role of intrinsic postzygotic incompatibilities in the Lake Victoria radiation. However, we find putative incompatibility loci significantly more often coinciding with islands of genetic differentiation between species that separated early in the radiation than between the younger sister species, consistent with the hypothesis that such variants segregated in the hybrid swarm and were sorted between species in the early speciation events.
README: Testing for a role of postzygotic incompatibilities in rapidly speciated Lake Victoria cichlids
https://doi.org/10.5061/dryad.ksn02v7bf
F2 hybrid CROSS information (species used as female x species used as male)
PNPxPPP stands for the cross between Pundamilia sp. "nyererei-like" and Pundamilia sp. "pundamilia-like"
PPMxPRHZ stands for the cross between Pundamilia pundamilia and Pundamilia sp. "red-head"
PNPxNOM stands for the cross between Pundamilia sp. "nyererei-like" and Neochromis omnicaeruleus
Analysis steps with list of input and output files
CROSSES
1. filter cross VCF files using "Crosses_VCF_filtering.txt"
- PNPxNOM_7more.vcf.gz -> originally from Feller et al. 2022, https://doi.org/10.5061/dryad.931zcrjm2
- PNPxPPP_final.vcf.gz -> originally from Feller et al. 2020, https://doi.org/10.5061/dryad.5tb2rbp1n
- PPMxPRHZ.vcf.gz -> originally from Feller et al. 2020, https://doi.org/10.5061/dryad.5tb2rbp1n
- CROSS_f1_f2_homfixF1het_noscaffolds_phased.vcf.gz
- CROSS_Pgenos.vcf.gz
- Crosses_VCF_filtering_individuals_to_exclude_overview.txt
2. get heterozygosity, allele frequencies, and genotype frequencies using "Crosses_Analyses.txt"
- CROSS_f1_f2_homfixF1het_noscaffolds_phased.vcf.gz
- CROSS.het
- CROSS_phased.het
- CROSS_fixed_phased_all.frq
- CROSS_fixed_phased_all.hwe
3. use JoinMap to identify identical/highly similar loci
a. generate JoinMap input files using "Crosses_vcfToJoinMap.R"
- CROSS_f1_f2_homfixF1het_noscaffolds_phased.vcf.gz
- CROSS_JMinput.txt
b. in JoinMap, subset markers as described in "Crosses_VCF_filtering_JoinMap.txt"
- CROSS_JMinput.txt
- CROSS_exclude975.txt
4. exclude identical/highly similar loci identified in JoinMap in "Crosses_getting_thinned_loci.R"
- CROSS_exclude975.txt
- CROSS_fixed_phased_all.frq
- CROSS_fixed_phased_all.hwe
- CROSS_Pgenos.vcf.gz
- CROSS_fixed_phased_all_sub.frq
- CROSS_fixed_phased_all_sub.hwe
- CROSS_Pgenos_sub.txt
5. Heteroyzgosity analysis using "Crosses_Analyses_Heterozygosity.R"
- CROSS.het
- CROSS_phased.het
6. identify segregation distorted loci using "Crosses_Analyses_SegregationDistortion_CROSS.R"
- CROSS_fixed_phased_all_sub.frq
- CROSS_fixed_phased_all_sub.hwe
- CROSS_Pgenos_sub.txt
- CROSS_ALLdeviations_sub.csv
WHOLE GENOMES (WG)
7. filtering of WG VCF files and LD analysis using "WholeGenomes_filtering_and_Analyses_LD.txt"
a. subset to 107 samples including ancestors and filter
- allGenomes.chr${IDX}.SNPs.minDP6.minGQ20.max0.5N.3masks.vcf.gz -> generated by Meier et al. 2023, Science
- WholeGenomes_LDsamples_107.txt
- forLDscan.chrALL.vcf.gz
b. subset to 94 LV samples and filter some more including intra-LD pruning
- WholeGenomes_LDsamples_94.tx
- forLDscan.chr${IDX}.vcf.gz
- forLDscan.chrALL_pruned_mdp.vcf.gz
c. run LD analysis
- forLDscan.chrALL_pruned_mdp.vcf.gz
- LDscan.ld # not saved (huge file)
- LDscan.ld_onlyinterchrom # not saved (huge file)
- LDscan.ld_onlyinterchrom_sub[05/06]
d. check check genotypes of ancestors at high LD sites
- forLDscan.chrALL.vcf.gz
- LDpos_r06.txt # generated in FSTvsLD.R, see below (originally called LDpos.txt)
- LDpos_r05.txt # generated in FSTvsLD.R, see below (originally called LDpos2.txt)
- LDlocSubset107.vcf.gz #(r2>0.6)
- LDlocSubset107_r05.vcf.gz #(r2>0.5)
8. filtering of WG VCF files and FST analysis using "WholeGenomes_filtering_and_Analyses_FST.txt"
- allGenomes.chr${IDX}.SNPs.minDP6.minGQ20.max0.5N.3masks.vcf.gz -> generated by Meier et al. 2023, Science
- WholeGenomes_FSTsamples.txt
- forFSTscans.chrALL_mdp.vcf.gz
- FSTscanPairX.windowed.weir.fst
OVERLAPS
9. testing overlaps in segregation distortion vs LD using "DistortionsvsLD.R"
- CROSS_ALLdeviations_sub.csv
- LDscan.ld_onlyinterchrom_sub06
- LDscan.ld_onlyinterchrom_sub05
- Number of overlaps, p-values (not saved as files)
- DistortionvsLD_Overlaps05_not_thinned.csv # for analysis 13.
- DistortionvsLD_Overlaps06_not_thinned.csv # for analysis 13.
10. testing overlaps in segregation distortion vs using "FSTvsDistortions.R"
- CROSS_ALLdeviations_sub.csv
- chrInfo.txt -> from Feulner et al. 2018 G3, https://doi.org/10.5061/dryad.59q56g6
- FSTscan_CROSS.windowed.weir.fst
- list of snps (not saved as file, only very few)
- pvalues (not saved as file)
script also checks overlaps in segregation distorted snps among crosses
11. testing overlaps FST vs LD using "FSTvsLD.R" (same script with in/output files with 'ns' suffix for non-sister tests)
- chrInfo.txt -> from Feulner et al. 2018 G3, https://doi.org/10.5061/dryad.59q56g6
- LDscan.ld_onlyinterchrom_sub06
- LDscan.ld_onlyinterchrom_sub05
- FSTscanPairX.windowed.weir.fst
- LDpos_r05.txt # used in 7. to subset ancestor genotype files
- LDpos_r06.txt # used in 7. to subset ancestor genotype files
- LD_vs_FST_overlaps_06.txt
- LD_vs_FST_overlaps_05.txt
- FSTvsLD_Overlaps_uniqueSNPs_r06_no_thinning.csv # for analysis 13.
- FSTvsLD_Overlaps_uniqueSNPs_r05_no_thinning.csv # for analysis 13.
script also includes Fisher's method to test if the combination of p-values depart from expectation
ANCESTOR GENOTYPES at high LD loci and GENES at overlapping loci
12. check which high LD loci are fixed in extant relatives of hybrid swarm ancestor using "LDloc_genotpyes.R"
- LDlocSubset107.vcf.gz
- LDlocSubset107_r05.vcf.gz
- fixedLDloci_r05_V2.csv # 5 Congo vs 8 Nile
- fixedLDloci_r06_V2.csv # 5 Congo vs 8 Nile
13. check which fixed high LD loci overlap with high FST or distorted regions using "LDlocfixed_vs_FSTandDistortions.R" (same script with in/output files with 'ns' suffix for non-sister tests)
- fixedLDloci_r05_V2.csv
- fixedLDloci_r06_V2.csv
- FSTvsLD_Overlaps_uniqueSNPs_r06_no_thinning.csv
- FSTvsLD_Overlaps_uniqueSNPs_r05_no_thinning.csv
- DistortionvsLD_Overlaps06_not_thinned.csv
- DistortionvsLD_Overlaps05_not_thinned.csv
- chrInfo.txt -> from Feulner et al. 2018 G3, https://doi.org/10.5061/dryad.59q56g6
- FSTscanPair5.windowed.weir.fst
- LDandFSTandfixed_r05and06_NEW_non_thinned.csv
14. extract in which genes putative incompatibility loci are using "genes.R" (same script with in/output files with 'ns' suffix for non-sister tests)
- P_nyererei_v2.gff.gz -> from Feulner et al. 2018 G3, https://doi.org/10.5061/dryad.59q56g6
- LDandFSTandfixed_r05and06_NEW_non_thinned.csv
- genes_NEW_non_thinned.csv # added GO terms manually (from UniProt via Ensembl), see Table S8
PLOTTING and OTHER
- FSTscans_similarityTests.R (also tests correlations with recombination landscape)
- Plots.R
- Plots_FST.R
- FSTscanPairX.windowed.weir.fst, P_nyererei_v2.RecRates.txt -> from Feulner et al. 2018 G3, https://doi.org/10.5061/dryad.59q56g6
- FSTscan_correlations.txt
Note: 11., 13., 14. also run for non sister pairs in a separate script (suffix _ns in files)
Funding
Swiss National Science Foundation