Skip to main content
Dryad

Data from: Testing for a role of postzygotic incompatibilities in rapidly speciated Lake Victoria cichlids

Cite this dataset

Feller, Anna F.; Peichel, Catherine L.; Seehausen, Ole (2024). Data from: Testing for a role of postzygotic incompatibilities in rapidly speciated Lake Victoria cichlids [Dataset]. Dryad. https://doi.org/10.5061/dryad.ksn02v7bf

Abstract

Intrinsic postzygotic hybrid incompatibilities are usually due to negative epistatic interactions between alleles from different parental genomes. While such incompatibilities are thought to be uncommon in speciation with gene flow, they may be important if such speciation results from a hybrid population. Here we aimed to test this idea in the endemic cichlid fishes of Lake Victoria. Hundreds of species have evolved within the lake in <15k years from hybrid progenitors. While the importance of prezygotic barriers to gene flow is well established in this system, the possible relevance of postzygotic genetic incompatibilities is unknown. We inferred the presence of negative epistatic interactions from systematic patterns of genotype ratio distortions in experimental crosses and wild samples. We then compared the positions of putative incompatibility loci to regions of high genetic differentiation between sympatric sister species as well as between members of clades that may have arisen in the early history of this radiation, and further determined if the loci showed fixed differences between the closest living relatives of the lineages ancestral to the hybrid progenitors. Overall, we find little evidence for a major role of intrinsic postzygotic incompatibilities in the Lake Victoria radiation. However, we find putative incompatibility loci significantly more often coinciding with islands of genetic differentiation between species that separated early in the radiation than between the younger sister species, consistent with the hypothesis that such variants segregated in the hybrid swarm and were sorted between species in the early speciation events.

README: Testing for a role of postzygotic incompatibilities in rapidly speciated Lake Victoria cichlids

https://doi.org/10.5061/dryad.ksn02v7bf

F2 hybrid CROSS information (species used as female x species used as male)

PNPxPPP stands for the cross between Pundamilia sp. "nyererei-like" and Pundamilia sp. "pundamilia-like"
PPMxPRHZ stands for the cross between Pundamilia pundamilia and Pundamilia sp. "red-head"
PNPxNOM stands for the cross between Pundamilia sp. "nyererei-like" and Neochromis omnicaeruleus

Analysis steps with list of input and output files

CROSSES

1. filter cross VCF files using "Crosses_VCF_filtering.txt"
input: the unfiltered cross VCF files: output: the quality filtered, phased&imputed, and subsetted VCF files (grandparental genotypes in a separate VCF file):
  • CROSS_f1_f2_homfixF1het_noscaffolds_phased.vcf.gz
  • CROSS_Pgenos.vcf.gz
info:
  • Crosses_VCF_filtering_individuals_to_exclude_overview.txt
2. get heterozygosity, allele frequencies, and genotype frequencies using "Crosses_Analyses.txt"
input: output from 1., the filtered VCF files:
  • CROSS_f1_f2_homfixF1het_noscaffolds_phased.vcf.gz
output: heterozygosity (het), allele frequency (frq), genotype frequency (hwe) tables:
  • CROSS.het
  • CROSS_phased.het
  • CROSS_fixed_phased_all.frq
  • CROSS_fixed_phased_all.hwe
3. use JoinMap to identify identical/highly similar loci

a. generate JoinMap input files using "Crosses_vcfToJoinMap.R"

input: output from 1., the filtered VCF files:
  • CROSS_f1_f2_homfixF1het_noscaffolds_phased.vcf.gz
output: JoinMap input files:
  • CROSS_JMinput.txt

b. in JoinMap, subset markers as described in "Crosses_VCF_filtering_JoinMap.txt"

input: output from 3a, JoinMap input files:
  • CROSS_JMinput.txt
output: list that indicates which loci to exclude:
  • CROSS_exclude975.txt
4. exclude identical/highly similar loci identified in JoinMap in "Crosses_getting_thinned_loci.R"
input: output from 3., the frq and hwe tables, and the grandparental genotype files from 1.:
  • CROSS_exclude975.txt
  • CROSS_fixed_phased_all.frq
  • CROSS_fixed_phased_all.hwe
  • CROSS_Pgenos.vcf.gz
output: subsetted allele frequency (frq), genotype frequency (hwe) tables, and grandparental genotype files:
  • CROSS_fixed_phased_all_sub.frq
  • CROSS_fixed_phased_all_sub.hwe
  • CROSS_Pgenos_sub.txt
5. Heteroyzgosity analysis using "Crosses_Analyses_Heterozygosity.R"
input: output from 2., the het tables:
  • CROSS.het
  • CROSS_phased.het
output: results tables
6. identify segregation distorted loci using "Crosses_Analyses_SegregationDistortion_CROSS.R"
input: output from 4., the subsetted frw and hwe, and the grandparental genotype files:
  • CROSS_fixed_phased_all_sub.frq
  • CROSS_fixed_phased_all_sub.hwe
  • CROSS_Pgenos_sub.txt
output: list of loci with any of the three patterns of segregation distortion per cross:
  • CROSS_ALLdeviations_sub.csv

WHOLE GENOMES (WG)

7. filtering of WG VCF files and LD analysis using "WholeGenomes_filtering_and_Analyses_LD.txt"

a. subset to 107 samples including ancestors and filter

input: whole genome VCF tables split up by chromosome, and list of samples:
  • allGenomes.chr${IDX}.SNPs.minDP6.minGQ20.max0.5N.3masks.vcf.gz -> generated by Meier et al. 2023, Science
  • WholeGenomes_LDsamples_107.txt
output: filtered whole genome VCF table with genotypes for 107 samples:
  • forLDscan.chrALL.vcf.gz

b. subset to 94 LV samples and filter some more including intra-LD pruning

input: filtered whole genome VCF tables split up by chromosome, and list of samples:
  • WholeGenomes_LDsamples_94.tx
  • forLDscan.chr${IDX}.vcf.gz
output: more stringently filtered whole genome VCF table with genotypes for 94 samples:
  • forLDscan.chrALL_pruned_mdp.vcf.gz

c. run LD analysis

input: output from 7b, the whole genome VCF table with genotypes for 94 samples:
  • forLDscan.chrALL_pruned_mdp.vcf.gz
output: results files from linkage disequilibrium (LD) analysis
  • LDscan.ld # not saved (huge file)
  • LDscan.ld_onlyinterchrom # not saved (huge file)
  • LDscan.ld_onlyinterchrom_sub[05/06]

d. check check genotypes of ancestors at high LD sites

input: output from 7a, the whole genome VCF table with genotypes for 107 samples, and the positions of loci in high intra-chromosomal LD:
  • forLDscan.chrALL.vcf.gz
  • LDpos_r06.txt # generated in FSTvsLD.R, see below (originally called LDpos.txt)
  • LDpos_r05.txt # generated in FSTvsLD.R, see below (originally called LDpos2.txt)
output: the whole genome VCF table with genotypes for 107 samples subsetted to high LD loci:
  • LDlocSubset107.vcf.gz #(r2>0.6)
  • LDlocSubset107_r05.vcf.gz #(r2>0.5)
8. filtering of WG VCF files and FST analysis using "WholeGenomes_filtering_and_Analyses_FST.txt"
input: whole genome VCF tables split up by chromosome, and list of samples:
  • allGenomes.chr${IDX}.SNPs.minDP6.minGQ20.max0.5N.3masks.vcf.gz -> generated by Meier et al. 2023, Science
  • WholeGenomes_FSTsamples.txt
output: more stringently filtered whole genome VCF table and results files of FST scans:
  • forFSTscans.chrALL_mdp.vcf.gz
  • FSTscanPairX.windowed.weir.fst

OVERLAPS

9. testing overlaps in segregation distortion vs LD using "DistortionsvsLD.R"
input: output from 6. and 7c., lists of loci with segregation distortion or in high intra-chromosomal LD:
  • CROSS_ALLdeviations_sub.csv
  • LDscan.ld_onlyinterchrom_sub06
  • LDscan.ld_onlyinterchrom_sub05
output:
  • Number of overlaps, p-values (not saved as files)
  • DistortionvsLD_Overlaps05_not_thinned.csv # for analysis 13.
  • DistortionvsLD_Overlaps06_not_thinned.csv # for analysis 13.
10. testing overlaps in segregation distortion vs using "FSTvsDistortions.R"
input: output from 6. and 8., lists of loci with segregation distortion and FST tables: output:
  • list of snps (not saved as file, only very few)
  • pvalues (not saved as file)

script also checks overlaps in segregation distorted snps among crosses

11. testing overlaps FST vs LD using "FSTvsLD.R" (same script with in/output files with 'ns' suffix for non-sister tests)
input: output from 7c. and 8., lists of loci in high intra-chromosomal LD and FST tables: output:
  • LDpos_r05.txt # used in 7. to subset ancestor genotype files
  • LDpos_r06.txt # used in 7. to subset ancestor genotype files
  • LD_vs_FST_overlaps_06.txt
  • LD_vs_FST_overlaps_05.txt
  • FSTvsLD_Overlaps_uniqueSNPs_r06_no_thinning.csv # for analysis 13.
  • FSTvsLD_Overlaps_uniqueSNPs_r05_no_thinning.csv # for analysis 13.

script also includes Fisher's method to test if the combination of p-values depart from expectation

ANCESTOR GENOTYPES at high LD loci and GENES at overlapping loci

12. check which high LD loci are fixed in extant relatives of hybrid swarm ancestor using "LDloc_genotpyes.R"
input: output from 7d, the genotypes of extant relatives of hybrid swarm ancestors at high LD sites:
  • LDlocSubset107.vcf.gz
  • LDlocSubset107_r05.vcf.gz
output: list of loci in high LD and fixed between ancestors:
  • fixedLDloci_r05_V2.csv # 5 Congo vs 8 Nile
  • fixedLDloci_r06_V2.csv # 5 Congo vs 8 Nile
13. check which fixed high LD loci overlap with high FST or distorted regions using "LDlocfixed_vs_FSTandDistortions.R" (same script with in/output files with 'ns' suffix for non-sister tests)
input: output from 9., 11., 12.:
  • fixedLDloci_r05_V2.csv
  • fixedLDloci_r06_V2.csv
  • FSTvsLD_Overlaps_uniqueSNPs_r06_no_thinning.csv
  • FSTvsLD_Overlaps_uniqueSNPs_r05_no_thinning.csv
  • DistortionvsLD_Overlaps06_not_thinned.csv
  • DistortionvsLD_Overlaps05_not_thinned.csv
  • chrInfo.txt -> from Feulner et al. 2018 G3, https://doi.org/10.5061/dryad.59q56g6
  • FSTscanPair5.windowed.weir.fst
output: list of loci in high LD and in FST outlier window and fixed between ancestors:
  • LDandFSTandfixed_r05and06_NEW_non_thinned.csv
14. extract in which genes putative incompatibility loci are using "genes.R" (same script with in/output files with 'ns' suffix for non-sister tests)
input: output from 13., and the P. nyererei gff file output: list of genes:
  • genes_NEW_non_thinned.csv # added GO terms manually (from UniProt via Ensembl), see Table S8

PLOTTING and OTHER

  • FSTscans_similarityTests.R (also tests correlations with recombination landscape)
  • Plots.R
  • Plots_FST.R
input: output:
  • FSTscan_correlations.txt

Note: 11., 13., 14. also run for non sister pairs in a separate script (suffix _ns in files)

Funding

Swiss National Science Foundation