Supplementary data from: Introgression across narrow contact zones shapes the genomic landscape of phylogenetic variation in an African bird clade
Data files
May 30, 2025 version files 226.42 GB
-
1.1-prepare_input_ASTRAL_TWISST_50_SNPs.R
9.95 KB
-
1.2-prepare_input_ASTRAL_TWISST_500_SNPs.R
8.12 KB
-
10.1-plot_results_single_SNPs_clines_South_Africa_50_SNPs.R
4.82 KB
-
10.2-plot_results_single_SNPs_clines_South_Africa_500_SNPs.R
4.97 KB
-
11.1-plot_results_single_SNPs_clines_Uganda-Kenya_50_SNPs.R
4.90 KB
-
11.2-plot_results_single_SNPs_clines_Uganda-Kenya_500_SNPs.R
4.94 KB
-
12-phylogenetic_signal_around_CYP2J19.R
1.26 KB
-
2.1-plot_results_topology_weighting_50_SNPs.R
11 KB
-
2.2-plot_results_topology_weighting_500_SNPs.R
7.17 KB
-
3.1-plots_topology_weighting_recombination_rate_50_SNPs.R
4.54 KB
-
3.2-plots_topology_weighting_recombination_rate_500_SNPs.R
3.97 KB
-
4.1_Topology_weighting_smoothed_curves_50_SNPs.R
5.79 KB
-
4.2_Topology_weighting_smoothed_curves_500_SNPs.R
5.44 KB
-
5-correlation_fdM_recombination_rate.R
4.20 KB
-
6-Infer_cline_Q_values_South_Africa.R
13.20 KB
-
7-Infer_cline_Q_values_Uganda-Kenya.R
11.29 KB
-
8-infer_clines_single_SNPs_South_Africa.R
10.89 KB
-
9-infer_clines_single_SNPs_Uganda-Kenya.R
10.95 KB
-
Appendix1_ddRADseq_samples_metadata.xlsx
61.99 KB
-
ddRADseq_reads.tar
223.93 GB
-
ddRADseq_samples_metadata_South_Africa.csv
33.33 KB
-
ddRADseq_samples_metadata_Uganda-Kenya.csv
7.57 KB
-
fdM_50_SNPs_windows.zip
4.35 MB
-
HZAR_single_SNPs_South_Africa.zip
4.53 MB
-
HZAR_single_SNPs_Uganda-Kenya.zip
45.04 MB
-
Linkage_map_YFT_LDhat_100kb.txt
625.64 KB
-
Neighbour_Joining_trees.zip
2.24 GB
-
Rancilhac_et_al_2025_Pogoniulus_phylogenomics_introgression_supplementary_materials.docx
4.52 MB
-
README.md
8.54 KB
-
TWISST_weights.zip
194.45 MB
Abstract
Genomic analyses of hybrid zones provide excellent opportunities to investigate the consequences of introgression in nature. In combination with phylogenomics analyses, hybrid zone studies may illuminate the role of ancient and contemporary gene flow in shaping variation of phylogenetic signals across the genome, but this avenue has not been explored yet. We combined phylogenomic and geographic cline analyses in a Pogoniulus tinkerbird clade to determine whether contemporary introgression through hybrid zones contributes to gene-tree heterogeneity across the species ranges. We found diverse phylogenetic signals across the genome with the most common topologies supporting monophyly among taxa connected by secondary contact zones. Remarkably, these systematic conflicts were also recovered when selecting only individuals from each taxon’s core range. Using analyses of derived allele sharing and “recombination aware” phylogenomics, we found that introgression shapes gene-tree heterogeneity, and the species tree most likely supports monophyletic red-fronted tinkerbirds, as recovered in previous reconstructions based on mitochondrial DNA. Furthermore, by fitting geographic clines across two secondary contact zones, we found that introgression rates were lower in genomic regions supporting the putative species tree compared to those supporting the two taxa in contact as monophyletic. This demonstrates that introgression through narrow contact zones shapes gene-tree heterogeneity even in allopatric populations. Finally, we did not find evidence that mitochondria-interacting nuclear genes acted as barrier loci. Our results show that species can withstand important amounts of introgression while maintaining their phenotypic integrity and ecological separation, raising questions regarding the genomic architecture of adaptation and barriers to gene flow.
https://doi.org/10.5061/dryad.0cfxpnwb3
Description of the data and file structure
This repository contains supplementary information and data for the manuscript "Introgression across narrow contact zones shapes the genomic landscape of phylogenetic variation in a tinkerbird clade". Files are organized as follows:
-
Rancilhac_et_al_2024_Phylogenomics_Pogoniulus_Supplementary_material.docx: The supplementary tables and figures associated with the manuscript.
-
Appendix1_ddRADseq_samples_metadata.xlsx: A spreadsheet giving the geographical origin, distance to contact zone, and Red-fronted Tinkerbird ancestry for each of the samples included in the ddRADseq datasets. ddRADseq_samples_metadata_South_Africa.csv and ddRADseq_samples_metadata_Uganda-Kenya.csv include the same information in csv format for easy importation in R.
Columns key: Sample = unique identifier for each sample; Subspecies = taxon the sample belong to; Country, Locality, Latitude, Longitude = geographic location of sampling locality; affinis ancestry (Q) = individual Red-fronted Tinkerbird ancestry, inferred with Admixture in two-population analyses of each hybrid zone individually; Distance to contact zone = closest distance (in straight line) to the contact zone, in kilometers.
-
Linkage_map_YFT_LDhat_100kb.txt: The linkage map of Pogoniulus extoni, inferred with LDhat.
Columns key: Chromosome, window start, window median position, window end (all three in kbp), number of SNPs, mean population recombination rate (rho, refer to LDhat manual for an explanation of this metric).
-
Neighbour_Joining_trees: A directory including Neighbour-Joining trees calculated in sliding windows using the Whole-Genome Sequencing data. Two sub-directories are included, corresponding to sliding windows of different sizes: 50_SNPs and 500_SNPs. Trees were calculated for each chromosome separately (SUPER_1 to SUPER_44 and SUPER_Z). For each chromosome, the following files are included:
-
SUPER_*_all_50S_maxmiss60_NJ_trees.trees: a files containing the trees calculated for this chromosome on separate lines, or NA if NJ calculation failed.
-
SUPER_*_all_50S_maxmiss60_windows_stats.tsv: a table with metadata for the genomic windows from which the NJ trees were calculated. Each line in this table corresponds to the same line number in the previous file.
Columns key: CHR = chromosome name, CHR.START = start of the window, CHR.END = end of the window, CHR.SIZE = physical size of the window (all three in bp), NSITES = number of SNPs included in the window, PROP.MISS = proportion of missing genotypes in the window, PROP.PIS = proportion of parcimony informative sites in the window, TREE = whether a neighbor-joining tree could be calculated in this window, NTIPS = number of tips in the tree.
-
SUPER_*_all_50S_maxmiss60_NJ_ASTRAL.trees: the trees, filtered to use as input for ASTRAL.
-
SUPER_*_all_50S_maxmiss60_NJ_TWISST.trees: the trees, filtered to use as input for TWISST.
Each of these files is provided for four sample subsets: all samples (SUPER_*all), core range samples (SUPER_*allopatric), contact zones samples (SUPER_*sympatric), and control samples (SUPER_*control).
In addition, the file recombination_50S_windows.txt attributes a recombination rate to each window for which a tree was calculated, based on estimates of the population recombination rate (rho) in 100 kbp windows. Columns key: CHR = chromosome name, WIN.TW = end of the window in the phylogenetic analysis, END.R = end of the 100 kbp window in the recombination rate analysis, rho = population recombination rate.
-
-
TWISST_weights: A directory including topology weights for each chromosome, as output by TWISST, with two sub-directories (50_SNPs and 500_SNPs). Again, four files are provided for each chromosome, corresponding to the four samples subsets. Each line gives the raw weights of the 15 possible topologies (one column per topology). The topologies are listed in newick format at the start of the file (lines starting with #).
-
HZAR_single_SNPs_South_Africa and HZAR_single_SNPs_Uganda-Kenya: Two directories containing the results of single SNP HZAR cline analyses in the two contact zones. The analysis was run on batches of 50 SNPs, and the results are given in the files "Estimates_hzar_*.txt, which are tables where each line is a SNP and the columns are the estimated cline parameters.
Columns key: locus = unique identifier for each SNP, created by Stacks during variant calling, width = estimated cline width in km, widthLLlow and widthLLhigh = confidence interval around width estimate (see Hzar documentation for more details), center = estimated cline center in km,* center_LLlow and **center_*LLhigh = confidence interval around center estimate (see Hzar documentation for more details).
These directories also contain the HZAR input files (Allele_frequencies_HZAR.csv) generated with Stacks populations, which include three columns for each SNP describing the frequency of the two alleles (SNP-number.A and SNP-number.B) and the number of chromosomes sampled at this SNP (SNP-number.N). Finally, a table gives the coordinates of the SNPs in plink format (SNPs*.plink.map), where the names of the SNPs correspond to that in the HZAR input file.
-
fdM_50_SNPs_windows: estimates of introgression with the fdM statistic in sliding windows of 50 SNPs for two taxa trios. For each trio, one table is given per autosome with one window per line.
Columns key: chr = chromosome, windowStart and windowEnd = coordinates of window (in bp), D = Patterson's D, f_d = fd (Martin et al. 2015 MBE https://doi.org/10.1093/molbev/msu269), f_dM = fd modified (Malinsky et al. 2015 Science https://doi.org/10.1126/science.aac9927),d_f = Distance fraction (Pfeifer & Kapan 2019 Bioinformatics https://doi.org/10.1186/s12859-019-2747-z).
-
ddRADseq_reads.tar contains demultiplexed, quality-filtered illumina reads for the two RADseq datasets. For each sample two fastq files are provided named sample-name.1.fq.gz and sample-name.2.fq.gz, corresponding to the forward and reverse reads.
Code/Software
The following R scripts are provided to run the analyses and plot the figures (all run in R v. 4.1.1):
- 1-prepare_input_ASTRAL_TWISST_50_SNPs.R: Filter sliding-windows trees for downstream analyses. Dependencies: ape. (also provided for 500 SNPs windows).
- 2-plot_results_topology_weighting_50_SNPs.R: Vizualize TWISST results. Dependencies: ape, data.table, ggplot2. (also provided for 500 SNPs windows).
- 3-plots_topology_weighting_recombination_rate_50_SNPs: Vizualize the association between topology weights and recombination rate. Dependencies: ape. (also provided for 500 SNPs windows).
- 4_Topology_weighting_smoothed_curves_50_SNPs.R: Vizualize the variation of topology weights along chromosomes. Dependencies: ape, data.table, ggplot2. (also provided for 500 SNPs windows).
- 5-correlation_fdM_recombination_rate.R: Vizualize the correlation between introgression rate estimated with fdM and recombination rate. Dependencies: data.table.
- 6-Infer_cline_Q_values_South_Africa.R: Infer geographic cline based on individual ancestries in South African hybrid zone. Dependencies: data.table, gridExtra, hzar.
- 7-Infer_cline_Q_values_Uganda-Kenya.R: Same as previous but for the hybrid zone in Uganda and Kenya.
- 8-infer_clines_single_SNPs_South_Africa.R: Infer geographic clines based on allele frequencies at single SNPs in the South African hybrid zone. Dependencies: data.table, hzar.
- 9-infer_clines_single_SNPs_Uganda-Kenya.R: Same as previous but for the hybrid zone in Uganda and Kenya.
- 10-plot_results_single_SNPs_clines_South_Africa_50_SNPs.R: Plot the results of single-SNP geographic clines in the South African hybrid zone. Dependencies: ggplot2, stringr. (also provided for 500 SNPs windows).
- 11-plot_results_single_SNPs_clines_Uganda-Kenya_50_SNPs.R: Same as previous but for the hybrid zone in Uganda and Kenya. (also provided for 500 SNPs windows).
- 12-phylogenetic_signal_around_CYP2J19.R: Vizualize local phylogenetic signal around the gene CYP2J19. Dependencies: ape, data.table, ggplot2.
