Data for: Plio-Pleistocene climatic fluctuations and divergence with gene flow drive continent-wide diversification in an African bird
Data files
Apr 23, 2025 version files 27.66 GB
-
Demography_Analyses.zip
2.18 MB
-
METADATA_for_ddRADS_raw_reads.csv
22.56 KB
-
Phylogeny.zip
17.23 KB
-
Population_Structure.zip
8.79 MB
-
Raw_ddRADs.zip
27.65 GB
-
README.md
5.32 KB
Abstract
This dataset contains the data used in Ogolowa et al (2025). The study assessed the genetic structure and demographic history of the Yellow-rumped Tinkerbird, Pogoniulus bilineatus species complex. It found substantial genetic structuring across the range of the species with an earliest split that coincides with the arid corridor. Thus highlighting the arid corridor as an important biogeographical feature in avian diversification in Sub-Saharan Africa. The data includes raw fastq files generated via double-digest restriction-site associated DNA sequencing (ddRADseq) from DNA samples of 183 individuals. The samples were collected across the geographic range of P. bilineatus, with representative samples from all seven currently recognized subspecies. It also contains the metadata of each sample that indicates the sample location, collection date, and coordinate information.
Dryad DOI: https://doi.org/10.5061/dryad.15dv41p71
Corresponding author: Alexander Kirschel email: kirschel@ucy.ac.cy
Electronic Data Files prepared by Bridget Ogolowa
bridget.ogolowa@uniben.du
This directory contains the data and codes required to replicate analyses in Ogolowa et al (in review) in assessing the genetic structure and demographic history of the Pogoniulus bilineatus species complex. The data covers the DNA of samples collected across the geographic range of P. bilineatus with representative samples from all currently recognized subspecies. Genetic data include raw fastq files generated via double-digest restriction-site associated DNA sequencing (ddRADs) from 183 individuals and their metadata used for population genetics and demographic analyses. Location data includes coordinates taken with standard GPS devices and a shapefile of the geographic range of P. bilineatus.
Dates of Data Collection
Museum Samples: 1990 - 2017
Samples collected from Fieldwork: 2015 to 2017
Files and Folders
~/Demography_Analyses/dadi_analyses_summary_output contains the result summary per pair of clades of demographic model analyses using dadi*.*
~/Demography_Analyses/JSFS_per_lineage_pair contains the joint site frequency spectrum (JSFS) per pair of clades used in the study.
~/Demography_Analyses/moments_analyses_summary_output contains the result summary per pair of clades of demographic model analyses using moments.
~/Phylogeny: contains files used to run phylogenetic analyses in ASTRAL and IQTREE
~/Phylogeny/With_ddRADs/Individulas_excluded_from_ASTRAL_analyses.txt and ~/Phylogeny/With_ddRADs/Individulas_excluded_from_IQTREE.txt contain the list of individuals excluded from ASTRAL and IQTREE analyses, respectively, due to having more than 20% missing sites. ~/Phylogeny/With_ddRADs/population_map_file.txt is the population map file of each individual. ~/Phylogeny/With_ddRADs/Sample_missingness_for_ASTRAL.imiss and ~/Phylogeny/With_ddRADs/Sample_missingness_for_IQTREE.imiss contain the missing rate per individual after filtering for ASTRAL and IQTREE, respectively.
~/Phylogeny/With_whole_genome_resequencing/Genomic_intervals_as_used_for_HAPLOTYPE_CALLER_in_GATK contains the list of genomic intervals used in the bioinformatic pipeline of whole-genome sequences in GATK.
~/Phylogeny/With_whole_genome_resequencing/WGS_population_map_file.txt contains the population map file per individual with whole genome sequences.
~/Phylogeny/list_of_autosomal_chromosomes.txt contains the list of autosomal chromosomes used for phylogenetic analyses.
~/Population_Structure/ADMIXTURE_qmatrix_and_geography_at_K8/Admixture_run_replicate_1, 2, and 3 contains the individual Q-matrix (ancestry), the log out, and the cross-validation (CV) errors from ADMIXTURE analyses across three replicates.
~/Population_Structure/ADMIXTURE_qmatrix_and_geography_at_K8/For_CLUMPP contains the input and output files used in CLUMPP.
~/Population_Structure/ADMIXTURE_qmatrix_and_geography_at_K8/For_distruct contains the input and output files used in the Distruct package.
~/Population_Structure/ADMIXTURE_qmatrix_and_geography_at_K8/Admixture_Qmatrix_at_K8.csv contains the Q-matrix of the best K (K=8) from ADMIXTURE analyses.
~/Population_Structure/ADMIXTURE_qmatrix_and_geography_at_K8/For_ADMIXTURE_analyses.bed is the plink file used to run ADMIXTURE analyses
~/Population_Structure/ADMIXTURE_qmatrix_and_geography_at_K8/Plot_of_CV_Error_across_three_replicates.png is the plot of the CV errors across three replicates inferred from ten clusters from ADMIXTURE analyses
~/Population_Structure/ADMIXTURE_qmatrix_and_geography_at_K8/site_coordinates.csv contains the individual coordinates used to plot the Q-matrix across geography.
~/Population_Structure/EEMS_analyses/Input_data_for_EEMS_analyses contains the input data for EEMs analyses. The input data (input_for_EEMS) was converted from the plink file (plink_file_for_EEMS.bed). ~/Population_Structure/EEMS_analyses/Output_data_from_EEMS_analyses_with_three_chains_at_700demes contains output files from EEMs analyses across three independent runs.
~/Population_Structure/FST_and_IBD_analyses/Fst_output_files contains Fst output files including the upper and lower limits of Fst values.
~/Population_Structure/FST_and_IBD_analyses/population_files_for_Fst_diversity_and_IBD_analyses contains the population map file used to run Fst in the study.
~/Population_Structure/FST_and_IBD_analyses/geo_locations.csv contains the individual coordinates used to run IBD analyses.
~/Population_Structure/PCA_analyses contains the output files from PCA analyses (eigenvalues.txt, eigenvectors.txt, pcs.txt, and pve.txt). ./Output_from_PCA_analyses_with_samples_and_pop_info.csv contains the PCs with sample names and population information to plot PCA. Plot_PCs.R is the code used to plot the PCs in the R software package.
~/Raw_ddRADs.zip contains the raw fastq files per sample obtained via ddRADseq
~/Raw_reads_for_ddRADS_METADATA.csv contains the metadata for each ddRADseq sample
The source of genetic materials included muscle tissue from museum samples and blood samples from fieldwork (2015 -2017) using mist nets and conspecific playbacks (Kirschel et al., 2009; Nwankwo et al*., 2018; Kirschel et al., 2020a). Genomic DNA was extracted from 183 samples using a DNeasy Blood and Tissue kit following the manufacturer’s protocol (Qiagen, USA). Genomic libraries were generated for double-digest restriction-site associated DNA sequencing (ddRADseq) following Brelsford et al. (2016). DNA was digested with SbfI and MseI restriction enzymes, to which SbfI-barcoded unique adapters (4bp-8bp in size) were ligated to each sample to allow downstream equimolar pooling of all samples. Sequencing was done at 2x150nt sequencing in an Illumina HiSeq X Ten platform by Novogene Inc. (Sacramento, CA, USA).
