Data from: Genomic landscapes of divergence among island bird populations: evidence of parallel adaptation but at different loci?
Data files
Apr 19, 2024 version files 1.18 GB
-
a.lines.fasta
-
Divergence_Peaks.R
-
FST_Correlation_plots.R
-
Genome_chromosome_codes.txt
-
Model_simulations.R
-
README.md
-
SCRIPTS.sh
Abstract
When populations colonise new environments they may be exposed to novel selection pressures but also suffer from extensive genetic drift due to founder effects, small population sizes, and limited interpopulation gene flow. Genomic approaches enable us to study how these factors drive divergence, and disentangle neutral effects from differentiation at specific loci due to selection. Here, we investigate patterns of genetic diversity and divergence using whole-genome resequencing (> 22X coverage) in Berthelot’s pipit (Anthus berthelotii), a passerine endemic to the islands of three north Atlantic archipelagos. Strong environmental gradients, including in pathogen pressure, across populations in the species range, make it an excellent system in which to explore traits important in adaptation and/or incipient speciation. Firstly, we quantify how genomic divergence accumulates across the speciation continuum, i.e., among Berthelot’s pipit populations, between subspecies across archipelagos, and between Berthelot’s pipit and its mainland ancestor, the tawny pipit (Anthus campestris). Across these colonisation timeframes (2.1 million – ca. 8,000 years ago), we identify highly differentiated loci within genomic islands of divergence and conclude that the observed distributions align with expectations for non-neutral divergence. Characteristic signatures of selection are identified in loci associated with craniofacial/bone and eye development, metabolism, and immune response between population comparisons. Interestingly, we find limited evidence for repeated divergence of the same loci across the colonisation range but do identify different loci putatively associated with the same biological traits in different populations, likely due to parallel adaptation. Incipient speciation across these island populations, in which founder effects and selective pressures are strong, may therefore be repeatedly associated with morphology, metabolism, and immune defence.
README: Genomic landscapes of divergence among island bird populations: evidence of parallel adaptation but at different loci?
Claudia A. Martin, Eleanor C. Sheppard, Hisham Ali, Juan Carlos Illera, Alexander Suh, Lewis G. Spurgin and David S. Richardson
https://doi.org/10.5061/dryad.1g1jwsv4b
For any further queries please contact Cmarti3@ed.ac.uk /claudia.martin@uea.ac.uk
Data
Data obtained from published datasets
Original RAW reads for the Berthelot's pipit draft genome are not supplied here as they are already available through previously published data by Armstrong et al. 2018 under https://doi.org/10.5061/dryad.9642b
These files include "Anthus_berthelotii_PS_816_genome.zip"
This zip file contains a BLAST database of the draft Berthelot's pipit genome as described in the Supplementary Methods of Armstrong et al. 2018. This genome was sequenced from sample 816 from Porto Santo. The assembled draft reference genome is provided here also a.lines.fasta
and is the only Berthelot's pipit reference file required in this current publication.
Variant Call Files (VCF) were also generated and described in greater detail by a previously published study by our group. Details of the raw VCF datasets used here can be found at Martin, Claudia et al. (2023). Runs of homozygosity reveal past bottlenecks and contemporary inbreeding across diverging populations of an island-colonizing bird [Dataset]. Dryad. https://doi.org/10.5061/dryad.ksn02v75k
This includes the following relevant data for this study:
1) All Pipits, Berthelots, and Tawny VCF files
These are the three datasets in variant call format as referred to in the manuscript.
2) Chromosome codes
Genome_chromosome_codes.txt
file (also provided here) contains Zebra finch (Taeniopygia guttata) chromosome names, and their equivalent numeric codes used in the VCF files. These are the calibrated genome locations for Berthelot's pipit contigs.
3) Creating VCF datasets
The first steps are detailed in example_gVCF.sh, which contains the pipeline used to generate the GATK haplotype called gVCF file on an individual-by-individual basis (for further details see gvcf_pipeline_description.txt).
Second individual gVCF files are converted to joint genotype called VCF datasets, and contig locations are mapped to Zebra finch chromosomes using SatsumaSynteny. This pipeline is detailed in gvcf_to_vcf.txt.
Third, Satsuma_output_vcf.R is used on the output files from SatsumaSynteny to assign contigs to chromosomes, and determine their order, location and orientation.
Finally, the three VCF datasets detailed in this manuscript are created and filtered.
Data included in this dataset
Here we provide the following additional data. The following files, as required to run the analyses detailed in this paper, are provided here:
1) SCRIPTS.sh
file
Code used to undertake anlayses outlined in this paper.
*NOTE! These scripts MUST be run prior to R scripts to produce outputs.
2) R scripts for plotting and further analyses
These .R files detail R scripts needed to produce output figures and statistics detailed in the manuscript. These files are separated by outputs in the different sections of the manuscript and refer specifically to figures used in the manuscript where relevant. FST_correlation_plots.R
details the code required to produce FST histograms across the population comparisons and correlation plots of these; Divergence_Peaks.R
details the code required to produce Manhattan plots across chromosomes for each of the divergence comparisons and follows this up with zoomed regions of strong divergence using Tajima's D and Pi across genomic windows. The location of genes within these regions is also plotted. Finally, Modelling_simulations.R
details the code to run and plot the individual-based modelling and plot produced in the paper.
Code/Software
This study used the following packages:
- GATK - for variant filtering.
- Bash - command line running of scripts and file manipulation
- R including packages ggplot2, tidyverse etc. - data presentation and plotting. Glads package - chromosome simulations.
- VCFtools - FST, Pi, Tajima's D, variant statistics.
- Plink 1.9 - PCA, filtering.
Methods
Whole genome data from six populations of Berthelot's pipit across the three archipelagos of their range in the North Atlantic. Resequencing data was generated by mapping Illumina HiSeq reads to a reference genome and calling variants using GATK HaplotypeCaller.
Genomic landscapes of divergence where assessed using pairwise FST across different population comparisons across the geographic range of the species'. Peaks of divergence were then charaterised using a range of population genomic statistics to elucidate the evolutionary drivers of divergence in these regions.
Individual-based simulations of genomic divergence, under neutral and selective processes, were then used to further understand the patterns detected in the empirical genomic datasets.