Skip to main content

Runs of homozygosity reveal past bottlenecks and contemporary inbreeding across diverging populations of an island-colonizing bird

Cite this dataset

Martin, Claudia et al. (2023). Runs of homozygosity reveal past bottlenecks and contemporary inbreeding across diverging populations of an island-colonizing bird [Dataset]. Dryad.


Genomes retain evidence of the demographic history and evolutionary forces that have shaped populations. Across island systems, contemporary patterns of genetic diversity reflect complex population demography, including colonisation events, bottlenecks, gene flow and genetic drift. Here, we investigate whether island founder events have prolonged effects on genome-wide diversity and runs of homozygosity (ROH) distributions, using whole genome resequencing from six populations across three archipelagos of Berthelot’s pipit (Anthus berthelotii) – a passerine which has undergone island speciation relatively recently. Pairwise sequential Markovian coalescent (PSMC) analyses estimated divergence from its sister species approximately two million years ago. Results indicate that all Berthelot’s pipit populations had shared ancestry until approximately 50,000 years ago, when the Madeiran archipelago populations were founded, while the Selvagens were colonised within the last 8,000 years. We identify extensive long ROH (>1 Mb) in genomes in the most recently colonised populations of Madeira and Selvagens which have experienced sequential island founder events and population crashes. Population expansion within the last 100 years may have eroded long ROH in the Madeiran archipelago, resulting in a prevalence of short ROH (<1 Mb). Extensive long and short ROH in the Selvagens reflects strong recent inbreeding, small contemporary effective population size and past bottleneck effects, with as much as 37.7% of the autosomes comprised of ROH >250 kb in length. These findings highlight the importance of demographic history, as well as selection and genetic drift, in shaping contemporary patterns of genomic diversity across diverging populations. 


Whole genome data from six populations of Berthelot's pipit across the three archipelagos of their range in the North Atlantic. Resequencing data was generated by mapping Illumina HiSeq reads to a reference genome and calling variants using GATK HaplotypeCaller.

Usage notes

Describes the datasets and outlines their usage to recreate and explore the findings in the current manuscript.

All Pipits, Berthelots and Tawny VCF files 
These are the three datasets in variant call format as referred to in the manuscript.

Individual level .fastq files
These files are used to generate the input files required to run the PSMC analyses, on an individual-by-individual basis. Example files are given for EH_161

Chromosome codes 
Genome_chromosome_codes.txt file contains Zebra finch (Taeniopygia guttata) chromosome names and their equivalent numeric codes used in the VCF files. 
Code used to undertake analyses outlined in this paper. 
These scripts must be run prior to R script to produce associated outputs.

R scripts to produce tables, figures and statistics 
ROH_analyses.R details R scripts and produces output figures and statistics detailed in the manuscript.

Creating VCF datasets

The first steps are detailed in, which contains the pipeline used to generate the GATK haplotype called gVCF file on an individual-by-individual basis (for further details see gvcf pipeline description.txt). 

Second individual gVCF files are converted to joint genotype called VCF datasets, and contig locations are mapped to Zebra finch chromosomes using SatsumaSynteny. This pipeline is detailed in gvcf to vcf.txt.

Third, Satsuma_output_vcf.R is used on the output files from SatsumaSynteny to assign contigs to chromosomes, and determine their order, location and orientation. 

Finally, the three VCF datasets detailed in this manuscript are created and filtered; Tawny is created directly from gVCF to VCF, Berthelot’s by joint calling all 11 Berthelot’s gVCF files and All pipits by joint calling all Berthelot’s induvial and the Tawny sample gVCFs.


Natural Environment Research Council, Award: NE/L002582/1

Natural Environment Research Council, Award: NE/S007334/1

Norwich Research Park Science Links Seed Fund

Ministerio de Ciencia e Innovación

European Commission, Award: PGC2018-097575-B-I00

Gobierno del Principado de Asturias, Award: AYUD/2021/51261