Skip to main content
Dryad logo

On the use of genome-wide data to model and date the time of anthropogenic hybridisation: an example from the Scottish wildcat


Howard-McCombe, Jo et al. (2021), On the use of genome-wide data to model and date the time of anthropogenic hybridisation: an example from the Scottish wildcat, Dryad, Dataset,


While hybridisation has long been recognised as an important natural phenomenon in evolution, the conservation of taxa subject to introgressive hybridisation from domesticated forms is a subject of intense debate. Hybridisation of Scottish wildcats and domestic cats is a good example in this regard. We develop a modelling framework to determine the timescale of introgression using approximate Bayesian computation (ABC). Applying the model to ddRAD-seq data from 129 individuals, genotyped at 6,546 loci, we show that a population of wildcats genetically distant from domestic cats is still present in Scotland. These individuals are found almost exclusively within the captive breeding program. Most wild-living cats sampled were introgressed to some extent. The demographic model predicts high levels of gene-flow between domestic cats and Scottish wildcats (13% migrants per generation) over a short timeframe, the posterior mean for the onset of hybridisation (T1) was 3.3 generations (~10 years) before present. Though the model had limited power to detect signals of ancient admixture, we find evidence that significant recent hybridisation may have occurred subsequent to the founding of the captive breeding population (T2). The model consistently predicts T1 after T2, estimated here to be 19.3 generations (~60 years) ago, highlighting the importance of this population as a resource for conservation management. Additionally, we evaluate the effectiveness of current methods to classify hybrids. We show that an optimised 35 SNP panel is a better predictor of the ddRAD-based hybrid score in comparison with a morphological method.


This study represents a new bioinformatic analysis of the sequence reads produced by Senn et al. (2019) (Dryad, Dataset,, incorporating an additional 51 captive and two wild individuals, as well as the original 76 samples.  Sequence reads were generated using the Illumina MiSeq Platform, as described in Senn et al. (2019).  As per Senn et al. (2019) reads were demultiplexed by barcode and quality filtered using the STACKS v2.1 module, process_radtags. Demultiplexed reads were trimmed to 135bp and concatenated into a single read file per individual.

Sequence reads were aligned using BWA to the Felis catus reference genome v9.0 (GCF_000181335.3).  Mapped reads were processed using STACKS (Catchen et al., 2013).  In STACKs a minimum of three reads were required to form a ‘stack’.  We allowed multiple SNPs per read, the mean number of SNPs per read across the final dataset was 1.6.  Variants were filtered using a minimum allele frequency of 0.05 and maximum proportion of heterozygous individuals of 0.7, treating the three sample sources (domestic, wild-living, and captive) as separate populations. 

PLINK v1.9 (Chang et al., 2015) and VCFtools v1.15 (Danecek et al., 2011) were used to filter data from STACKs.  Specifically, this led to the removal of individuals with >30% missing data and stringent subsequent filtering of loci to remove all sites with missing data.  Closely related individuals were identified using IBD estimates calculated by PLINK, corrected to account for admixture using the method described by Morrison (2013).  Corrected IBD estimates were used as input for PRIMUS (Staples et al., 2014) which uses genetic data to reconstruct pedigrees up to third degree relatives.  Individuals were then removed from the dataset to limit relatedness.

Usage Notes


Table with sample information. Contains all samples collected for this study, some of which are not present in the final dataset due to low genotyping rate and/or relatedness to other individuals.


VCF containing genotype data