Skip to main content

Thoroughbred horse inbreeding measures and racing phenotypes

Cite this dataset

Hill, Emmeline et al. (2022). Thoroughbred horse inbreeding measures and racing phenotypes [Dataset]. Dryad.


We quantified inbreeding based on runs of homozygosity (ROH) using 297K SNP genotypes from 6,128 horses born in Europe and Australia, of which 13.2% were unraced. 


Race records (up to the end of the 2020 racing season) were retrieved for n = 6,128 horses that were born in Europe (EUR) and Australia and New Zealand (ANZ) prior to and including 2015 and were therefore at least five years old. Among horses that race, the majority have their first start before they are five years old (22) and the median age of retirement from racing has been reported as five years old (19). Horses were assigned as ‘raced’ (n = 3,038, EUR, n = 2,282 ANZ) if they had at least one start before five years old or ‘unraced’ (n = 606 EUR, n = 202 ANZ) if they had no recorded race start before five years old. Race records for the major race regions (Europe, Australia and North America) were used to partition samples into the two cohorts searching all regions including other than birth region. We cannot, however, rule out that horses categorised as ‘unraced’ may have raced in (minor) regions of the world that were not searched. Two SNP genotyping platforms were used to genotype the animals; n = 4,933 were genotyped on the Illumina Equine SNP70 BeadChip (Illumina, San Diego, CA) comprising approximately 70,000 SNPs (SNP70) and n = 4,018 were genotyped on the Axiom Equine Genotyping Array (Axiom MNEC670) (Affymetrix, Santa Clara, CA) comprising approximately 670,000 SNPs (SNP670). All samples had a call rate >95%. Samples genotyped on the SNP70 array were imputed up to 488,576 SNPs with BEAGLE version 5.2 (25) using the samples genotyped on the SNP670 genotyping platform as a reference set.  After imputation, we discarded SNPs with a Beagle dosage R2 <0.8 to remove poorly imputed SNPs. We then filtered for SNPs with call rates >0.99, minor allele frequency ≥0.01 and retained only SNPs located on autosomes, leaving a final dataset with 296,691 SNPs. We called runs of homozygosity (ROH) with a minimum length of 300 kb using –homozyg in PLINK v1.90b (26) with the following parameters: --homozyg-window-snp 30 –homozyg-snp 30 –homozyg-kb 300 –homozyg-gap 150 –homozyg-density 100 –homozyg-window-missing 1 –homozyg-window-het 1. We chose 300 kb as the minimum ROH length as we were interested in evaluating fitness effects of both shorter and longer ROH. Moreover, based on the SNP density and horse genome size, we expect around 40 SNPs in a 300 kb stretch of ROH, which should be sufficient to reliably call short ROH. Inbreeding coefficients (FROH) were calculated by summing the total length of ROH for each individual and dividing by the autosomal genome length of 2,281 Mb (27, 28).  Shorter ROH with most-recent common ancestors from further back in the pedigree might differ in their fitness effects compared to long ROH (15-17). We therefore also calculated an inbreeding coefficient for ROH < 5 Mb (FROH_short) and for ROH ≥ 5 Mb (FROH_long). While the cutoff is semi-arbitrary, ROH of length 5 Mb are expected to have a common ancestor haplotype approximately 10 generations ago, calculated as with g being the number of generations (29), assuming a uniform recombination map such that 1 Mb is equivalent to 1 cM. The effects of FROH_long and FROH_short can therefore be broadly interpreted as effects of more recent versus older inbreeding, or more precisely, of younger versus older haplotypes. 


Science Foundation Ireland, Award: 11/PI/1166