Data from: How accurate is genomic prediction across wild populations?
Data files
This dataset is embargoed and will be released on May 01, 2026. Please contact Kenneth Aase at on.untn@esaa.htennek with any questions.
Lists of files and downloads will become available to the public when released.
Abstract
Evolutionary ecology seeks to understand causes and consequences of evolutionary changes across time and space, and genomic data present novel opportunities to investigate these processes. Genomic prediction - predicting individual genetic values from high-density marker data - has revolutionized breeding programs and medical genetics. In wild populations, however, genomic prediction has been used in comparatively few studies, and largely within populations. Applications that instead operate across populations could answer questions related to spatially varying evolutionary processes, such as local adaptation. A severe challenge for across-population genomic prediction, however, is the decrease in accuracy when training models on data from one population and predicting genetic values in another. Here, we applied genomic prediction across wild house sparrow populations and compared the accuracy to within-population models. We also highlighted limitations of the current theory for genomic prediction accuracy, and sought to provide a mechanistic understanding of the across-population accuracy by relating it to several population-differentiation measures. Predictions across populations were generally less accurate and more variable than within populations, and across-population accuracy covaried with some population-differentiation metrics. Our results underline the necessity of understanding the mechanisms governing genomic prediction accuracy, and of developing methods that exploit genomic data in novel ways.
The genotype data is located in the files "sparrow_snp.fam", "sparrow_snp.bim" and "sparrow_snp.bed".
This dataset is the combined 70K and 200K SNPs data from both Helgeland and the southern systems (Åfjord, Vega, Vikna, Leka) plus some Faroe islands samples. The dataset contains all successfully genotyped samples (14015).
In the .fam file, the FID column (column 1) corresponds to an anonymized version of the ring number of the bird.
The IID column (column 2) is identical, except in the following cases:
- Birds that were genotyped multiple times have a been marked with "_i" for the i'th genotyping (e.g., ANON00008 was genotyped twice, and these two samples are differentiated ANON00008_1 and ANON00008_2 in the IID column).
- Samples that have mismatching genetic and phenotypic sex have had their IIDs flagged with a “MISSEX”, with degrees 1 to 3 identifying the severity of the mismatch.
- High heterozygosity was a sign of poor genotyping quality because high heterozygosity (> 0.35) was not repeatable when samples were re-genotyped. 73 samples had heterozygosity > 0.35, and IIDs are appended __HIGHHET so they can be easily excluded if needed.
The adult morphology data file is a semicolon-delimited text-file with name: "sparrow_pheno.csv".
Adult morphology data is included for genotyped adult individuals that were captured at least once as adult in any of our study populations during years 1992-2023.
There are in total 13428 records for 5888 adult individuals in the phenotype data file.
Description of variables:
ringnr: anonymized version of unique individual ID (number on the metal ring)
adult_sex: the sex of the bird (1=male; 2=female), based solely on phenotypic information
year: year of measurement
month: month of measurement
day: day of measurement
locality: the number of the locality (island) where the bird was captured when it was measured
hatch_year: the bird's known or assumed hatch year
max_year: the last year the bird was recorded alive
first_locality: the first locality where the bird was recorded (this is the bird's known or assumed hatching locality)
last_locality: the last locality where the bird was recorded (this is generally the locality where the bird was (potentially) breeding)
body_mass: body mass of the bird at measurement (in g)
thr_bill_depth: the bird's bill depth (=height) at measurement, adjusted to Thor Harald Ringsby-measurement (in mm)
thr_bill_length: the bird's bill length at measurement, adjusted to Thor Harald Ringsby-measurement (in mm)
thr_tarsus: the length of the bird's right tarsus at measurement, adjusted to Thor Harald Ringsby-measurement (in mm)
thr_wing: the length of the bird's right wing at measurement, adjusted to Thor Harald Ringsby-measurement (in mm)
The names and study systems of the most important localities are:
A) The Helgeland metapopulation study system
20: Nesøy (farm island)
22: Myken (non-farm island)
23: Træna (non-farm island)
24: Selvær (non-farm island)
26: Gjerøy (farm island)
27: Hestmannøy (farm island)
28: Indre Kvarøy (farm island)
34: Lovund (non-farm island)
35: Sleneset (non-farm island)
38: Aldra (farm island)
331: Lurøy (farm island)
332: Onøy (farm island)
B) The "southern system" (these localities are islands/archipelagos with farms)
60: Leka
61: Vega
63: Vikna
67: Lauvøya, Selnes and Flenstad
68: Rånes
For information on names and locations of the other localities please contact Henrik Jensen.
