Data from: Parentage and relatedness reconstruction in Pinus sylvestris using genotyping by sequencing
Data files
Mar 04, 2020 version files 756.37 MB
Abstract
Estimating kinship is fundamental for studies of evolution, conservation and breeding. Genotyping-by-sequencing (GBS) and other restriction based genotyping methods have become widely applied in these applications in non-model organisms. However, sequencing errors, depth and reproducibility between library preps could potentially hinder accurate genetic inferences. In this study, we tested different sets of parameters in data filtering, different reference populations and eight estimation methods to obtain a robust procedure for relatedness estimation in Scots pine (Pinus sylvestris L.). We used a seed orchard as our study system, where candidate parents are known and pedigree reconstruction can be compared to theoretical expectations. We found that relatedness estimates were lower than expected for all categories of kinship estimated if the proportion of shared SNPs was low. However, estimates reached expected values if loci showing an excess of heterozygotes were removed and genotyping error rates were considered. The genetic variance-covariance matrix (G-matrix) estimation, however, performed poorly in kinship estimation. The reduced relatedness estimates are likely due to false heterozygosity calls. We analyzed the mating structure in the seed orchard and identified a selfing rate of 3% (including crosses between clone mates) and external pollen contamination of 33.6%. Little genetic structure was observed in the sampled Scots pine natural populations, and the degree of inbreeding in the orchard seed crop is comparable to natural stands. We illustrate that under our optimized data processing procedure, relatedness and genetic composition, including level of pollen contamination within a seed orchard crop, can be established consistently by different estimators.
Methods
Needles from multiple ramets of each of the 28 genotypes in a seed orchard (Västerhus, Sweden) were collected to establish their genetic identity. Additional samples from 149 tree genotypes were collected in genetic archives and seed orchards across Sweden. Seeds obtained from a bulk collection of cones produced by open pollination in 2014 in the orchard, an additional orchard for diversity comparison and two unmanaged stands were germinated in a greenhouse. Needles from 300 seedlings from each orchard and 50 from each unmanaged stand were harvested in early 2017. We also included haploid material from two Västerhus genotypes (eight megagametophytes each), representing selfed material and 29 samples from two congeneric species, Pinus tabuliformis and Pinus yunnanensis. In total, 922 samples were collected and genotyped.
DNA was extracted and GBSlibraries were made. Libraries were sequenced using Illumina Hiseq2500 and Illumina HiseqX. Samples were de-multiplexed using Stacks: process_radtags and then mapped to the Pinus taeda genome v1.01 with a Burrows-Wheeler Aligner (BWA). Samples were then combine for variant calling with SAMtools –mpileup.
Usage notes
The dataset contains several files:
Vasthus_001_m06.vcf.gz: VCF-file which has been slightly pre filtered to reduce size, see vcf_filter.txt
Vasterhus.txt: Sample names of samples in the study
refkeep.txt: Sample names of the samples used as allele frequency reference
Parental_ID.txt: The registered names for the parental trees in the study
vcf_filter.txt: Description on how to filter the VCF file according to the manuscript
Rfiles
Dataset-2.RData: The data set resulting from working through the vcf filtering and rhe R-script files
Rcode_related.R: R-script for relatedness estimation using the R-package 'related'
view_relationships.R: Utilizing the result from the previous R-script to visualize pairwise relatedness and reconstructing some figures from the manuscript