Data from: Applications of random forest feature selection for fine-scale genetic population assignment

Sylvester, Emma V.A.1; Bentzen, Paul1; Bradbury, Ian R.2; Clément, Marie3; Pearce, Jon1; Horne, John1; Beiko, Robert G.1; Sylvester, Emma V. A.1

Published Jul 27, 2017 on Dryad. https://doi.org/10.5061/dryad.93h33

Data files

Jul 27, 2017 version files 152.74 MB

AtlanticSalmon_93KGenepop.txt

152.74 MB
format_dataframe_for_RF.r

2.90 KB

Abstract

Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest, and guided regularized random forest) compared with FST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP dataset for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than FST-selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each dataset, respectively, a level of accuracy never reached for these species using FST-selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic datasets to improve assignment for management and conservation of exploited populations.

Data from: Applications of random forest feature selection for fine-scale genetic population assignment

Data files

Abstract

Usage notes

AtlanticSalmon_93KGenepop

format_dataframe_for_RF

Works referencing this dataset