In an era of ever-increasing amounts of whole genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct gray wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1 kb non-genic neutral regions and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to non-candidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in SweeD and Bayenv analyses, respectively. This result verifies the use of genome wide SNP surveys to tag genes that contain functional variants between populations. We highlight non-synonymous variants in APOB, LIPG, and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genome wide genotyping arrays with large scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations.
AllSamples_n107_EnvData_wLatLong
These data represent latitude and longitude coordinates for the 107 wolves used in this study for selection tests. The 12 environmental variables for each coordinate were downloaded within ArcGIS from various WORLDCLIM databases (http://www.worldclim.org/). Please see descriptions on website for information on what each variable measures. From the website: "Please note that the temperature data are in °C * 10. This means that a value of 231 represents 23.1 °C. This does lead to some confusion, but it allows for much reduced file sizes which is important as for many downloading large files remains difficult. The unit used for the precipitation data is mm (millimeter)."
Variant file in VCF format
Genic variants for 107 individuals in VCF v4.1 format. For details on genotype calling and filtering, see Methods and Supplemental Information sections.
Filtered_variableSites_fixedSamples_9July2014_minDP10noMissing_ecotypesOnly_n107_GenicRegions_95CallRate.recode.vcf
Bayenv results file for genic SNPs
This is the output averaged from 10 independent runs of Bayenv for genic SNPs in 107 wolves and 6 ecotypes. Column headers are provided in the file.
bayenv1_output_n107_k6_Average_10Runs_forR.txt
Bayenv output file for neutral SNPs
This is the output averaged across 10 runs of Bayenv using neutral SNPs, in 107 wolves and 6 ecotypes. The results in this file were used as a null for determining empirical significance for the genic SNPs. See paper text for details.
bayenv1_output_n107_k6_Average_10Runs_NEUTRAL_forR.txt
BayeScan results file for genic SNPs
This is the output file from BayeScan run with genic SNPs for 107 wolves and 6 ecotypes.
bayescan_n107_Ecotypes_GenicRegions_95CallRate_wEcotypes_output_prior_odds_1000_fst.txt
BayeScan results file for neutral SNPs
This is the output file from BayeScan for neutral SNPs in 107 wolves and 6 ecotypes. Results within this file were used to assess null background for assigning significance to genic SNPs. See main text for details.
bayescan_n107_NeutralRegions_95CallRate_LDpruned_output_prior_odds_10000_fst.txt
Phenotype data for 47 individuals (BLACK)
Phenotype data for 47 individuals, coded so that black individuals are 2, and non-black individuals are 1. Input file was used for running EMMAX phenotype genotype analysis.
GenicRegions_95CallRate_wKlocus_BLACK_forEmma_phenotype.txt
Phenotype data for 47 individuals (WHITE)
Phenotype data for 47 individuals, coded so that white individuals are 2, and non-white individuals are 1. Input file was used for running EMMAX phenotype genotype analysis.
GenicRegions_95CallRate_wKlocus_WHITE_forEmma_phenotype.txt
Kinship matrix for EMMAX
Kinship matrix generated from LD-pruned neutral SNPs for genotype-phenotype associations in EMMAX.
NeutralSNPs_LDpruned_for_EmmaX.aBN.kinf
EMMAX results file (BLACK)
Results file generated from EMMAX for genotype-phenotype analysis for black coat color.
results_BLACK.txt
EMMAX Results file (WHITE)
Results file generated from EMMAX for genotype-phenotype analysis for white coat color.
results_WHITE.txt
SweeD_Report.Pop_1_WestForest_Genic_10000_chrAll
Sweed Output file for West Forest Wolves & Genic sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_1_WestForest_Neutral_10000_chrAll
Sweed Output file for West Forest Wolves & neutral sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_2_BorealForest_Genic_10000_chrAll
Sweed Output file for Boreal Forest Wolves & genic sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_2_BorealForest_Neutral_10000_chrAll
Sweed Output file for Boreal Forest Wolves & neutral sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_3_Arctic_Genic_10000_chrAll
Sweed Output file for Arctic Wolves & genic sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_3_Arctic_Neutral_10000_chrAll
Sweed Output file for Arctic Wolves & neutral sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_4_HighArctic_Genic_10000_chrAll
Sweed Output file for High Arctic Wolves & genic sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_4_HighArctic_Neutral_10000_chrAll
Sweed Output file for High Arctic Wolves & neutral sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_5_BritishColumbia_Genic_10000_chrAll
Sweed Output file for British Columbia Wolves & genic sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_5_BritishColumbia_Neutral_10000_chrAll
Sweed Output file for British Columbia Wolves & neutral sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_6_AtlanticForest_Genic_10000_chrAll
Sweed Output file for Atlantic Forest Wolves & genic sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
SweeD_Report.Pop_6_AtlanticForest_Neutral_10000_chrAll
Sweed Output file for Atlantic Forest Wolves & neutral sites. Columns are chromosome, position, likelihood, and alpha. See SweeD details at http://sco.h-its.org/exelixis/web/software/sweed/
Latitude and Longitude Coordinates for 117 Wolves
These are the latitude and longitude coordinates for all 117 wolves initially sampled in this study. Ten individuals were not included in any selection tests for the reason indicated in the "Ecotype or Status" column, i.e. they were admixed in Structure or related to another individual and dropped.
AllSamples_n117_wLatLong.xlsx
Output from Variant Effect Predictor
This file is the output file of Ensembl's Variant Effect Predictor run on genic variants observed in 107 wolves at a minimum call rate of 95%. See main text for more details.
variant_effect_output.txt
Capture array target regions
BED file containing annotated regions targeted for the capture array. Coordinates are provided in CanFam3.1.
NAwolf_capture_array_regions_Feb2012_canfam3.1.bed
NAwolf_baits_canfam3.1
Coordinates for 66937 baits covering regions targeted by capture array, in CanFam3.1.
Information for BAM files on NCBI SRA
BAM files for 117 wolves, separated into genic and neutral regions. See paper text for details on alignment and mapping pipeline.