This AMKE_MigrGenomics_readme.txt file was generated on 2022-04-25 by Christen Bossu GENERAL INFORMATION 1. Title of Dataset: American kestrel migratory genomics 2. Author Information A. Principal Investigator Contact Information Name: Christen Bossu Institution: Colorado State University Address: Biology Department Colorado State University Fort Collins, CO 80521 Email: cbossu@rams.colostate.edu B. Associate or Co-investigator Contact Information Name: Kristen Ruegg Institution: Colorado State University Address: Biology Department Colorado State University Fort Collins, CO 80521 Email: Kristen.Ruegg@colostate.edu 3. Date of data collection (single date, range, approximate date) : 1998-05-09 to 2018-07-30 4. Geographic location of data collection: Breeding birds were collected across the breeding range of American Kestrel and the migrating birds were collected at a Boise, Idaho migration station, 43.60583, -116.0597 5. Information about funding sources that supported the collection of the data: This work was made possible by a California Energy Commission grant to K. Ruegg and T. Smith (EPC-15-043), a National Geographic grant to K. Ruegg (WW-202R-17), a grant to K. Ruegg from the National Science Foundation (NSF-1942313), the Boise State Raptor Research Center, and a grant to JAH from the Strategic Environmental Research and Development Program (RC-2702). SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: none 2. Links to publications that cite or use the data: Bossu CM, Heath JA, Kaltenecker GS, Helm B, Ruegg KC. 2022. Clock-linked genes underlie seasonal migratory timing in a diurnal raptor. Proc. R. Soc. B 289: 20212507. https://doi.org/10.1098/rspb.2021.2507 3. Links to other publicly accessible locations of the data: https://github.com/cbossu/AMKE_MigrGenomics 4. Links/relationships to ancillary data sets: Ruegg KC et al. 2021 The American kestrel (Falco sparverius) genoscape: implications for monitoring, management, and subspecies boundaries. Ornithology 138, ukaa051. (doi:10.1093/auk/ukaa051). ##This is the RAD-seq data set used to identify candidate loci associated with migratory behavior in American kestrels. 5. Was data derived from another source? No A. If yes, list source(s): 6. Recommended citation for this dataset: Bossu CM, Heath, J, Kaltenecker G, Helm B, Ruegg KC. 2022 American kestrel genotype data and custom scripts. See https://github.com/cbossu/AMKE_MigrGenomics. Bossu, Christen et al. (2022), Clock-linked genes underlie seasonal migratory timing in a diurnal raptor, Dryad, Dataset, https://doi.org/10.5068/D1B69N DATA & FILE OVERVIEW 1. File List: AMKE.meta_CandGenotyping.Breeding_MigrIDonly.csv #This is the file with the raw genotypes of the candidate gene SNP type assays, where 0 = homozygous reference allele, 1=heterozygote and 2 = homozygous alternative allele. This file includes all other information captured for each individual including data of collection (Month, Day and Year), whether it was a nestling or adult, whether it was a breeding bird, or a migrating bird, morphometrics, and location it was sampled. This dataset was used to test the multigene and single gene association of candidate gene and migratory timing and it's link with latitude of collection AMKE.IDall.final_panel4-5.186SNP.rubias_fix.rm_miss.txt #This file is the rubias input file for the American kestrels sampled from the Boise Idaho migration station. The input file for the reference breeding individuals was used previously in Ruegg KC et al. 2021, and can be found here. AMKE.IDrm_miss.rep_indiv_est.meta.txt #This file includes the results of the rubias analysis of the American kestrel individuals, where we assigned each bird collected at the Biose Idaho migration station back to a distinct genetic cluster of American kestrels (AK, East, West, TX or FL). 2. Relationship between files, if important: The AMKE.IDrm_miss.rep_indiv_est.meta.txt is the result of the rubias analysis that uses the AMKE.IDall.final_panel4-5.186SNP.rubias_fix.rm_miss.txt data file as an input file 3. Additional related data collected that was not included in the current data package: The input file for the reference breeding individuals to be used for the rubias analysis was used previously in Ruegg KC et al. 2021. 4. Are there multiple versions of the dataset? no METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: To identify candidate loci associated with migratory behavior, we used an FST -based analysis with low detection thresholds. We created custom R scripts to identify loci with F estimates that fell within a relaxed 90th percentile FST outlier threshold between the resident and migratory/partially migratory populations (Florida and Texas versus all other populations). We designed Fluidigm SNPtype assays and used them to screen additional breeding and migrating American kestrels that were independent of the RAD-seq analyses. Specifically, we used the R package snps2assays (Anderson 2015) to evaluate the efficacy of designing assays for candidate loci. We considered the assays designable if GC content was less than 0.65, there were no insertions or deletions (indels) within 30bp of the target variant, and there were no additional variants within 20bp of the targeted variable site. We filtered out assays with primers that mapped to multiple locations in the genome (bwa mem: Li and Durbin 2009), resulting in assays for nine loci in nine-candidate genes. We used the resulting Fluidigm assays to genotype the nine-candidate migration genes in 738 breeding American kestrels from 83 sites and 165 migrating American kestrels from a single-migration station in Boise, Idaho collected in a three-month time-series spanning autumn migration over 2 years (2016 and 2017). For the migrating American kestrels, day of capture, sex of bird and band (a.k.a. ring) number were recorded. 2. Methods for processing the data: To screen the 9 designable assays our candidate loci, genotyping was performed on the FluidigmTM 96.96 IFC controller. We used the Juno GT Preamp Master Mix (Fluidigm, Item #100-8363) for the preamplification of the SNPs and the Juno GT Preamp Master Mix for the final amplification. For each run, we screened 94 individuals, that included two non-template controls. We imaged the results on an EP1 Array Reader and called alleles using Fluidigm’s automated Genotyping Analysis Software (Fluidigm Inc.) with a confidence threshold of 90%. In addition, we visually inspected all SNP calls and removed any calls that did not fall clearly into one of three clusters (heterozygote or either homozygote cluster). As DNA quality can affect call accuracy, we employed a stringent quality filter and dropped variants with missing calls exceeding 10%. The resulting genotypes for breeding and migrating American kestrels can be found in the AMKE.meta_CandGenotyping.Breeding_MigrIDonly.csv file. 3. Instrument- or software-specific information needed to interpret the data: We exported genotypes from Fluidigm's genotyping analysis software (see above), and merged raw genotypes in R to create the final dataset. We then used a multi-gene and single gene framework to determine whether migratory timing was significantly associated with allele frequency shifts in the nine candidate migration genes. To determine how the nine candidate genes covary with each other, we conducted an ordinal principal component analysis (PCA) using the R software package gifi (Mair and De Leeuw 2019) . We used a linear regression to evaluate whether migration timing (day of year when a fall migrant was captured) was associated with genetic variation as measured by PC1 and PC2, and included a covariate of sex to account for the potential influence of differential migration between sexes on migratory timing. To investigate single gene effects, we fit linear regression models of each allele frequency of the top 4 candidate genes, i.e. those that loaded strongly on PC1, top1, peak1, phlpp1 and cpne4, to migration timing as defined by the midpoint day of each week during the autumn migration period and using the lm model in the R software package stats v 3.6.2 (R Core Team 2019). The nonlinear decline in allele frequency over time prompted the fitting of a curved regression model, and we tested whether this linear regression polynomial model provided a better fit using a likelihood ratio test in the R package lmtest v 0.9-37 (Zeileis and Hothorn 2002). To test whether seasonal allele frequency trends result from different populations migrating through the migration station at different times or distinct migratory chronotypes, we examined the association between PC1 and latitude as well as allele frequency in our 4 top ranked loci and latitude of kestrels breeding across the west. Further, we genotyped 151 of the 165 migrating birds from Boise, Idaho (all samples for which we had high quality DNA remaining) with population-specific SNP-type assays used in Ruegg et al. (2021), and assigned these birds to the breeding population of origin using rubias (Anderson and Moran 2018). 4. Standards and calibration information, if appropriate: n/a 5. Environmental/experimental conditions: n/a 6. Describe any quality-assurance procedures performed on the data: 7. People involved with sample collection, processing, analysis and/or submission: C.M.B.: data curation, formal analysis, methodology, visualization, submission; J.A.H.: formal analysis; B.H.: visualization; K.C.R.: investigation, methodology; G.S.K.: American kestrel sample collection at Boise, Idaho migration station; Teia Schweizer: DNA extraction, and SNP-genotyping DATA-SPECIFIC INFORMATION FOR: AMKE.meta_CandGenotyping.Breeding_MigrIDonly.csv 1. Number of variables: 123 2. Number of cases/rows: 903 3. Variable List: The variable list includes meta data 4. Missing data codes: NA 5. Specialized formats or other abbreviations used: The genotypes of the candidate genes are coded as 0 (homozygous reference allele), 1 (heterozygous) and 2 (homozygous alternative allele) DATA-SPECIFIC INFORMATION FOR: AMKE.IDall.final_panel4-5.186SNP.rubias_fix.rm_miss.txt 1. Number of variables: 376 2. Number of cases/rows: 153 3. Variable List: This includes sample type, repunit, collection locality, individual and population specific SNPtype assay genotypes for migrating birds in Idaho. 4. Missing data codes: NA 5. Specialized formats or other abbreviations used: 1=A, 2=C, 3=G, 4=T DATA-SPECIFIC INFORMATION FOR: AMKE.IDrm_miss.rep_indiv_est.meta.txt 1. Number of variables: 16 2. Number of cases/rows: 153 3. Variable List: Collection locality (mixture_collection), individual (indiv), probability of assignment to the 5 genetic clusters of American Kestrel (AK, East, FL, TX and West), number of missing genotypes in analysis (na_count), and meta data of collection including Day, Month, Year,State, nearestCity, countryCode, latitude and longitude 4. Missing data codes: n 5. Specialized formats or other abbreviations used: NA