Data from: Environment but not geography explains genetic variation in the invasive and largely panmictic European starling in North America
Data files
Mar 07, 2022 version files 51.57 MB
-
EUSTallSNPr8.vcf
51.56 MB
-
Indices.zip
6.68 KB
Abstract
Breast muscle tissue was sampled using biopsy punches (Integra Miltex) and frozen in 95% ethanol. Samples were shipped on dry ice, and DNA was extracted using a Qiagen DNeasy kit following the manufacturer's protocol (Qiagen). DNA concentration of each sample was quantified using a Qubit 2.0 fluorometer (Thermo Fisher Scientific). Following the protocol of Peterson et al. (2012), we generated a reduced-representation genomic data set of doubledigested, restriction-site associated DNA (RAD) markers as described in Thrasher et al. (2017) using the restriction enzymes SbfI and MspI and adaptors P1 and P2. We sequenced 100-bp, single-end reads of the 160 best-quality libraries on an Illumina HiSeq 2500. We trimmed and filtered for quality using the fastx-toolkit (http:// hannonlab.cshl.edu/fastx_toolkit). We then used the process_radtags commands in stacks version 1.19 (Catchen et al., 2013) to demultiplex the remaining sequences. In subsequent filtering steps, we retained reads only if the following conditions were met: reads passed the Illumina chastity filter, contained an intact SbfI RAD site, contained one of the unique barcodes, and did not contain Illumina indexing adaptors.
Individual reads were mapped to a Sturnis vulgaris reference genome (Hofmeister, Rollins et al., in prep) using bowtie2 version 2.2.8 (Langmead & Salzberg, 2012) using the “very sensitive local” set of alignment presets, and then assembled sequences into “stacks” using the ref-map option in stacks. Compared to a reference-free approach, the bioinformatics pipeline used for the reference-based assembly has the advantage of using fewer similarity thresholds to build loci. We required that a single-nucleotide polymorphism (SNP) be present in a minimum of 80% of the individuals (-r 0.8) with a minimum stack depth of 10 reads at a locus within an individual (-m 10) for it to be called. We removed two individuals, one with >50% missing data and one with >50% relatedness (measured using the unadjusted AJK statistic and calculated within vcftools), leaving 158 individuals remaining in the study. A total of 15,038 SNPs were identified. We used the VCFTOOLS –hwe option to remove any SNPs out of Hardy–Weinberg equilibrium (HWE) (e.g., an exact test that compared expected and observed heterozygosity in polymorphic sites only gave a p-value less than .001). About 6% of sequenced variants (904 variants) were out of HWE across all sampling sites; given that (i) we are particularly interested in SNPs that may be specific to certain populations, and (ii) filtering for HWE did not change the results described in sections (1) and (2) below, we retain all 15,038 SNPs for the VCF file included in this upload.
- Hofmeister, Natalie R.; Werner, Scott J.; Lovette, Irby J. (2021), Environmental correlates of genetic variation in the invasive European starling in North America, Molecular Ecology, Journal-article, https://doi.org/10.1111/mec.15806
