Skip to main content

Data from: Environment but not geography explains genetic variation in the invasive and largely panmictic European starling in North America

Cite this dataset

Hofmeister, Natalie R; Werner, Scott J; Lovette, Irby J (2022). Data from: Environment but not geography explains genetic variation in the invasive and largely panmictic European starling in North America [Dataset]. Dryad.


Populations of invasive species that colonize and spread in novel environments may differentiate both through demographic processes and local selection throughout the genome. European starlings (Sturnus vulgaris) were introduced to New York in 1890 and subsequently spread throughout North America, becoming one of the most widespread and numerous bird species on the continent. Genome-wide comparisons across starling individuals and populations can identify demographic and/or selective factors that facilitated this rapid and successful expansion. We investigated patterns of genomic diversity and differentiation using reduced-representation genome sequencing (ddRADseq) of 17 starling populations. Consistent with this species’ high dispersal rates and rapid expansion history, we found low genome-wide differentiation and few FST outliers even at a continental scale. Despite starting from a founding population of approximately 180 individuals, North American starlings do not seem to have undergone a detectable genetic bottleneck: they have maintained an extremely large effective population size since introduction. We find more than 200 variants that correlate with temperature and/or precipitation. Genotype-environment associations (but not outlier scans) identify these SNPs against a background of negligible genome- and range-wide divergence. Such variants fall in the coding regions of genes associated with metabolism, stress, and neurological function. This evidence for incipient local adaptation in North American starlings suggests that it can evolve rapidly even in wide-ranging and evolutionarily young populations. This survey of genomic signatures of expansion in North American starlings is the most comprehensive to date and complements ongoing studies of world-wide local adaptation in these highly dispersive and invasive birds.


Breast muscle tissue was sampled using biopsy punches (Integra Miltex) and frozen in 95% ethanol. Samples were shipped on dry ice, and DNA was extracted using a Qiagen DNeasy kit following the manufacturer's protocol (Qiagen). DNA concentration of each sample was quantified using a Qubit 2.0 fluorometer (Thermo Fisher Scientific). Following the protocol of Peterson et al. (2012), we generated a reduced-representation genomic data set of doubledigested, restriction-site associated DNA (RAD) markers as described in Thrasher et al. (2017) using the restriction enzymes SbfI and MspI and adaptors P1 and P2. We sequenced 100-bp, single-end reads of the 160 best-quality libraries on an Illumina HiSeq 2500. We trimmed and filtered for quality using the fastx-toolkit (http:// We then used the process_radtags commands in stacks version 1.19 (Catchen et al., 2013) to demultiplex the remaining sequences. In subsequent filtering steps, we retained reads only if the following conditions were met: reads passed the Illumina chastity filter, contained an intact SbfI RAD site, contained one of the unique barcodes, and did not contain Illumina indexing adaptors.

Individual reads were mapped to a Sturnis vulgaris reference genome (Hofmeister, Rollins et al., in prep) using bowtie2 version 2.2.8 (Langmead & Salzberg, 2012) using the “very sensitive local” set of alignment presets, and then assembled sequences into “stacks” using the ref-map option in stacks. Compared to a reference-free approach, the bioinformatics pipeline used for the reference-based assembly has the advantage of using fewer similarity thresholds to build loci. We required that a single-nucleotide polymorphism (SNP) be present in a minimum of 80% of the individuals (-r 0.8) with a minimum stack depth of 10 reads at a locus within an individual (-m 10) for it to be called. We removed two individuals, one with >50% missing data and one with >50% relatedness (measured using the unadjusted AJK statistic and calculated within vcftools), leaving 158 individuals remaining in the study. A total of 15,038 SNPs were identified. We used the VCFTOOLS –hwe option to remove any SNPs out of Hardy–Weinberg equilibrium (HWE) (e.g., an exact test that compared expected and observed heterozygosity in polymorphic sites only gave a p-value less than .001). About 6% of sequenced variants (904 variants) were out of HWE across all sampling sites; given that (i) we are particularly interested in SNPs that may be specific to certain populations, and (ii) filtering for HWE did not change the results described in sections (1) and (2) below, we retain all 15,038 SNPs for the VCF file included in this upload.

Usage notes


North America