Genomic data reveals mixed support for the current subspecies designations in Eastern Screech-Owls (Megascops asio)
Data files
Jan 14, 2026 version files 58.41 MB
-
demult.sh
3.27 KB
-
filter.sh
2.19 KB
-
popmap_management
1.89 KB
-
popmap2
2.11 KB
-
populations.snps.222align.vcf
58.39 MB
-
README.md
4.12 KB
-
refmapSCRIPT
9.46 KB
Abstract
The Eastern Screech-Owl is a non-migratory resident forest owl that occurs across a broad distribution throughout much of eastern and central North America with five subspecies generally recognized across the range. Using reduced representation sequencing, we found that genetic differentiation among populations was generally low, with only two clear genetic clusters supporting that M. a. mccallii is distinct. Beyond these two main groups, we found subtle genetic clustering roughly corresponding to geographic region and the currently recognized subspecies, suggesting mixed support for current subspecies designations. To address these questions, we analyzed 228 individuals representing all five recognized subspecies at 8,220 SNPs using standard population genetic approaches. In addition to the subtle populations structuring noted above, the genetically identified subpopulations also varied in estimated effective population sizes and metrics of genetic diversity. We also detected a weak, but significant, signal of Isolation by Distance across the range, suggesting clinal patterns of genetic variation across this broadly distributed species. The genetic population structure we uncovered broadens our understanding of subspecies-level taxonomic classification in this species and may be useful for effective conservation management if it becomes necessary.
Dataset DOI: 10.5061/dryad.g4f4qrg33
Description of the data and file structure
We isolated genomic DNA using the DNeasy Extraction Kit (Qiagen, Valencia, California, USA) following manufacturer’s protocols, and quantified DNA concentrations using a Qubit 2.0 fluorometer (Life Technologies, New York, USA). We generated double-digested restriction-site associated DNA markers (ddRADtags) following the protocol of Thrasher et al. (2018) with some modifications. For DNA digestion, we used the enzymes SbfI and MspI (New England Biolabs, Massachusetts, USA) and ligated ends of the digested genomic DNA using T4 DNA ligase (New England BioLabs) to P1 and P2 adapters (Peterson et al., 2012). Indexing groups were pooled in equimolar concentrations sequenced on a shared Illumina NovaSeqX lane (2 X 150bp paired end reads) by the Biotechnology Resource Center (BRC) Genomics Facility (RRID:SCR_021727) at the Cornell Institute of Biotechnology (http://www.biotech.cornell.edu/brc/genomics-facility).
Variant Calling & Filtering
We filtered raw sequences of 150 bp for quality using the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit). We removed sequences if a single base had a Phred quality score less than 10 and if more than 95% of bases had a Phred quality score less than 20. We used the process_radtags commands in STACKS 2.67 (Catchen et al. 2013) to demultiplex the remaining sequences, retaining reads only if they passed the Illumina chastity filter, contained an intact SbfI RAD site, and did not contain Illumina sequencing adapters.
Reference Genome Alignment
Using Bowtie2-build (Langmead & Salzberg 2012), we aligned each of the filtered and demultiplexed sequences to a scaffold-level M. asio reference genome (Robinson et al. 2025) which we had indexed for alignment. We used the ref_map.pl program from STACKS 2 version 2.67 (http://catchenlab.life.illinois.edu/stacks/) which ran both the GSTACKS and POPULATIONS programs. We used the program POPULATIONS to filter genotypes and export the data. We required that a locus be present in a minimum of 80% of the individuals for it to be called and applied a minimum minor allele frequency filter of 0.05. Using the population.structure output from the POPULATIONS command, we analyzed the percentage of missing data for each sample and removed samples for which missing data was greater than 50%. We then filtered out samples where the adjusted mean GSTACKS coverage was less than 5x. These filtering steps followed standard STACKS 2 analysis protocol (Rivera-Colón & Catchen 2022).
Files and variables
File: demult.sh
Description: Script for demultiplexing pooled ddRAD libraries
File: filter.sh
Description: Script for filtering raw sequence data (trimming, QC)
File: popmap_management
Description: popmap file used in STACKS to generate the population-level VCF (individuals treated as populations based on Admixture data - see publication for details)
File: popmap2
Description: popmap file used in STACKS to generate the range-wide VCF (all individuals treated as a single population - see publication for details)
File: refmapSCRIPT
Description: Script used in STACKS to map reads to an existing reference genome
File: populations.snps.222align.vcf
Description: Filtered VCF from stacks. Individuals were filtered and aligned to an existing screech-owl reference genome. This VCF can be used with either popmap file above to generate a range-wide or population-level data set
Code/software
For all of the data posted here, raw sequences were processed, filtered, and aligned in STACKS. Variants were also called using the STACKS pipeline.
Using the provided VCF, traditional downstream population genetic analyses can be conducted.
Access information
Other publicly accessible locations of the data:
Data was derived from the following sources:
