Data from: Genetic-environment associations explain genetic differentiation and variation between western and eastern North Pacific Rhinoceros Auklet (Cerorhinca monocerata) breeding colonies

Graham, Brendan 1 ; Hipfner, Mark2; Wellband, Kyle3; Motohiro, Ito4; Burg, Theresa5

Published Jul 14, 2025 on Dryad. https://doi.org/10.5061/dryad.ffbg79d39

Data files

Jul 14, 2025 version files 102.09 MB

auklets.vcf

102.07 MB
README.md

1.67 KB
RHAUenvVar.csv

18.48 KB

Abstract

Animals are strongly connected to the environments they live in and may become adapted to local environments. Examining genetic-environment associations of key indicator species, like seabirds, provide greater insights into the forces that drive evolution in marine systems. Here we examined a RADseq dataset of 19,213 SNPs for 99 Rhinoceros Auklets (Cerorhinca monocerata) from five western Pacific and ten eastern Pacific breeding colonies. We used partial-redundancy analyses to identify candidate adaptive loci and to quantify the effects of environmental variation on population genetic structure. We identified 262 candidate adaptive loci, which accounted for 3.0% of the observed genetic variation among western Pacific and eastern Pacific breeding colonies. Genetic variation was more strongly associated with pH and maximum current velocity, than maximum sea surface temperature. Genetic-environment associations explain genetic differences between western and eastern Pacific populations, however, genetic variation within the western and eastern Pacific Ocean populations appears to follow a pattern of isolation-by-distance. This study represents a first to quantify the relationship between environmental and genetic variation for this widely distributed marine species and provides greater insights into the evolutionary forces that act on marine species.

DNA was extracted from blood samples using a salting out extraction protocol (for samples from the eastern Pacific, Miller, Dykes & Polesky, 1988) or a Qiagen DNAeasy kit (for samples from western Pacific). Genomic DNA was used to construct nextRAD genotyping-by-sequencing libraries (SNPsaurus, LLC) using the Sbf1 enzyme as described by Baird et al. (2008). Genomic DNA was first fragmented with Nextera reagent (Illumina, Inc), which also ligates short adapter sequences to the ends of the fragments. The Nextera reaction was scaled for fragmenting 15 ng of genomic DNA, although 20 ng of genomic DNA was used for input to compensate for the amount of degraded DNA in the samples and to increase fragment sizes. Fragmented DNA was then amplified for 27 cycles at an annealing temperature of 74 ^oC, with one of the primers matching the adapter and extending ten nucleotides into the genomic DNA with the selective sequence GTGTAGAGCC. Only those fragments starting with that sequence can be hybridized by the selective sequence of the primer and efficiently amplified. This protocol resulted in a final library fragment size of 450 bp (Etter et al. 2011). The nextRAD libraries were sequenced on an Illumina NovaSeq 6000 with one lane of single-end 150 bp reads. All genomic library preparations and sequencing were completed at the University of Oregon.

Sequences were demultiplexed and then trimmed to 122 bp by SNPsaurus using the SNPsaurus pipeline with the bbduk package (BBMaptools, http://sourceforge.net/projects/bbmap/). Next, we assembled reference loci by collecting 10 million high quality reads, evenly from all of the samples (~70, 000 reads per individual were used), and excluding loci with fewer than seven or more than 700 reads. This range of seven to 700 represents a standardized number calculated by SNPsaurus to retain as many loci as possible without compromising the quality of the data with low quality reads. Overall mean depth of the reference genome was 65x. Loci that met the previously stated criteria were then aligned to the assembled reference genome using custom script from SNPsaurus (SNPsaurus, LLC). For the de novo alignment, we mapped 152,204,819 of the original 289,864,865 single-end reads to the de novo reference genome using an identity threshold of 0.95 using bbmap (BBMap tools). Genotype calling was done using the callvariants tool (BBMap tools), with the following settings (multisample=t rarity=0.05 minallelefraction=0.05 usebias=f ow=t nopassdot=f minedistmax=5 minedist=5 minavgmapq=15 minreadmapq=15 minstrandratio=0.0 strandedcov=t). The genotype data were converted to a VCF file where we filtered the data to remove loci with a minimum frequency of less than 3%, a Q-score below 20, and removed all individuals with greater than 60% missing data (an additional 13 individuals did not meet this criterion and were omitted from all analyses). The average percentage of missing data was much lower than this original threshold (mean=5.8% missing data; median = 3.3% missing data), although we included three individuals with 40% missing data because they grouped with other individuals from the same population. To ensure that relatedness did not skew our results we calculated relatedness among individuals in Genodive 3.04 (Miermans, 2020). Relatedness among individuals from the same population was <0.08, with exception to one pair that had a relatedness of 0.26, suggesting that one set of full siblings from Teuri were present in our data. Given the low level of relatedness among our data, we retained all samples in our analyses. We retained all of the 19,213 SNPs following the filtering for our examination of genetic-environment associations.

Data from: Genetic-environment associations explain genetic differentiation and variation between western and eastern North Pacific Rhinoceros Auklet (Cerorhinca monocerata) breeding colonies

Data files

Abstract

README: Genetic-environment associations explain genetic differentiation and variation between western and eastern North Pacific Rhinoceros Auklet (Cerorhinca monocerata) breeding colonies

Methods

Works referencing this dataset