README_dryad.txt file was generated on 2022-08-12 by Matthew Webster GENERAL INFORMATION 1. Title of Dataset: Data from: A genomic and morphometric analysis of alpineÊbumblebees: Ongoing reductions in tongue length but no clearÊgenetic component 2. Author Information Corresponding Investigator Name: Matthew Webster Institution: Uppsala University Email: matthew.webster@imbim.uu.se Co-investigators: Matthew J. Christmas (1) Julia C. Jones (1,2) Anna Olsson (1) Ola Wallerman (1) Ignas Bunikis (3) Marcin Kierczak (4) Kaitlyn M. Whitley (5,6) Isabel Sullivan (5,7) Jennifer C. Geib (5) Nicole E. Miller-Struttmann (8) 1 Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden 2 School of Biology and Environmental Science, University College Dublin, Dublin, Ireland 3 Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden 4 Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden 5 Department of Biology, Appalachian State University, Boone, North Carolina, USA 6 U.S. Department of Agriculture, Agriculture Research Service, Charleston, South Carolina, USA 7 Marine Estuarine Environmental Sciences, University of Maryland, College Park, Maryland, USA 8 Biological Sciences Department, Webster University, St. Louis, Missouri, USA 3. Date of data collection: 2017 4. Geographic location of data collection: Rocky Mountains, Colorado, USA 5. Funding sources that supported the collection of the data: Swedish Research Council Formas 6. Recommended citation for this dataset: DOI: 10.1111/mec.16291 DATA & FILE OVERVIEW 1. Description of dataset Over the last six decades, populations of the bumblebees Bombus sylvicola and Bombus balteatus in Colorado have experienced decreases in tongue length, a trait important for plant-pollinator mutualisms. It has been hypothesized that this observation reflects selection resulting from shifts in floral composition under climate change. Here we used morphometrics and population genomics to determine whether morphological change is ongoing, investigate the genetic basis of morphological variation, and analyse population structure in these populations. We analysed whole-genome sequencing data and morphometric measurements of 580 samples of both species from seven high-altitude localities. Out of 281 samples originally identified as B. sylvicola, 67 formed a separate genetic cluster comprising a newly-discovered cryptic species (ÒincognitusÓ). However, an absence of genetic structure within species suggests that gene flow is common between mountains. We did not discover any genetic associations with tongue length, but a SNP related to production of a proteolytic digestive enzyme was implicated in body size variation. We identified evidence of covariance between kinship and both tongue length and body size, which is suggestive of a genetic component of these traits, although it is possible that shared environmental effects between colonies are responsible. Our results provide evidence for ongoing modification of a morphological trait important for pollination and indicate that this trait probably has a complex genetic and environmental basis. This archive contains genetic variation data derived from genome sequencing of 580 bumblebee samples collected from high-elevation locations in Colorado. The species are Bombus sylvicola (n=214), Bombus balteatus (n=299) and "incognitus" (n=67). 2. File List: File 1 Name: B_sylvicola.vcf.gz File 2 Name: b_balteatus.vcf.gz File 3 Name: incognitus.vcf.gz File 1 Description: vcf file (variant call format) for the B. sylvicola samples File 2 Description: vcf file (variant call format) for the B. balteatus samples File 3 Description: vcf file (variant call format) for the ÒincognitusÓ samples METHODOLOGICAL INFORMATION We extracted DNA from the thoraces of bumblebees collected from across the seven sampling sites using the Qiagen Blood and Tissue kit. We prepared dual-indexed libraries using the Nextera Flex kit and performed sequencing on an Illumina HiSeq X to produce 2 ? 150 bp reads, using an average of 36 samples per lane. The Bombus sylvicola and "incognitus" samples are mapped to the Bombus sylvicola reference assembly (GCA_019677175.1) and the Bombus balteatus samples are mapped to the Bombus balteatus assembly (GCA_019201815.1).We mapped reads to the two reference genome assemblies using the mem algorithm in BWA. We performed sorting and indexing of the resultant bam files using samtools and marked duplicate reads using Picard. We used the genome analysis toolkit (GATK) to call variants. We first ran HaplotypeCaller using default parameters on the bam file of each sample to generate a gVCF file for each sample. We then used GenomicsDBImport and GenotypeGVCFs with default parameters to call variants for all samples mapping to each reference assembly seperately. We applied a set of hard filters using the VariantFiltration tool to filter for reliable SNPs using the following thresholds: QD <2, FS >60, MQ <40, MQRankSum