# Title of Dataset: Data for "Limited genetic differentiation of Mycetomoellerius mikromelanos in Parc National Soberanía, Panama: Implications for queen dispersal" Cardenas et al. --- ## ABSTRACT The coevolutionary relationship between fungus-growing ants (Formicidae: Attini: Attina) and their symbionts has been well studied in the Panamanian rain forests. To further understand the ecological context of these evolutionary relationships, we have examined the population genetic structure of the fungus-growing ant species *Mycetomoellerius mikromelanos* Cardenas, Schultz, Adams 2021 in the Panama Canal Zone. We specifically investigated the presence of population structure, the significance of geographic features (i.e., creeks) limiting gene flow, and relatedness between ant colonies. To accomplish this, we genotyped 85 ant colonies from nine creeks across an approximately 30 km transect in Parque National Soberanía, Panama using double digest restriction-site associated DNA sequencing. We did not find distinct population structure using two genetic clustering methods; however, we did detect an effect of isolation by distance. Furthermore, related colonies were frequently detected on the same creek or neighboring creeks, and some at further geographic distances. Collectively, these findings demonstrate that new colonies tend to establish on natal creeks and occasionally on distant creeks following long-distance dispersal events. We discuss how population genetic patterns reveal the natural history of *M. mikromelanos* in Parque National Soberanía and how these results fit into the context of fungus-growing ant mutualisms. ## Data description DNA was extracted from a single worker from colonies of *Mycetomoellerius mikromelanos* found Parque National Soberanía. Followign the ddRADseq protocol of Peterson et al. (2012) digestion was performed with restriction site enzymes Msp1 and Spfl-HF (New England Biolabs, Inc., Ipswich, Massachusetts, USA). Genomic fragments of 250-600 bp in length were selected and sequenced 150 bp paired-end reads at Nationwide Childrens Hospital in Columbus, Ohio USA on an Illumina HiSeq 4000. SNP data was QC'd and trimmed (FastQC v0.11.7; Andrews 2018; trimmomatic v0.38; Bolger et al. 2014) and SNP discovery performed with ipyrad (v0.9.31; Eaton & Overcast, 2020). In sequence files (barcodes, etc.) bird, bird plot, BrdPL or any other variation within is now limbo plot in main article and supporting information. ### **./barcodes** barcode information for each index > lib1i01_barcodes.txt > lib1i02_barcodes.txt > lib1i03_barcodes.txt > lib1i04_barcodes.txt > lib1i05_barcodes.txt > lib1i06_barcodes.txt > lib1i07_barcodes.txt > lib1i12_barcodes.txt > lib2i03_barcodes.txt > lib2i06_barcodes.txt > lib2i12_barcodes.txt ### **DNA_catalog_ddRADseq.xlsx** indicates all sequencing information, indexes, names, localities, species, etc. Contains README tab with details for each subsequent tab. * "Creek count" tab Provides details of sampling sites from 2017 & 2018 feild seaosns * "coll_org" Column info * Box, storage box number * Position, number in storage box * Collection code, Attine specalist collection code format * Contents, caste type * Genus, taxonomic level * species, taxonomic level * Lat, latitudinal coordinates * long, longitudinal coordinates * elevation, elevation in meters * locality, collection locality * "for library prep" Column info * xtrctn Code, extraction code * Plate, plate one of two * Plate Position, position on library preperation plate * adapter 1, adapter info * adapter 2, adapter info * total ng, total DNA extracted in nano grams * assay ng/ul, qubit calculatin of DNA concentration * volume necessary, volume estimated for DNA concentration * ul to vacufuge, volume used to concentrate DNA * Coll. Code, attine specalist collection code * Location, location namte * "DNA_catalog" DNA extraction information with *Trachymyrmex zeteki* samples present * Coll. Code, attine specalist collection code * Location, locality name * \# of spm, number of specimen used in extraction * xtrctn Code, DNA extraction code * Eluted in, elution solution (eg., H2O or EB Buffer) * Elution Volume ul, Total elution volume * total ng, estimated total DNA in nano grams * assay ng/ul, qubit assay in nano grams per micro litre * volume necessary, volume nevessary for ca. 100 ng of DNA * total volume present, total volume * extraction started, extraction date * extraction finished, extraction date * Comment, who extracted and when * "DNA_Catalog (2)" DNA extraction informat, with just *Mycetomoellerius mikromelanos* samples, same column information as "DNA_catalog" * Locality specific tabs "Bird Plots", "Frijolito", "La Seda", and "Plantation Road" with collection information. Each contains the following columns: * Coll. Code, attine specalist extraction code * Location, locality name * xtrctn Code, DNA extraction code * lat, latitude coorddinates * lon, longitudinal coordinates * elev, elevation in meters * "plate-template" Copy and paste of columns without headers for for plate organization * "A" plate number * "B" Plate position * "C" index information ### **Sqeuence data** each file contains fastq.gz files by index group for sequence reads > C_CarstensB_cc1_i12_V1C_1.tar.gz > C_CarstensB_cc1_i1_V1C_1.tar.gz > C_CarstensB_cc1_i2_V1C_1.tar.gz > C_CarstensB_cc1_i3_V1C_1.tar.gz > C_CarstensB_cc1_i4_V1C_1.tar.gz > C_CarstensB_cc1_i5_V1C_1.tar.gz > C_CarstensB_cc1_i6_V1C_1-1.tar.gz > C_CarstensB_cc1_i7_V1C_1.tar.gz > C_CarstensB_cc2_i12_V1C_1.tar.gz > C_CarstensB_cc2_i3_V1C_1.tar.gz > C_CarstensB_cc2_i6_V1C_1.tar.gz ### **"Supporting_data.zip"** Supporting data contains fastqc stats, ipyrad API notebook and output data, R analysis scripts, and COLONY analysis directories. ### *./"r_analysis data"* This file contains conStruct analysis, radiatior filtering data, 2020_9_25.R population analyses, vcf file used in radiatior (setmarch_vr1.vcf), and strata.csv used in R analyses. * "strata.csv" Strata used in R analyses. Column variables: * set11, dataset with filtered samples (set 1 of 1) * loc, locality ID * creek, creek ID * xrn_code, extraction code * lat, latitude coordinates * long, longitude coordinates * elevation, in meters * watershed_eph, Watersehd N or ephemeral creek (creek goes dry in during dry season) * watershed_2, creek type, Watershed N or ephemeral creek N in watershed.. * NvsS, creek north or south of the Chagras River (N or S) * 2020_09_25.R Contains codes used for radiatior filtering and subsequent analysis of filtered and unfiltered datsets. Analyses include: Hardy-Weinberg filtering parameters, summary statistics, fixation indices, AMOVA, Mantel Test of IBD, creek centroid calculations, data conversions, data formating for construct, data formating for COLONY, and plot construction of relatedness dyad of COLONY output. #### *./construct* This directory contains all conStruct input and output files. For each analysis, a spatial (`*sp_`) and (`*nsp_`) analysis was performed for ancestral population values (k) 1:4. This produces multiple output types: `k1sp_*`, `k2nsp_*`... or `k3nsp_*`, `k4nsp_*`. Furthermore, for each analysis, an MCMC algorithm was used using 5 chains. Because of this, many output PDF files contain `*.chain_1.pdf`, `*.chain_2.pdf`, etc. * `admixprop*.pdf` admixture proportions for each K value. "admixprops.pdf" is the same as "admixprop2.pdf", just in a different orientation and color scheme. * allelefreqs_canalzone.Rdata R object of allele frequences generated in 2020_09_25.R analysis * conStruct_pop.R R script for running construct analyses * cross-validation.pdf Cross validation results comparing spatial and non-spatial analyses * cross_val.xval.data.partitions.Robj R object of cross validation proportions calculated for each k value * cross_val.xval.results.Robj R object of cross validation results used to construct cross-validation.pdf * cross_val_nsp_xval_results.txt non-spatial cross validation results * cross_val_sp_xval_results.txt spatial cross validation results * dist_bycreek.Rdata R object of pairwise creek distances * `*_conStruct.results.Robj` R object of conStruct results for reach spatial and non-spatial analysis * `*_data.block.Robj` R object of conStruct data for each spatial and non-spatial analysis * `*_layer.cov.curves.chain_*.pdf` covarance curve estimates * `*_model.fit.CIs.chain_*.pdf` model fit confidence intervals * `*_trace.plots.chain_*.pdf` MCMC chain trace pltos * `*_pie.map.chain_*.pdf` admixture proportions in pie-plot format * `*_structure.plot.chain_*.pdf` structure plot of each spatial and non spatial analyses * `*_model.fit.Robj` R object of model fit data used by conStruct #### *./"radiatior filtered data"* Radiatior formating and filtering outputs. ##### *./"radiatior filtered data/filter_rad_20200411@0018"* Description of output files and steps used in radiatior for this filtering step are found at [thierrygosselin.github.io](https://thierrygosselin.github.io/radiator/reference/filter_rad.html) ##### *./"radiatior filtered data/read_vcf_20200411@0017"* output of the "read_vcf" radiatior function, used to create tidy_VCF files for R analysis. Details found at [thierrygosselin.github.io](https://thierrygosselin.github.io/radiator/reference/read_vcf.html) ### *./ipyrad_output* contains all snp data generated by ipyrad; output formats described at [ipyrad.readthedocs.io](https://ipyrad.readthedocs.io/en/master/output_formats.html?highlight=output#output-formats) ### *./ipyrad_nb* contains ipynb, HTML files, and pdfs for each of the following ipyrad setps: demultiplexing, `march 2020_analysis.*` SNP discovery, PCA on raw data, and structure, `sept2020_filteredanalysis.*` PCA analysis on filtered data, `sept2020_PCAanalysis.*` ### *./FASTQ* all index groups FASTQC statistics in HTML format ### *./colony* Colony output files * coloines_prob-greaterthan-50.txt colony with relatedness probabilities greater than 50& * colonydata colony input file without file extension * sibship12iv2020.FullSibDyad all colony relatedness probabilities * coloines_prob-lessthan-50.txt colonies with relatedness probabilities less than 50% * sibship.dat colony input file with file extension * strata.filtered_forcolony.tsv