Metadata for files deposited at Dryad for: Clark LV and Jasieniuk M "Spontaneous hybrids between native and exotic Rubus in the Western United States produce offspring both by apomixis and by sexual recombination." Heredity. Data collected by Lindsay Clark and Stella Hartono. Contact Lindsay V. Clark at lvclark@illinois.edu. "species_chloroplast_latlong.csv" contains collection data and chloroplast genotypes for each sample in the hybridization study. **The "Well" column indicates the plate number and well where the sample was located on the DNA dilution plates. **The "Site" column contains a three-letter code for the collection site, which corresponds to a collection site description that can be found in the Supplementary Material fo the paper. **The "Individual" column contains the site code plus number used to uniquely identify each individual. **The "Initial Species Determination" column lists the species as it was determined either during collection or after preliminary analysis. **The "Species based on genetic data" column lists the final species determination. Identification of R. vestitus and R. anglocandicans individuals is described in a manuscript that is in revision for the journal Biological Invasions. **The "ndh Allele" column gives the size of the PCR fragment produced by amplifying the ndhF chloroplast region (using primers and PCR conditions are described in the paper), while the "Peak Height" column gives the height of the peak detected on an ABI3100 capillary sequencer for this allele, and "Run" lists the identity of the run as produced by the data collection software. **The "trnK allele" column gives the result of a CAPS marker in the trnK region of the chloroplast genome: the "cut" allele has a HindIII site while the "uncut" allele does not. **"pch for R" contains the number of the symbol used to represent each individual in Principal Coordinate Analysis in R, based on the ndhF allele. **"color for R" contains the colors initially used to represent each individual in Principal Coordinates Analysis in R, based on the trnK allele and whether the individual is a member of the R. fruticosus agg. or not. All "magenta" individuals were later changed to "red" so that the color would only represent the genetic results. **"LatOrder" contains a number for each population, in order of increasing latitude. **"Latitude" and "Longitude" contain decimal coordinates where the sample was collected, obtained using a Garmin GPSMAP 60CS in the WGS84 datum. **"Collection date" contains the date that leaf tissue was collected in the field, in YYYYMMDD format. **"Comment" contains any additional information, such as whether a pressed voucher was collected, whether the sample was excluded from analysis, or information about morphology or location. Eight files contain microsatellite genotype data from the hybridization study. These are "CBA6_genotypes.txt", "CBA14_genotypes.txt", "CBA15_genotypes.txt", "CBA23_genotypes.txt", "CBA28_genotypes.txt", "RUB26_genotypes.txt", "RUB126_genotypes.txt", and "RUB262_genotypes.txt". These are tab-delimited text files in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns. Nine files contain pairwise genetic distances between the 579 samples that were analyzed. These are "mean_distances.csv", "CBA6_distances.csv", "CBA14_distances.csv", "CBA15_distances.csv", "CBA23_distances.csv", "CBA28_distances.csv", "RUB26_distances.csv", "RUB126_distances.csv" and "RUB262_distances.csv". These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters. For pairs of individuals that both had nine or more alleles at locus RhCBA28, distances were calculated manually in Microsoft Excel and then inserted into the CBA28 matrix. The mean distance matrix was then recalculated to include these additional values. Nine files contain names of samples used in nine different analysis groups. "fruticosus.txt", "ursinus.txt", "pensilvanicus.txt", "leucodermis.txt", "spectabilis.txt", and "parviflorus.txt" contain sample names corresponding to those species. "cultivated.txt" contains all names of samples that were known to be cultivated types at the time of sampling. This list is further divided into the groups in "uncutcultivated.txt" and "cutcultivated.txt", which are cultivated varieties that have the "uncut" trnK allele and "cut" trnK allele, respectively. "hybridization_analysis.R" is a text file containing R code for replicating the analysis to detect hybridization based on inter-individual distances. It was created in Emacs Speaks Statistics but can be opened in any text editor. When executed in R, the working directory will need to contain the files in this data archive. "uncut_Structure.txt" is a tab-delimited file containing genotypes for all individuals in the hybridization study with the "uncut" trnK chloroplast allele, formatted for the software Structure 2.3.3. The file contains a row of marker names and a row of recessive allele symbols (all of them -9, the missing data symbol, to indicate allele copy number ambiguity), followed by the rows containing the genotypes. The ploidy of the file is 8, so each individual is represented by eight rows. The first column contains labels (sample names for each individual) and the second column contains numbers indicating population identity, corresponding to "LatOrder" in "species_chloroplast_latlong.csv". This file was generated using the R package "polysat" version 0.1. "apomixis_samples_110218.csv" This file contains four columns of information regarding the parent plants, seeds, and seedlings that were genotyped for the study to determine the reproductive mode of the hybrids. The "Sample Name" column lists the names of the samples as they were used in analysis. The names of the parents correspond to the wells that they were in on the DNA dilution plates from the hybridization study. The formatting of the names of seed and seedling samples vary only because different researchers were working on the project over a long period. The "Parent clone" indicates the species or hybrid type from which seeds were collected, and each unique entry in this column corresponds to one clone. The "Type" column indicates whether the sample was the original parent, a seed embryo, or a seedling. The "DNA extraction" column indicates whether the DNA was extracted with a routine CTAB protocol or with a Zymo ZR Plant/Seed DNA Micro kit. "apomixis_genotypes_110218.txt" This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "apomixis_samples_110218.csv". The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The eight "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer than eight alleles were detected. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read this file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns. "apomixis_analysis_111211.R" is a text file of R code for the analysis of reproductive mode. It was created in Emacs Speaks Statistics but can be opened in any text editor. When executed in R, the working directory will need to contain the files in this data archive. Several .zip compressed archives, beginning with "rubus_photos", contain photographs of most individuals from the study. Photographs are organized into folders by collection site. Sample IDs are contained within each image.