Facultative asexual reproduction is a trait commonly found in invasive species. With a combination of sexual and asexual reproductive modes, such species may adapt to new environments via sexual recombination during range expansion, while at the same time having the benefits of asexuality such as the maintenance of fitness effects that depend upon heterozygosity. In the Western United States, native species of Rubus (Rosaceae) reproduce sexually whereas exotic naturalized Rubus species reproduce by pseudogamous apomixis. We hypothesized that new asexual lineages of Rubus could arise from hybridization in this range. To detect hybridization between native and exotic Rubus, we genotyped 579 individuals collected across California, Oregon, and Washington with eight nuclear microsatellites and two chloroplast markers. Principal Coordinate Analysis and Bayesian clustering revealed a limited amount of hybridization of the native R. ursinus with the exotic R. armeniacus and R. pensilvanicus as well as cultivated varieties. Genetic distances between these hybrids and their offspring indicated that both R. ursinus x armeniacus and R. ursinus x pensilvanicus produced a mix of apomictic and sexual seeds, with sexual seeds being more viable. Although neither of these hybrid types is currently considered invasive, they model the early stages of evolution of new invasive lineages, given the potential for fixed heterosis and the generation of novel genotypes. The hybrids also retain the ability to increase their fitness via sexual recombination and natural selection. Mixed reproductive systems such as those described here may be an important step in the evolution of asexual invasive species.
README
Description of files in the package.
apomixis_analysis_111211
"apomixis_analysis_111211.R" is a text file of R code for the analysis of reproductive mode. It was created in Emacs Speaks Statistics but can be opened in any text editor. When executed in R, the working directory will need to contain the files in this data archive.
apomixis_genotypes_110218
This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "apomixis_samples_110218.csv". The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The eight "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer than eight alleles were detected. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read this file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
apomixis_samples_110218
This file contains four columns of information regarding the parent plants, seeds, and seedlings that were genotyped for the study to determine the reproductive mode of the hybrids. The "Sample Name" column lists the names of the samples as they were used in analysis. The names of the parents correspond to the wells that they were in on the DNA dilution plates from the hybridization study. The formatting of the names of seed and seedling samples vary only because different researchers were working on the project over a long period. The "Parent clone" indicates the species or hybrid type from which seeds were collected, and each unique entry in this column corresponds to one clone. The "Type" column indicates whether the sample was the original parent, a seed embryo, or a seedling. The "DNA extraction" column indicates whether the DNA was extracted with a routine CTAB protocol or with a Zymo ZR Plant/Seed DNA Micro kit.
CBA6_distances
Pairwise genetic distances at locus CBA6 between the 579 samples that were analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. File contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
CBA6_genotypes
Microsatellite genotypes at locus CBA6. This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
CBA14_distances
Pairwise genetic distances at locus CBA14 between the 579 individuals analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
CBA14_genotypes
Microsatellite genotypes for locus CBA14. This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
CBA15_distances
Pairwise genetic distances at locus CBA15 for the 579 samples analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
CBA15_genotypes
Microsatellite genotypes at locus CBA15. This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
CBA23_distances
Pairwise distances at locus CBA23 between the 579 individuals that were analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
CBA23_genotypes
Microsatellite genotypes at locus CBA23. This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
CBA28_distances
Pairwise genetic distances at locus CBA28 for the 579 samples that were analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters. For pairs of individuals that both had nine or more alleles at locus RhCBA28, distances were calculated manually in Microsoft Excel and then inserted into the CBA28 matrix.
CBA28_genotypes
Microsatellite genotypes at locus CBA28. This is a tab-delimited text files in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
cultivated
Samples in the "cultivated" analysis group.
cutcultivated
Individuals in the "cultivated" group with the "cut" trnK allele.
fruticosus
Samples in the "fruticosus" analysis group.
hybridization_analysis
"hybridization_analysis.R" is a text file containing R code for replicating the analysis to detect hybridization based on inter-individual distances. It was created in Emacs Speaks Statistics but can be opened in any text editor. When executed in R, the working directory will need to contain the files in this data archive.
leucodermis
Samples in the "leucodermis" analysis group.
mean_distances
Mean pairwise genetic distances between the 579 samples analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
parviflorus
Samples in the "parviflorus" analysis group.
pensilvanicus
Samples in the "pensilvanicus" analysis group.
RUB26_distances
Pairwise genetic distances at locus RUB26 between the 579 samples that were analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
RUB26_genotypes
Microsatellite genotypes at locus RUB26. This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
RUB126_distances
Pairwise genetic distances at locus RUB126 between the 579 samples analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
RUB126_genotypes
Microsatellite genotypes at locus RUB126. This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
RUB262_distances
Pairwise genetic distances at locus RUB262 between the 579 samples analyzed. These distances were calculated using the meandistance.matrix function in the R package "polysat" version 0.1, using the Bruvo.distance measure. Each file contains a square matrix with row names and column names (sample names), as produced by the write.csv function in R version 2.11 under default parameters.
RUB262_genotypes
Microsatellite genotypes at locus RUB262. This is a tab-delimited text file in the format produced by Applied Biosystems GeneMapper software version 3.7. Each sample*locus genotype is represented in one row. The "Sample Name" column gives names of samples that directly correspond to those used in "species_chloroplast_latlong.csv". The "Well" column indicates the location of the sample on the DNA dilution plates. The "Marker" column indicates the microsatellite marker. The "Panel" column indicates groups of markers that were run simultaneously on the ABI3100 capillary sequencer. The "Dye" column indicates the fluorescent tag used to detect the PCR fragments: Y for NED, G for HEX, and B for FAM. The "Allele" columns contain the names of all alleles detected for each genotype. Cells are left blank if fewer alleles were detected than the number of columns. A value of -9 for Allele 1 indicates missing data. The "Height" columns indicate the height of the peak representing each allele, and the "Size" columns indicate the exact size of the DNA fragment as calculated by GeneMapper. A value of "TRUE" in the "AE" column indicates that the genotype was manually edited in GeneMapper, although all genotypes were visually inspected. The R package "polysat" can read these file directly, and will use the "Sample Name", "Marker", and "Allele" columns, and will ignore all other columns.
species_chloroplast_latlong
"species_chloroplast_latlong.csv" contains collection data and chloroplast genotypes for each sample in the hybridization study.
**The "Well" column indicates the plate number and well where the sample was located on the DNA dilution plates.
**The "Site" column contains a three-letter code for the collection site, which corresponds to a collection site description that can be found in the Supplementary Material fo the paper.
**The "Individual" column contains the site code plus number used to uniquely identify each individual.
**The "Initial Species Determination" column lists the species as it was determined either during collection or after preliminary analysis.
**The "Species based on genetic data" column lists the final species determination. Identification of R. vestitus and R. anglocandicans individuals is described in a manuscript that is in revision for the journal Biological Invasions.
**The "ndh Allele" column gives the size of the PCR fragment produced by amplifying the ndhF chloroplast region (using primers and PCR conditions are described in the paper), while the "Peak Height" column gives the height of the peak detected on an ABI3100 capillary sequencer for this allele, and "Run" lists the identity of the run as produced by the data collection software.
**The "trnK allele" column gives the result of a CAPS marker in the trnK region of the chloroplast genome: the "cut" allele has a HindIII site while the "uncut" allele does not. **"pch for R" contains the number of the symbol used to represent each individual in Principal Coordinate Analysis in R, based on the ndhF allele.
**"color for R" contains the colors initially used to represent each individual in Principal Coordinates Analysis in R, based on the trnK allele and whether the individual is a member of the R. fruticosus agg. or not. All "magenta" individuals were later changed to "red" so that the color would only represent the genetic results.
**"LatOrder" contains a number for each population, in order of increasing latitude.
**"Latitude" and "Longitude" contain decimal coordinates where the sample was collected, obtained using a Garmin GPSMAP 60CS in the WGS84 datum.
**"Collection date" contains the date that leaf tissue was collected in the field, in YYYYMMDD format.
**"Comment" contains any additional information, such as whether a pressed voucher was collected, whether the sample was excluded from analysis, or information about morphology or location.
spectabilis
Samples in the "spectabilis" analysis group.
uncutcultivated
Samples in the "cultivated" analysis group with the "uncut" trnK allele (maternally derived from R. ursinus).
uncut_Structure
"uncut_Structure.txt" is a tab-delimited file containing genotypes for all individuals in the hybridization study with the "uncut" trnK chloroplast allele, formatted for the software Structure 2.3.3. The file contains a row of marker names and a row of recessive allele symbols (all of them -9, the missing data symbol, to indicate allele copy number ambiguity), followed by the rows containing the genotypes. The ploidy of the file is 8, so each individual is represented by eight rows. The first column contains labels (sample names for each individual) and the second column contains numbers indicating population identity, corresponding to "LatOrder" in "species_chloroplast_latlong.csv". This file was generated using the R package "polysat" version 0.1.
ursinus
Samples in the "ursinus" analysis group.
rubus_photos1
Photographs of plants sampled at sites ARC, ARP, BBG, and BWP. Sample IDs can be seen within the images. Image format is .jpg.
rubus_photos2
Photographs of plants at sites CMB, CSR, CSP, and CFX. Sample IDs are shown within the images. Images are in .jpg format.
rubus_photos3
Photographs of plants at sites CMW, CTF, MGH, DVS, DNV, and DMU. Sample IDs are found in the images. Images are in .jpg format.
rubus_photos4
Photographs of plants at sites FCR, GTN, KLS, KVY, KRC, KNL, MCD, and PSG. Sample IDs can be seen within the images. Images are in .jpg format.
rubus_photos5
Photographs of plants at sites PLV, RSB, SDW, SFD, SRP, TCM, WIN, and WSH. Sample IDs can be seen in the images. Images are in .jpg format.