Data archive from: Genetic variation and risks of introgression in the wild Coffea arabica gene pool in southwestern Ethiopian montane rainforests Aerts, Raf, Berecha, Gezahegn, Gijbels, Pieter, Hundera, Kitessa, Van Glabeke, Sabine, Vandepitte, Katrien, Muys, Bart, Roldán-Ruiz, Isabel, Honnay, Olivier Evolutionary Applications Final data archive compiled 2012.06.21 by Raf Aerts (ResearcherID: A-7602-2008 URL: http://www.researcherid.com/rid/A-7602-2008) FILES: Aerts-Coffea-SSRdata.xls: Microsoft Office Excel worksheet [Compatibility Mode] containing metadata and genotypes CoffeaSSRs.txt: tab-delimited ASCII file containing genotypes in GeneMapper format compatible with POLYSAT in R bunagen.r: R script to load the arabica coffee microsatellite dataset into an R/POLYSAT workspace ATbuna24.at: ASCII file containing genotypes in ATetra format bunagenPA.txt: comma-delimited ASCII file containing genotypes in binary (dominant) format for further use in e.g. GenAlEx bunastruct.txt: tab-delimited ASCII file in STRUCTURE format bunagenplots.kmz: Google Earth points of sample plots bunagenloci.txt: INTROGRESS locus file bunagenpp1.txt: INTROGRESS parent population file (wild genotypes) bunagenpp2.txt: INTROGRESS parent population file (cultivars) bunagenadmix.txt and bunagenadmixfc.txt: INTROGRESS potentially admixed samples for SFC and FC populations bunaintrogress.r: R script calculating hybrid index using INTROGRESS DETAILED DESCRIPTION FOR Aerts-Coffea-SSRdata.xls: Worksheets: Population metadata Marker metadata Genotypes: genotypes in dominant marker format (fn. 1) Genemapper format for R polysat: genotypes in GeneMapper format compatible with POLYSAT in R Binary format: genotypes in presence/absence format Genalex IDs: transformation of unique IDs to Bx format (alphabetical) for GenAlEx Genalex format: genotypes in presence/absence format with parameters for GenAlEx Structure format: genotypes in tetraploid format for STRUCTURE Worksheet contents: Population metadata: Coffee production system: forest coffee or semi-forest coffee Region: name of region where population is located Population: population name Population code: population code used in sample IDs Number of samples: number of individuals sampled in population Latitude °N: latitude of population location in decimal degrees on WGS84 datum Longitude °E: longitude of population location in decimal degrees on WGS84 datum Marker metadata: Multiplex panel: ID of the multiplex panel in which the marker was used Locus: Coffea microsatellite DNA locus Dye: dye used for marker (DS-33 Applied Biosystems Standard Dye Set for Genotyping Applications) GenBank Accession URL: link to microsatellite metadata record in GenBank Genotypes: Population: population code used in sample IDs Individual ID: unique individual ID Alleles (161 columns) Genemapper format for R polysat: Sample Name: unique individual ID Marker: marker ID (fn. 2) Allele 1: fragment length in nucleotides; -9 = missing value Allele 2: fragment length in nucleotides; -9 = missing value Allele 3: fragment length in nucleotides; -9 = missing value Allele 4: fragment length in nucleotides; -9 = missing value Binary format: unique individual ID presence (1), absence (0) or missing data (-9) for all alleles (markerID.frag length) GenalexIDs: unique individual ID transformed to GenAlEx ID Genalex: Genalex format Structure format: STRUCTURE format [1] allele name =(right(GenBanklocus; 3))+"_"+(dye)+"_"+(fragment length in nucleotides) [2] marker ="loc"+(right(Genbanklocus;3)