Set the work directory.
To import the full data of 9 populations
Pinf9 <- read.genalex("Pinf9 new data input file for Poppr.csv", ploidy = 3, geo = FALSE, region = FALSE)
To set the strata for the whole data set and return the names of populations.
setPop(Pinf9)<- ~ Pop
popNames(Pinf9)
## [1] "Chiapas" "Jalisco" "Michoacan" "Puebla" "Tlaxcala" "Toluca"
## [7] "USA" "Veracruz" "Zacatecas"
To check the basic information about the new data set.
Pinf9
##
## This is a genclone object
## -------------------------
## Genotype information:
##
## 154 original multilocus genotypes
## 197 triploid individuals
## 12 codominant loci
##
## Population information:
##
## 1 stratum - Pop
## 9 populations defined - Chiapas Jalisco Michoacan ... USA Veracruz
## Zacatecas
Defining multilocus genotypes by genetic distance allows us to incorporate genotypes that have missing data or genotyping error into their parent clusters. One method of choosing a threshold is to find a gap in the distance distribution that represents clonal groups. We can look at this by analyzing the distribution of all possible thresholds with the function “cutoff_predictor” (see R package poppr reference manual). Here, we’ll use Bruvo’s distance to calculate the cutoff
Pinf9reps <- fix_replen(Pinf9, c(D13 = 2, G11 = 2, Pi04 = 2,Pi4B = 2, Pi63 = 3, Pi70 = 3, SSR11 = 2, SSR2 = 2, SSR3 = 2, SSR4 = 2, SSR6 = 2, SSR8 = 2))
thresholds <- mlg.filter(Pinf9, algorithm = "farthest_neighbor", distance = bruvo.dist, stats = "THRESHOLDS", replen = Pinf9reps, threshold = 1)
We will use these thresholds to find an appropriate cutoff
pcut <- cutoff_predictor(thresholds)
pcut
## [1] 0.07565534
mlg.filter(Pinf9, distance = bruvo.dist, replen = Pinf9reps) <- pcut
To see the basic information of this lineage-corrected data set
Pinf9
##
## This is a genclone object
## -------------------------
## Genotype information:
##
## 133 contracted multilocus genotypes
## (0.076) [t], (bruvo.dist) [d], (farthest) [a]
## 197 triploid individuals
## 12 codominant loci
##
## Population information:
##
## 1 stratum - Pop
## 9 populations defined - Chiapas Jalisco Michoacan ... USA Veracruz
## Zacatecas
To show all individuals associated with a single multilocus genotype
Pinf9LC <- mlg.id(Pinf9)
Pinf9LC
## $`1`
## [1] "Zac-6"
##
## $`2`
## [1] "Tlax724"
##
## $`3`
## [1] "Mich7005"
##
## $`5`
## [1] "Pic_97744" "Pic_97748" "Pic_97749" "Pic_97750" "Pic_97751"
##
## $`6`
## [1] "Tlax732"
##
## $`7`
## [1] "Tlax740"
##
## $`8`
## [1] "Mich7034"
##
## $`9`
## [1] "Mich7056"
##
## $`10`
## [1] "Mich7061"
##
## $`11`
## [1] "Mich7008"
##
## $`12`
## [1] "Jag6"
##
## $`14`
## [1] "Pic_97146"
##
## $`15`
## [1] "Pic_97130"
##
## $`16`
## [1] "Tlax723"
##
## $`17`
## [1] "Tlax729"
##
## $`18`
## [1] "Tlax749"
##
## $`19`
## [1] "Pic_97318"
##
## $`20`
## [1] "Tlax737"
##
## $`21`
## [1] "Tlax762"
##
## $`22`
## [1] "Tlax753"
##
## $`23`
## [1] "Tlax710"
##
## $`24`
## [1] "Pic_97335"
##
## $`25`
## [1] "Pic_97008"
##
## $`28`
## [1] "Tlax703" "Tlax714"
##
## $`29`
## [1] "Tlax702" "Tlax706" "Tlax707" "Tlax726"
##
## $`31`
## [1] "Tlax701" "Tlax704" "Tlax709" "Tlax712"
##
## $`32`
## [1] "T50"
##
## $`33`
## [1] "Pic_97106"
##
## $`34`
## [1] "Mich7014"
##
## $`35`
## [1] "Mich7095"
##
## $`36`
## [1] "Mich7043"
##
## $`37`
## [1] "PA-5"
##
## $`38`
## [1] "Mich7047"
##
## $`39`
## [1] "Pic_97438"
##
## $`40`
## [1] "Mich7078"
##
## $`41`
## [1] "Mich7077" "Mich7084"
##
## $`42`
## [1] "Mich7019"
##
## $`43`
## [1] "G-14"
##
## $`45`
## [1] "J107" "J207" "J407" "J507"
##
## $`46`
## [1] "NC07-19_014_US-21" "NC07-19_040_US-21"
##
## $`47`
## [1] "Mich7041"
##
## $`48`
## [1] "Mich7059"
##
## $`50`
## [1] "US110028_US-11" "US110160_US-11"
##
## $`51`
## [1] "Mich7012"
##
## $`52`
## [1] "Mich7068" "Mich7071" "Mich7075" "Mich7082"
##
## $`53`
## [1] "Mich7055" "Zac-20"
##
## $`54`
## [1] "Mich7045"
##
## $`56`
## [1] "Mich7002" "Mich7020" "Mich7022" "Mich7023" "Mich7024" "Mich7027"
## [7] "Mich7033"
##
## $`57`
## [1] "US940494_US-12"
##
## $`58`
## [1] "Pic_97716"
##
## $`59`
## [1] "Mich7079" "Mich7094"
##
## $`60`
## [1] "Pic_97153"
##
## $`61`
## [1] "Tlax725"
##
## $`62`
## [1] "Mich7039"
##
## $`63`
## [1] "Tlax718"
##
## $`64`
## [1] "PUE-41"
##
## $`65`
## [1] "Pic_97136"
##
## $`66`
## [1] "Tlax731"
##
## $`67`
## [1] "J12"
##
## $`69`
## [1] "US110041_US-23" "US110047_US-23" "US110059_US-23"
##
## $`72`
## [1] "Tlax738"
##
## $`73`
## [1] "Tlax705"
##
## $`74`
## [1] "Pic_97785" "Pic_97791" "Pic_97793"
##
## $`75`
## [1] "Pic_97310"
##
## $`76`
## [1] "Tlax713"
##
## $`77`
## [1] "Pic_97340"
##
## $`78`
## [1] "Pic_97111"
##
## $`79`
## [1] "Pic_97727"
##
## $`80`
## [1] "Tlax761"
##
## $`81`
## [1] "Pic_97432"
##
## $`82`
## [1] "US110063_US-8"
##
## $`84`
## [1] "US100048_US-8"
##
## $`85`
## [1] "Pi-001-01_US-14"
##
## $`86`
## [1] "US467_US-8" "CA-10-1_US-8"
##
## $`87`
## [1] "US110021_US-24" "US110022_US-24" "US110153_US-24"
##
## $`88`
## [1] "Mich7007"
##
## $`89`
## [1] "Mich7072"
##
## $`90`
## [1] "Mich7057" "Mich7058"
##
## $`92`
## [1] "Mich7088"
##
## $`93`
## [1] "Mich7018"
##
## $`94`
## [1] "Mich7028"
##
## $`95`
## [1] "J11"
##
## $`96`
## [1] "Mich7006"
##
## $`97`
## [1] "Mich7030"
##
## $`98`
## [1] "Mich7029"
##
## $`99`
## [1] "Mich7089"
##
## $`100`
## [1] "10H" "11F" "11H" "12F" "12H" "13F" "13H" "14H" "1T" "2H" "3H"
## [12] "4H" "5H" "9H"
##
## $`103`
## [1] "IMK-213_US-20" "NC04-6_US-20"
##
## $`104`
## [1] "Mich7062" "Mich7063"
##
## $`105`
## [1] "J604"
##
## $`107`
## [1] "J204" "J707" "J807"
##
## $`108`
## [1] "Mich7035"
##
## $`109`
## [1] "Mich7081"
##
## $`110`
## [1] "PA-37"
##
## $`111`
## [1] "Pic_97187"
##
## $`112`
## [1] "Pic_97423"
##
## $`113`
## [1] "Tlax748"
##
## $`114`
## [1] "Tlax745"
##
## $`115`
## [1] "PUE-10"
##
## $`116`
## [1] "Mich7060"
##
## $`117`
## [1] "PUE-38"
##
## $`118`
## [1] "CH-17" "CH-25"
##
## $`119`
## [1] "Tlax739"
##
## $`120`
## [1] "PUE-16"
##
## $`121`
## [1] "Pic_97159"
##
## $`122`
## [1] "Tlax728"
##
## $`123`
## [1] "Pic_97149"
##
## $`124`
## [1] "Tlax722"
##
## $`125`
## [1] "PUE-3"
##
## $`126`
## [1] "CH-10"
##
## $`128`
## [1] "J104" "J404" "US970001_US-17"
##
## $`129`
## [1] "Tlax751"
##
## $`130`
## [1] "T81"
##
## $`131`
## [1] "Tlax747"
##
## $`132`
## [1] "Tlax744" "Tlax746" "Tlax760"
##
## $`135`
## [1] "Tlax759"
##
## $`136`
## [1] "Tlax715"
##
## $`137`
## [1] "Tlax708"
##
## $`138`
## [1] "Pic_97392"
##
## $`139`
## [1] "Tlax735"
##
## $`140`
## [1] "Tlax733" "Tlax736"
##
## $`141`
## [1] "Tlax741" "Tlax742"
##
## $`142`
## [1] "Tlax717" "Tlax719"
##
## $`143`
## [1] "Pic_97066"
##
## $`144`
## [1] "Tlax730"
##
## $`145`
## [1] "Pic_97724"
##
## $`146`
## [1] "Tlax743"
##
## $`147`
## [1] "Pic_97442"
##
## $`148`
## [1] "US110052_US-22" "US110070_US-22"
##
## $`150`
## [1] "Mich7038"
##
## $`151`
## [1] "Tlax716"
##
## $`153`
## [1] "Tlax755" "Tlax756" "Tlax758"
##
## $`154`
## [1] "Mich7054"
Note that because the lineage correction algorithm is based on genetic distance, we ran it on the full data set of 197 P. infestans isolates and then we manually picked one isolate from each collapsed lineages. We obtained a resulting lineage-corrected data set including 133 isolates. We used this lineage-corrected data set as an input file for the STRUCTURE analyis to infer the population structure.
In addition, please note that other lineage-corrected data sets used in this paper are subsets of the lineage-corrected data set of 133 isolates that we generated above. We refer to it as the full lineage-corrected data set in the statements below. Specifically: 1) We selected a total of 109 isolates collected from Michoacán, Tlaxcala and Toluca from the full lineage-corrected data set to produce a new data set and we used it as an input file to: (a) run AMOVA, (b) test for linkage disequilibrium and Hardy-Weinberg equilibrium, (c) infer gene flow (using software Migrate-n), and (d) calculate mean allelic richness (using software ADZE). 2) We selected a total of 96 diploid isolates collected from Michoacán, Tlaxcala and Toluca from the full lineage-corrected data set to produce a diploid data set and used it to: (a ) test for linkage disequilibrium and Hardy-Weinberg equilibrium and (b) obtain the global inbreeding coefficient (Fis) for these three diploid populations in Mexico. 3) We selected a total of 25 Toluca isolates from the full lineage-corrected data set to produce a new data set for analysis of linkage disequilibrium and Hardy-Weinberg equilibrium. 4) We selected a total of 22 diploid Toluca isolates from the full lineage-corrected data set to produce a new data set for analysis for linkage disequilibrium and Hardy-Weinberg equilibrium.
Session information
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.6 (El Capitan)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.14 poppr_2.2.1 pegas_0.9 ape_3.5
## [5] adegenet_2.0.1 ade4_1.7-4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.7 spdep_0.6-8 formatR_1.4 plyr_1.8.4
## [5] LearnBayes_2.15 tools_3.3.1 boot_1.3-18 digest_0.6.10
## [9] evaluate_0.10 tibble_1.2 nlme_3.1-128 gtable_0.2.0
## [13] lattice_0.20-34 mgcv_1.8-15 fastmatch_1.0-4 Matrix_1.2-7.1
## [17] igraph_1.0.1 shiny_0.14.2 DBI_0.5-1 yaml_2.1.13
## [21] parallel_3.3.1 coda_0.18-1 cluster_2.0.5 dplyr_0.5.0
## [25] stringr_1.1.0 gtools_3.5.0 rprojroot_1.1 grid_3.3.1
## [29] R6_2.2.0 phangorn_2.0.4 rmarkdown_1.2 sp_1.2-3
## [33] gdata_2.17.0 ggplot2_2.1.0 reshape2_1.4.2 seqinr_3.3-3
## [37] deldir_0.1-12 magrittr_1.5 nnls_1.4 gmodels_2.16.2
## [41] splines_3.3.1 backports_1.0.4 scales_0.4.0 htmltools_0.3.5
## [45] MASS_7.3-45 assertthat_0.1 permute_0.9-4 mime_0.5
## [49] colorspace_1.2-7 xtable_1.8-2 httpuv_1.3.3 quadprog_1.5-5
## [53] stringi_1.1.2 munsell_0.4.3 vegan_2.4-1