Background: The majority of the subspecies of Daucus carota have not yet been discriminated clearly by various molecular or morphological methods and hence their phylogeny and classification remains unresolved. Recent studies using 94 nuclear orthologs and morphological characters, and studies employing other molecular approaches were unable to distinguish clearly many of the subspecies. Fertile intercrosses among traditionally recognized subspecies are well documented. We here explore the utility of single nucleotide polymorphisms (SNPs) generated by genotyping-by-sequencing (GBS) to serve as an effective molecular method to discriminate the subspecies of the D. carota complex. Results: We used GBS to obtain SNPs covering all nine Daucus carota chromosomes from 162 accessions of Daucus and two related genera. To study Daucus phylogeny, we scored a total of 10,814 or 38,920 SNPs with a maximum of 10 or 30 % missing data, respectively. To investigate the subspecies of D. carota, we employed two data sets including 150 accessions: (i) rate of missing data 10 % with a total of 18,565 SNPs, and (ii) rate of missing data 30 %, totaling 43,713 SNPs. Consistent with prior results, the topology of both data sets separated species with 2n = 18 chromosome from all other species. Our results place all cultivated carrots (D. carota subsp. sativus) in a single clade. The wild members of D. carota from central Asia were on a clade with eastern members of subsp. sativus. The other subspecies of D. carota were in four clades associated with geographic groups: (1) the Balkan Peninsula and the Middle East, (2) North America and Europe, (3) North Africa exclusive of Morocco, and (4) the Iberian Peninsula and Morocco. Daucus carota subsp. maximus was discriminated, but neither it, nor subsp. gummifer (defined in a broad sense) are monophyletic. Conclusions: Our study suggests that (1) the morphotypes identified as D. carota subspecies gummifer (as currently broadly circumscribed), all confined to areas near the Atlantic Ocean and the western Mediterranean Sea, have separate origins from sympatric members of other subspecies of D. carota, (2) D. carota subsp. maximus, on two clades with some accessions of subsp. carota, can be distinguished from each other but only with poor morphological support, (3) D. carota subsp. capillifolius, well distinguished morphologically, is an apospecies relative to North African populations of D. carota subsp. carota, (4) the eastern cultivated carrots have origins closer to wild carrots from central Asia than to western cultivated carrots, and (5) large SNP data sets are suitable for species-level phylogenetic studies in Daucus.
Additional file 1 Table S1
Additional file 1: Table S1. The 162 accessions of Daucus, and two accessions
of related genera characterized in this study, improvement status, locality
information and new identification.
Additional file 2 Table S2
Additional file 2: Table S2. Summary of processing GBS data of Daucus.
Additional file 3 Figure S1
Additional file 3: Figure S1. Phylogenomics of Daucus from a maximum
likelihood analysis using 164 accessions and 10,814 SNPs (10% missing imputed
genotypes) obtained by GBS. Numbers above branches represent bootstrap values,
with only values higher than 70% shown. Names given to clades refer to the
geographic origin and improvement status of the accessions of D. carota
complex. Clades A and B corresponds to the two main groups of the Daucus
phylogeny.
Additional file 4 Figure S2
Additional file 4: Figure S2. Phylogenomics of Daucus from a maximum
likelihood analysis using 164 accessions and 38,920 SNPs (30% missing imputed
genotypes) obtained by GBS. Numbers above branches represent bootstrap values,
with only values higher than 70% shown. Names given to clades refer to the
geographic origin and improvement status of the accessions of D. carota
complex. Clades A and B corresponds to the two main groups of the Daucus
phylogeny.
Additional file 5 Figure S3
Additional file 5: Figure S3. Maximum likelihood reconstruction and structure
of the genetic diversity of 144 accessions of Daucus carota complex and
outgroup using 43,713 SNPs (30% missing imputed genotypes) obtained by GBS.
Each accession is represented by a horizontal bar, and each color corresponds
to a population (nine in total). Numbers above branches represent bootstrap
values, with only values higher than 70% shown. Names given to clades refer
to the geographic origin and improvement status of the accessions of D.
carota complex. The outgroup taxon is D. syrticus.
Additional file 6 Figure S4
Additional file 6: Figure S4. Bayesian phylogenetic tree of 144 accessions of
Daucus carota complex and outgroup using 18,565 SNPs (10% missing imputed
genotypes) obtained by GBS. Numbers above the branches represent posterior
probabilities, with only values higher than 0.7 shown. Names given to clades
refer to the geographic origin and improvement status of the accessions of
the D. carota complex. The outgroup taxon is D. syrticus.
Additional file 7 Figure S5
Additional file 7: Figure S5. Relationships among 144 accessions of Daucus
carota complex and outgroup from an exhaustive quartet sampling inference
using 18,565 SNPs (10% missing imputed genotypes) obtained by GBS. Numbers
above the branches represent bootstrap values, with only values higher than
70% shown. Names given to clades refer to the geographic origin and
improvement status of the accessions of D. carota complex. ME & E refers to Middle East & Europe. Accessions designated by double stars are misplaced
relative to the maximum likelihood topology of Daucus carota complex using
the same number of SNPs. The outgroup taxon is D. syrticus.
Additional file 8 Figure S6
Additional file 8: Figure S6. Relationships among 144 accessions of Daucus
carota complex and outgroup from an exhaustive quartet sampling inference
using 43,713 SNPs (30% missing imputed genotypes) obtained by GBS. Numbers
above branches represent bootstrap values, with only values higher than 70%
shown. Names given to clades refer to the geographic origin and improvement
status of the accessions of D. carota complex. ME & E refers to Middle East &
Europe. Accessions designated by double stars are misplaced relative to the
maximum likelihood topology of Daucus carota complex using the same number of
SNPs. The outgroup taxon is D. syrticus.
Additional file 9 Figure S7
Additional file 9: Figure S7. Species tree of the Daucus carota complex based
on a coalescent model using an exhaustive quartet sampling inference and
18,565 SNPs (10% missing imputed genotypes) obtained by GBS. Numbers above
the branches represent bootstrap values. The outgroup taxon is D. syrticus.
Additional file 10 Figure S8
Additional file 10: Figure S8. Number of populations. A. Plot of Delta K (ΔK).
B. Plot of the log likelihood; internal plot corresponds to the log
likelihood (thousands) for K ranging from 1 to 9. All values were obtained
from STRUCTURE HARVESTER analysis. Fourteen populations were considered in a
data set of 18,565 SNPs (10% missing imputed genotypes) and 150 samples.
Additional file 11 Figure S9
Additional file 11: Figure S9. Number of populations. A. Plot of Delta K (ΔK).
B. Plot of the log likelihood; internal plot corresponds to the log
likelihood (thousands) for K ranging from 1 to 9. All values were obtained
from STRUCTURE HARVESTER analysis. Fourteen populations were considered in a
data set of 43,713 SNPs (30% missing imputed genotypes) and 150 samples.
Additional file 12 Figure S10
Additional file 12: Figure S10. Box plot analyses of the 23 morphological
characters examined for members of Daucus carota complex (subsp. sativus not
included) in this study. The box plot displays individual plant values for
median, 25% and 75% percentile, range, and outliers.
SNP data set_Daucus_0.1 missing data_Arbizu et al
Data set containing 164 accessions with 10%
missing imputed genotypes.
SNP data set_Daucus_0.3 missing data_Arbizu et al
Data set containing 164 accessions with 30%
missing imputed genotypes.