Revisiting the origin of octoploid strawberry
Liston, Aaron et al. (2019), Revisiting the origin of octoploid strawberry, Dryad, Dataset, https://doi.org/10.5061/dryad.ncjsxksqj
The cultivated strawberry (Fragaria ×ananassa) is an octoploid, and the identity of its four subgenomes has long been a mystery. In their recent strawberry genome publication, Edger et al. present a novel hypothesis: each subgenome originated from a different extant diploid progenitor, and the hexaploid species Fragaria moschata was a direct ancestor. We reanalyzed the four octoploid subgenomes in a phylogenomic context and our results support only two extant diploids progenitors; we also found no support for F. moschata as a direct ancestor. We identified assumptions in the Edger et al. tree-searching algorithm that prevent it from accepting extinct or unsampled progenitors, and we argue that this is a critical weakness of their approach.
Phylogenomic Analysis of the Octoploid Subgenomes
Illumina sequencing libraries were prepared from leaf tissue of six diploid Fragaria species, and 9-32X genomic coverage was obtained (Supplementary Table 1). Fragaria vesca subsp. bracteata, F. iinumae and F. nipponica were sequenced at the Oregon State University Center for Genome Research and Biocomputing Central Services Lab with 100 bp single ends on a HiSeq 3000. Fragaria mandshurica, F. nilgerrensis and F. nipponica were sequenced at Berry Genomics (Hangzhou, China) with 150 bp paired ends on a HiSeq 2500. We removed adapters and low-quality portions of reads using Trimmomatic (v 0.35)1 and settings LEADING:20, TRAILING:20, SLIDINGWINDOW:5:20, MINLEN:50. The genome assemblies of octoploid Fragaria ananassa ‘Camarosa’, diploid F. bucharica (published as F. nubicola)2, and diploid outgroup Potentilla micrantha3 were downloaded from the Genome Database for Rosaceae (https://www.rosaceae.org/). The former was subdivided by the Edger et al.4 subgenome assignment into four sets of seven chromosomes. These assemblies were converted to 20X genomic coverage of random 100 bp sequences using BBTools randomreads.sh (https://jgi.doe.gov/data-and-tools/bbtools/). The above sets of sequence reads were aligned with BWA (v 0.7.12)5 to the Fragaria vesca v 4.1 genome assembly6 after masking repetitive regions of the genome. This was done with BBTools bbmask.sh (https://jgi.doe.gov/data-and-tools/bbtools/) masking sequences that align to the F. vesca v 4.1 transposable element library downloaded from the Genome Database for Rosaceae (https://www.rosaceae.org/). In total, 4.5% of the genome was masked. This is lower than the 31% transposable element (TE) content reported in supplemental Table 4 of the F. vesca genome publication6, but similar to the value inferred from Figure 1 in the same publication. We suggest this discrepancy is due to the inclusion of partial and degenerate TEs in the calculation used in the table, while the figure more accurately represents the fraction of non-unique sequence in the F. vesca reference genome.
The twelve resulting alignments were converted to a variant call format (vcf) file with SAMtools (v 1.9)7 with the default settings of the mpileup and call options. The vcf file was converted into a multisample variant format (mvf) file using MVFtools (v 0.5.1.4)8. Sites with mapping quality below 20 and coverage below 3 were excluded. All heterozygous sites were converted to N, to account for the fact that the octoploid subgenome and outgroup sequences were derived from haploid genome assemblies. MVFtools was used to automate maximum-likelihood (ML) estimates of phylogeny using RAxML (v 8.2.12)9 with the GTR+Γ model of sequence evolution and 100 bootstrap replicates. A taxon was excluded from the analysis of a 100 kb window if it had <10% of aligned sites. Analyses were conducted for the seven base chromosomes (Fig. 1b, Supplementary Fig. 1) and for 2191 non-overlapping windows of 100 kb across the seven chromosomes (Figs. 1c, 2, Supplementary Table 4). The phylogenetic position of each subgenome relative to diploid species was recorded (see Fig. 2 for details) and summarized for each base chromosome (Supplementary Table 5). Homeologous exchange was inferred when the ‘Camarosa vesca’ subgenome shared a MRCA with F. iinumae or when the other three subgenomes shared a MRCA with the F. vesca clade (F. vesca, F. mandshurica, F. bucharica).
Fragaria moschata Linkage Mapping and Phylogenetic Inference
We used an F1 cross for linkage mapping of the hexaploid F. moschata. Our F1 mapping population was derived from two parental plants collected from Slovenia (46.6827°N, 16.2951°W). Following our previous protocols10, seeds of the experimental cross (N = 192) were planted in a custom soil mixture (2 : 1, Fafard 4 : sand top-dressed with Sunshine Redi-earth Plug & Seedling; Sun Gro Horticulture, Agawam, MA, USA), and grown under 16 °C /21 °C night/day temperatures and a 14-h photoperiod in a growth chamber at the University of Pittsburgh for 11 wk. We selected a random subset (N = 46) of the F1 progeny for targeted sequence capture.
Targeted sequence capture was performed using previously developed Fragaria baits (v 2.0)10,11. These 20,000 capture baits of 100 bp each are relatively randomly distributed across the seven base chromosomes (1–7). DNA was isolated from silica-dried leaf tissue of the 46 progeny and two parents at Ag-Biotech (Monterey, CA). We constructed individually indexed genomic libraries using the NEBNext Ultra DNA Library Prep Kit (New England BioLabs, Ipswich, MA, USA), which were then target enriched10 and sequenced using a 1/3 lane of 150 bp paired ends on a HiSeq 3000 at the Oregon State University Center for Genome Research and Biocomputing Central Services Lab.
The hexaploid linkage mapping involved four steps: quality filtering of paired-end capture reads, mapping reads to the above-mentioned diploid F. vesca v 4.1 genome assembly, genotype calling in polyploids using POLiMAPS (v 1.1)12, and linkage mapping using OneMap13. First, we removed adapters and low-quality portions of paired-end reads using Trimmomatic (v 0.35)1 as described above, and merged the paired-end reads using PEAR (v 0.9.6)14 with a minimum overlap size of 20 bp. Second, we mapped both the merged and un-merged paired-end reads to the F. vesca v 4.1 reference using BWA (v 0.7.12)5. The sorted BAM files were generated with SAMtools (v 1.9)7 and then were used to create the mpileup file. Third, with the mpileup file, we conducted polyploid genotype calling using POLiMAPS (v 1.1)12, in which heterozygous and homozygous loci were identified with the default parameters, except for the depth of ≥32X per progeny for the hexaploid. Lastly, to construct maternal and paternal linkage groups (LGs), we assigned SNPs to the most likely LGs based on a logarithm of odds (LOD) threshold of 5 using OneMap13 in R (v 3.3.3)15. LGs with at least 20 SNPs were used for subsequent analyses.
To infer the phylogenetic placement of F. moschata in relation to the octoploid strawberry and other diploid Fragaria, we first extracted the quality-filtered reads that contained the above LG SNPs, and mapped these reads to the F. vesca v 4.1 reference to generate consensus LG sequences using POLiMAPS (v 1.1). We next used POLiMAPS (v 1.1) and the variant call format file described in the above Phylogenomic Analysis of Octoploid Subgenomes section to generate multiple sequence alignments among F. moschata LG sequences, the four octoploid reference subgenomes, diploid Fragaria genomes and the outgroup Potentilla. Phylogenetic inference was conducted for each of chromosomes 1–7 and for maternal and paternal LGs separately, using the ML method with the GTR+Γ model and 100 bootstrap replicates in RAxML (v 8.0.26)9.
Alignment files (phylip format) and resulting phylogenetic trees (nexus format).
A. Analysis of Fragaria moschata subgenomes:
B. Analysis of Fragaria x ananassa 'Camarosa' subgenomes. The nexus trees with branch length and bootstrap values are in column 4 of the tree-results.txt files. See the first line header for other fields.
Key to codes used in octoploid subgenome alignment files:
National Science Foundation, Award: DEB 1241217
National Science Foundation, Award: DEB 1241006