Data from: Multiple genotypes of Phelipanche ramosa indicate repeated introductions to the Americas: Sequence alignments and phylogenetic trees
Data files
Dec 10, 2024 version files 2.53 MB
-
cpDNA_raxml.tre
5.17 KB
-
cpDNA.fasta
2.06 MB
-
ITS_raxml.tre
4.74 KB
-
ITS.fasta
69.66 KB
-
nrDNA_raxml.tre
1.14 KB
-
nrDNA.fasta
319.62 KB
-
README.md
5.64 KB
-
Specimen_data.csv
5.90 KB
-
Specimen_data.xlsx
15.77 KB
-
trnLF_raxml.tre
2.80 KB
-
trnLF.fasta
42.27 KB
Dec 26, 2024 version files 2.53 MB
-
cpDNA_raxml.tre
5.17 KB
-
cpDNA.fasta
2.06 MB
-
ITS_raxml.tre
4.74 KB
-
ITS.fasta
69.66 KB
-
nrDNA_raxml.tre
1.14 KB
-
nrDNA.fasta
319.62 KB
-
README.md
6.93 KB
-
Specimen_data.csv
5.90 KB
-
Specimen_data.xlsx
15.77 KB
-
trnLF_raxml.tre
2.80 KB
-
trnLF.fasta
42.27 KB
Abstract
Premise: Phelipanche ramosa is an economically damaging parasitic plant that has been reported in North America since the late 1800s. While this species comprises a variety of genetically distinct host races in its native range, the genetic composition of adventive populations in the New World remains unexplored. On the basis of morphological and ecological variation, some have suggested that the closely related P. nana may also be present.
Methods: Genome skimming was used to assess the relationships of 30 populations of Phelipanche spanning the geographic and host ranges in North and South America, plus one P. nana reference population from Lebanon.
Results: Phylogenetic analysis indicated four distinct genetic groups, though plastome and nrDNA data supported conflicting signals of relationships among them. First, specimens from Chilean tomato fields were nearly indistinguishable genetically from the reference P. nana. Second, a pair of samples from Virginia showed similar nrDNA as the first group, but divergent plastomes. The remaining 24 samples sorted into two groups, one which parasitizes cultivated plants, especially tomato, and the other on roadside weeds in different parts of the United States.
Conclusions: The geographic and ecological cohesiveness of four distinct genetic groups supports a hypothesis of multiple introductions to the Americas, presumably from Eurasia, followed by little to no subsequent gene flow among them. However, such groups do not align with existing morphological or ecological species concepts for P. ramosa and P. nana. In practice, threat assessment of Phelipanche populations to agricultural settings should be evaluated regionally given the phylogeographic and ecological heterogeneity.
README: Data from: Multiple genotypes of Phelipanche ramosa indicate repeated introductions to the Americas
https://doi.org/10.5061/dryad.cvdncjtdg
Description of the data and file structure
Thirty-one specimens of Phelipanche were collected across the known range of the species. For 26 of these, dried flowers or an inflorescence tip from single individuals were ground, and DNA extracted using a DNeasy Plant Pro kit. DNA extractions of the remaining five, all from Texas, were shared by Chris Randle (Sam Houston State University, Texas, USA). Genomic DNA was enzymatically fragmented and sequenced using 2 x 150 paired end Illumina sequencing.
From each set of reads, the nuclear ribosomal repeat and plastome were assembled de novo using GetOrganelle v.1.7.4.1. When multiple contigs were returned instead of one complete sequence, these were joined by comparison with congeneric reference sequences. Multiple sequence alignments were generated using MAFFT v.7.505, and phylogenetic analysis of the respective nrDNA and plastome alignments were performed using RAXML-NG v.1.2.2.
ITS and trnLF alignments were generated by extracting the respective regions from the nrDNA and plastome alignments and adding existing sequences from three sources: the curated alignments of Piwowarczyk et al. (2021), 22 vouchered samples from cultivated tomato fields in California (GenBank # OR690545–OR690569; voucher specimens at California Department of Food andAgriculture herbarium), and 6 specimens from tobacco fields in Bulgaria (GenBank # MK024283–MK024290). Phylogenetic trees were generated as above.
Files and variables
File: nrDNA_raxml.tre
Description: Maximum likelihood phylogeny (Newick format) of 30 samples of Phelipanche populations in the Americas and one reference P. nana from Lebanon with bootstrap support (%) indicated at nodes. Inferred using the nuclear ribosomal tandem repeat.
File: nrDNA.fasta
Description: Multiple sequence alignment of the nuclear ribosomal tandem repeat of 30 samples of Phelipanche populations in the Americas and one reference P. nana from Lebanon.
File: ITS_raxml.tre
Description: Maximum likelihood phylogeny (Newick format) of 111 samples of Phelipanche spp. with bootstrap support (%) indicated at nodes. Inferred using the internal transcribed spacer (ITS) locus.
File: ITS.fasta
Description: Multiple sequence alignment of the nuclear ribosomal tandem repeat from 111 samples of Phelipanche spp.
File: trnLF_raxml.tre
Description: Maximum likelihood phylogeny (Newick format) of 56 samples of Phelipanche spp. with bootstrap support (%) indicated at nodes. Inferred using the plastid trnL–trnF spacer.
File: trnLF.fasta
Description: Multiple sequence alignment of the plastid trnL–trnF spacer from 56 samples of Phelipanche spp.
File: cpDNA_raxml.tre
Description: Maximum likelihood phylogeny (Newick format) of 33 samples of Phelipanche spp. with bootstrap support (%) indicated at nodes. Inferred using the plastid genome.
File: cpDNA.fasta
Description: Multiple sequence alignment of the nearly complete plastid genome from 33 samples of Phelipanche spp.: Thirty from the Americas plus reference samples of European P. nana, P. ramosa, and P. aegyptiaca.
File: Specimen_data.xlsx
Description: Specimen data, including voucher information, for the 31 specimens of Phelipanche spp. newly sequenced for this study. (NCBI BioProject PRJNA1176015)
Variables
- Tip Name – Sample name that appears in sequence alignments and phylogenetic trees.
- Library ID – The internal Schneider lab code for Illumina library preps. Corresponds to the sample names used on the NCBI Sequence Read Archive for this project (PRJNA1176015)
- cpDNA – NCBI GenBank accession number for the plastome sequence.
- nrDNA – NCBI GenBank accession number for the nuclear ribosomal sequence.
- collected_by – Individual(s) who collected the specimen, and collection number if assigned.
- collection_date – Date of specimen collection in day-month-year format.
- geo_loc_name – Geographic locality where specimen was collected.
- host – The host or putative hosts of the specimen or population that the specimen represents.
- specimen_voucher – The herbarium in which the specimen is currently stored. Provided in the format "institution-code:collection-code:accession or barcode number". Also see Table 2 of Schneider 2025.
- latitude – The latitude where the specimen was collected
- longitude – The longitude where the specimen was collected
- BioSample – The NCBI BioSample record assigned to this specimen
- Illumina_read_archive – The NCBI sequence read archive experiment record, from which the raw sequence reads can be accessed and downloaded.
- Total Illumina Reads – The total number of paried 150bp reads generated for the sample.
File: Specimen_data.csv
Description: Same as above, in csv format.
File: rev_Fig1_map.pdf
Description: Figure 1: Georeferenced records of Phelipanche spp. in the United States, Cuba, and Chile (blue circles and triangles) with newly sequenced samples indicated by red diamonds. Data from GBIF (2024) with four iNaturalist and herbarium records from Chile manually added. Conflicting species determinations in the southern and eastern United States reflect controversy in identification rather than the presence of two co-occurring species.
File: rev_Fig2_Phylogeny.pdf
Description: Figure 2: Nuclear ribosomal repeat (nrDNA) and plastome phylogenies of 31 samples of Phelipanche ramosa from the United States and Chile and a reference sample of P. nana from Lebanon. Tip labels indicate collector and locality, with voucher information available in Table 2. Bootstrap support of selected nodes shown. Inset: Phylograms comparing samples shown in the main nrDNA and plastome panels (black tip labels) with a curated selection of previously published Phelipanche spp. (gray labels showing GenBank accession and taxon). Rooting follows Piwowarczyk et al. (2021). AL, Alabama; CA, California; LA, Louisiana; TX, Texas; VA, Virginia.
Access information
Preprint of the accepted article available on Zenodo: (doi: 10.5281/zenodo.14556553)
Other publicly accessible locations of the data:
- Raw Illumina reads are archived in the NCB Sequence Read Archive (BioProject PRJNA1176015)
- Assembled nrDNA and plastome sequences are available on GenBank (PQ611155–PQ611185 and PQ479106–PQ479136)
Methods
Thirty-one specimens of Phelipanche were collected across the known range of the species. For 26 of these, dried flowers or an inflorescence tip from single individuals were ground, and DNA extracted using a DNeasy Plant Pro kit. DNA extractions of the remaining five, all from Texas, were shared by Chris Randle (Sam Houston State University, Texas, USA). Genomic DNA was enzymatically fragmented and sequenced using 2 x 150 paired end Illumina sequencing.
From each set of reads, the nuclear ribosomal repeat and plastome were assembled de novo using GetOrganelle v.1.7.4.1. When multiple contigs were returned instead of one complete sequence, these were joined by comparison with congeneric reference sequences. Multiple sequence alignments were generated using MAFFT v.7.505, and phylogenetic analysis of the respective nrDNA and plastome alignments were performed using RAXML-NG v.1.2.2.
ITS and trnLF alignments were generated by extracting the respective regions from the nrDNA and plastome alignments and adding existing sequences from three sources: the curated alignments of Piwowarczyk et al. (2021), 22 vouchered samples from cultivated tomato fields in California (GenBank # OR690545–OR690569; voucher specimens at CDA), and 6 specimens from tobacco fields in Bulgaria (GenBank # MK024283–MK024290). Phylogenetic trees were generated as above.