Data from: Chloroplast genomes of six Colocasia species (Araceae) including taro
Data files
Jul 22, 2024 version files 4.47 MB
-
README.md
-
Set1_CDS_x19samples_6spp.phy.fasta
-
Set2_Complete_x18samples_5spp.phy.fasta
-
Set3_Intergenic_x16samples_4spp.phy.fasta
Jul 22, 2024 version files 4.47 MB
-
README.md
-
Set1_CDS_x19samples_6spp.phy.fasta
-
Set2_Complete_x18samples_5spp.phy.fasta
-
Set3_Intergenic_x16samples_4spp.phy.fasta
Abstract
Chloroplast genome diversity in taro (Colocasia esculenta) and one other Colocasia species (C. formosana) was previously analysed using concatenated sequences produced by sequencing six polymorphic loci. Here we present complete chloroplast genome sequences for 19 samples (17 new) from six species of Colocasia (Araceae): C. esculenta, C. lihengieae, C. formosana, C. spongifolia, C. oresbia and C. fallax. Three different alignments of these sequences were prepared for a report entitled "Chloroplast capture and range extension after hybridization in taro (Colocasia esculenta)" (Matthews et al. 2024 in press). The alignments show that C. fallax is a distant outgroup and C. oresbia a near outgroup for C. esculenta (including wild and cultivated forms) and a group of closely related wild species (C. lihengiae, C. formosana, C. spongifolia).
The alignments may be useful for future studies involving (1) further populations of each species in the present study, and (2) Colocasia species not represented here.
This work was mainly carried out as part of a project entitled "Mapping Genetic Diversity in Taro to Test Domestication Theories" an international research project supported by the Japan Society for the Promotion of Science (JSPS Kakenhi No. 17H04614, awarded to P. J. Matthews, with funding for the period 1st April 2017 -– 31st March 2021. Other funding sources for related work carried out before, during and after this period are also noted here.
README: Chloroplast genomes from six Colocasia species (Araceae) including taro
https://doi.org/10.5061/dryad.3n5tb2rqk
Complete chloroplast genome sequences (161,252 bp to 162,644 bp) were obtained for 19 samples (17 new) from six wild Colocasia species, C. lihengieae (CLI), C. formosana (CFO), C. spongifolia (CSF), C. oresbia (COR), C. fallax (CFA), and from wild and cultivated forms of C. esculenta (CES, taro). Three sets of alignments prepared from these sequences are reported here. Each set includes two edited sequences based on previously reported, complete chloroplast sequences from New Zealand (Genbank numbers JN105689, JN105690; Ahmed et al. 2012). The main article supported by the present alignments provides taxonomic information, sample history, photographs of sampled plants, populations and habitats, and interpretation: Matthews et al. (2024 in press) "Chloroplast capture and range extension after hybridization in taro (Colocasia esculenta)". Ecology and Evolution (further details pending).
Description of the data and file structure
Three sets of data are provided here. For each set, the Inverted Repeat IRa was removed before analysis, as explained in the main text. Gaps in the alignments were removed during each analysis reported in the main paper.
Set1_CDS_x19samples_6spp.phy.fasta
Protein coding sequences (CDS) from all six species and 19 samples, and placed in a single concatenated sequence.
Set2_Complete_x18samples_5spp.phy.fasta
Complete genome sequences from five species and 18 samples (not including far-outgroup C. fallax).
Set3_Intergenic_x16samples_4spp.phy.fasta
Intergenic sequences from four species and 16 samples (not including C. fallax and near-outgroup C. oresbia).
Sample labels used in the fasta files are explained below and in the table below, which is based on the full sample list provided in the main article. Complete and partial chloroplast sequences for samples CESNZ03 (var. GP, line 1) and CESNZ02 (var. RR, line 6) were previously reported by Ahmed et al. (2012, 2020, references below).
The sample labels use an alphanumeric identity code representing (1) species name initials, (2) country name, and then a number for each species-country, e.g. CESNZ03 = "Colocasia esculenta (CES), New Zealand (NZ), and third sample reported (03)". Note that the samples CESNZ01 and CESNZ02 exist but were not used to generate the present data set; they were reported in Ahmed et al. (2020). AG refers to the laboratory of Alphagenomics Co. Ltd, Pakistan, where all sequences were assembled from the original Illumina reads.
References
Ahmed, I., P. J. Biggs, P. J. Matthews, L. J. Collins, M. D. Hendy and P. J. Lockhart (2012). Mutational dynamics of aroid chloroplast genomes. Genome Biology and Evolution 4 (12): 1316–1323.
Ahmed, I., P. J. Lockhart, E. M. G. Agoo, K. W. Naing, D. V. Nguyen, D. K. Medhi and P. J. Matthews (2020). Evolutionary origins of taro (Colocasia esculenta) in Southeast Asia. Ecology and Evolution 10: 13530–13543.
Table: Full sample list (19 samples, six Colocasia species). PJM = P. J. Matthews (collection notes)
Line no. | Sample label, Genbank registration number, and assembled genome size | Species name (Colocasia spp.) | AG lab label | NME museum label (list = sample ID sent to laboratory for sequencing) | **Collection notes ** |
---|---|---|---|---|---|
1 | CESNZ03, JN105689, 162,424 bp (Ahmed et al. 2012, 2020) | C. esculenta, var. GP | na | na | Campus garden* (ex horto),* University of Auckland. Coll. I. Ahmed, 25th June 2008 (IA004, Massey University herbarium MU004). |
2 | CESBD01, PP811809, 162,424 bp | C. esculenta | WP396 | WP396, list BD02 | Wild, commensal, on upper bank of Old Brahmaputra river, Mymensingh, Bangladesh. Coll. PJM & M. A. Hossain, 10th Feb. 2019. |
3 | CESBD02, PP817734, 162,463 bp | C. esculenta | WP475 | WP475, list BD20 | Wild, commensal, Sylhet, Bangladesh. Coll. PJM & M. A. Hossain, 13th Feb. 2019. |
4 | CESBD03, PP817735, 162,386 bp | C. esculenta | WP501 | WP501, list BD26 | Wild, commensal, form A, “shak kasu/kochu” (leaf taro), Sylhet, Bangladesh. Coll. PJM & M. A. Hossain, 14th Feb. 2019. |
5 | CESJP30, PP817736, 162,376 bp | C. esculenta | CV_CP | cv ta-imo | Cultivar ex pond field, vic. Kamkatetsu-Keraji, Kikai island, Amami, Japan. Coll. PJM & E. Takei, 14th March 2017. |
6 | CESNZ02, JN105690, 162,546 bp (Ahmed et al. 2012, 2020) | C. esculenta var. RR | na | na | Campus garden (ex horto), University of Auckland, Auckland. Coll. I. Ahmed, 25th June 2008 (IA003, Massey University herbarium MU003) |
7 | CESPK08, PP817737, 162,478 bp | C. esculenta | TLP_CP | na | Cultivated, ex market, Islamabad, Pakistan. Coll. I. Ahmed (2020). |
8 | CESVN15, PP817740, 162,641 bp | C. esculenta | WP734 | WP734, list VN09 | Wild, commensal, in tall grass on canal bank; on settled, rural delta island between main Mekong river course and Song Hau, Mekong, Vietnam. Coll. PJM & Nguyen V. D., 21st Sept. 2017. |
9 | CESTH06, PP817738, 162,644 bp | C. esculenta | WP509 | WP509 | Wild, Khlong Prapa, Bangkok, Thailand. Coll. PJM & D. Sookchaloem 18th Feb. 2019. |
10 | CESTH07, PP817739, 162,644 bp | C. esculenta | WP516 | WP516 | Wild, Ko Kret Island, Chao Praya, Bangkok, Thailand. Coll. PJM & D. Sookchaloem 18th Feb. 2019 |
11 | CESAU24, PP475493, 162,584 bp | C. esculenta | L1 | Extract tube L1.3 (= field sub-location number) | Wild, Hopevale, Queensland, Australia. Coll. PJM & K. Thiele, 26th Sept., 1987. |
12 | CLIVN04, PP817743, 162,557 bp | C. lihengiae | 93_CP | WP 933B, list VN93 | Wild, commensal; Xuan Son Guest House, Phu Tho prov. , Vietnam. Coll. PJM & Nguyen V. D., 4th Oct. 2017. |
13 | CLIVN05, PP817744, 162,640 bp | C. lihengiae | 107_CP | WP 944A, list VN107 | Wild, Tukuk Commune, Dang Son district, Vietnam. Coll. PJM & Nguyen V. D., 4th Oct. 2017. |
14 | CxVN01, PP817748, 162,641 bp | C. lihengiae x | 114_CP | WP964, list VN114 | Wild, roadside, Yen Bai/Phu Tho border. Coll. PJM & Nguyen V. D., 5th Oct., 2017. Noted in field as possible hybrid between C. lihengiae and C. menglaensis. |
15 | CSFVN01, PP817747, 162,388 bp | C. spongifolia Matthews, Nguyen, Fang & Long (Matthews et al. 2022) | CSP_CP | WP251 | Sample from seedling grown ex situ; seed ex wild plant, Bach Ma National Park, Vietnam. Coll. PJM and Nguyen V. D., 22nd Sept. 2018. |
16 | CFOTW03, PP817742, 162,158 bp | C. formosana Hayata (Hayata 1919, Hsu et al. 2000 | CFO_CP | WP108 | Seedling grown ex situ; seed ex wild plant, Wutai district, Pingtung County, Taiwan. Coll. PJM and K.-C. Tsai, 1st Sept. 2014 |
17 | CORMY01, PP817745, 161,973 bp | C. oresbia A. Hay (1996) | JJ04 | JJ04 | Wild, Jalan Tambunan-Kota Kinabalu, Tambunan, Sabah, Malaysia. Coll. J. Joling 22nd Dec. 2020. |
18 | CORMY02, PP817746, 161,947 bp | C. oresbia A. Hay (1996) | JJ07 | JJ07 | Wild, Mahua waterfall near entrance, Tambunan, Sabah, Malaysia, Coll. J. Joling 22nd Dec. 2020. |
19 | CFABD01, PP817741, 161,252 bp | C. fallax Schott (Deva and Naithani 1985, Ara 2000) | WP466 | WP466A, list BD13 | Wild along stream bank below waterfall, Madhabkunda, Bangladesh. Coll. PJM & M. A. Hossain, 13th Feb. 2019. |
Methods
Complete chloroplast genome sequences were obtained from 17 field samples representing six species of Colocasia (Araceae) The sequences were assembled and then aligned together with two previously-published sequences from taro (C. esculenta). The 17 new samples were sequenced using Illumina PE 150 run (Genwiz Life Sciences, China). Bioinformatic analyses, including sequencing-data quality checks, genome assembly, annotations, circularization and data curation, were done as reported previously. The previously-published taro sequences were from New Zealand (var. GP, triploid, JN105689 and var. RR, triploid, JN105690).
Using MAFFT in Geneious, sequence sets (see below) were aligned, and one copy of the inverted repeat (IRa) and all gaps (indels) were removed. Three different alignments were made for the sequences from (1) 19 sample, six species, (2) 18 samples, five species, and (3) 16 samples, four species. The alignments were used to prepare Maxium Likelihood (ML) trees shown in the full article. Individual complete genome sequences have been deposited in Genbank.