Phylogenomics of bivalvia using ultraconserved elements (UCEs) reveal new topologies for Pteriomorphia and Imparidentia
Data files
Oct 03, 2024 version files 150.81 MB
-
02_phylogenetic_trees_Feb.zip
7.23 MB
-
03_alignments.zip
117.88 MB
-
04_mitochondrial_data.zip
17.01 MB
-
Baits_Bivalve_v1.fasta
1.36 MB
-
Baits_Bivalve_v2.fasta
1.36 MB
-
Bivalve_Pseudoreference.fasta
5.93 MB
-
Bivalvia_vall_25p_p.json
29.24 KB
-
leaf_name_update_2_tvBOT.xlsx
13.12 KB
-
README.md
5.53 KB
Abstract
Despite significant advances in phylogenetics over the past decades, the deep relationships within Bivalvia (phylum Mollusca) remain inconclusive. Previous efforts based on morphology or several genes have failed to resolve many key nodes in the phylogeny of Bivalvia. Advances have been made recently using transcriptome data, but the phylogenetic relationships within Bivalvia historically lacked consensus, especially within Pteriomorphia and Imparidentia. Here, we inferred the relationships of key lineages within Bivalvia using matrices generated from specifically designed ultraconserved elements (UCEs) with 16 available genomic resources and 85 newly sequenced specimens from 55 families. Our new probes (Bivalve UCE 2k v.1) for target sequencing captured an average of 849 UCEs with 1085-bp in mean length from in vitro experiments. Our results introduced novel schemes from six major clades (Protobranchina, Pteriomorphia, Palaeoheterodonta, Archiheterodonta, Anomalodesmata and Imparidentia), though some inner nodes were poorly resolved, such as paraphyletic Heterodonta in some topologies potentially due to insufficient taxon sampling. The resolution increased when analyzing specific matrices for Pteriomorphia and Imparidentia. We recovered three Pteriomorphia topologies different from previously published trees, with the strongest support for ((Ostreida + (Arcida + Mytilida)) + (Pectinida + (Limida + Pectinida))). Limida were nested within Pectinida, warranting further studies. For Imparidentia, our results strongly supported the new hypothesis of (Galeommatida + (Adapedonta + Cardiida)), while the possible non-monophyly of Lucinida was inferred but poorly supported. Overall, our results provide important insights into the phylogeny of Bivalvia and show that target enrichment sequencing of UCEs can be broadly applied to study both deep and shallow phylogenetic relationships.
https://doi.org/10.5061/dryad.m63xsj48g
This includes designed probes for Bivalvia (Bivalve UCE 2k v.1), data matrices for topology reconstruction, phylogenetic trees generated for this study, and scripts for manipulating data. These data contribute to the article.
Description of the data and file structure
This includes :
1. probes fasta file for Bivalve UCE 2k v.1, including two fasta files: Bait Bivalve_v1.fasta and Bait Bivalve_v2.fasta. These are baits sequences, which can be used to extract target locus from the assemblies using Phyluce.
#Baits_Bivalve_v1.fasta: 10000 baits designed based on molluscan genome data.
#Baits_Bivalve_v2.fasta: 10000 baits designed based on bivalve transcript data.
2. Topologies: 02_phylogenetic_trees_Feb.zip. This zipped document includes all phylogenetic trees generated for this study. The visualization of these tree files can use tvBOT online website (https://www.chiplot.online/tvbot.html) with the annotation files mentioned below. This zipped document includes four folders and two annotation files.
#for all different level datasets, we conducted analyses using the “combination/combine”, “v1”, “v2”, which means using all probes (v1+v2) captured loci, only v1 captured loci, and only v2 captured loci. Besides, we generated five taxon data occupancy for all levels, including 25%, 35%, 50%, 60%, and 75%. For each taxon data occupancy matrix, we performed the phylogenetic analysis with partitions by locus (“02locus”) or without partitions (“01one”).
#subfolder 01_in_vitro_experiments: Subfolders (combination, v1, v2) contain phylogenetic trees based on only 85 target-captured data.
#subfolder 02Bivalvia: for *Matrix-1 *related matrices. Subfolders contain phylogenetic tree files based on the Maximum Likelihood (ML) method. In the “Bivalvia” level, we also tested using the third approach– SWSC to assign partitions and perform phylogenetic analyses accordingly (“03swsc”). Besides four subfolders, there is one summary tree (“astral4t.tre”) and one Bayesian Inference (BI) tree (“phylobayes_matrix1p_bpcomp.con.tre”).
#subfolder 03_Pteriomorphia: phylogenetic tree files for Matrix-2 related matrices based on the ML method.
#subfolder 04_Imparidentia: phylogenetic tree files for Matrix-3 related matrices based on the ML method.
3. Alignments: 03_alignments.zip. This zipped document includes all data matrices captured using Bait Bivalve_v1.fasta, Bait Bivalve_v2.fasta, or their combination with five taxon data occupancy rate (25%, 35%, 50%, 60%, 75%).
#subfolder 01_01_in_vitro_experiments: matrices based on 85 target-captured data.
#subfolder 02_Bivalvia: *Matrix-1 *related matrices based on 109 data.
##subfolder mcmc_v1_25p: v1_25p_2.phy is the alignment for divergence time analysis.
##subfoler phylobayes: v1_25p_kpi_smg.phy is the alignment for BI analysis.
##subfolder SWSC_dataset_original: The data matrices with SWSC partitions.
##other subfolders: “combine_1985”, “v1_1367”, “v2_618” is for the combination, v1, and v2 captured loci, separately.
#subfolder 03_Pteriomorphia: Matrix-2 related matrices.
#subfolder 04_Imparidentia: *Matrix-3 *related matrices.
4. Mitochondrial data: 04_mitochondrial_data.zip. This zipped document includes available mitochondrial genome data (complete or incomplete) used in this study.
#subfolder 01_annotation files: contain annotation files for mitochondrial genomes (complete or incomplete).
#subfolder 02_alignment: contain alignments applied to reconstruct phylogenetic relationships based on 13 PCGs from mitochondrial genomes.
#subfolder 03_trees: contain phylogenetic trees based on the concatenated 13 PCGs.
#subfolder 04_reference_Data: published bivalve mitochondrial genomes (removed duplicate species) used for phylogenetic analysis and their taxonomy information.
5. Tree annotation file: leaf_name_update_2_tvBOT.xlsx and Bivalvia_vall_25p_p.json.
6. consensus sequences for genetic distance analysis: Bivalve_Pseudoreference.fasta
7. scripts: 05_scripts.zip. This zipped documents include scripts used in this study to calculate genetic distance and sensitivity test of alignments.
8. Supplementary files: 2-Appendix_1_Aug.docx, 3-Appendix_2_Aug.docx, 4-Appendix_3-Aug.xlsx. These three files contains Supplementary Note, Figures and Tables cited in the main text.
@How to visualize topologies online?
- Use tvBOT (https://www.chiplot.online/tvbot.html) to visualize topologies online.
- Method 1 to visualize tree with annotation: import tree file and annotation sheet file (leaf_name_update_2_tvBOT.xlsx) to check trees. The blank cell in the leaf_name_update_2_tvBOT.xlsx means no italic style is needed for the specimen name (“New_name”), please keep it as blank.
- Method 2 to visualize tree with annotation: The Bivalvia_vall_25p_p.json file can be directly uploaded to tvBOT (embedded with name and group annotation) as a template, and then can import different tree files to check their topologies.
We designed a new set of probes, Bivalve UCE 2k v.1, targeting ~2000 loci for bivalve species. Based on this probe set, we collected UCEs of 85 specimens from their target-captured Illumina sequencing data. The data matrices of this study were generated according to the probes (v1, v2, or all probes) and taxon data occupancy rate (25%, 35%, 50%, 60%, 75%) using Phyluce v.1.7.1. These data matrices were processed to conduct phylogenetic analyses. The methods were detailed in the main text and supplementary documents.