Data from: phylogenomics of the Neogastropoda: the backbone hidden in the bush
Data files
Mar 19, 2024 version files 1.85 GB
-
Backbone_phylogeny_of_the_Neogastropoda_2023_Supp_acknowledgments.docx
21.75 KB
-
Backbone_phylogeny_of_the_Neogastropoda_2023_Supp_material.docx
36.41 KB
-
NEOGASTROPODA_Table_S1.xlsx
33.18 KB
-
NEOGASTROPODA_Table_S2.xlsx
10.05 KB
-
NEOGASTROPODA_Table_S3.xlsx
9.93 KB
-
README.md
9.44 KB
-
SData1_Orthogroups.GeneCount.tsv
1.92 MB
-
SData10_Custom_python_scripts.zip
16.89 KB
-
SData11_all_assemblies.tar.gz
1.80 GB
-
SData12_unfiltered_orthogroup_sequences.tar.gz
22.34 MB
-
SData2_Orthogroup_alignments.zip
3.38 MB
-
SData3_Orthogroup_trees.zip
4.05 MB
-
SData4_Concatenated_matrices.zip
12.36 MB
-
SData5_GeneSortR_properties_sorted_dataset.csv
194.18 KB
-
SData6_PB_matrixNEO95.zip
1.61 MB
-
SData7_PB_matrixNEO95_500.zip
1.97 MB
-
SData8_PCA.zip
4.50 KB
-
SData9_dGLS.zip
2.72 MB
-
Supplementary_figures.zip
571.48 KB
Abstract
The molluscan order Neogastropoda encompasses over 15,000 almost exclusively marine species playing important roles in benthic communities and in the economics of coastal countries. Neogastropoda underwent intensive cladogenesis in early stages of diversification, generating a ‘bush’ at the base of their evolutionary tree, that has been hard to resolve even with high throughput molecular data. In the present study we analyze a comprehensive exon capture dataset of 1,817 loci (79.6% data occupancy), comprising 112 taxa of 48 (out of 60) recent Neogastropoda families with a variety of phylogenetic inference methods to resolve their relationships. Our results show consistent topologies and high support in all analyses at (super)family level, supporting monophyly of Muricoidea, Mitroidea, Conoidea, and, with some reservations, Olivoidea and Buccinoidea. Volutoidea and Turbinelloidea as currently circumscribed are clearly paraphyletic. Despite our analyses consistently resolve most backbone nodes, three prove problematic. First, uncertain placement of Cancellariidae, as a sister group of either a Ficoidea-Tonnoidea clade, or of the rest of Neogastropoda, leaves monophyly of Neogastropoda unresolved. Second, relationships are contradictory at the base of the major grouping the ‘core Neogastropoda’. Third, coalescence-based analyses reject monophyly of the Buccinoidea in relation to Vasidae. We analysed loci phylogenetic signal in relation with potential biases, and propose most probable resolutions in the two latter recalcitrant nodes. The uncertain placement of Cancellariidae may be explained by orthology violations due to the differential paralog loss short after the whole genome duplication, and should be resolved with a curated set of longer loci.
README: Phylogenomics of the Neogastropoda: the backbone hidden in the bush
https://doi.org/10.5061/dryad.8931zcrx5
The present repository contains The phylogenetic matrices, gene alignments and trees, as well as the original scripts and raw output files from software tools used for crucial steps of the data analysis
Description of the data and file structure
# Data from: Phylogenomics of Neogastropoda: the backbone hidden in the bush
Contact Corresponding Author Information
Name: Alexander Fedosov
Institution: Swedish Museum of Natural History
Address: Frescativägen 40, 114 18 Stockholm, Sweden
Email: fedosovalexander@gmail.com
####### Data files overview
filename file type content
SData1_Orthogroups.GeneCount.tsv tab separated text file Orthofinder2 output file, comprising counts of sequences per specimen per orthogroup.
SData2_Orthogroup_alignments.zip zip archive 1817 Orthogroup alignments in fasta format. Header formatting as follows '>sample ID|target exon ID|sample assembly contig ID'
SData3_Orthogroup_trees.zip zip archive 1817 RAxML Orthogroup trees in newick format. Taxa names correspond to the fasta headers in the SData2.
SData4_Concatenated aminoacid matrices.zip zip archive
├── NEO_NL70p_aa_123taxa.fasta Matrix NEO70 alignment
├── NEO_NL70p_aa_123taxa_partitions Matrix NEO70 unmerged partiton breakdown
├── NEO_NL95p_aa_123taxa.fasta Matrix NEO95 alignment
├── NEO_NL95p_aa_123taxa_partitions Matrix NEO95 unmerged partiton breakdown
├── NEO_NL95p_aa_GSR500_123taxa.fasta Matrix NEO95_500 alignment
└── NEO_NL95p_aa_GSR500_123taxa_partitions Matrix NEO95_500 unmerged partiton breakdown
SData5_GeneSortR_properties_sorted_dataset.csv comma separated text file GenesortR output file, comprising loci statistics inferred by GenesortR.
SData6_PB_matrixNEO95.zip zip archive
├── CH23x5000bpcomp.bpdiff output of phylobayes bpcomp metrics of topological convergence between chains
├── CH23x5000bpcomp.bplist output of phylobayes bpcomp list of bipartitions
├── CH23x5000bpcomp.tre output of phylobayes bpcomp consensus tree
├── CH23x10000bpcomp.bpdiff output of phylobayes bpcomp metrics of topological convergence between chains
├── CH23x10000bpcomp.bplist output of phylobayes bpcomp list of bipartitions
├── CH23x10000bpcomp.tre output of phylobayes bpcomp consensus tree
├── NEO_NL95p_aa_123_catgtr.chain2.trace phylobayes trace file chain parameters
└── NEO_NL95p_aa_123_catgtr.chain3.trace phylobayes trace file chain parameters
SData7_PB_matrixNEO95_GSR500.zip zip archive
├── CH24x2500bpcomp.bpdiff output of phylobayes bpcomp metrics of topological convergence between chains
├── CH24x2500bpcomp.bplist output of phylobayes bpcomp list of bipartitions
├── CH24x2500bpcomp.tre output of phylobayes bpcomp consensus tree
├── CH24x6000bpcomp.bpdiff output of phylobayes bpcomp metrics of topological convergence between chains
├── CH24x6000bpcomp.bplist output of phylobayes bpcomp list of bipartitions
├── CH24x6000bpcomp.tre output of phylobayes bpcomp consensus tree
├── NEO_NL95p_GSR500_aa_123_catgtr.chain2.trace phylobayes trace file chain parameters
└── NEO_NL95p_GSR500_aa_123_catgtr.chain4.trace phylobayes trace file chain parameters
SData8_PCA.zip zip archive
├── 2BI_trees_PCA_matrix.txt Clades presence-absence matrix for PCA
└── clades Taxa content of the 208 unique clades retrieved from the Trees 1-14.
SData7_dGLS.zip zip archive
├── constrained_topologies (dir) IQTree output tree files (newick format) constrained for a desired topology for dGLS tests
│ ├── constrained_A_NEO_NL95p_aa_123taxa_partitions.treefile NEO95 IQTree-part output tree file (newick format) constrained for sister relationship of Ficoider-Tonnoidea and Cancellariidae
│ ├── constrained_A1_NEO_NL95p_aa_123taxa.fasta.treefile NEO95 IQTree-PMM output tree file (newick format) constrained for monophyletic Neogastropoda
│ ├── constrained_C1_NEO_NL95p_aa_123taxa.fasta.treefile NEO95 IQTree-PMM output tree file (newick format) constrained for paraphyletic Buccinoidea
│ ├── constrained_C1_NEO_NL95p_aa_123taxa_partitions.treefile NEO95 IQTree-part output tree file (newick format) constrained for paraphyletic Buccinoidea
│ ├── constrained_ColumbariidaeFirst_NEO_NL95p_aa_123taxa.fasta.treefile NEO95 IQTree-PMM output tree file (newick format) constrained for Columbariidae being first offshoot of core Neogastropoda
│ └── constrained_CoMuFirst_NEO_NL95p_aa_123taxa_partitions.treefile NEO95 IQTree-part output tree file (newick format) constrained for Columbariidae+Muricidae being first offshoot of core Neogastropoda
│
├── AU_output (dir) AU test on the significance of preference for one or another alternative topology
│ ├── NEO95_AA1_GAMMALG4X_AU at the base of Neogastropoda under GAMMALG4X model for NEO95 matrix
│ ├── NEO95_CC1_GAMMALG4X_AU at the base of Buccinoidea under GAMMALG4X model for NEO95 matrix
│ ├── NEO95_part-A1A_AU at the base of Neogastropoda under best models for merged partitions for NEO95 matrix
│ ├── NEO95_part-CC1_AU at the base of Buccinoidea under best models for merged partitions for NEO95 matrix
│ ├── NEO95_PGLG4X_CM-Cfirst_AU at the base of core Neogastropoda under GAMMALG4X model for NEO95 matrix
│ └── NEO95part_Cfirst-CM_AU at the base of core Neogastropoda under best models for merged partitions for NEO95 matrix
│
├── T-test (dir) results of T test of loci statistics (Saturation, missing data, Evolutionary rate, Compositional heterogeneity, Alignment length, Average bootstrap, RF distance to species tree) for three groups of loci: --10% dGLS (strong support for alternative topology), rest 80% (weak support for either of the two topologies), ++10% dGLS (strong support for the main topology)
│ ├── AA1_GAMMALG4X_t-test_10-80-10_dGLS.txt For loci showing varying signal for alternative topologies at the base of Neogastropoda under GAMMALG4X model for NEO95 matrix
│ ├── CC1_GAMMALG4X_t-test_10-80-10_dGLS.txt For loci showing varying signal for alternative topologies at the base of Buccinoidea under GAMMALG4X model for NEO95 matrix
│ ├── CM-Cfirst_GAMMALG4X_t-test_10-80-10_dGLS.txt For loci showing varying signal for alternative topologies at the base of core Neogastropoda under GAMMALG4X model for NEO95 matrix
│ ├── partA1A_t-test_10-80-10_dGLS.txt For loci showing varying signal for alternative topologies at the base of Neogastropoda under best models for merged partitions for NEO95 matrix
│ ├── partCC1_t-test_10-80-10_dGLS.txt For loci showing varying signal for alternative topologies at the base of Neogastropoda under best models for merged partitions for NEO95 matrix
│ └── partCfirst-CM__t-test_10-80-10_dGLS.txt For loci showing varying signal for alternative topologies at the base of core Neogastropoda under best models for merged partitions for NEO95 matrix
│
├── NEO95_AA1_GAMMALG4X.sitelh Site likelyhood score output of the RAxML for alternative topologies at the base of Neogastropoda under GAMMALG4X model for NEO95 matrix
├── NEO95_CC1_GAMMALG4X.sitelh Site likelyhood score output of the RAxML for alternative topologies at the base of Buccinoidea under GAMMALG4X model for NEO95 matrix
├── NEO95_CM-Cfirst_GAMMALG4X.sitelh Site likelyhood score output of the RAxML for alternative topologies at the base of core Neogastropoda under GAMMALG4X model for NEO95 matrix
├── NEO95_part-A1A.sitelh Site likelyhood score output of the RAxML for alternative topologies at the base of Neogastropoda under best models for merged partitions for NEO95 matrix
├── NEO95_part-CC1.sitelh Site likelyhood score output of the RAxML for alternative topologies at the base of Neogastropoda under best models for merged partitions for NEO95 matrix
└── NEO95_part-Cfirst-CM.sitelh Site likelyhood score output of the RAxML for alternative topologies at the base of core Neogastropoda under best models for merged partitions for NEO95 matrix
SData11_all_assemblies.tar.gz gz archive 154 original assemblies files (Trinity assemblies for transcriptomic data sets, and clustered by CD-Hit for exon capture data sets).
SData12_unfiltered_orthogroup_sequences.tar.gz gz archive 3000 unfiltered, but codon-aligned fasta files with exon sequences per orthogroup as inferred by Orthofinder2.
Sharing/Access information
The original sequencing data (RAW reads) can be accessed in the GenBank under the Bioproject PRJNA885117
Code/Software
SData10_Custom Python scripts.zip
SData10_Scripts
Script_S10-1_separate_Loci.py
Script_S10-2_slice-translate_ginsi_remove_identical_poolback_to_taxa_corr.py
Script_S10-3_Remove_CC_run_PPP
Script_S10-4_Remove_LB.py
Script_S10-5_testPairwiseDistancesDistribution.py
Script_S10-6_trees2PCAmatrix.py
Script_S10-7_check_sort_summ_dSLS_boxplots.py
Script_S10-8_FTonnoCancellariidaeNEO_brlens.py