Skip to main content

Investigating sources of conflict in deep phylogenomics of vetigastropod snails

Cite this dataset

Cunha, Tauana; Reimer, James; Giribet, Gonzalo (2021). Investigating sources of conflict in deep phylogenomics of vetigastropod snails [Dataset]. Dryad.


Phylogenetic analyses may suffer from multiple sources of error leading to conflict between genes and methods of inference. The evolutionary history of the mollusc clade Vetigastropoda makes them susceptible to these conflicts, their higher level phylogeny remaining largely unresolved. Originating over 350 million years ago, vetigastropods were the dominant marine snails in the Paleozoic. Multiple extinction events and new radiations have resulted in both very long and very short branches and a large extant diversity of over 4000 species. This is the perfect setting of a hard phylogenetic question in which sources of conflict can be explored. We present 41 new transcriptomes across the diversity of vetigastropods (62 terminals total), and provide the first genomic-scale phylogeny for the group. We find that deep divergences differ from previous studies in which long branch attraction was likely pervasive. Robust results leading to changes in taxonomy include the paraphyly of the order Lepetellida and the family Tegulidae. Tectinae subfam. nov. is designated for the clade comprising TectusCittarium and Rochia. For two early divergences, topologies disagreed between concatenated analyses using site heterogeneous models vs. concatenated partitioned analyses and summary coalescent methods. We investigated rate and composition heterogeneity among genes, as well as missing data by locus and by taxon, none of which had an impact on the inferred topologies. We also found no evidence for ancient introgression throughout the phylogeny. We further tested whether uninformative genes and over-partitioning were responsible for this discordance by evaluating the phylogenetic signal of individual genes using likelihood mapping, and by analyzing the most informative genes with a full multispecies coalescent model. We find that most genes are not informative at the two conflicting nodes, but neither this nor gene-wise partitioning are the cause of discordant results. New method implementations that simultaneously integrate amino acid profile mixture models and the multispecies coalescent might be necessary to resolve these and other recalcitrant nodes in the Tree of Life.

Usage notes


This dataset holds alignments, input files, tree files, supplementary figures, tables and R code. Below is a brief description of each item. See the Materials and Methods of the main paper for details.

Supplementary_Table_S1.csv - Specimen information with vouchers and assembly statistics
Supplementary_Table_S2.csv - Assembly statistics from TransRate
Supplementary_Figures.pdf - Supplementary figures
Supplementary_Code_S1.html - R code to reproduce tree figures
Supplementary_Code_S2.html - R code to reproduce figures related to likelihood-mapping analyses
Supplementary_Code_S3.html - R code to reproduce figures related to concordance factors and partitioned coalescence support
Supplementary_Code_S4.html - R code related to the introgression analysis

assemblies_trinity_vetigastropoda.tar.gz - Trinity assemblies
assemblies_peptide_vetigastropoda.tar.gz - Peptide assemblies (input for orthology)

Individual gene files:
alignments.tar.gz - Alignments of all 1027 genes
genetrees.tar.gz - Gene trees of all 1027 genes

Trees - folder with tree files from all analyses, has its own README

Input matrices for phylogenetic inference:
31taxa_concatenated.nex - Matrix 1
44taxa_concatenated.nex - Matrix 2
EvoRates_concatenated.nex - Matrix 3
p4Homogeneous_concatenated.nex - Matrix 4
70resolved.xml - Matrix 5
75resolved.xml - Matrix 6

Gene content of each matrix (lists of named orthogroups):
Content_OGslice_31taxa - Matrix 1
Content_OGslice_44taxa - Matrix 2
Content_OGslice_EvoRates - Matrix 3
Content_OGslice_p4Homogeneous - Matrix 4
Content_OGslice_Resolved70 - Matrix 5
Content_OGslice_Resolved75 - Matrix 6

Files related to the likelihood-mapping analyses (lmap):
(two independent tests, one for the recalcitrant node regarding the position of Haliotidae, another for the recalcitrant node regarding the position of Fissurellidae)
clusters-Haliotidae.nex - clusters of taxa, input for the test
clusters-Fissurellidae.nex - clusters of taxa, input for the test
Content_LMAP-Haliotidae - Set of genes included in test
Content_LMAP-Fissurellidae - Set of genes included in test
lmap_summary_Haliotidae - Summary of results over all genes
lmap_summary_Fissurellidae - Summary of results over all genes

Output files from Concordance Factors and Partitioned Coalescence Support (pcs_cf):
(these are input for Supplementary Code S3) - output based on topology 1 (IQTREE-cat, Matrix 1) - output based on topology 2 (IQTREE-part, Matrix 1)
PCS_astral_31taxa - output based on the Astral tree of Matrix 1

Information of gene properties:
(used in Supplementary Code S2)
avg_trimal_scores - Evolutionary rates
occ.matrix.tsv - Occupancy of Matrix 1
p4_outcome_ordered.txt - Outcome of amino acid composition homogeneity test
size_alignments.csv - Alignment length

(used in Supplementary Code S4)
(output files from Concordance Factors obtained after 2000 resamplings of gene trees)
iqtree-cat_31taxa/ - outputs based on topology 1 (IQTREE-cat, Matrix 1)
iqtree-part_31taxa/ - outputs based on topology 2 (IQTREE-part, Matrix 1)


National Science Foundation, Award: 1701648

Schlumberger (Netherlands), Award: Faculty for the Future Fellowship

Society of Systematic Biologists, Award: Graduate Student Research Award

Harvard University, Award: Putnam Expedition Grant