Data from: When discrete characters are wanting: Continuous character integration under the phylospecies concept informs the revision of the Australian land snail Thersites (Eupulmonata, Camaenidae)
Data files
Dec 08, 2025 version files 25.15 GB
-
character.zip
20.54 MB
-
GIS.zip
96.05 KB
-
molecular.zip
45.02 MB
-
morphology.zip
25.09 GB
-
README.md
10.55 KB
Abstract
Species that are predominantly characterized by continuous instead of discrete morphological characters pose a challenge to species delimitation. Under the phylospecies concept, species are delimited by apomorphies, which are difficult to establish when characters are not discrete. In the present study, we address these challenges in the Australian land snail genus Thersites (family Camaenidae), where accepted species exhibit a scarcity of discrete distinguishing characters and differ in a limited range of continuous characters according to prior taxonomic studies. We integrate analyses of genome-scale molecular data with evaluations of several continuous, qualitative, and discrete morphological characters derived from landmarks to delineate species through detecting apomorphies. We found that the dimensions of the shell and genitals overlapped considerably among the currently accepted species. These overlaps may indicate morphospace saturation, which may affect species delineation through the gen-morph species concept. Statistical methods, such as Dunn’s test, failed to consistently delineate monophyletic taxa. Additionally, we could not derive apomorphies from the discretized landmarks due to the prevalence of outlier specimens. However, applying the Kruskal-Wallis test at certain nodes of the tree revealed significant differences in certain continuous characters. We propose that these inferred differences represent apomorphies. Ultimately, our results suggest that of the four species currently accepted, both T. mitchellae and T. novaehollandiae should be synonymized as T. novaehollandiae is paraphyletic with respect to T. mitchellae. In addition, T. darlingtoni nests within the T. richmondiana lineage, and both taxa are also considered synonyms. Distinct character states of the umbilicus support the existence of two independent lineages. One with an open (T. sp1 + T. sp2) and one with a closed umbilicus (T. novaehollandiae + T. richmondiana). The Kruskal-Wallis tests support the recognition of four distinct species, T. novaehollandiae and T. richmondiana, plus two new species.
Dataset DOI: 10.5061/dryad.2rbnzs81c
Description of the data and file structure
Folder structure of the Thersites project.
This project provides an organized folder structure for the Thersites project, which is divided into four main modules: morphology, molecular, character, and GIS. Every folder was compressed into a single zip file. More README information can be found in every zip file, including corresponding software used for this dataset.
.
├── character
│ ├── apomorphy
│ ├── apomorphy_cont
│ ├── apomorphy_lm
│ ├── contmap
│ ├── kmeans
│ └── kruskal
├── GIS
├── molecular
│ ├── BI
│ ├── dartR
│ ├── phylonet
│ ├── ML
│ ├── model
│ └── MP
│ ├── EIW
│ ├── EW
│ └── IW
└── morphology
├── cont_phylogeny
├── genital
│ ├── cont_phylogeny
│ ├── flagellum
│ ├── lm_phylogeny
│ └── measurement
├── lm_phylogeny
└── shell
Module descriptions
morphology.zip
The morphology module contains data related to the shapes and structures of snail shells and genitalia. Within this module, the shell folder holds data on shell morphology, while the genital folder is dedicated to genital morphology and is further subdivided into two folders: flagellum, which contains data on the flagellum, and measurement, which stores measurement data. lm_phylogeny contains phylogenetic data based on landmark analysis, and cont_phylogeny contains phylogenetic data based on continuous character analysis.
Contains morphological measurements
- shell/ – Shell morphology analysis. Contains raw images (
.tif/.jpg/.png) and processed landmark data (.tps). Analyses performed using tpsDig, tpsUtil, and MorphoJ. - genital/ – Genital morphology. Subfolders:
- flagellum/ – Flagellum shape data processed using FIJI/ImageJ and Python scripts (
resample_point.py,draw_lines.py) for landmark-based analysis. - measurement/ – Raw measurements obtained using FIJI/ImageJ, scripts for plotting distributions, violin plots, and significance heatmaps (
gen_distribution.py,gen_violin_plot.py, etc.).
- flagellum/ – Flagellum shape data processed using FIJI/ImageJ and Python scripts (
- lm_phylogeny/ – Phylogenetic trees based on landmark analysis using TNT. Parallelized with
pvm. - cont_phylogeny/ – Phylogenetic trees based on continuous morphological characters using TNT.
molecular.zip
The molecular module focuses on molecular data optimization and phylogenetic analyses. The dartR folder holds SNP call data generated using dartR. The model folder contains model selection files. It includes several subfolders that organize different analytical methods: the BI folder contains data from Bayesian Inference phylogenetics; the ML folder contains Maximum Likelihood phylogenetics data; and the MP folder contains Maximum Parsimony phylogenetics data. The MP folder is further divided into three subdirectories, EIW (Extended Implied Weighting), EW (Equal Weighting), and IW (Implied Weighting), each corresponding to a different weighting strategy.
Contains molecular datasets and phylogenetic analyses.
- dartR/ – SNP calls processed with dartR. Input CSV obtained from DArT company.
- model/ – Model selection files. Uses
modeltest-ng. - BI/ – Bayesian Inference phylogenetics (MrBayes).
- ML/ – Maximum Likelihood phylogenetics (RAxML-NG).
- MP/ – Maximum Parsimony phylogenetics (TNT), subfolders:
- EIW/ – Extended Implied Weighting, subfolder
Kcontains trees obtained from different K values. - EW/ – Equal Weighting
- IW/ – Implied Weighting
- EIW/ – Extended Implied Weighting, subfolder
- phylonet/ Results of PhyloNet.
character.zip
The character module includes files related to character optimization projects. This module is subdivided into four folders, each serving a specific analytical purpose: the contmap folder contains files for continuous character mapping; the kmeans folder holds data for k-means clustering analysis; the apomorphy* folder stores apomorphy analysis results generated by TNT and WinClada; and the kruskal folder includes results from the Kruskal-Wallis test based on phylogenetic lineages.
Contains results for character optimization and clustering.
- apomorphy/ – Parsimony-based character optimization using TNT in
wincladtreefolder and WinClada inwincladafolder. - apomorphy_cont/ – Continuous character optimization (TNT scripts).
- apomorphy_lm/ – Landmark-based character optimization aligned to molecular phylogeny using TNT. Subfolder
rftra_align,tree_alignandunaligncontain restulst under different alignment methods. - contmap/ – Continuous character mapping onto phylogenies using R.
- kmeans/ – K-means clustering for landmark data. Scripts include
nts2csv.py,lm_tps.py,lm_nts.py,barchart.py. - kruskal/ – Kruskal-Wallis test for phylogenetic lineages.
GIS.zip
Lastly, the GIS module is dedicated to geographic information system (GIS) data, which records species occurrences and distribution information.
File types & software
| File type | Description | Recommended software |
|---|---|---|
.R |
R scripts | R 4.x |
.py |
Python scripts | Python 3.10+ |
.sh |
Shell scripts | Unix shell (bash) |
.run |
TNT run scripts | TNT |
.csv |
Tabular data for morphometrics, SNP calls, or analysis output | Any spreadsheet software (Excel), text editor (vim) |
.txt |
Plain text files | Any text editor (vim, Notepad) |
.svg |
Scalable vector graphics | Web browser, Inkscape |
.emf |
Enhanced metafile format | Inkscape, Adobe Illustrator |
.ijm |
FIJI/ImageJ macros | FIJI/ImageJ |
.tps |
Landmark and semi-landmark data | tpsDig, tpsUtil, MorphoJ |
.morphoj |
MorphoJ project files | MorphoJ |
.tnt |
TNT phylogenetic scripts and input files | TNT |
.fasta |
DNA sequences | Any sequence viewer or analysis software |
.ss |
WinClada files | WinClada |
.nwk |
phylogenetic tree | FigTree, iTOL, R (ape) |
.tre |
phylogenetic tree | FigTree, iTOL, R (ape) |
.ctf |
phylogenetic tree for TNT | TNT |
.Rdata |
R data | R |
.qgz |
QGIS project file | QGIS |
Abbreviations used in csv
| Abbreviation | Meaning |
|---|---|
| id | individual |
| seq | sequence or specimen ID |
| pop | population |
| lat | latitude |
| lon | longitude |
| ph | phallus |
| ep1 | epiphallus segment 1 |
| ep2 | epiphallus segment 2 |
| fl | flagellum |
| lm | 7 landmarks |
| slm | 2 semi-landmark group |
| slm1 | semi-landmark group 1 |
| slm2 | semi-landmark group 2 |
| fl50 | 50% resampled flagellum points |
| max | maximum number |
| min | minimum number |
| mean | mean value |
| std | standard deviation |
| NS | not significant |
| S | significant |
| dar | darlingtoni |
| mit | mitchellae |
| nov | novaehollandiae |
| ric | richmondiana |
| sp1 | species1=kaputarensis |
| sp2 | species2=coolahensis |
Usage examples
Morphology
# Shell processing
for py in draw_shell_width_height.py gen_distribution.py gen_violin_plot.py ; do
python "$py"
done
# Flagellum resampling
python resample_point.py input.csv
Molecular
# Convert SNP CSV to fasta
nohup Rscript Thersites.R &
# Maximum Likelihood tree
raxml-ng --all --msa gl_output_rmq.fasta --model HKY+I+G16 --bs-trees 1000 --threads 40
# Bayesian Inference
nohup mpirun -np 4 mb -i mimi.nex &
Character optimization
# Run TNT for parsimony apomorphy
bash character/apomorphy/run.sh
# Run continuous character mapping
bash character/contmap/run.sh
# K-means clustering
bash character/kmeans/run.sh
# Kruskal-Wallis phylogeny
bash character/kruskal/run.sh
