BIMAGES: Bivalve images for morphological analysis and genetic estimation study

Hofmann, Martin 1 ; Kiel, Steffen2 ; Kösters, Lara3; Wäldchen, Jana3; Mäder, Patrick1

Research facility: Technische Universität Ilmenau

Published Jul 25, 2024 on Dryad. https://doi.org/10.5061/dryad.k6djh9wd0

Data files

Jul 25, 2024 version files 35.21 MB

edge_lengths.csv

44.67 KB
meta.pkl
17.66 MB
meta.tsv

17.49 MB
README.md
10.88 KB

Abstract

Reconstructing the tree of life and understanding the relationships of taxa are core questions in evolutionary and systematic biology. The main advances in this field in the last decades were derived from molecular phylogenetics; however, for most species, molecular data are not available. Here, we explore the applicability of two deep learning methods – supervised classification approaches and unsupervised similarity learning – to infer organism relationships from specimen images. As a basis, we assembled an image dataset covering 4144 bivalve species belonging to 74 families across all orders and subclasses of the extant Bivalvia, with molecular phylogenetic data being available for all families and a complete taxonomic hierarchy for all species. The suitability of this dataset for deep learning experiments was evidenced by an ablation study resulting in almost 80% accuracy for identifications on the species level. Three sets of experiments were performed using our dataset. First, we included taxonomic hierarchy and genetic distances in a supervised learning approach to obtain predictions on several taxonomic levels simultaneously. Here, we stimulated the model to consider features shared between closely related taxa to be more critical for their classification than features shared with distantly related taxa, imprinting phylogenetic and taxonomic affinities into the architecture and training procedure. Second, we used transfer learning and similarity learning approaches for zero-shot experiments to identify the higher-level taxonomic affinities of test species that the models had not been trained on. The models assigned the unknown species to their respective genera with approximately 48% and 67% accuracies. Lastly, we used unsupervised similarity learning to infer the relatedness of the images without prior knowledge of their taxonomic or phylogenetic affinities. The results indicated a reasonable similarity between visual appearance and genetic relationships at the higher taxonomic levels. The correlation was 0.6 for the most species-rich subclass, the Imparidentia, and ranged from 0.5 to 0.7 for the orders with the most images. Overall, the correlation between visual similarity and genetic distances at the family level was 0.78. However, fine-grained reconstructions based on the observed correlation, such as sister-taxa relationships, require further work. Overall, our results broaden the applicability of automated taxon identification systems and provide a new avenue for estimating phylogenetic relationships from specimen images.

Inferring Taxonomic Affinities and Genetic Distances Using Morphological Features Extracted from Specimen Images: a Case Study with a Bivalve dataset

This preprocessed fine-grained labeled dataset contains 71,888 images of 4,144 species in 884 genera, 74 families, 26 orders, and six subclasses; the phylogenetic study by Bieler et al. (2014) covers all 74 families.

Description of the data and file structure

Metadata and labels are located inside the meta.tsv file. The code is located in the respective code folder.

Sharing/Access information

All images can be accessed through the h5 database files.

Data was derived from the following sources:

Source	Website	N of images
Data aggregators
GBIF	https://www.gbif.org/	4780
iDigBio	https://www.idigbio.org/	3894
World Register of Marine Species	http://www.marinespecies.org/	3073

Museums / universities / educational
Bailey-Matthews National Shell Museum	https://catalog.shellmuseum.org/shells/southwest-florida-shells	293
Biological Library (BioLib)	https://www.biolib.cz/	509
BOLD systems	https://v3.boldsystems.org/index.php/TaxBrowser\_Home	353
Burke Museum	https://www.burkemuseum.org/	66
Florida Museum of Natural History	https://www.floridamuseum.ufl.edu/	797
Museum of Paleontology, University of California	https://ucmp.berkeley.edu/	274
National Museum Wales	https://naturalhistory.museumwales.ac.uk/	1478
Natural History Museum London	http://data.nhm.ac.uk/	412
Natural History Museum Rotterdam	http://datasets.nlbif.nl/nmr/	585
Naturalis	http://bioportal.naturalis.nl/	3667
Neogene Atlas of Ancient Life	https://neogeneatlas.net/	126
Paleontological Research Institution	https://www.priweb.org/	1713
Taiwan Malacofauna Database	https://shell.sinica.edu.tw/	162
University of Göttingen	http://www.animalbase.uni-goettingen.de/	318

Private / shell dealers
Allspira	https://allspira.com/gallery/marine/bivalvia/	2365
Aphotomarine	https://www.aphotomarine.com/mollusc\_shells\_bivalves\_marine.html	99
Argonauti	http://www.argonauti.org/Conchiglie/bivalvia.html	292
Bishogai Shells Database	http://bigai.world.coocan.jp/	356
Conchiglie del Mediterraneo	http://www.conchigliedelmediterraneo.it/shell.php?classe=Bivalvia	902
Conchology	https://www.conchology.be/	36134
Conquiliologistas do Brasil	http://www.conchasbrasil.org.br	487
De Donder shells	https://www.dedondershells.com/	1140
FisherCollection	\ul https://fisherscollection.com/shop/	1157
General Shell Portal	http://www.idscaro.net/sci/index.htm	2064
New Zealand Mollusca	http://www.mollusca.co.nz/	268
Pacific Northwest Shell Club	http://www.bily.com/pnwsc/web-content/Bivalve-Identification.html	440
Shellpassion	http://www.shellspassion.com/	963
South East Queensland Shells	https://www.seqshells.com/seqbivalves.php	419
Tweed-Byron Coast, Australia	http://www.roboastra.com/	23

Code/Software

Dataset File Description

Please refer to the readme.txt files within the zip archives for detailed instructions.

images.zip: contains the images.
h5.zip: contains all experimental splits in h5 format.
meta.pkl: Pickled pandas object containing the meta data.
meta.tsv: Tab-separated meta data file containing all meta data.
edge_lengths.csv: Comma-separated file that contains the distances published by Blier et al.
code_task_1.zip:
- Multi-level taxonomic identification task.
- Includes code for training and evaluating sequential and parallel multi-head networks, with and without genetic distances target.
code_task_2.zip:
- Zero-Shot categorization task.
- Evaluates network performance in categorizing an unseen taxon within its higher taxonomic group (e.g., an unseen genus in the correct family).
- Note: This file is large due to the extensive meta data split files.
genus_splits.zip:
- Additional meta data split files for the Zero-Shot categorization task
- Archive contains the meta data splits needed for the training of task 2.
- Extract the archive to ./code_task_2/genus_splits
code_task_3.zip:
- Similarity learning experiment code.
- Focuses on learning visual similarity and comparing it to the original for regression analysis.

BIMAGES: Bivalve images for morphological analysis and genetic estimation study

Data files

Abstract

README: BIMAGES: Bivalve images for morphological analysis and genetic estimation study

Inferring Taxonomic Affinities and Genetic Distances Using Morphological Features Extracted from Specimen Images: a Case Study with a Bivalve dataset

Description of the data and file structure

Sharing/Access information

Code/Software

Dataset File Description

Contents

Methods

Works referencing this dataset