Taxonomy is the very first step of most biodiversity studies, but how confident can we be in the taxonomic-systematic exercise? One may hypothesise that the more material, the better the taxonomic delineation, because the more accurate the description of morphological variability. As rarefaction curves assess the degree of knowledge on taxonomic diversity through sampling effort, we aim to test the impact of sampling effort on species delineation by subsampling a given assemblage. To do so, we use an abundant and morphologically diverse conodont fossil record. Within the assemblage, we first recognize four well established morphospecies but about 80% of the specimens share diagnostic characters of these morphospecies. We quantify these diagnostic characters on the sample using geometric morphometrics, and assess the number of morphometric groups, i.e. morphospecies, using ordination and cluster analyses. Then we gradually subsample the assemblage in two ways (randomly and by mimicking taxonomist work) and redo the ‘ordination + clustering’ protocol to appraise the evolution of the number of clusters related to sampling effort. We observe the number of delineated morphospecies decreasing when increasing the number of specimens, whatever the subsampling method, resulting mostly in less morphospecies than expected. Such rather counter-intuitive influence of sampling effort on species delineation highlights the complexity of taxonomical work. This indicates that new morphotaxa should not be erected based on small samples, and encourages researchers to largely illustrate, measure, and quantitatively compare their material to better constrain the morphological variability of a clade, and so to better characterize and delineate morphospecies. --

Please refer to the Material and Methods part of the publication Guenser et al. "When less is more and more is less: the impact of sampling effort on species delineation"

R code (Guenser et al. When less is more and more is less_R code.txt)

Just copy/paste the content of the file on R.

This file contains the code used for :

the Multi-Factorial Analysis
the Broken-Stick Model
the estimation of error measurement (supplementary figure FIG-S6 alongside the publication)
the clustering
the sub-sampling protocole (random and systematist-like)
the evolution of the number of cluster related to sampling effort

Dataset (data_for_MFA.csv)

These are the variables used for the Multi-Factorial Analysis. Please refer to the Material and Methods part of the publication Guenser et al. "When less is more and more is less: the impact of sampling effort on species delineation" for detail of the acquisition of these data.

Column 1 is the taxonomy of the specimens.

Column 2 is the Height/Length ratio.

Column 3 is the Mean/Standart deviation ratio from the angle measurement.

Column 4 is the Skewness/Kurtosis ratio from the angle measurement.

Columns 5 to 7 are the discrete characters.

Columns 8 to 67 are the aligned coordinates after Generalised Procruste superimposition for 2D Geometric Morphometrics.

Columns 68 to 142 are the aligned coordinates after Generalised Procruste superimposition for 3D Geometric Morphometrics (see Berio and Bayles 2020 for 3D processes).

Data from: When less is more and more is less: the impact of sampling effort on species delineation

Data files

Abstract

Data from: When less is more and more is less: the impact of sampling effort on species delineation

Data files

Abstract

Methods

Usage notes

Works referencing this dataset