Skip to main content

Neo-taphonomic analysis of the Misiam leopard lair

Cite this dataset

Domínguez-Rodrigo, Manuel (2022). Neo-taphonomic analysis of the Misiam leopard lair [Dataset]. Dryad.


The data set presented here contains the MAU% data for the selected hyena-made and leopard-made faunal assemblages with which the Misiam assemblage is compared. Misiam is a recently discovered modern faunal accumulation found at Olduvai Gorge (Tanzania) interpreted as a palimpsest resulting from the action of leopards (main transporting agents) and hyenas (secondary scavengers). It is the first open-air reported leopard-made faunal accumulation. Defining the anatomical and taphonomic characteristics of such an assembllage is important for the interpretation of prehistoric faunal assemblages created by carnivores. It is also relevant for modern ecological studies. In this particular case, the bulk of the assemblage is composed of wildebeests. This is usually not the target of leopards; however, their seasonal abundance during the wildebeest migration on the plains adjacent to Olduvai Gorge prompts this rather exceptional highly-specialized behavior by usually eclectic leopards. In the present work, a thorough taphonomic analysis is carried out and the main taxonomic, anatomical and taphonomic characteristics of this felid-hyenic modified assemblage is decribed. The analytical approach adopted uses the data presented here. 


The Misiam data were collected in the field. The bone assemblage lay on the surface of a densely-vegetated ravine. Bones were simply collected and in one particular area an excavation was m,ade to retrieve bones sub-surficially, In order to compare skeletal profiles in felid and hyenid assemblages, we will use some of the most representative assemblages in the literature. For spotted hyena dens, we will use data from the Koobi Fora Hyena Den 1 (KFHD1) , the Amboseli den, the Maasai Mara den, and the Syokimau den, all of them in Kenya, and the Eyasi (Kisima Ngeda) Hyena Den 2 (KND2) (Tanzania). We used these assemblages also because they are either dominated by size 3 carcasses or these make up a significant part of the assemblage. 


When comparing long bone shaft breakage patterns, we also used additional hyena-made assemblages: Dumali, Heraide, Yangula Ari, Oboley (spotted hyenas), Datagabou (striped hyena, Djibouti), and Uniab (brown hyena, Namibia). These assemblages are almost completely dominated by very small fauna (Capra hircus), and several of them constitute significantly smaller sample sizes than the hyena dens mentioned above.


The leopard lairs used for comparison are: Portsmut and Hakos River (Namibia), and WU/BA-001 (South Africa). Portsmut and Hakos River show a low density of remains, probably also modified by porcupines or other agents. The remains belonging to larger animals show an interesting contrast with those documented in hyena dens: the presence of axial and compact bones is high. These latter bones are also well represented in smaller carcasses. This characteristic is more marked in WU/BA-001; the least altered leopard lair documented to date. This lair was monitored for 7 years. 


All the comparative assemblages were transformed into %MAU to account for differential inter-assemblage quantitative representation. First, they were analyzed using Generalized Low Rank Models (GLRM) as an exploratory method. Then, we used a Uniform Manifold Approximation and Projection (UMAP), to classify leopards´ and hyenas´ bone assemblages, especially according to each feature. Lastly, we used a cluster analysis with variance-dependent phylogenetic tree to show the actual distances among all the assemblages compared.  


GLRM are a series of methods for dimensionality reduction that use several loss function types and can implement regularization functions. Whereas principal component analysis (PCA) is based on orthogonal projections of linear relationships, in cases where relationships are non-linear, the PCA underperforms compared to other more flexible methods. GLRM decomposes a table into two distinctive matrices X and Y. X contains the same number of rows as the original table, but all variables are condensed into k factors. Y has k rows and the same number of columns as features (i.e., variables) in the original table. Each of the rows is an archetypal feature derived from the columns (i.e., variables) of the original table. Each row of X corresponds to a row of the original table projected into this reduced dimension feature space. Data are compressed by the low-rank representation derived from k feature reduction. An advantage of GLRM over PCA is that it can handle mixed datasets containing numeric, categorical and Boolean data. GLRM admits several types of loss functions: Huber, Poisson, quadratic, periodic or hinge. It also allows the use of regularization functions, including: Lasso, Ridge, OneSparse, Simplex, UnitOneSparse, and quadratic. Loss functions are used to select the optimal archetypal values. Regularization is used to limit X and Y archetypal values. This impacts the effect of negative data, multicollinearity and overfitting. In the present analysis, GLRM was performed with the “h2o” R library ( 


UMAPs is a non-linear dimension-reduction method based on finding inter-case distances in a low-dimensional feature space. The key of UMAP over other dimension-reduction non-linear methods, like t-distributed stochastic neighbor embedding (t-SNE), is that distances are generated along a “manifold”. A manifold is a n-dimensional geometric shape constituted of the path(s) among the points. Every point is referenced according to a small two-dimensional neighborhood around it. The UMAP algorithm searches for a multi-dimensional space delimited by the location of points. UMAP uses a nearest-neighbor approach, by eventually connecting all the points along its search regions. This forces a uniform distribution of points. The distances of points along this manifold are then derived through Euclidean distances. Several optimization methods can be used to reproduce inter-point distances. For the latter process, the UMAP approach that we will use is based on a cross-entropy loss function. For the UMAP analysis, we have used the “umap” R library ( We have also used a search grid combining ranges of values for number of neighbors, minimal distance between neighbors, distance metric, and number of epochs (i.e., iterations of the optimization process).


Finally, a hierarchical cluster analysis, using an Euclidean distance matrix on the %MAU dataset, was carried out. The method used was the “average” linkage, which represents the average distance between the points. The combination of the three methods was used to study agent-specific variability in inter-assemblage element representation.


Usage notes

Data are in txt and xls formats. Code for analysis was performed in R.