Discovery of the closest free-living relative of the domesticated “magic mushroom” Psilocybe cubensis in Africa

Bradshaw, Alexander 1 ; Sharp, Cathy2 ; Van Der Merwe, Breyten3; Tremble, Keaton4; Dentinger, Bryn5

Published Jan 03, 2026 on Dryad. https://doi.org/10.5061/dryad.5x69p8df2

Data files

Jan 03, 2026 version files 468.78 MB

DNA_barcoding_phylogenetic_analysis.zip

325.39 MB
ENM_and_SDM.zip

19.52 MB
ITS_variation_and_splitstree.zip

1.95 MB
Psilocybe_SDM-code.zip

10.42 KB
RBB_analysis.zip

120.57 MB
README.md

12.06 KB
Supplementary_data.zip

1.32 MB

Abstract

The "magic mushroom" genus Psilocybe is globally distributed and has a hotspot of diversity in the temperate regions of the Americas, particularly in Mesoamerica. However, many undersampled regions of the world are known to have endemic species but lack historical sampling. Here, we describe a new species of Psilocybe from Zimbabwe, Psilocybe ochraceocentrata sp. nov., Using morphological features and multiple DNA barcode regions extracted from genomic data from type specimens across the Cubensae complex. We show that Psilocybe ochraceocentrata sp. nov., is the sister clade to Psilocybe cubensis, suggesting that the cubensae complex is more diverse than previously thought and further expands upon the hidden diversity of Psilocybe derived from Africa. The geographical origin of Psilocybe cubensis is currently unknown and heavily debated. Here, we perform molecular dating and ecological niche and species distribution modeling of Psilocybe ochraceocentrata and Psilocybe cubensis to refine their possible geographic origin.

Data for

Discovery of the closest free-living relative of the domesticated “magic mushroom” Psilocybe cubensis in Africa

DOI: https://doi.org/10.5061/dryad.5x69p8df2

1. Overview

This Dryad repository contains molecular, phylogenetic, genomic, and ecological niche modeling data supporting the discovery and evolutionary placement of the closest known free-living relative of Psilocybe cubensis from sub-Saharan Africa.

All files are organized by analysis type and are documented below such that they can be interpreted and reused without access to the associated manuscript.

The dataset includes:

DNA barcode sequence alignments and phylogenetic inference outputs
Molecular clock (BEAST) input files and MCMC results
Analyses of the psilocybin biosynthetic gene cluster using reciprocal-best BLAST (RBB)
ITS sequence variation and phylogenetic network analyses
Ecological niche modeling (ENM) and species distribution modeling (SDM) visualization outputs

Intermediate files are retained to ensure analytical transparency and reproducibility.

2. External Data Accessions

DNA barcode sequences: Deposited in NCBI GenBank
Raw genomic sequencing data: Deposited in the NCBI Short Read Archive (SRA) under BioProject PRJNA1159811.
Type-derived molecular data: Submitted to NCBI for RefSeq designation and curation

3. Repository Structure

This repository contains the following compressed directories:

DNA_barcoding_phylogenetic_analysis.zip
ENM_and_SDM.zip
ITS_variation_and_splitstree.zip
RBB_analysis.zip
Supplementary_data.zip
Psilocybe_SDM-code.zip

Each directory is described below.

4. File and Directory Descriptions

4.1 DNA_barcoding_phylogenetic_analysis.zip

This directory contains multilocus DNA barcode alignments and phylogenetic inference outputs for species delimitation and evolutionary analyses.

Subdirectories and contents

BEAST/

Multilocus sequence alignments (FASTA / NEXUS)
.xml — BEAST configuration files specifying models, priors, and MCMC settings
.log — parameter log files recording posterior estimates across MCMC runs
.state — BEAST state files allowing MCMC continuation or inspection
.trees — posterior tree distributions sampled during MCMC
Final summarized time-calibrated tree (PDF)

EF1a/, ITS/, RPB1/, RPB2/

Each locus-specific directory contains output and intermediate files produce for each phylogenetic analysis, including all log and consensus trees:

Sequence alignments (.fasta, .nex)
.iqtree — IQ-TREE output files including model selection and likelihood statistics
.contree — consensus trees inferred from bootstrap or likelihood analyses
.treefile / .tre — inferred phylogenetic trees

These files collectively allow re-analysis, model comparison, or extraction of trees and alignments for independent studies. Files can be directly used in conjunction with alignment software such as mafft, phylogenetic analysis software such as iqtree, or tree files (.treefile) can be vizualized directly in tree viewers such as figtree.

4.2 ENM_and_SDM.zip

This directory contains final ecological niche modeling (ENM) and species distribution modeling (SDM) visualization outputs.

Temporal structure

Subdirectories correspond to specific climatic intervals, including:

Anthropocene (1979–2013)
Meghalayan, Northgrippian, Greenlandian
Younger Dryas, Bølling–Allerød, Heinrich Stadial 1
Last Glacial Maximum, Last Interglacial
Marine Isotope Stage 19
Pliocene

File types

.pdf only — finalized suitability maps and visualization plots

All ENM/SDM results are provided as static, publication-ready figures.

4.3 ITS_variation_and_splitstree.zip

This directory documents analyses of ITS sequence variation and phylogenetic network structure. Files can be directly used in conjunction with alignment software such as mafft, phylogenetic analysis software such as iqtree, or tree files (.treefile) can be vizualized directly in tree viewers such as figtree.

ITS_variation/

Raw ITS sequences (.fasta)
ITSx-partitioned ITS regions (.fasta)
Multiple sequence alignments (.mafft.fasta)
Concatenated alignments and partition files
Phylogenetic tree files (.treefile, .contree, .iqtree)

These files support evaluation of intra- and interspecific ITS variation.

SplitsTree/

Trimmed ITS fasta and alignment files (.mafft.fasta)used as input to SplitsTree
Network analysis output and intermdiate files as used generated using SplitsTree App.

These data allow reconstruction or reinterpretation of phylogenetic networks perfomed using the program SplitsTree App.

4.4 RBB_analysis.zip

This directory contains analyses of the psilocybin biosynthetic gene cluster.

Exonerate/

Genome assemblies (.fasta)
FASTA files for PsiD, PsiH, PsiK, PsiM, and PsiR extracted using Exonerate
Extraction used the format flag:
--ryo "%tcs\n"
Query sequences derived from NCBI assembly GCA_017499595.2

RBB/

Augustus-predicted gene models (.gff)
Reciprocal-best BLAST result files
FASTA files for PsiD, PsiH, PsiK, PsiM, PsiP, PsiR, PsiT1, and PsiT2 for each specimen.

These files support inference of orthology and biosynthetic cluster structure.

4.5 Supplementary_data.zip

This directory contains tabular datasets used throughout the study.

Missing data and empty cells

Empty cells represent missing or inapplicable data, not zero values
Missing values arise from unrecorded data from voucher specimens, or non-applicable variables
Users performing automated analyses should treat empty cells as NA in R statistical software packages.

Subdirectories

Psilocybe_ochraceocentrata_tables.xlsx

.xlsx workbooks containing multiple worksheets
Each worksheet represents a distinct dataset (Voucher Table, SplitsTree Barcodes, Microscopic features, Genome asssembly stats)

Mycoportal_2024-09-26/

Raw fungal occurrence records downloaded from MycoPortal in .xlsx
all output files associated with accessing Mycoportal data including multimedia links to occurences when applicable (multimedia.csv), Total Occurence data (occurences.csv), indentifier information when applicable (identifications.csv), metadata associated with the specific myportal querry (meta.xml), and the ecologcial metadata language associated with the occurence data where applicable (eml.xml).
Fields include species name, locality, and geographic coordinates Empty cells represent missing or inapplicable data, not zero values.

P_cubensis_occurrences_for_ENM_and_SDM.xlsx

Filtered occurrence dataset used as ENM/SDM input
Empty cells represent missing or inapplicable data, not zero values
Coordinates in decimal degrees (WGS84)
Columns include:
- species
- latitude
- longitude
- source

4.6 Psilocybe_SDM-code.zip

Contains scripts used to perform ENM and SDM analyses. All analysis was perfromed in R with the bellow methodology:

To reconstruct the natural history and potential patterns of introduction and range limits, we used occurrence data and the commonly used 19 bioclimatic variables to build environmental niche models (ENMs) to estimate climatic suitability for P. cubensis. ENMs predict the suitability of local climate as a continuous variable from 0 (unsuitable) to 1 (predicted perfect suitability). ENMs were then used to construct species distribution models (SDMs) as a binary estimate of presence or absence. Geo-coordinates of P. cubensis occurrences used for modeling were pulled from MycoPortal entries, accessed in September of 2025, which includes data from Mushroom Observer (https://mushroomobserver.org/) and iNaturalist (https://www.inaturalist.org), but may not be fully populated from each repository. Samples were given unique identifiers in the dataset, which included their MycoPortal identifier and holding institution for vouchered specimens or noted as observations from iNaturalist or Mushroom Observer (MUOB). Data points were filtered to remove those without geo-coordinates, specimens from Africa, non-wild collections, and entries with specific mentions of samples being cultivated, confiscated by police, or those labeled as known cultivated strains of P. cubensis, reducing the overall dataset from 1,168 to 1,013 points** (Supplementary Data). Locations of all collections and observations were plotted (Figure 1) with ggmap v. 4.0.0.

Prior to constructing ENMs we reduced the occurrence dataset to contain a single observation per 10-arc-minute grid cell using the function gridSample of the dismo R package v1.3-16. Thinning was conducted to reduce spatial autocorrelation of observations and model over-fitting in highly populated regions. ENMs were constructed using the filtered dataset of geo-coordinates and the 19 bioclimatic variables at the most coarse resolution available, the 10 arc-minute (~20 km) resolution as environmental predictors. We chose this coarse resolution to account for potential inaccuracies in GPS collection information. Not all collections possess associated metadata related to geo-coordinate accuracy limiting our ability to prune data point based on this metric. Thus, we chose an environmental dataset resolution of ~20km²~~ which allows each datapoint to exhibit substantial variability. ENM modeling was performed using the SDM R package v1.2-46 [49], with which we tested six of the most common modeling algorithms ("bioclim", "domain.dismo", "glm", "gam", "rf", and "svm"), using 1000 random points as “absence” points for validation, which indicated that a Random Forest (RF) algorithm was the best-performing model. The effect of variable and course level geo-coordinates on the robustness of our modeling was investigated utilizing Pearson correlation between SDMs made with a random subset of sampling and of the total dataset (Supplementary Figure 3). To convert the continuous random-forest ENM into a binary SDM, we adopted the True-Skill Statistic (TSS) optimization approach to set a threshold of climatic suitability from which we considered a location as *P. cubensis* present or absent.

To estimate ranges through time and potential routes of introduction, we also constructed ENMs and SDMs for P. cubensis using paleo-climatic datasets for the 19 bioclimatic variables. Paleo-climatic datasets associated with 11 geological timespans: the Anthropocene (1979 – 2013), Meghalayan (4.2-0.3ka), Northgrippian (8.326-4.2ka), Greenlandian (11.7-8.326 ka), Younger Dryas Stadial (12.9-11.7ka), Bolling-Allerod (14.7-12.9ka), Heinrich Stadial 1 (17-14.7ka), Last Glacial Maxima (~21Ka),** Pleistocene, last interglacial (LIG) (~130KYA) [56], Pleistocene MIS19 (~787 KYA)[57], Pliocene (~3.3Mya) at 10M resolution accessed from paleoclim.org.

Scripts are derived from:
https://github.com/KeatPorcini/Psilocybe_SDM

5. Definitions and Abbreviations

ITS — Internal Transcribed Spacer
ENM — Ecological niche modeling
SDM — Species distribution modeling
RBB — Reciprocal-best BLAST
LGM — Last Glacial Maximum
MIS — Marine Isotope Stage

6. Data Sources

Public data were obtained from:

NCBI GenBank
NCBI Short Read Archive (SRA)
MycoPortal
Mushroom Observer
iNaturalist
PaleoClim
BioClim2

7. Notes on Reuse

Both final results and intermediate analysis files are provided to maximize transparency and reproducibility. Users may reuse alignments, trees, modeling outputs, or tabular data independently of the original study.