Discovery of the closest free-living relative of the domesticated “magic mushroom” Psilocybe cubensis in Africa
Data files
Jan 03, 2026 version files 468.78 MB
-
DNA_barcoding_phylogenetic_analysis.zip
325.39 MB
-
ENM_and_SDM.zip
19.52 MB
-
ITS_variation_and_splitstree.zip
1.95 MB
-
Psilocybe_SDM-code.zip
10.42 KB
-
RBB_analysis.zip
120.57 MB
-
README.md
12.06 KB
-
Supplementary_data.zip
1.32 MB
Abstract
The "magic mushroom" genus Psilocybe is globally distributed and has a hotspot of diversity in the temperate regions of the Americas, particularly in Mesoamerica. However, many undersampled regions of the world are known to have endemic species but lack historical sampling. Here, we describe a new species of Psilocybe from Zimbabwe, Psilocybe ochraceocentrata sp. nov., Using morphological features and multiple DNA barcode regions extracted from genomic data from type specimens across the Cubensae complex. We show that Psilocybe ochraceocentrata sp. nov., is the sister clade to Psilocybe cubensis, suggesting that the cubensae complex is more diverse than previously thought and further expands upon the hidden diversity of Psilocybe derived from Africa. The geographical origin of Psilocybe cubensis is currently unknown and heavily debated. Here, we perform molecular dating and ecological niche and species distribution modeling of Psilocybe ochraceocentrata and Psilocybe cubensis to refine their possible geographic origin.
Data for
Discovery of the closest free-living relative of the domesticated “magic mushroom” Psilocybe cubensis in Africa
DOI: https://doi.org/10.5061/dryad.5x69p8df2
1. Overview
This Dryad repository contains molecular, phylogenetic, genomic, and ecological niche modeling data supporting the discovery and evolutionary placement of the closest known free-living relative of Psilocybe cubensis from sub-Saharan Africa.
All files are organized by analysis type and are documented below such that they can be interpreted and reused without access to the associated manuscript.
The dataset includes:
- DNA barcode sequence alignments and phylogenetic inference outputs
- Molecular clock (BEAST) input files and MCMC results
- Analyses of the psilocybin biosynthetic gene cluster using reciprocal-best BLAST (RBB)
- ITS sequence variation and phylogenetic network analyses
- Ecological niche modeling (ENM) and species distribution modeling (SDM) visualization outputs
Intermediate files are retained to ensure analytical transparency and reproducibility.
2. External Data Accessions
- DNA barcode sequences: Deposited in NCBI GenBank
- Raw genomic sequencing data: Deposited in the NCBI Short Read Archive (SRA) under BioProject PRJNA1159811.
- Type-derived molecular data: Submitted to NCBI for RefSeq designation and curation
3. Repository Structure
This repository contains the following compressed directories:
DNA_barcoding_phylogenetic_analysis.zipENM_and_SDM.zipITS_variation_and_splitstree.zipRBB_analysis.zipSupplementary_data.zipPsilocybe_SDM-code.zip
Each directory is described below.
4. File and Directory Descriptions
4.1 DNA_barcoding_phylogenetic_analysis.zip
This directory contains multilocus DNA barcode alignments and phylogenetic inference outputs for species delimitation and evolutionary analyses.
Subdirectories and contents
BEAST/
- Multilocus sequence alignments (FASTA / NEXUS)
.xml— BEAST configuration files specifying models, priors, and MCMC settings.log— parameter log files recording posterior estimates across MCMC runs.state— BEAST state files allowing MCMC continuation or inspection.trees— posterior tree distributions sampled during MCMC- Final summarized time-calibrated tree (PDF)
EF1a/, ITS/, RPB1/, RPB2/
Each locus-specific directory contains output and intermediate files produce for each phylogenetic analysis, including all log and consensus trees:
- Sequence alignments (
.fasta,.nex) .iqtree— IQ-TREE output files including model selection and likelihood statistics.contree— consensus trees inferred from bootstrap or likelihood analyses.treefile/.tre— inferred phylogenetic trees
These files collectively allow re-analysis, model comparison, or extraction of trees and alignments for independent studies. Files can be directly used in conjunction with alignment software such as mafft, phylogenetic analysis software such as iqtree, or tree files (.treefile) can be vizualized directly in tree viewers such as figtree.
4.2 ENM_and_SDM.zip
This directory contains final ecological niche modeling (ENM) and species distribution modeling (SDM) visualization outputs.
Temporal structure
Subdirectories correspond to specific climatic intervals, including:
- Anthropocene (1979–2013)
- Meghalayan, Northgrippian, Greenlandian
- Younger Dryas, Bølling–Allerød, Heinrich Stadial 1
- Last Glacial Maximum, Last Interglacial
- Marine Isotope Stage 19
- Pliocene
File types
.pdfonly — finalized suitability maps and visualization plots
All ENM/SDM results are provided as static, publication-ready figures.
4.3 ITS_variation_and_splitstree.zip
This directory documents analyses of ITS sequence variation and phylogenetic network structure. Files can be directly used in conjunction with alignment software such as mafft, phylogenetic analysis software such as iqtree, or tree files (.treefile) can be vizualized directly in tree viewers such as figtree.
ITS_variation/
- Raw ITS sequences (
.fasta) - ITSx-partitioned ITS regions (
.fasta) - Multiple sequence alignments (
.mafft.fasta) - Concatenated alignments and partition files
- Phylogenetic tree files (
.treefile,.contree,.iqtree)
These files support evaluation of intra- and interspecific ITS variation.
SplitsTree/
- Trimmed ITS fasta and alignment files (
.mafft.fasta)used as input to SplitsTree - Network analysis output and intermdiate files as used generated using SplitsTree App.
These data allow reconstruction or reinterpretation of phylogenetic networks perfomed using the program SplitsTree App.
4.4 RBB_analysis.zip
This directory contains analyses of the psilocybin biosynthetic gene cluster.
Exonerate/
- Genome assemblies (
.fasta) - FASTA files for PsiD, PsiH, PsiK, PsiM, and PsiR extracted using Exonerate
- Extraction used the format flag:
--ryo "%tcs\n" - Query sequences derived from NCBI assembly GCA_017499595.2
RBB/
- Augustus-predicted gene models (
.gff) - Reciprocal-best BLAST result files
- FASTA files for PsiD, PsiH, PsiK, PsiM, PsiP, PsiR, PsiT1, and PsiT2 for each specimen.
These files support inference of orthology and biosynthetic cluster structure.
4.5 Supplementary_data.zip
This directory contains tabular datasets used throughout the study.
Missing data and empty cells
- Empty cells represent missing or inapplicable data, not zero values
- Missing values arise from unrecorded data from voucher specimens, or non-applicable variables
- Users performing automated analyses should treat empty cells as NA in R statistical software packages.
Subdirectories
Psilocybe_ochraceocentrata_tables.xlsx
.xlsxworkbooks containing multiple worksheets- Each worksheet represents a distinct dataset (Voucher Table, SplitsTree Barcodes, Microscopic features, Genome asssembly stats)
Mycoportal_2024-09-26/
- Raw fungal occurrence records downloaded from MycoPortal in
.xlsx - all output files associated with accessing Mycoportal data including multimedia links to occurences when applicable (multimedia.csv), Total Occurence data (occurences.csv), indentifier information when applicable (identifications.csv), metadata associated with the specific myportal querry (meta.xml), and the ecologcial metadata language associated with the occurence data where applicable (eml.xml).
- Fields include species name, locality, and geographic coordinates Empty cells represent missing or inapplicable data, not zero values.
P_cubensis_occurrences_for_ENM_and_SDM.xlsx
- Filtered occurrence dataset used as ENM/SDM input
- Empty cells represent missing or inapplicable data, not zero values
- Coordinates in decimal degrees (WGS84)
- Columns include:
specieslatitudelongitudesource
4.6 Psilocybe_SDM-code.zip
Contains scripts used to perform ENM and SDM analyses. All analysis was perfromed in R with the bellow methodology:
To reconstruct the natural history and potential patterns of introduction and range limits, we used occurrence data and the commonly used 19 bioclimatic variables to build environmental niche models (ENMs) to estimate climatic suitability for P. cubensis. ENMs predict the suitability of local climate as a continuous variable from 0 (unsuitable) to 1 (predicted perfect suitability). ENMs were then used to construct species distribution models (SDMs) as a binary estimate of presence or absence. Geo-coordinates of P. cubensis occurrences used for modeling were pulled from MycoPortal entries, accessed in September of 2025, which includes data from Mushroom Observer (https://mushroomobserver.org/) and iNaturalist (https://www.inaturalist.org), but may not be fully populated from each repository. Samples were given unique identifiers in the dataset, which included their MycoPortal identifier and holding institution for vouchered specimens or noted as observations from iNaturalist or Mushroom Observer (MUOB). Data points were filtered to remove those without geo-coordinates, specimens from Africa, non-wild collections, and entries with specific mentions of samples being cultivated, confiscated by police, or those labeled as known cultivated strains of P. cubensis, reducing the overall dataset from 1,168 to 1,013 points** (Supplementary Data). Locations of all collections and observations were plotted (Figure 1) with ggmap v. 4.0.0.
Prior to constructing ENMs we reduced the occurrence dataset to contain a single observation per 10-arc-minute grid cell using the function gridSample of the dismo R package v1.3-16. Thinning was conducted to reduce spatial autocorrelation of observations and model over-fitting in highly populated regions. ENMs were constructed using the filtered dataset of geo-coordinates and the 19 bioclimatic variables at the most coarse resolution available, the 10 arc-minute (~20 km) resolution as environmental predictors. We chose this coarse resolution to account for potential inaccuracies in GPS collection information. Not all collections possess associated metadata related to geo-coordinate accuracy limiting our ability to prune data point based on this metric. Thus, we chose an environmental dataset resolution of ~20km2~~ which allows each datapoint to exhibit substantial variability. ENM modeling was performed using the SDM R package v1.2-46 [49], with which we tested six of the most common modeling algorithms ("bioclim", "domain.dismo", "glm", "gam", "rf", and "svm"), using 1000 random points as “absence” points for validation, which indicated that a Random Forest (RF) algorithm was the best-performing model. The effect of variable and course level geo-coordinates on the robustness of our modeling was investigated utilizing Pearson correlation between SDMs made with a random subset of sampling and of the total dataset (Supplementary Figure 3). To convert the continuous random-forest ENM into a binary SDM, we adopted the True-Skill Statistic (TSS) optimization approach to set a threshold of climatic suitability from which we considered a location as *P. cubensis* present or absent.
To estimate ranges through time and potential routes of introduction, we also constructed ENMs and SDMs for P. cubensis using paleo-climatic datasets for the 19 bioclimatic variables. Paleo-climatic datasets associated with 11 geological timespans: the Anthropocene (1979 – 2013), Meghalayan (4.2-0.3ka), Northgrippian (8.326-4.2ka), Greenlandian (11.7-8.326 ka), Younger Dryas Stadial (12.9-11.7ka), Bolling-Allerod (14.7-12.9ka), Heinrich Stadial 1 (17-14.7ka), Last Glacial Maxima (~21Ka),** Pleistocene, last interglacial (LIG) (~130KYA) [56], Pleistocene MIS19 (~787 KYA)[57], Pliocene (~3.3Mya) at 10M resolution accessed from paleoclim.org.
Scripts are derived from:
https://github.com/KeatPorcini/Psilocybe_SDM
5. Definitions and Abbreviations
- ITS — Internal Transcribed Spacer
- ENM — Ecological niche modeling
- SDM — Species distribution modeling
- RBB — Reciprocal-best BLAST
- LGM — Last Glacial Maximum
- MIS — Marine Isotope Stage
6. Data Sources
Public data were obtained from:
- NCBI GenBank
- NCBI Short Read Archive (SRA)
- MycoPortal
- Mushroom Observer
- iNaturalist
- PaleoClim
- BioClim2
7. Notes on Reuse
Both final results and intermediate analysis files are provided to maximize transparency and reproducibility. Users may reuse alignments, trees, modeling outputs, or tabular data independently of the original study.
