Modelling the potential distribution of African Wormwood (Artemisia afra) using machine learning algorithm-based approach (MaxEnt) in Sekhukhune District, South Africa
Data files
Jun 23, 2025 version files 10.15 KB
-
Occurrence_data.zip
1.19 KB
-
README.md
8.96 KB
Abstract
Artemisia afra Jacq. Ex Willd, commonly known as African wormwood, is a native medicinal plant that has been unsustainable harvested primarily for its leaves due to its medicinal properties. The unsustainable harvesting of this plant underscores the urgent need for conservation and management practices. This study, therefore, used the MaxEnt model of the potential distribution of A. afra. Location: Sekhukhune District Municipality, South Africa. We used 105 sampled records and 27 environmental variables to model the potential spatial distribution of A. afra using the MaxEnt modelling approach. The predictions were performed using current climatic and topographic conditions. A significant portion of the area, 54.46%, is highly suitable for the distribution of A. afra, with various suitability degrees. Precipitation contributed 33.6% to the suitability predictions, followed by NDVI, soil, and distance from rivers with 27.1%, 8.1%, and 5.7%, respectively. Artemisia afra is predicted to be persistent in mountainous areas and along riverbanks.
README
Title: Modelling the Potential Distribution of African Wormwood (Artemisia afra) Using a Machine Learning Algorithm-Based Approach (MaxEnt) in Sekhukhune District, South Africa
DOI: https://doi.org/10.5061/dryad.dz08kps71
Description of the Data and File Structure
This dataset contains georeferenced field occurrence records and raster-based environmental predictors used to model the potential distribution of Artemisia afra using the MaxEnt algorithm. Field data were collected in the Greater Sekhukhune District Municipality, Limpopo, South Africa. Environmental variables were sourced from publicly available datasets and processed to meet the requirements of MaxEnt.
File: Occurrence_Data.zip
Contains presence-only occurrence records for Artemisia afra used in model development and validation.
- Artemisia afra train.csv
- 70% of field-verified occurrence points (training set)
- Columns:
Latitude: Decimal degreesLongitude: Decimal degreesSpecies: Always listed asArtemisia afra
- Artemisia afra test.csv
- 30% of field-verified occurrence points (testing set)
- Same column structure as the training set
File: ENVIRONMENTAL_LAYERS.zip
Contains raster files of environmental predictors used in the MaxEnt modelling.
- Format: .asc (ESRI ASCII Raster)
- Projection: WGS 1984 UTM Zone 35S
- Resolution: 30 arc seconds (climatic layers), 30 meters (topographic/land layers)
| Variable Name | Units | Resolution | Source |
|---|---|---|---|
| Bio1: Annual Mean Temperature | °C | 30 arc seconds | WorldClim (https://www.worldclim.org/) |
| Bio2: Mean Diurnal Range | °C | 30 arc seconds | WorldClim |
| Bio3: Isothermality (BIO2/BIO7 × 100) | % | 30 arc seconds | WorldClim |
| Bio4: Temperature Seasonality | % | 30 arc seconds | WorldClim |
| Bio5: Max Temperature of Warmest Month | °C | 30 arc seconds | WorldClim |
| Bio6: Min Temperature of Coldest Month | °C | 30 arc seconds | WorldClim |
| Bio7: Temperature Annual Range | °C | 30 arc seconds | WorldClim |
| Bio8: Mean Temperature of Wettest Qtr | °C | 30 arc seconds | WorldClim |
| Bio9: Mean Temperature of Driest Qtr | °C | 30 arc seconds | WorldClim |
| Bio10: Mean Temp of Warmest Qtr | °C | 30 arc seconds | WorldClim |
| Bio11: Mean Temp of Coldest Qtr | °C | 30 arc seconds | WorldClim |
| Bio12: Annual Precipitation | mm | 30 arc seconds | WorldClim |
| Bio13: Precipitation of Wettest Period | mm | 30 arc seconds | WorldClim |
| Bio14: Precipitation of Driest Period | mm | 30 arc seconds | WorldClim |
| Bio15: Precipitation Seasonality (CV) | % | 30 arc seconds | WorldClim |
| Bio16: Precipitation of Wettest Qtr | mm | 30 arc seconds | WorldClim |
| Bio17: Precipitation of Driest Qtr | mm | 30 arc seconds | WorldClim |
| Bio18: Precipitation of Warmest Qtr | mm | 30 arc seconds | WorldClim |
| Bio19: Precipitation of Coldest Qtr | mm | 30 arc seconds | WorldClim |
| Elevation | meters | 30 m | SRTM DEM (https://code.earthengine.google.com/) |
| Slope | degrees | 30 m | Derived from DEM |
| Aspect | degrees | 30 m | Derived from DEM |
| Topographic Wetness Index | index | 30 m | Derived from DEM |
| Soil | - | 1:1 million | SOTER Database (https://www.isric.org/explore/soter) |
| Land Cover | - | 30 m | https://egis.environment.gov.za/gis_data_downloads |
| NDVI (Normalized Difference Vegetation Index) | - | 30 m | Sentinel-2 MSI (https://code.earthengine.google.com/) |
| Distance from Rivers | meters | 1:50,000 | https://www.dws.gov.za/iwqs/gis_data/river/ |
Additional Files and Subfolders:
.prjand.xmlfiles: Associated with selected raster layers for projection and metadataSelected variables: Contains the final 13 predictor layers used in model constructionmaxent.cache: Model-generated internal cache files created by MaxEnt during runs
Code/Software
1. MaxEnt (Maximum Entropy Modelling Software)
- Version: 3.4.1
- URL: https://biodiversityinformatics.amnh.org/open_source/maxent/
- Usage: Input: training/testing CSVs + environmental layers in
.ascformat. Output: suitability maps, AUC, response curves, jackknife plots.
2. ArcGIS 10.8 (Esri)
- Used for raster pre-processing (projection, resampling, masking, format conversion to
.asc)
3. Microsoft Excel / LibreOffice Calc
- Used for CSV inspection and editing
Access Information
Other publicly accessible locations of the data:
The dataset is exclusively hosted on Dryad: https://doi.org/10.5061/dryad.dz08kps71
Data was derived from the following sources:
| Source Name | Description | License |
|---|---|---|
| WorldClim v2.1 | Bioclimatic variables | Open for research/non-commercial use |
| SRTM (CGIAR-CSI) | Elevation, slope, aspect | Public domain |
| MODIS / Sentinel-2 | NDVI | NASA/Public use |
| SANLC (DFFE) | Land cover | Open access South Africa |
| SoilGrids / SOTER | Soil classification | ODbL license |
| HydroSHEDS / DWS | River data and proximity | Free for scientific use |
This README ensures complete transparency and usability for future users and reviewers.
Handheld GPS was used for data collection. The collected location data was cleaned and prepared using Excel.
