Biocultural vulnerability of traditional crops in the Indian Trans Himalaya
Data files
Jul 11, 2025 version files 140.19 KB
-
2023_crop_data.csv
29.06 KB
-
biocultural_code_for_manuscript.R
77.71 KB
-
Pea_n18_NC_066579.covar
3.16 KB
-
Pea_n18_NC_066580.covar
3.17 KB
-
Pea_n18_NC_066581.covar
3.17 KB
-
Pea_n18_NC_066582.covar
3.15 KB
-
Pea_n18_NC_066583.covar
3.17 KB
-
Pea_n18_NC_066584.covar
3.17 KB
-
Pea_n18_NC_066585.covar
3.17 KB
-
peapca_supplement_genome.csv
2.36 KB
-
README.md
8.89 KB
Abstract
Traditional agricultural landscapes are vital reservoirs of biocultural heritage and agrobiodiversity, yet traditional farming systems and their unique crop landraces face increasing marginalization and genetic erosion. Using northwest Himalaya as a case study, we examine the ecological resilience and genetic diversity of an understudied traditional crop, black pea (scientific name unclear), alongside barley (Hordeum vulgare), and compare them to the introduced cash crop, green pea (Pisum sativum L.). Participatory field experiments with local farmers revealed that traditional crops outperform introduced varieties in survival and reproduction traits across sites. We generate the first whole-genome sequencing data for black peas. Clustering and nutritional analyses highlight black pea’s genetic richness and dietary potential. Our findings underscore the importance of integrating Traditional Ecological Knowledge with ecological science to sustain agrobiodiversity, enhance climate resilience, and promote sustainable food systems. We provide insights for global agri-food innovations and socio-ecological stability in fragile mountain ecosystems.
Overview of data
This dataset is for our research examining the ecological and genomic variation of traditional crop species- black pea, and barley, cultivated across Indian Trans-Himalaya. The field study was conducted in March-September 2023 to evaluate the growth performance of three crop species: green peas, black peas, and barley. The field experiment was carried at three sites characterized: low (4000m), mid (4200m), and high (4500m) elevations. The experiment was laid out in a randomized block design with a factorial arrangement, incorporating three factors: crop species, water treatments, and elevation levels. Each treatment combination was replicated three times, and thus a total of 81 plots were monitored. The seeds were sown uniformly across all plots and spacing was determined by interacting with farmers. The sowing density for green pea was lower than black pea seeds since black pea seeds are relatively smaller. The farmers also advised on sowing depth and bed preparation to ensure optimal growth conditions. The IRB eprotocol number is 71749.
We set up plots for green, black pea, and barley (1.5 by 1 meter) along a climate gradient in three villages (at elevations ~4000m, ~4200m, ~4500m) with three water treatments. We monitored the plants for photosynthetic, non-reproductive and reproductive traits such as, germination rate, flowering time, height at maturity, leaf coloration (disease infestation), and flower/pod production. Data on growth parameters (plant height, leaf area, and biomass) were collected at regular intervals (every 7-10 days). Yield components were assessed at the time of harvest. Data were analyzed using regression analysis to determine the effects of the crop type, water treatment, elevation, and their interactions. Post hoc tests were performed to locate differences between treatment means at a significance level of p<0.05. All statistical analyses were carried out using R software.
Additionally, black pea genotypes were analyzed using whole-genome sequencing and compared to published Pisum genomic datasets to understand population structure and diversity.
Files Included
2023_crop_data.csv
peapca_supplement_genome.csv
Pea_n18_NC_066579.covar- Covariance matrix for black pea chromosome NC_066579 (18x18).
Pea_n18_NC_066580.covar- Covariance matrix for black pea chromosome NC_066580 (18x18).
Pea_n18_NC_066581.covar- Covariance matrix for black pea chromosome NC_066581 (18x18).
Pea_n18_NC_066582.covar- Covariance matrix for black pea chromosome NC_066582 (18x18).
Pea_n18_NC_066583.covar- Covariance matrix for black pea chromosome NC_066583 (18x18).
Pea_n18_NC_066584.covar- Covariance matrix for black pea chromosome NC_066584 (18x18).
Pea_n18_NC_066585.covar- Covariance matrix for black pea chromosome NC_066585 (18x18).
biocultural_code_for_manuscript.R- R script for PCA and clustering on covariance matrices. Contains code to generate all figures in the manuscript.
Description of data files
File: 2023_crop_data.csv
Tabulates ecological data on plant growth and phenology across crop types and treatments as well as across the 3 sites for green pea, barley, and black pea. Each row corresponds to a crop plot defined by crop type, elevation (site), and replication. The variable names include:
Sno | Serial number for each Site.
Site | Village where the plot was located. There are four sites: Kibber, Kiamo, Tashigang, Thinam; each at a different elevation.
Helper_ID | The column is a copy of the Site column with numbers to coding and analysis ease.
Field no | Field identifier within the site.
Plot no | Plot identifier.
Crop | Crop species. There are three crops: green pea, black pea, or barley.
Water_treatment | There were three water treatments S1, S2, S3 with three replicates each. S1 corresponds to business as usual irrigation scheme as advised by the farmers. S2 refers to reduced irrigation and S3 corresponds to drought like conditions with least irrigation scheme for the crops. However, our water treatments are not used as a predictor variable since there were more rains in the summer season. The treatments were used as replicates and averaged.
Irrigation_times | Denotes the number of times the fields were watered throughout the growing season. The irrigation times and days were advised by the farmers. This ranges from 2 times for Black pea in Kiamo to 9 times for green pea in many sites depending on the water treatment as noted below.
Average leaves [X] | Mean number of leaves per plant at X days since sowing the seeds for respective crops.
Average height [X] (units cm) | Mean stem height in cm at X days after sowing.
Flowering plants [X] | Proportion of plants that had flowered by day X for that crop.
Plants pods [X] | Proportion of plants bearing pods by day X.
Survival | Proportion of seeds that produced sapling 30 days after sowing.
Average leaf length [X] days (units cm) | Average length of leaves X days after sowing.
Average leaf breadth [X] days (units cm) | Average breadth of leaves X days after sowing.
Average stem thickness [X] days (units cm) | Average stem thickness X days after sowing.
Average no of pods per plant [X] | Average no of pods X days after sowing.
Average pod height [X] (units cm) | Average pod height X days after sowing.
Average no of seeds per pod [X] | Average no of seeds per pod per plant X days after sowing.
Average weight of pod [X] (units grams) | Average weight of pod X days after sowing.
Average fresh weight above ground [X] days (units grams) | Average weight of plant above ground X days after sowing.
Average dry weight above ground [X] days (units grams) | Average dry weight of plant above ground X days after sowing.
Average dry weight below ground [X] days (units grams) | Average dry weight of plant below ground X days after sowing.
Total weight of pods [X] days (units grams) | Weight of pods X days after sowing.
Please note that there are empty cells for some columns in the .csv file. For the leaves, flowering and pods columns, the empty cells represent there were no leaves, flowers or pods in the initial months. The cells become non-empty as these traits begin to show up. There are also empty cells for the Barley crop column, since Barley is a grass and we did not count number of leaves for a grass species.
File: peapca_supplement_genome.csv
This file contains metadata on comparative genome accessions and clustering assignments.
Run No. | Identifier for sequencing run
Sample ID | Identifier for each accession
Accession No. | NCBI accession number
Original taxa | Reported taxonomic name
Country | Country of sample origin
Type | Domesticated or wild pea sample
ADMIXTURE Group (SNPs) | Cluster assignment (q > 0.6) from SNP-based ADMIXTURE at K=5
ADMIXTURE Group (SVs) | Cluster assignment from SV-based ADMIXTURE at K=5
Description of .covar Files
The .covar
files are numeric covariance matrices of dimension 18 × 18, generated from genomic variation among black pea accessions. Each file corresponds to one of the seven chromosomes: Pea_n18_NC_066579.covar -> Chromosome NC_066579, Pea_n18_NC_066580.covar -> Chromosome NC_066580... and so on. These files can be opened using:
- Text editors (e.g., Notepad)
- Spreadsheet programs (e.g., Microsoft Excel)
- R or Python (e.g., in R: as.matrix(read.table("filename.covar", header = FALSE))
)
Each row and column represents one of the 18 accessions, and the values indicate pairwise covariance.
Data Sources
Comparative genomic data were obtained from:
Yang, T. et al. (2022). "Improved pea reference genome and pan-genome highlight genomic features and evolutionary characteristics." Nature Genetics, 54(10), 1553–1563. https://doi.org/10.1038/s41588-022-01178-6
The newly sequenced black pea sample from this study has been deposited at NCBI under: BioSample accession: SAMN44380099
Software and Code
All code for PCA and clustering analyses have been shared in the script.
To run the script, you will need:
- R version 4.2.0 or higher
- R packages: RColorBrewer, tidyverse, reshape2, magick, ggbiplot, factoextra, ggpubr, ggfortify, cowplot, corrplot, ggpmisc, patchwork, hrbrthemes, Hmisc, RSpectra, plot.matrix, FactoMineR, scatterplot3d, kernlab
The script reads the .covar
files, performs PCA, and runs k-means, hierarchical, and spectral clustering. Outputs include 2D and 3D PCA plots and cluster visualizations.
The crop dataset was collected during March-Sep 2023 during the cropping season for survival and growth performance of the three different crops. The data was collated and compared on those performance metrics. The genetic dataset was collected from paper:
Tao Yang et al. “Improved pea reference genome and pan-genome highlight genomic features and evolutionary characteristics”. Nature genetics 54.10 (2022), pp. 1553–156
This was compared with our whole genome sequencing data. We want to note that the whole genome sequencing data generated for the black pea as part of this study has been deposited on NCBI - BioSample accession SAMN44380099.
The genetic analysis consists of the following files. The files are the covariance matrices for each of the 7 chromosomes for black peas. They can be read in R using the read.table() function with the header as False since the file is a matrix with 18 rows and 18 columns.
Pea_n18_NC_066579.covar
Pea_n18_NC_066580.covar
Pea_n18_NC_066581.covar
Pea_n18_NC_066582.covar
Pea_n18_NC_066583.covar
Pea_n18_NC_066584.covar
Pea_n18_NC_066585.covar
In addition, the file peapca_supplement_genome.csv provides SRR and accension details for the Pisum species and subspecies used to compare our black pea sample with. The data was collected from Tao Yang 2022 paper.
The ecological analysis can be performed using 2023_crop_data. The csv file is read in R using read.csv function. The columns correspond to flowering (proportion of flowering plants), survival (proportion of seeds germinating), leaf (length in cm), stem height (length in cm), number of pods after x days, where x is added to the column names. The rows correspond to the values for the type of crop (green pea, black pea, barley), according to site (Kibber, Kiamo, Tashigang, and Thinam) as well as the replicates.