Plasma multiplex cytokine and phenotyping data in pediatric asthma
Data files
May 28, 2026 version files 194.26 KB
-
analyze_repository.py
11.09 KB
-
cytokines_cohort_2025.csv
42.77 KB
-
cytokines_cohort_2026.csv
70.20 KB
-
data_dictionary.csv
5.33 KB
-
dataset_metadata.json
9.21 KB
-
LICENSE.txt
755 B
-
participants.csv
41.62 KB
-
README.md
13.27 KB
Abstract
The dataset documents plasma cytokine signatures and matched clinical phenotyping for 222 children aged 6–18 years hospitalised with bronchial asthma at a single tertiary pulmonology centre. The data permit independent replication of the analyses reported in the accompanying journal submission and are intended for reuse in (i) meta-analyses of paediatric T2 and non-T2 asthma cytokine profiles, (ii) methodological work on cross-panel multiplex calibration drift and z-score normalisation, and (iii) development or external validation of paediatric atopic-asthma phenotyping algorithms. This is a cross-sectional observational design with two non-overlapping recruitment windows, each corresponding to a single multiplex assay batch. Cohort 2025 (n = 121) was profiled with the standard 17-plex MagPix panel (Bio-Rad). Cohort 2026 (n = 101) was profiled with the MILLIPLEX Human High-Sensitivity T-cell 21-plex panel (lot HT17MG-14K-PX25). Fifteen analytes are common to both panels.
1. Data description
Authors and affiliations.
S. Yu. Tereshchenko (corresponding author). Scientific Research Institute of Medical Problems of the North, Federal Research Center "Krasnoyarsk Science Centre" SB RAS, Krasnoyarsk, Russian Federation.
Funding.
This study was carried out within the framework of the exploratory scientific research ‘Development of technologies for assessing risk factors and markers of uncontrolled course of bronchial asthma in children’ (№ 1024070500020-3-3.2.3), carried out at the Research Institute of Medical Problems of the North FRC KSC SB RAS in 2024-2026.
Geographic location of data collection.
Krasnoyarsk, Russian Federation.
Time period of data collection.
December 2024 – April 2026.
Ethical approval and consent.
The study was approved by the local institutional ethics committee. Written informed consent was obtained from a parent or legal guardian for every participant. Full IRB reference is provided in the Methods section of the associated article.
Background and purpose of the data.
The dataset documents plasma cytokine signatures and matched clinical phenotyping for 222 children aged 6–18 years hospitalised with bronchial asthma at a single tertiary pulmonology centre. The data permit independent replication of the analyses reported in the accompanying journal submission and are intended for reuse in (i) meta-analyses of paediatric T2 and non-T2 asthma cytokine profiles, (ii) methodological work on cross-panel multiplex calibration drift and z-score normalisation, and (iii) development or external validation of paediatric atopic-asthma phenotyping algorithms.
Study design. Cross-sectional observational design with two non-overlapping recruitment windows, each corresponding to a single multiplex assay batch. Cohort 2025 (n = 121) was profiled with the Bio-Plex Pro Human Cytokine 17-plex Assay (Bio-Rad Laboratories, Hercules, CA, USA; catalog # M5000031YV). Cohort 2026 (n = 101) was profiled with the MILLIPLEX MAP Human High Sensitivity T-Cell 21-plex panel (Merck Millipore, Darmstadt, Germany; catalog # HT17MG-14K-PX25). Both assays were run on a Bio-Plex MAGPIX Multiplex Reader (Bio-Rad Laboratories) according to the respective manufacturer's instructions. Fifteen analytes are common to both panels.
Methods (summary).
Venous plasma was sampled at hospital admission, prior to any in-hospital systemic corticosteroid administration, and stored at −80 °C until analysis. Values below the lower limit of detection (LOD) were imputed as LOD/√2 (de Jager et al. 2009) and flagged in the column LOD_censored. Within-cohort z-scores (mean 0, SD 1 per cytokine per cohort) are provided in z_score_within_cohort and are the recommended unit for any pooled cross-cohort analysis, because the two cohorts were profiled on different panels and on different days and consequently exhibit substantial lot-to-lot calibration drift in the raw pg/mL values. Atopy phenotype: all three of (i) any atopic comorbidity, (ii) total IgE > 200 IU/mL, (iii) skin-prick test positive for mite, pollen, or animal dander. Eosinophilic phenotype: blood eosinophils ≥ 470 cells/µL (Maison et al. 2022 age-specific 90th-percentile cutoff). Four-cell phenotype: cross-tabulation of the two axes.
Reuse considerations and known limitations.
Recommended unit for pooled analyses is z_score_within_cohort, not value_pg_per_mL. Pooled analyses on raw concentrations will exhibit substantial inter-cohort offset that is not of biological origin.
Specific IgE was not measured; the aeroallergen criterion relies on skin-prick testing alone.
Viral exacerbation is a clinical, not virological, label.
Pre-hospital corticosteroids may have been administered to a fraction of children by emergency-medical services en route to admission. Exacerbation-related signals should be interpreted accordingly.
Single-centre cohort; external replication on independent cohorts is the natural next step.
Anonymisation.
Data are fully anonymised in accordance with the HIPAA Safe Harbor standard (45 CFR §164.514) and the EU GDPR Art. 4(5). All eighteen direct identifiers were removed. Calendar dates are replaced by the single relative-time variable storage_days_minus80. Participant identifiers (P001–P222) are an irreversible random permutation; no look-up table is retained that could be used to reverse the mapping. The deposit contains at most three indirect identifiers as required by Dryad's human-subjects guidance.
2. Files and variables
File inventory.
| File | Format | Rows × cols | Description |
|---|---|---|---|
README.md |
Markdown / UTF-8 | — | This file. |
participants.csv |
CSV / UTF-8 (RFC 4180) | 222 × 35 | One row per participant. Demographics, asthma clinical fields, comorbidities, IgE, eosinophils, skin-prick test, derived phenotype assignments. |
cytokines_cohort_2025.csv |
CSV / UTF-8 (RFC 4180) | 1331 × 6 | Long-format cytokine measurements for cohort 2025 (17-plex MagPix). |
cytokines_cohort_2026.csv |
CSV / UTF-8 (RFC 4180) | 2100 × 6 | Long-format cytokine measurements for cohort 2026 (MILLIPLEX HSTC 21-plex). |
data_dictionary.csv |
CSV / UTF-8 | 41 × 6 | Variable, type, unit, allowed values, and description for every column in every data file. |
dataset_metadata.json |
JSON / UTF-8 | — | DataCite Schema 4.4 metadata: title, creators, descriptions, methods, technical info, rights (CC0), formats, sizes, reuse potential, ethical considerations. |
analyze_repository.py |
Python 3 | — | Self-contained replication script. Reads only the deposited CSVs. |
LICENSE.txt |
Text / UTF-8 | — | CC0 1.0 Public Domain Dedication. |
Variables — complete definitions in data_dictionary.csv. The data dictionary specifies the variable name, type, unit, allowed values (or admissible range), and a short description for every column in every data file. The most analytically important derived variables are summarised below.
Cytokine files (cytokines_cohort_2025.csv, cytokines_cohort_2026.csv). Columns:
participant_id— foreign key toparticipants.csv.cohort—2025or2026; matches the cohort inparticipants.csv.cytokine— analyte name (controlled list per panel; 17 analytes for cohort 2025, 21 for cohort 2026, 15 common).value_pg_per_mL— post-imputation plasma concentration in pg/mL. For non-censored samples this is the raw assay output; for censored samples it equals LOD/√2.z_score_within_cohort— z-score within the cohort, mean 0 and SD 1 per cytokine per cohort. Recommended unit for any pooled cross-cohort analysis.LOD_censored— 1 if the raw value was reported as below the lower limit of detection (and therefore imputed); 0 otherwise.
Atopy phenotype (atopy_phenotype_strict_3of3 in participants.csv):
1— strict-atopic (n = 76).0— non-strict-atopic (n = 73).NA— at least one input missing (n = 73; most commonly because the skin-prick test was not performed).
Eosinophilic phenotype (eosinophilic_phenotype_above_470 in participants.csv):
1— blood eosinophils ≥ 470 cells/µL (n = 48).0— < 470 cells/µL (n = 170).NA— eosinophil count missing (n = 4).
Four-cell phenotype (phenotype_4cell):
T2-high (atopy = 1 ∧ eos = 1), Atopy-only (atopy = 1 ∧ eos = 0), Eos-only (atopy = 0 ∧ eos = 1), T2-low (atopy = 0 ∧ eos = 0), or Unclassified if any input is missing.
Missing-data convention. Empty cells are written as NA in all CSV files. A missing input to a derived variable propagates to NA in the derived column. No imputation is performed at deposit other than the cytokine LOD/√2 substitution flagged in LOD_censored.
3. Code and software
Replication script. analyze_repository.py is included in the deposit. It reads only the three deposited CSVs and reproduces every primary number reported in the accompanying article: cohort and phenotype counts, the four-cell cross-tabulation, the Mann-Whitney U test of IL-7 between strict-atopic and non-strict-atopic participants, the nested ordinary-least-squares adjustment chain with HC3 robust standard errors, the atopy × clinical-phase interaction test in the fully adjusted model, and the per-cohort directional-consistency check.
Software versions used in the original analyses.
Python 3.10 / 3.12; pandas 2.x; numpy 1.26+; scipy 1.11+; statsmodels 0.14+; matplotlib 3.8+.
Dependencies of the replication script.
pandas, numpy, scipy, statsmodels. No specialised file readers — only the standard CSV reader is used. No proprietary or paid software is needed at any point.
How to run.
# Requires Python ≥ 3.9.
pip install pandas numpy scipy statsmodels
python analyze_repository.py
Expected output (key values). Manuscript ↔ replication:
| Reported in the article | Replication output |
|---|---|
| Cohort 2025 / 2026 sizes 121 / 101 | 121 / 101 |
| Atopy = 1 / 0 / NA: 76 / 73 / 73 | 76 / 73 / 73 |
| Eos = 1 / 0 / NA: 48 / 170 / 4 | 48 / 170 / 4 |
| 4-cell: T2-high / Atopy-only / Eos-only / T2-low / Unclassified = 21 / 53 / 12 / 60 / 76 | 21 / 53 / 12 / 60 / 76 |
| Mann-Whitney IL-7 atopy vs non, p ≈ 0.013, r ≈ −0.24 | p = 0.0133, r = −0.236 |
| OLS univariate β ≈ +0.51, p ≈ 0.005, N = 148 | +0.509, p = 0.0045, N = 148 |
| OLS + age + sex + ICS + cohort β ≈ +0.48, p ≈ 0.005, N = 148 | +0.481, p = 0.0051, N = 148 |
| OLS + ACT + phase + storage β ≈ +0.42, p ≈ 0.012, N = 147 | +0.424, p = 0.0119, N = 147 |
| OLS full (+ severity + exac) β ≈ +0.40, p ≈ 0.025, N = 146 | +0.399, p = 0.0251, N = 146 |
| Storage-days coef in full model β ≈ −0.005 SD/day, p ≈ 0.032 | −0.0049, p = 0.0317 |
| Atopy × phase interaction β ≈ +0.26, p ≈ 0.55 | +0.261, p = 0.554 |
| Per-cohort 2025 / 2026 p ≈ 0.084 / 0.078 | 0.0839 / 0.0779 |
Any small numerical drift (≤ 0.001 in coefficients) reflects rounding of the deposited z-scores to six decimal places and is not interpretable.
4. Access information
License. Creative Commons CC0 1.0 Universal Public Domain Dedication (LICENSE.txt). The data are released into the public domain. Re-users are kindly asked, but not required, to cite the originating publication.
Suggested citation for the dataset.
Tereshchenko, S. Yu. (2026). Plasma multiplex cytokine and phenotyping data in pediatric asthma. [Dataset]. Dryad. https://doi.org/10.5061/dryad.z08kprrw5.
Related works. A link to the article DOI will be added under relatedIdentifiers in dataset_metadata.json upon acceptance.
Contact and questions. Address curation or scientific questions to the corresponding author. Contact details and the canonical Dryad DOI will be available on the public dataset page.
Sources of metadata and controlled vocabularies.
DataCite Schema 4.4 (dataset_metadata.json); OECD Fields of Science and Technology classification for the research domain; ROR for institutional affiliations; ORCID for personal identifiers; SPDX license identifier CC0-1.0.
Change log.
- v1.0 (2026-05-17) — initial deposit accompanying journal submission.
Human subjects data
Data are fully anonymised in accordance with the HIPAA Safe Harbor standard (45 CFR §164.514) and the EU GDPR Art. 4(5). All eighteen direct identifiers were removed. Calendar dates are replaced by the single relative-time variable storage_days_minus80. Participant identifiers (P001–P222) are an irreversible random permutation; no look-up table is retained that could be used to reverse the mapping. The deposit contains at most three indirect identifiers as required by Dryad's human-subjects guidance.
Venous plasma was sampled at hospital admission, prior to any in-hospital systemic corticosteroid administration, and stored at −80 °C until analysis. Values below the lower limit of detection (LOD) were imputed as LOD/√2 (de Jager et al. 2009) and flagged in the column LOD_censored. Within-cohort z-scores (mean 0, SD 1 per cytokine per cohort) are provided in z_score_within_cohort and are the recommended unit for any pooled cross-cohort analysis, because the two cohorts were profiled on different panels and on different days and consequently exhibit substantial lot-to-lot calibration drift in the raw pg/mL values. Atopy phenotype: all three of (i) any atopic comorbidity, (ii) total IgE > 200 IU/mL, (iii) skin-prick test positive for mite, pollen, or animal dander. Eosinophilic phenotype: blood eosinophils ≥ 470 cells/µL (Maison et al. 2022 age-specific 90th-percentile cutoff). Four-cell phenotype: cross-tabulation of the two axes.
