Co-development of gut microbial metabolism and visual neural circuitry over human infancy

Data files

Sep 23, 2025 version files 511.32 MB

allmeta.csv

118.42 KB
bonham_margolis_mbio_allunirefs.csv.gz

511.20 MB
README.md

4.62 KB

Abstract

Infancy is a time of elevated neuroplasticity supporting rapid brain and sensory development. The gut microbiome, also undergoing extensive developmental changes in early life, may influence brain development through metabolism of neuroactive compounds. Here, we leverage longitudinal data from 194 infants across the first 18 months of life to show that microbial genes encoding enzymes that metabolize molecules playing a key role in modulating early neuroplasticity are associated with visual cortical neurodevelopment, measured by the Visual-Evoked Potential (VEP). Neuroactive compounds included neurotransmitters GABA and glutamate, the amino acid tryptophan, and short-chain fatty acids involved in myelination, including acetate and butyrate. Microbial gene sets around 4 months of age were strongly associated with the VEP from around 9 to 14 months of age and showed more associations than concurrently measured gene sets, suggesting microbial metabolism in early life may affect subsequent neural plasticity and development.

Dataset DOI: 10.5061/dryad.rn8pk0pq2

Description of the data and file structure

Data collected for Codevelopment of gut microbial metabolism and visual neural circuitry over human infancy, to be published in mBio (https://doi.org/10.1128/mbio.00835-25, preprint link: https://www.medrxiv.org/content/10.1101/2024.07.24.24310884).

Files and variables

Subject metadata and VEP features

The allmeta.csv file contains subject- and sample-specific metadata for all subjects and visits included in the study. There is one header row and 487 data rows. Columns names and contents are described below. Cells where data is missing contain no text.

The following columns are included:

subject_id: type String - a unique identifier for the child participating in the study. N=
visit: type String - Visit number, one of "v1", "v2", or "v3". Visit 1 corresponds to ~3mo of age, visit 2 ~6mo, visit 3 ~12mo
age_vep_weeks: type Float64 - age in weeks when subjects underwent EEG evaluation.For visits in which both stool and VEP were collected, ages for each should be within 2 weeks of one another (months = weeks / 52 * 12)
stool_age: type Float64 - age in months when stool sample was collected. For visits in which both stool and VEP were collected, ages for each should be within 2 weeks of one another (weeks = months 12 * 52)
biospecimen: type String - unique ID for stool sample collected at given visit.
seqprep: type String - unique ID for preparation sent to sequencing facility.
S_well: Well ID for sequencing preparation (in 96 well plates)
peak_(amp|latency)_(N1|P1|N2): type - Float64 6 columns representing each extracted VEP feature (amplitude and latency) for each peak (N1, P1, and N2). Amplitudes are measured in millivolts (mV) and latencies are measures in milliseconds (ms).
n_trials: type Integer - number of VEP trials retained for a given visit.
cohort_*: type Bool - filtering columns for each cohort comparison. Values may be true or false. Cohorts "v1", "v2", and "v3" compare concurrently collected stool and VEP. Cohorts comparing stool from one visit to VEP from a different visit contain 2 visit identifiers, and end with _stool or _vep. For example, filtering on rows where cohort_v1v2_stool is true will select rows that correspond to visit 1 stool samples included when visit 1 stool is compared to visit 2 VEP. To get the corresponding rows for visit 2 VEP, you should filter on cohort_v1v2_vep. When filtering on any one cohort_* column, the subject column should contain all unique values.
peak_(amp|latency)_(P1|N2)_corrected: type - Float64 - values offset from the previous peak. P1_corrected is P1 - N1, and N2_corrected is N2 - P1.

Microbial genes

The bonham_margolis_mbio_allunirefs.csv.gz file contains a gzip-compressed CSV file with the abundance of UniRef90 values in each sample in long format. Contains one header row and 46,226,678 data rows. The following columns are included:

feature: type String - UniRef90 identifier for a given gene function.
abundance: type Float64 - abundance of a uniref in reads per kilobase per million reads (RPKM).
sample: corresponds to seqprep from the metadata table.

Taxonomic profiles

While not used in the main figures for this manuscript, taxonomic profiles for these samples can be found in the related dataset: https://doi.org/10.5061/dryad.dbrv15f9z

Code/software

All analysis code written in julia and generated figures can be found on github at https://github.com/Klepac-Ceraj-Lab/EEGMicrobiome/ and archived on github at https://doi.org/10.5281/zenodo.15723691.

Open source packages used for the analysis are documented in the Project.toml and Manifest.toml file found in the repository.

Human subjects data

Study procedures were offered in English or Xhosa depending on the language preference of the
mother. This study was approved by the relevant university Health Research Ethics Committees
(University of Cape Town study number: 666/2021). Informed consent was collected from mothers on
behalf of themselves and their infants.

All data is de-identified and does not contain any personally identifiable information.

Stool Samples

We collected stool and the visual evoked potential (VEP) in a longitudinal cohort of 194 children in South Africa during the first 18 months of life (visit 1, N = 119, age 3.7 ± 0.9 months, visit 2, N = 144, age 8.6 ± 1.5 months, and visit 3, N = 130, age 14.1 ± 1.0 months).

Stool samples (n = 315) were collected in the clinic by the research assistant directly
from the diaper, transferred to Zymo DNA/RNA ShieldTM Fecal collection Tubes (#R1101, Zymo Research Corp., Irvine, USA), and immediately frozen at −80 ˚C. Stool samples were not collected if the participant had taken antibiotics within the 2 weeks prior to sampling.

DNA extraction was performed at Medical Microbiology, University of Cape Town, South Africa, from stool samples collected in DNA/RNA Shield Fecal collection tube using the Zymo Research Fecal DNA MiniPrep kit (# D4300, Zymo Research Corp., Irvine, USA) following the manufacturer’s protocol. To assess the extraction process’s quality, ZymoBIOMICS Microbial Community Standards (#D6300 and #D6310, Zymo Research Corp., Irvine, USA) were incorporated and subjected to the same process as the stool samples. The DNA yield and purity were determined using the NanoDrop ND−1000 (Nanodrop Technologies Inc. Wilmington, USA).

Shotgun metagenomic sequencing was performed on all samples at the Integrated Microbiome Research Resource (IMR, Dalhousie University, NS, Canada). A pooled library (max 96 samples per run) was prepared using the Illumina Nextera Flex Kit for MiSeq and NextSeq from 1 ng of each sample. Samples were then pooled onto a plate and sequenced on the Illumina NextSeq 2000 platform using 150 + 150 bp paired-end P3 cells, generating 24 M million raw reads and 3.6 Gb of sequence per sample (43).

VEP

Electroencephalography (EEG) data were acquired from infants while they were seated in their caregiver’s lap in a dimly-lit, quiet room using a 128-channel high-density HydroCel Geodesic Sensor Net (EGI, Eugene, OR), amplified with a NetAmps 400 high-input amplifier, and recorded via an Electrical Geodesics, Inc. (EGI, Eugene, OR) system with a 1,000 Hz sampling rate. EEG data were online referenced to the vertex (channel Cz) using EGI Netstation software. Impedances were kept below 100 KΩ in accordance with the impedance capabilities of the high-impedance amplifiers. Geodesic Sensor Nets with modified tall pedestals designed to improve the inclusion of infants with thick/curly/tall hair were used as needed across participants (41). Shea Moisture leave-in castor oil conditioner was applied to hair across the scalp prior to net placement to improve both impedances and participant comfort (41). This leave-in conditioner contains insulating ingredients; hence, there is no risk of electrical bridging, and it has not been found to disrupt the EEG signal during testing (unpublished data). Conditioning hair in this way allows for nets to lay closer to the scalp for curly/coily hair types and makes for more comfortable net removal at the end of testing.

The visual-evoked potential (VEP) task was presented using Eprime 3.0 software (Psychology Software Tools, Pittsburgh, PA) on a Lenovo desktop computer with an external monitor 19.5 inches on the diagonal facing the infant (with a monitor approxi mately 65 cm away from the infant). A standard phase-reversal VEP was induced with a black and white checkerboard (1 × 1 cm squares within the board) stimulus that alternated presentation (black squares became white, white squares became black) every 500 ms for a total of 100 trials. Participants looking were monitored by video and by an assistant throughout data collection. If the participant looked away during the VEP task, the task was rerun.

VEP data were exported from native Netstation .mff format to .raw format and then pre-processed using the HAPPE + ER pipeline within HAPPE v3.3 software, an automated open-source EEG processing software validated for infant data (42). A subset of the 128 channels was selected for pre-processing that excluded the rim electrodes, as these are typically artifact-laden (channels excluded from pre-processing included in Table S1). The HAPPE pre-processing pipeline was run with user-selected specifications outlined in Table S1. Pre-processed VEP data were considered usable and moved forward to VEP extraction if HAPPE pre-processing ran successfully, at least 15 trials were retained following bad trial rejection, and at least one good channel was kept within the visual ROI. Note that channels marked badly during pre-processing had their data interpolated as part of standard preprocessing pipelines for ERPs (42).

Interpolated channels were included in analyses here as is typically done in develop mental samples, and given the low overall rates of interpolation present (e.g., an average of between 4 and 5 of 5 possible good channels in the region of interest were retained at each visit time point). Visual-evoked potentials (VEPs) VEP waveforms were extracted and quantified using the HAPPE + ER v3.3 GenerateERPs script (42). Electrodes in the occipital region were selected as a region of interest (i.e., E70, E71, E75, E76, and E83). The VEP waveform has three main components to be quantified: a negative N1 peak, a positive P1 peak, and a negative N2 peak. The windows for selecting the calculated features were based on preliminary visualizations of the waveforms at each visit, such that the selected windows would capture the most component peaks across all subjects. Due to normative maturation of the waveforms as infants age, one set of user-specified windows for calculating component features was used for visits 1 and 2, and another was used for visit 3. For visits 1 and 2, the window for calculating features for the N1 component was 40–100 ms, 75–175 ms for the P1 component, and 100–325 ms for the N2 component. For visit 3, the window for calculating features for the N1 component was 35–80 ms, 75–130 ms for the P1 component, and 100–275 ms for the N2 component. All VEPs were visually inspected to ensure that the automatically extracted values were correct and were adjusted if observable peaks occurred outside the automated window bounds. These visual checks ensure that peak amplitudes and latencies capture individual variability within and across visits. Participants were considered to have failed this visual inspection and were subsequently removed from the data set if their VEP did not produce three discernible peaks. HAPPE + ER parameters used in extracting the ERPs are summarized in Table S2. To correct for the potential influence of earlier components on later components, corrected amplitudes and latencies were calculated and used in all analyses. Specifically, the P1 amplitude was corrected for the N1 amplitude (corrected P1 amplitude = P1 N1 amplitude), the P1 latency was corrected for the N1 latency (corrected P1 latency = P1 N1 latency), the N2 amplitude was corrected for the P1 amplitude (corrected N2 amplitude = N2 P1 amplitude), and the N2 latency was corrected for the P1 latency (corrected N2 latency = N2 P1 latency).

VEP waveforms of the included participants by time point are included in Fig. 2A. Ninety-seven infants provided usable VEP data at visit 1, 130 infants provided usable VEP data at visit 2, and 131 infants provided usable VEP data at visit 3. For included participants, EEG data quality metrics are summarized in Table S3. t-tests for data quality metrics (i.e., number of trials collected, number of trials retained, number of channels
retained in the ROI, and Pearson’s r for data pre- vs. post-wavelet thresholding at 5, 8, 12, and 20 Hz) were run between each visit combination (i.e., visit 1 vs. visit 2, visit 1 vs. visit 3, and visit 2 vs. visit 3). For visits that differed in data quality, follow-up post hoc correlations were run for the data quality measure with each VEP feature at each visit in the t-test. In no case did the data quality metric relate to VEP features at multiple visits, making it highly unlikely the data quality difference contributed to results.