Data from: Biomarker detection and validation for corneal involvement in patients with acute infectious conjunctivitis: A multi-country study

Seitzman, Gerami1; Lalitha, Prajna2; Prajna, N. Venkatesh2; Sansanayudh, Wiwan3; Satitpitakul, Vannarut4; Laovirojjanakul, Wipada5; Chen, Cindi1; Zhong, Lina1; Ruder, Kevin1; Redd, Travis6; Deiner, Michael1; Porco, Travis1; McLeod, Stephen1; Lietman, Thomas1; Hinterwirth, Armin 1 ; Doan, Thuy1

Published Oct 18, 2025 on Dryad. https://doi.org/10.5061/dryad.4j0zpc8mm

Data files

Oct 18, 2025 version files 8.98 MB

biomarker_detection_count_data_md5-9b4b21f9.csv

8.58 MB
biomarker_detection_sample_table_md5-7bd7fef2.csv

2.58 KB
ENSG_ID2Name.txt.zip

395.48 KB
README.md

2.89 KB

Abstract

This dataset accompanies the study “Biomarker Detection and Validation for Corneal Involvement in Patients With Acute Infectious Conjunctivitis,” published in JAMA Ophthalmology (doi:10.1001/jamaophthalmol.2024.2891). The study utilized transcriptomic data and machine learning approaches to identify biomarkers associated with corneal involvement in conjunctivitis patients, with apolipoprotein E (APOE) emerging as a key biomarker. The dataset includes raw transcriptomic counts, sample metadata, and gene mapping files, enabling replication and further exploration of the findings.

Ethical considerations have been addressed, with all patient data anonymized and deidentified to protect privacy.

https://doi.org/10.5061/dryad.4j0zpc8mm

This dataset contains the gene count data used for finding biomarkers predicting corneal involvement in patients with acute infectious conjuntivitis, as described in the paper:

Seitzman GD, Prajna L, Prajna NV, et al. Biomarker Detection and Validation for Corneal Involvement in Patients With Acute Infectious Conjunctivitis. JAMA Ophthalmol. 2024;142(9):865–871. doi:10.1001/jamaophthalmol.2024.2891

List of files

biomarker_detection_count_data_md5-9b4b21f9.csv
biomarker_detection_sample_table_md5-7bd7fef2.csv
ENSG_ID2Name.txt.zip

Description of the data and file structure

Below is a brief description of each data file.

`biomarker_detection_count_data_DESeq2norm_md5-9b4b21f9.csv`

The CSV file biomarker_detection_count_data_DESeq2norm_md5-9b4b21f9.csv contains counts for human genes found in 58 conjunctival samples used in the study. RNA-Seq data was generated on an Illumina NovaSeq 6000 sequencing machine at the UCSF sequencing center. Sequencing reads were quality filtered using PriceSeqFilter, and aligned to the GRCh38 human genome assembly using HISAT2 (version 2.1.0). Abundance of genes was calculated using the default parameters in stringtie2 (version 1.3.4d). Annotation of transcripts was based on ENSEMBL GRCh38.87. The attached gene count matrix was then generated using the "prepDE.py" script according to the protocol found in the stringtie2 documentation.

Rows: counts for 58,302 genes (identified by ENSG gene_id)
Columns: one for each of 58 samples

`biomarker_detection_sample_table_md5-7bd7fef2.csv`

The CSV file biomarker_detection_sample_table_md5-7bd7fef2.csv contains metadata about the samples used, including the number of input reads for normalzing the counts to reads/million. It contains the following columns:

Sample: unique, anonymized ID for each sample
N_reads: number of sequencing read pairs
Country: country of origin for each sample
DESEq: part of training set used for DESeq2 dimensionality reduction
Machine Learning: included in ML (all true, redundant column)
RT-qPCR Validation: Real-time quantitative PCR was performed on this sample (Yes / No)
Corneal Involvement: corneal involvement clinically detected (1 yes, 0 no)
Sex: sex of patient
Age: age of patient in years

ENSG_ID2Name.txt.zip

ENSG_ID2Name.txt.zip is a zip-compressed text file containing the mapping of ENSG IDs to gene names as they were used in the study.

Code/Software

Software used for determining gene counts:

PRICE Sequence Filter (version 1.2)
HISAT2 (version 2.1.0)
stringtie2 (version 1.3.4d), and stringtie2's prepDE.py Python script

Data from: Biomarker detection and validation for corneal involvement in patients with acute infectious conjunctivitis: A multi-country study

Data files

Abstract

README: Data from: Biomarker detection and validation for corneal involvement in patients with acute infectious conjunctivitis: A multi-country study

List of files

Description of the data and file structure

biomarker_detection_count_data_DESeq2norm_md5-9b4b21f9.csv

biomarker_detection_sample_table_md5-7bd7fef2.csv

ENSG_ID2Name.txt.zip

Code/Software

Works referencing this dataset

`biomarker_detection_count_data_DESeq2norm_md5-9b4b21f9.csv`

`biomarker_detection_sample_table_md5-7bd7fef2.csv`