Antigen-Independent, Autonomous B-cell Receptor Signalling in Diffuse Large B-cell Lymphoma
Data files
Aug 14, 2025 version files 118.99 GB
-
1_5252_1.fastq.gz
3.88 GB
-
1_5252_2.fastq.gz
3.98 GB
-
10_3567_1.fastq.gz
3.93 GB
-
10_3567_2.fastq.gz
4.06 GB
-
11_5244_1.fastq.gz
3.79 GB
-
11_5244_2.fastq.gz
3.94 GB
-
12_3850_1.fastq.gz
3.81 GB
-
12_3850_2.fastq.gz
3.93 GB
-
13_3844_1.fastq.gz
3.80 GB
-
13_3844_2.fastq.gz
3.93 GB
-
14_3872_1.fastq.gz
3.82 GB
-
14_3872_2.fastq.gz
3.94 GB
-
15_208_1.fastq.gz
3.89 GB
-
15_208_2.fastq.gz
4 GB
-
16_4328_1.fastq.gz
4.25 GB
-
16_4328_2.fastq.gz
4.38 GB
-
17_4609_II_1.fastq.gz
3.82 GB
-
17_4609_II_2.fastq.gz
3.91 GB
-
3_3267_1.fastq.gz
3.79 GB
-
3_3267_2.fastq.gz
3.92 GB
-
4_3752_1.fastq.gz
3.81 GB
-
4_3752_2.fastq.gz
3.94 GB
-
6_4760_1.fastq.gz
4.39 GB
-
6_4760_2.fastq.gz
4.52 GB
-
7_3882_1.fastq.gz
3.81 GB
-
7_3882_2.fastq.gz
3.94 GB
-
8_4391_1.fastq.gz
3.92 GB
-
8_4391_2.fastq.gz
4.03 GB
-
9_2997_1.fastq.gz
3.90 GB
-
9_2997_2.fastq.gz
3.96 GB
-
README.md
5.12 KB
Abstract
Diffuse large B-cell lymphoma (DLBCL) comprises two major cell-of-origin subtypes, germinal center B-cell (GCB) type and activated B-cell (ABC) type. ABC-DLBCL is characterized by chronic active B-cell receptor (BCR) signalling and NFκB activation, which is explained by activating mutations of the BCR signalling cascade in a minority of cases. We here demonstrate that autonomous BCR signalling, akin to its essential pathogenetic role in chronic lymphocytic leukemia (CLL), can explain chronic active BCR signalling in DLBCL. We show that 13 of 18 tested DLBCL-derived BCR induced spontaneous calcium flux in murine triple knock-out pre-B cells10 in the absence of antigenic stimulation or external BCR crosslinking. Autonomous BCR signalling was associated with IgM isotype, dependent on somatic BCR mutations, and largely restricted to non-GCB DLBCL. Autonomous BCR signaling represents a novel immunological driver mechanism originating from individual BCR sequences and adds a new dimension to currently proposed genetics- and transcriptomics-based DLBCL classifications.
https://doi.org/10.5061/dryad.612jm647m
This dataset contains raw whole-exome sequencing (WES) data in paired-end FASTQ format for 15 patient samples diagnosed with diffuse large B-cell lymphoma (DLBCL). These data were generated to support downstream analyses, including variant calling, copy number variation (CNV) detection, structural variant identification, and molecular classification using the LymphGen algorithm.
Principal Investigator Contact Information
Name: Hendrik Veelken
Institution: Leiden University Medical Center
Email: J.H.Veelken@lumc.nl
Alternate Contact Information
Name: Cornelis van Bergen
Institution: Leiden University Medical Center
Email: c.a.m.van_bergen@lumc.nl
Description of the Data and File Structure
File Format and Structure
- File type: FASTQ (gzipped)
- Sequencing type: Paired-end
- Platform: Illumina HiSeq 2000
- Capture Kit: Agilent SureSelect Human All Exon V7
- Reference genome: GRCh38
- Average coverage: ~50×
Each sample is represented by two FASTQ files:
_1.fastq.gz: Forward read_2.fastq.gz: Reverse read
Files follow the naming format:
[SampleID][SampleCase][ReadPair].fastq.gz
SampleID: Numeric identifier for the sample (e.g., 1, 3, 6...)SampleCase: Case code used in the studyReadPair: Either1(forward) or2(reverse)
Example:
6_4760_1.fastq.gz = Sample 6, Case 4760, forward read
File List
| Sample ID | Forward Read File | Reverse Read File |
|---|---|---|
| 5252 | 1_5252_1.fastq.gz | 1_5252_2.fastq.gz |
| 3267 | 3_3267_1.fastq.gz | 3_3267_2.fastq.gz |
| 3752 | 4_3752_1.fastq.gz | 4_3752_2.fastq.gz |
| 4760 | 6_4760_1.fastq.gz | 6_4760_2.fastq.gz |
| 3882 | 7_3882_1.fastq.gz | 7_3882_2.fastq.gz |
| 4391 | 8_4391_1.fastq.gz | 8_4391_2.fastq.gz |
| 2997 | 9_2997_1.fastq.gz | 9_2997_2.fastq.gz |
| 3567 | 10_3567_1.fastq.gz | 10_3567_2.fastq.gz |
| 5244 | 11_5244_1.fastq.gz | 11_5244_2.fastq.gz |
| 3850 | 12_3850_1.fastq.gz | 12_3850_2.fastq.gz |
| 3844 | 13_3844_1.fastq.gz | 13_3844_2.fastq.gz |
| 3872 | 14_3872_1.fastq.gz | 14_3872_2.fastq.gz |
| 208 | 15_208_1.fastq.gz | 15_208_2.fastq.gz |
| 4328 | 16_4328_1.fastq.gz | 16_4328_2.fastq.gz |
| 4609_II | 17_4609_II_1.fastq.gz | 17_4609_II_2.fastq.gz |
Experimental Context
The data were generated and processed as follows:
- DNA Extraction: Genomic DNA was extracted from frozen tissue.
- WES Library Preparation: SureSelect Human All Exon V7 (Agilent) was used for capture.
- Sequencing Platform: Illumina HiSeq 2000.
- Read Processing:
- Alignment with BWA v0.7.17 to GRCh38.
- Post-processing with GATK v4.1.7.0 (duplicate marking, base recalibration).
- Variant calling with Strelka2 v2.9.10.
- Annotation using Ensembl-VEP v103, filtering by:
- NF-κB pathway genes (KEGG map04064)
- Frequent mutations in DLBCL (Chapuy et al., 2018)
- Clinical impact scores (CADD, SIFT, PolyPhen)
- Variants annotated as benign in ClinVar (version 202008) were excluded.
- Copy Number Variation:
- Detected using GATK4 CNV, with a panel of 24 normal samples.
- Structural Variants:
- Translocations involving MYC, BCL2, and BCL6 were identified via targeted locus capture-based sequencing.
Abbreviations
| Term | Definition |
|---|---|
| WES | Whole-Exome Sequencing |
| FASTQ | Format for storing raw sequencing reads |
| SNV | Single Nucleotide Variant |
| INDEL | Insertion or Deletion |
| CNV | Copy Number Variation |
| GATK | Genome Analysis Toolkit |
| VEP | Variant Effect Predictor |
| DLBCL | Diffuse Large B-Cell Lymphoma |
| CADD | Combined Annotation Dependent Depletion |
| SIFT | Sorting Intolerant From Tolerant |
| PolyPhen | Polymorphism Phenotyping |
Code/Software
The following tools and versions were used for data processing:
- Sarek Workflow v2.7 (nf-core/sarek)
- BWA v0.7.17 – read alignment
- GATK v4.1.7.0 – duplicate marking, base recalibration, CNV detection
- Strelka2 v2.9.10 – variant calling
- Ensembl-VEP v103 – variant annotation
- MultiQC v1.8 – quality control reporting
These tools were run in accordance with the best practice guidelines from the Broad Institute and the nf-core community.
Cell lines and biopsies
Fresh-frozen biopsies of histologically confirmed DLBCL samples were identified in the pathology archive at Leiden University Medical Center (LUMC). The study was approved by the Scientific Review Committee of the LUMC Dept of Hematology under an applicable waiver of consent by the LUMC Ethical Committee (B16.048).
Genetic analyses
Whole exome sequencing (WES) was performed on fragmented DNA with the SureSelect Human All Exon V7 kit (Agilent) capture on the HiSeq2000 (Illumina) platform to an average coverage of 50x. For the variant calling analysis, FASTQ files were processed using the Sarek workflow v2.7 and aligned to the human reference genome GRCh38 using the Burrows-Wheeler Algorithm (BWA) v0.7.17. 35,36 Duplicated mapped reads were marked, local realignment of regions flanking indels, and recalibration of base quality scores were performed to obtain more accurate bases according to the Genome Analysis ToolKit (GATK) best practices version v4.1.7.0. 37 Single-nucleotide variants (SNV) and short insertions and deletions (INDELS) were called using Strelka2 v2.9.10. 38 Only high confidence variants defined by quality scores (GQX) of at least 15 for SNV and 30 for INDELS were kept. The resulting variant call files were annotated by Ensembl-VEP (v103) with four filtering steps. Variants were filtered for the NFkB signalling pathway of the Kyoto Encyclopedia of Genes and Genomes (www.kegg.jp/entry/map04064) and the most frequently mutated genes in DLBCL. Thereafter, variants were filtered by consequences, i.e. frameshift, in-frame deletion, missense, missense variant & splice region variant, splice region variant & synonymous variant, synonymous, in-frame insertion, stop gained, stop lost, frameshift variant & stop lost, missense variant & splice region variant, and coding sequence variant. Finally, variants were annotated for predicted effects by CADD phred, SIFT, and POLYPHEN scores, and according to clinical impact. Benign variants annotated in ClinVar 202008 were discarded. Workflow quality control metrics were calculated and aggregated by MultiQC v1.8. 41 Recalibrated bam files from the variant calling workflow were used for the detection of copy number variations (CNV) by the somatic copy number variation workflow following the Broad’s recommended best practices using GATK4 CNV 37 with a panel of 24 normal tissue samples. The modelled segment files were filtered by genomic coordinates of the genes/regions of interest. Methods for processing the data: For whole exome sequencing (WES), DNA was amplified if necessary by isothermal alkaline genome amplification with Phi29 polymerase and random hexamer priming (REPLI-g kit; Qiagen). WES was performed on fragmented DNA with the SureSelect Human All Exon V7 kit (Agilent) capture on the HiSeq2000 (Illumina) platform to an average coverage of 50x.
- Eken, Janneke A.; Koning, Marvyn T.; Kupcova, Kristyna et al. (2024). Antigen-independent, autonomous B cell receptor signaling drives activated B cell DLBCL. Journal of Experimental Medicine. https://doi.org/10.1084/jem.20230941
