Skip to main content

Improving the efficiency of single cell genome sequencing based on overlapping pooling strategy

Cite this dataset

Tu, Jing; Yang, Zengyan; Lu, Na; Lu, Zuhong (2022). Improving the efficiency of single cell genome sequencing based on overlapping pooling strategy [Dataset]. Dryad.


Single cell genome sequencing has become a useful tool in medicine and biology studies. However, an independent library is required for each cell in single cell genome sequencing, so that the cost grows in step with the number of cells. In this study, we report a study on efficient single-cell copy number variation (CNV) analysis based on overlapping pooling strategy together with branch and bound (B&B) algorithm. Single cells are overlapped pooled before sequencing, and later are assorted into specific types by estimating their CNV patterns by B&B algorithm. Instead of constructing libraries for each cell, a library is required only for each pool. As long as the number of pools is smaller than the cells, fewer libraries are needed, and a lower cost is spent. Through computer simulations, we overlapping pooled 80 cells into 40 and 27 pools and classified them into cell types based on CNV pattern. The results showed that 84% cells in 40 pools and 76.5% cells in 27 pools were correctly classified on average, while only half or one-third of the sequencing libraries are required. Combining with traditional approaches, our method is expected to significantly improve the efficiency of single cell genome sequencing.


The dataset contains the statistics of the sequencing data and the copy number profiles of the single cells. 

The single-cell sequencing data of 80 single cells from 7 tumor patients with Triple-Negative Breast Cancer (TNBC) were downloaded in FASTQ format from National Center for Biotechnology Information (NCBI) [15] under Sequence Read Archive (SRA) accessions SRP064210. 

Performed basic statistics on BAM files mapped to the human genome hg19. We followed the protocol put forward by Baslan et al. to obtain the copy number profile of single cells.

Usage notes

From the sequencing data statistics file, we can get the number of single cells used for analysis. The number of sequencing reads, mapping rate, genome coverage, and sequencing depth were also included.

From the copy number profiles file, we can know the copy number of every single cell under the current bin size.



National Natural Science Foundation of China, Award: 61971125

Government of Jiangsu Province, Award: 2019-SWYY-004