# Title of Dataset --- PacBio sequencing data of Coptotermes gestroi ## Description of the data and file structure For PacBio sequencing, genomic DNA was purified from 15 heads of workers using phenol/chloroform. DNA concentration was firstly pre-checked on NanoDrop (Thermo Fisher Scientific, USA) for purity, with the OD ratios measuring 1.8 and 2.0 for 260/280 and 260/230, respectively. For high-throughput sequencing, the DNA quantity was determined using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, USA). DNA integrity was profiled using the Femto Pulse system and the genomic DNA 165 kb Kit (Agilent, USA), showing a distribution at 5-50kb and a major distribution around 21kb. Shotgun library construction was conducted using SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, USA). Genomic DNA was then sheared using Megaruptor® (Diagenode, USA), and the SMRTbell adaptor-ligated library was subjected to gel size selection at 5-30kb (major at 10kb) using the BluePippin Size-Selection system (Sage Science, USA). SMRT sequencing was performed on Sequel system using Sequel Sequencing Kit 3.0 and SMRT Cell 1M V3 LR kit (Pacific Biosciences, USA) with 20-hour movie runs and 2-hr pre-extension. The polymerase reads were processed with instrument software SMRTlink 9.0 to generate subreads, and also to generate circular consensus reads (CCS) with parameter settings of minimum CCS read length at 10kb, minimum 3 full passes, and minimum predicted accuracy at 99%. Describe relationships between data files, missing data codes, other abbreviations used. Be as descriptive as possible. In the PacBio sequencing data of C. gestroi, 27,544 sequences were acquired, ranging from 91 bp to 20,975 bp, with a majority distributed between 5,000 bp and 10,000 bp. Of all the genomic reads searched, a total of 23,116 reads were detected with repeat motifs that could be used for the development of microsatellite markers. Distribution of the repeat motifs predominantly consisted of tetra nucleotide (45.1%) and tri nucleotide (33.6%); while the percentage for the remaining were di- (14.2%), mono- (4.31%), penta- (2.7%), and hexa- (0.06%), respectively. ## Sharing/Access information This is a section for linking to other ways to access the data, and for linking to sources the data is derived from, if any. Links to other publicly accessible locations of the data: None Data was derived from the following sources: None ## Code/Software This is an optional, freeform section for describing any code in your submission and the software used to run it. Describe any scripts, code, or notebooks (e.g., R, Python, Mathematica, MatLab) as well as the software versions (including loaded packages) that you used to run those files. If your repository contains more than one file whose relationship to other scripts is not obvious, provide information about the workflow that you used to run those scripts and notebooks.