Skip to main content
Dryad

Data from: Deep learning reveals hidden diversity of Synechococcus in the coastal water of China: Novel clades and their ecological insights

Data files

Oct 21, 2025 version files 94.22 MB

Click names to download individual files

Abstract

Synechococcus is ubiquitous and diverse in marine environments and contributes significantly to primary productivity in the ocean. The genetic diversity of the genus Synechococcus has been extensively explored based on the 16S-23S rRNA internal transcribed spacer (ITS) region. However, accurate identification of Synechococcus ITS from large sequencing datasets is challenging due to the absence of a standardized taxonomy and ambiguous clade boundaries. To address these limitations, we developed Syn_Tool, a deep learning-based framework integrating a curated Synechococcus ITS database for sequence identification, classification, and novel clade discovery. Analyzing 1,087,323 ITS sequences from the coastal water of China—the largest Synechococcus dataset to date—Syn_Tool classified them into 42 clades, including 28 known and 14 newly defined clades. Biogeographic analyses revealed a latitudinal diversity gradient driven by temperature, with 12 newly defined clades (clades CSII-IV, CSVI-XIV) primarily found in estuarine regions where rapid diversification may promote the emergence of novel genotypes. This study demonstrates the application of deep learning in classifying Synechococcus and understanding their ecological roles in dynamic marine ecosystems.