Data from: Population genomics of endangered lenoks (Brachymystax spp.) in China reveals the presence of cryptic species
Data files
Feb 12, 2026 version files 1.15 GB
-
Lenok_MT.nexus
2.42 KB
-
Lenok_SNP.nexus
2.89 KB
-
Lenok.BEAST.fa
11.09 KB
-
Lenok.ILS_TREE.tar.gz
59.67 MB
-
Lenok.MT.fa
1.74 MB
-
Lenok.SNP.fa.gz
1.09 GB
-
README.md
6.62 KB
-
Supplemental_File.pdf
1.69 MB
-
Supplemental_Table_S1.csv
6.67 KB
-
Supplemental_Table_S4.csv
10.17 KB
-
Supplemental_Table_S5.csv
1.22 KB
-
Supplemental_Table_S6.csv
23.84 KB
Abstract
Lenoks, species within the genus Brachymystax, are freshwater salmonids with a scattered distribution in the rivers of Siberia, Northeast China, Xinjiang, Hebei, and the Qinling Mountains. Owing to long-term population declines, all species assigned to Brachymystax are protected by law in China. However, the evolutionary history and species-level systematics of this genus remain controversial, complicating taxonomic designations and conservation efforts. In particular, the geographical separation of populations may have resulted in the formation of phenotypically similar cryptic species. We built a chromosome-level genome assembly of B. tsinlingensis and re-sequenced the genomes of 103 individuals of Chinese Brachymystax spp. from five geographically isolated locations. Population genomic and phylogenomic analyses based on nuclear SNPs and mitochondrial genomes revealed six different genetic lineages, of which at least one, the Hebei lineage, represents a cryptic species. Notably, the results suggest that the sympatric species B. lenok and B. tumensis are not close relatives, but the former is more closely related to the new species B. sp. Xinjiang with an estimated divergence time of c. 630 Ka, indicating that closely-related sympatric species may not have evolved via sympatric speciation in areas influenced by Pleistocene climate changes. We observed mito-nuclear phylogenomic discordance in Brachymystax caused by the strong gene flow between B. lenok and B. tumensis. Phylogenetic and demographic analyses emphasize the important role of interglacial refugia in promoting speciation and underscore the impact of historical gene flow. Compared to other lenoks, the Gansu population had the lowest genetic diversity, suggesting that particular attention to protect its genetic resources may be required. Finally, we suggest that cross-regional proliferation and release of lenoks should be banned in the future to protect the genetic integrity of these divergent groups.
Description: Neighbor-joining (NJ) phylogenetic tree constructed based on genome-wide single nucleotide polymorphisms (SNPs)
Content: Including all 103 resequenced samples used in this study
Format: Nexus
Lenok_MT.nexus
Description: Neighbor-joining (NJ) phylogenetic tree constructed based on mitochondrial DNA (mtDNA) sequences
Content: Including the complete mitochondrial DNA sequences generated from the 103 resequenced samples in this study
Format: Nexus
Lenok.MT.fa
Description: Mitochondrial sequences generated for 103 resequenced samples using the mia software
Content: Including 103 mitochondrial DNA sequences with sequence IDs consistent with Supplemental Table S1
Format: Fasta
Lenok.SNP.fa.gz
Description: SNP consensus sequences used for constructing the phylogenetic tree (Lenok_SNP.nexus), which are core analytical sequences filtered from genome-wide SNP loci as described in the manuscript
Content: Including SNP consensus sequences of 103 samples with sequence IDs consistent with Supplemental Table S1, as well as the Atlantic salmon sequence (SLM01) as the outgroup
Format: Gzip-compressed fasta
Lenok.BEAST.fa
Description: SNP data used for Bayesian factor analysis in the BEAST software, generated by random sampling of the whole genome and conversion of homozygous/heterozygous loci
Content: Including data of 103 samples with sequence IDs consistent with Supplemental Table S1
Format: Fasta (0 = homozygous for reference allele, 1 = heterozygous, 2 = homozygous for alternative allele)
Lenok.ILS_TREE.tar.gz
Description: Tree files used for quIBL analysis. 100 bp SNP alignment fragments were randomly extracted for four target three-population combinations (HLJL-XIN-HLJT, SNX-HEB-HLJT, SNX-HEB-HLJL, SNX-HEB-XIN), and 500,000 tree files were generated for each combination through 100 independent quIBL runs
Content: Including four folders, each containing a single tree file with 500,000 trees
Format: Newick
Supplemental_Table_S1.csv
Description: Detailed record of the basic information of the 103 individuals subjected to genome-wide resequencing in this study
Format: Sample ID (unique identification number of the genome-wide resequenced individual); Population Name (abbreviation of the population to which the individual belongs); Breeds (species/lineage to which the individual belongs); Province (sampling province of the individual); City/Country (sampling city/district/county of the individual); Sample Site (specific sampling location of the individual (river/basin)); Depth (genome-wide resequencing depth); MSMC2 (whether the individual was used for MSMC2 analysis (population historical dynamics inference analysis))
Supplemental_Table_S4.csv
Description: Quality metrics of the raw sequencing data of the 103 individuals subjected to genome-wide resequencing in this study
Format: Sample ID (unique identification number of the genome-wide resequenced individual); Population Name (abbreviation of the population to which the individual belongs); RawReads (number of raw reads obtained by sequencing); RawBases (total amount of raw bases obtained by sequencing); Length (length of sequencing reads); Q20 (proportion of bases with a sequencing quality value ≥20); Q30 (proportion of bases with a sequencing quality value ≥30); GC (GC base content ratio of sequencing sequences)
Supplemental_Table_S5.csv
Description: Analysis results of the NewHybrids software, used to determine whether hybridization events occurred in HLJL and HLJT populations and identify hybrid offspring samples
Format: Sample (analysis serial number of the sample); IndivName (unique identification number of the sample for hybridization analysis, consistent with Sample ID); 1.000/0.000/0.000/0.000 (posterior probability of the sample belonging to homozygous parent 1 (P1)); 0.000/0.000/0.000/1.000 (posterior probability of the sample belonging to homozygous parent 2 (P2)); 0.000/0.500/0.500/0.000 (posterior probability of the sample belonging to F1 hybrid (P1×P2 first filial generation)); 0.250/0.250/0.250/0.250 (posterior probability of the sample belonging to F2 hybrid (F1×F1 second filial generation)); 0.500/0.250/0.250/0.000 (posterior probability of the sample belonging to backcross generation 1 (F1×P1)); 0.000/0.250/0.250/0.500 (posterior probability of the sample belonging to backcross generation 1 (F1×P2))
Supplemental_Table_S6.csv
Description: Optimal likelihood run results of population genetic simulation analysis using the Fastsimcoal3 software
Format: Including ancestral population size (ANCSIZE); population sizes of different lineages (L(HLJL), M(HLJT), S(SNX), X(XIN), G(GAN), H(HEB)) (NPOPL, NPOPM, NPOPS, NPOPX, NPOPG, NPOPH); population divergence times (e.g., TDIV1, TDIV2, TDIV3, TDIV5); population fusion/contact times (e.g., TINT_LM, TINT_HM, TINT_HL); lineage divergence completion times (TDIVCOMP1~TDIVCOMP4); bidirectional migration rates (e.g., MIGLM, MIGML,MIGHM, MIGMH); maximum estimated likelihood value (MaxEstLhood); maximum observed likelihood value (MaxObsLhood)
Supplemental_File.pdf
Description: Including Supplemental Table S2, Supplemental Table S3, Supplemental Table S7 and Supplemental Figure S1-S10
Table_S2: Short summary of BUSCO
Table_S3: Summary of repeats
Table_S7: BF values of different species delimitation models
FIGURE S1. Four alternative models for Fastsimcoal3 simulation.
FIGURE S2. Heatmap of Hi-C analysis for the assembled reference genome. Chromosomes were sorted and named by size, with blue outlines.
FIGURE S3. Circos plot of sequence synteny between Brachymystax tsinlingensis and Salmo salar. Chromosomes and their lengths are labeled and colored on the outer ring.
FIGURE S4. Genomic landscape of Brachymystax tsinlingensis. A: Gene density; B: GC content; C: Total repeat density; D: SNP density.
FIGURE S5. Pattern of SNP numbers across different lenok groups.
FIGURE S6. Site frequency spectrum of six populations.
FIGURE S7. Admixture analysis results for K = 2-7 with corresponding CV values.
FIGURE S8. TreeMix analysis results for other different K values.
FIGURE S9. OptM output for all TreeMix results. Upper panel: Mean and standard deviation (SD) of the composite likelihood L(m) and proportion of variance explained across five iterations. Bottom panel: Second-order rate of change (Δm) across different values of m.
FIGURE S10. Distribution of Delta BIC values for the other three triplets (SNX-HEB-HLJL, SNX-HEB-HLJT, SNX-HEB-XIN).
