Data from: Two new species of Primula sect. Auganthus from Sichuan, China
Data files
Sep 17, 2025 version files 2.56 MB
-
Morphological_Statistics_Table.csv
6.35 KB
-
README.md
3.31 KB
-
Single_Nucleotide_Polymorphisms_Matrix.fas
2.55 MB
Abstract
In groups undergoing rapid radiations, species delimitation among phylogenetically close sister lineages has long been a challenge. During plant surveys in northwestern Sichuan, we unexpectedly discovered two putative new species of Primula that are morphologically similar yet distinct from each other. These species resemble P. xingshanensis, which has been assigned to sect. Auganthus based on morphological characters. To clarify the precise phylogenetic positions of the two putative new species and P. xingshanensis, we sampled related taxa and conducted phylogenetic analyses using chloroplast genomes and nuclear SNPs. The results showed that the two putative new species form sister clades and are closely related to P. sinensis, while P. xingshanensis is sister to P. rupestris. All belong to sect. Auganthus. Based on population genetic structure, morphological statistics, and artificial hybridization experiments, both putative new species should be accepted as distinct species, herein formally described as P. rongrong sp. nov and P. fujiangensis sp. nov. Based on our field surveys and in accordance with the IUCN criteria, we assess the conservation status of P. rongrong as Least Concern (LC) and P. fujiangensis as Critically Endangered (CR).
Dataset DOI: 10.5061/dryad.g79cnp62s
Description:
This dataset contains the raw data used in the manuscript "Two New Species of Primula Sect. Auganthus from Sichuan, China". It comprises two main components:
- Morphometric data for leaf and floral traits.
- Genomic data in the form of a FASTA file containing aligned Single Nucleotide Polymorphism (SNP) sequences for phylogenetic and population genetic analysis.
File List:
This deposit contains the following two files:
Morphological_Statistics_Table.csv: Comma-separated values file containing morphological measurements for all sampled individuals.Single_Nucleotide_Polymorphisms_Matrix.fas: A FASTA-formatted alignment file of SNP loci.
Data Dictionary:
File: Morphological_Statistics_Table.csv
This file contains the morphological measurement data. Each row represents one individual plant. The columns are defined as follows:
- Leaf_Area (cm²): The two-dimensional area of a single side of the leaf.
- Leaf_Perimeter (cm): The total length of the leaf's outer boundary.
- Leaf_Shape_Index: A dimensionless index calculated as Perimeter / Area.
- Leaf_Length (cm): The length of the leaf blade, excluding the petiole.
- Leaf_Width (cm): The maximum width of the leaf blade.
- Petiole_Length (cm): The length of the leaf stalk (petiole).
- Petiole_Ratio: Ratio of petiole length to total leaf length: Petiole_Length / (Leaf_Length + Petiole_Length).
- Primary_Lobes_Number: The count of primary lobes on the leaf.
- Total_Serrations_Number: The total count of serrations on all leaf lobes.
- Lateral_Veins_Number: The number of lateral veins on the leaf.
- Inflorescence_Height (cm): The height of the inflorescence from its base to the top.
- Pedicel_Length (cm): The length of the stalk of a single flower.
- Bract_Length (cm): The length of a randomly selected bract on the inflorescence.
- Calyx_Tube_Length (cm): The length of the calyx tube of a flower.
- Nectar_Guide: Presence (1) or absence (0) of nectar guides on the flower.
- Throat_Color: Color of the flower's throat (White: 0, Purple: 1).
File: Single_Nucleotide_Polymorphisms_Matrix.fas
This is a standard FASTA file containing SNP sequence data.
- Content: It contains aligned SNP loci from a total of 17 individuals, including three previously described species and two newly described species in the associated study.
- Format: Each sequence entry begins with a header line starting with a
>character, followed by the individual identifier (e.g.,>Sample_ID_01). The subsequent lines contain the nucleotide sequence for that individual. - Correspondence: All individual identifiers in the header lines correspond one-to-one with the numbering used in the paper.
- Bioinformatics: This file is the product of a variant calling pipeline against the Primula veris reference genome, filtered to retain 150,248 high-quality SNP loci.
Usage Notes:
- For detailed descriptions of the methodologies used to collect morphological data and generate the SNP matrix, please see the Methods section of this dataset deposit.
Morphological Data Acquisition:
Thirty flowering individuals, with a minimum interval of two meters between any two, were randomly selected from each wild population for morphological observation. A single, moderately sized leaf was collected from each plant. Each leaf was photographed indoors on a black light-absorbing cloth with a scale bar for reference. A total of ten leaf morphological traits (Leaf Area, Leaf Perimeter, Leaf Shape Index, Leaf Length, Leaf Width, Petiole Length, Petiole Ratio, Primary Lobes Number, Total Serrations Number, Lateral Veins Number) were measured and counted from these digital images using ImageJ software (v. 1.54d). Six floral traits (Inflorescence Height, Pedicel Length, Bract Length, Calyx Tube Length) were measured in the field using a vernier caliper. The presence of nectar guides (coded as: 0 = absent, 1 = present) and throat color (coded as: 0 = white, 1 = purple) were also recorded for each individual.
Genomic SNP Data Acquisition and Bioinformatics:
Genomic DNA was extracted from silica-gel-dried leaf tissue using a modified CTAB (cetyltrimethylammonium bromide) protocol. DNA quality and concentration were assessed using a NanoDrop 1000 Spectrophotometer and agarose gel electrophoresis. Library preparation, genome skimming (low-coverage whole-genome sequencing), and initial quality control (FastQC) were performed by Wuhan Frasergen Genetic Information Co., Ltd. (Wuhan, China) on an Illumina HiSeq 6000 platform. Approximately 3 Gb of raw paired-end sequencing data (150 bp) was generated per sample.
Raw sequencing reads were processed with SOAPnuke (v.2.1.7) to remove adapters and low-quality reads, producing clean reads. These clean reads were then aligned to the Primula veris reference genome using BWA-MEM (v.0.7.17-r1188) with default parameters. The resulting Sequence Alignment/Map (SAM) files were converted to Binary Alignment/Map (BAM) format, sorted, and processed with Picard-tools (https://broadinstitute.github.io/picard/) to mark and remove PCR duplicates. Mapping statistics (e.g., mapping rate, coverage depth) were calculated using SAMtools (v.1.9). Variant calling was performed on all samples simultaneously using the HaplotypeCaller tool in GATK (v.4.0). The initial raw SNP set was rigorously filtered using PLINK (v.1.07) with the following thresholds: a minor allele frequency (MAF) > 0.05, a call rate > 0.95, and pruning for linkage disequilibrium (LD). This process resulted in a final high-quality dataset of 150,248 independent SNP loci retained for downstream phylogenetic and population genetic structure analysis.
