Data from: The structure and allelic diversity of the self-incompatibility locus (S-locus) in diploid potatoes inferred from genome sequences and transcriptome data from styles and pollen
Data files
Dec 11, 2025 version files 25.46 GB
-
Data_Figure6.txt
3.12 KB
-
Gene_Alignments_and_Phylogenies.zip
278.43 KB
-
Genomes.zip
24.98 GB
-
Genotypes_list.txt
1.60 KB
-
Pollen_RNAseq.zip
141.20 MB
-
QC_Metrics_for_Transcriptomes.xlsx
10.61 KB
-
README.md
3.76 KB
-
Style_Isoseq.zip
148.98 MB
-
Style_RNAseq.zip
192.91 MB
Abstract
Gametophytic Self-Incompatibility is a reproductive strategy to prevent inbreeding and promote outcrossing, and it is controlled by the self-incompatibility locus (S-locus). Studies to understand molecular and evolutionary aspects of the SI system in the Solanaceae have been conducted using several genera, including Petunia, Nicotiana, and Solanum. S-RNases are pistil determinants of gametophytic SI, and multiple S-RNase alleles have been identified in a few potato species. SLFs, the pollen determinants of SI, are linked to S-RNases on chromosome 1. The S-RNase and SLFs present on each chromatid determine an individual’s self-compatible (SC) or self-incompatible (SI) haplotypes. We used long-read genome sequencing and RNA sequencing from styles and pollen to assess the diversity and composition of elements within the S-locus. RNA for RNA-seq was collected from stylar tissue and germinated pollen. For three genotypes, PacBio Iso-Sequencing was used to confirm the presence of alternative transcripts for the S-RNase gene. The combined datasets enabled us to localize both the male (pollen) and female (stylar) components of the S-locus. Sequences of the identified S-RNases and SLFs were used to evaluate their phylogenies along with other Solanaceae sequences for the same genes obtained from databases. Our analysis showed that SLF sequences are expressed in pollen but not in styles, vary in number between individuals, and are distributed across a 9-17 Mb region flanking one S-RNase gene. Preferential associations within haplotigs of specific S-RNase types and SLF types were not observed. Extensive sequence diversity was observed for S-RNases and SLFs, and phylogenetic analysis indicates that diversification of both genes predates the divergence between tomatoes and potatoes.
Dataset DOI: 10.5061/dryad.qz612jmv1
Description of the data and file structure
Files and variables
File: Genomes.zip
Description: contains assembled phased genomes in fasta format and annotation files in gff format. Fasta files were generated from PacBio HiFi data and assembled using Hifiasm v. 0.19.9. Gene prediction was done using Augustus v.3.5.0. Each sub-folder inside the Genomes folder is dedicated to a specific genotype. The identity of each WiDiPo genotype is listed in the genotypes_list.txt file.
File: Style_RNAseq.zip
Description: contains assembled style transcriptomes in fasta format and classifications of transcripts based on functional annotations of T2-Ribonucleases (GO0033887) as DNA and protein sequences. Also, transcripts for the IPR025886 domain are included for each genotype. Illumina fastq data were used to assemble the transcriptomes using the Trinity Assembler V. 2.15.1. Transcriptome annotations were done using TRAPID 2.0. The identity of each WiDiPo genotype is listed in the genotypes_list.txt file.
File: Style_Isoseq.zip
Description: contains full-length and quasi-full-length style transcripts in fasta format and classifications of transcripts based on functional annotations of T2-Ribonucleases (GO0033887) as DNA and protein sequences. PacBio isosequencing data was used to assemble the transcriptomes using the Trinity Assembler V. 2.15.1. Transcriptome annotations were done using TRAPID 2.0. The identity of each WiDiPo genotype is listed in the genotypes_list.txt file.
File: Gene_Alignments_and_Phylogenies.zip
Description: Contains sequence alignments and phylogeny files of S-RNase and SLF genes. All sequence alignments for S-RNases and SLFs were done using MAFFT v7.490 (Katoh et al., 2002; Katoh and Standley, 2013), and maximum likelihood (ML) phylogenies were reconstructed with RAxML V.8 (Stamatakis et al., 2014) using a GTR Gamma nucleotide model with rapid bootstrapping and search for the best-scoring ML tree. Five hundred bootstrap replicates were calculated, and a consensus tree was created. There are three subfolders: 1) S-RNases cDNAs, 2) S-RNase proteins 3) SLFs cDNAs. In each subfolder, the following types of files can be found: .fasta (alignment in fasta format); .nex (alignment in nexus format); .phy (alignment in phylip format); .newick (phylogenetic tree in newick format). The identity of each WiDiPo genotype is listed in the genotypes_list.txt file.
File: Pollen_RNAseq.zip
Description: contains assembled pollen transcriptome in fasta format and fasta files for transcripts functionally classified for F-box domain proteins. Illumina fastq data were used to assemble the transcriptomes using the Trinity Assembler V. 2.15.1. Transcriptome annotations were done using TRAPID 2.0. Each sub-folder inside the Pollen_RNAseq folder is dedicated to a specific genotype. The identity of each WiDiPo genotype is listed in the genotypes_list.txt file.
File: QC_Metrics_for_Transcriptomes.xlsx
Description: contains quality evaluation parameters for all the transcriptome datasets.
File: Data_Figure6.txt
Description: Contains data used to generate the relationships graph of haplotypes (Figure 6)
File: Genotypes_list.txt
Description: Contains the information for all genotypes from this dataset.
Code/software
Alignments and phylogenies can be visualized with Mega.
All fasta files with GFF annotations can be visualized with JBrowse or IGV.
