Deciphering salt stress responses in Solanum pimpinellifolium through high-throughput phenotyping
Data files
Aug 17, 2023 version files 111.40 GB
Abstract
Soil salinity is a major environmental stressor affecting agricultural productivity worldwide. Understanding plant responses to salt stress is crucial for developing resilient crop varieties. Wild relatives of cultivated crops, such as wild tomato, Solanum pimpinellifolium, can serve as a useful resource to further expand the resilience potential of the cultivated germplasm, S. lycopersicum. In this study, we employed high-throughput phenotyping in the greenhouse and field conditions to explore salt stress responses of a S. pimpinellifolium diversity panel. Our study revealed extensive phenotypic variations in response to salt stress, with traits such as transpiration rate, shoot mass, and ion accumulation showing significant correlations with plant performance. We found that while transpiration was a key determinant of plant performance in the greenhouse, shoot mass strongly correlated with yield under field conditions. Conversely, ion accumulation was the least influential factor under greenhouse conditions. Through a Genome Wide Association Study, we identified candidate genes not previously associated with salt stress, highlighting the power of high-throughput phenotyping in uncovering novel aspects of plant stress responses. Overall, this study contributes to our understanding of salt stress tolerance in S. pimpinellifolium and lays the groundwork for further investigations into the genetic basis of these traits, ultimately informing breeding efforts for salinity tolerance in tomato and other crops.
Methods
Illumina whole genome sequencing of 265 Solanum pimpinellifolium accessions using paired-end PE 150 insert llibraries sequenced on Novaseq was newly performed in this study. SNP variant calling was then performed within these 265 newly seqeunced accessions together with 226 additional accessions of wild and cultivated tomatoes previously resequenced and available under the SRA project PRJNA454805.
SNP variant calling was performed following the method described in (Abrouk et al., 2020) and available on GitHub (https://github.com/IBEXCluster/Wheat-SNPCaller) with a few modifications. Raw sequence reads were filtered with Trimmomatic-v0.38 (Bolger et al., 2014) using the following criteria: SLIDINGWINDOW:5:20; MINLEN:50. The filtered paired-end reads were then aligned for each sample individually against the LA2093_genome_v1.4 reference assembly (Wang et al., 2020a) using BWA-MEM (v-0.7.17) (Li and Durbin, 2010), only reads mapping with a quality Q>20 were retained, followed by sorting and indexing using samtools (v1.8). Duplicated reads were marked and read groups were assigned using the Picard tools (http://broadinstitute.github.io/picard/). Variants were identified with GATK (v4.1.8.0) (McKenna et al., 2010) using the “--emitRefConfidence’ function of the HaplotypeCaller algorithm and to call SNPs and InDels for each accession (Van der Auwera et al., 2013). Individual g.vcf files for each sample were then compressed and indexed with tabix (v-0.2.6) (Li, 2011) and combined into chromosome-chunks g.vcf using the GenomicsDBImport function of GATK. Joint genotyping was then performed for each chromosome-chunk using the function GenotypeGVCFs of GATK. To obtain high confidence variants, we excluded SNPs with the VariantFiltration function of GATK with the criteria: QD < 2.0; FS > 60.0; MQ < 40.0; MQRankSum < −12.5; ReadPosRankSum < - 8.0 and SOR > 3.0. Full chromosome VCF files and subsequently full genome VCF were obtained using the gatherVCF algorithm of GATK (Danecek et al., 2011). In total, 37,687,189 SNPs were called from 491 individuals.
To obtain high-quality SNP data, we applied SNP clustering filter to allow no more than three SNPs in a 10-bp window using VariantFiltration from GATK v4.1.8.0 (argument: --cluster-size 3 --cluster-window-size 10). Additional filters were applied with VCFtools (v 0.1.17) to obtain high-quality SNPs: low and high average SNP depth (6 ≤ DP ≥ 30), keep only bi-allelic sites, accepting missing data ≤10%, SNPs located in chromosome unanchored, and accessions having > 10% of missing data. Finally, 20,325,817 SNPs and 482 individuals were kept.
Usage notes
GATK or vcftools