Skip to main content
Dryad

Annotation and evolutionary analysis of chemosensory gene sequence data in the Colorado potato beetle, Leptinotarsa decemlineata

Cite this dataset

Schoville, Sean et al. (2023). Annotation and evolutionary analysis of chemosensory gene sequence data in the Colorado potato beetle, Leptinotarsa decemlineata [Dataset]. Dryad. https://doi.org/10.5061/dryad.8sf7m0cmk

Abstract

Plants and plant-feeding insects comprise the majority of global species diversity, and their coevolutionary dynamics provide an important window into the mechanisms that mediate niche evolution. In particular, there is considerable interest in understanding the nature of genetic changes that allow host-plant shifts to occur and to determine whether functional genomic diversity varies predictably in relation to host-plant breadth. Insect chemosensory proteins play a central role in mediating insect-plant interactions, as they directly influence plant detection and sensory stimuli during feeding. This large group of gene families is known to evolve rapidly, yet it remains unclear how these genes evolve in response to host-shifts and host specialization. Here we investigate whether selection at chemosensory genes is linked to host-plant expansion in the Colorado potato beetle (CPB), Leptinotarsa decemlineata (Coleoptera: Chrysomelidae), and whether rates of selection vary among ten closely related Leptinotarsa species. To develop functional hypotheses of chemosensory genes involved in the detection of potato host-plants, we combine gene expression analysis of the antennae and maxillary-labial palps using RNA sequencing with genomic evidence of natural selection. We show that expression of chemosensory genes differs among pest populations of Leptinotarsa decemlineata and numerous genes are under positive selection. We also find that rates of positive selection on olfactory receptors are higher in host-plant generalists, whereas rates are higher for gustatory receptors and olfactory binding proteins in host-plant specialists.

README

This data directory contains files supporting the project "Chemosensory genes in Leptinotarsa decemlineata", by Zachary Cohen, Michael S. Crossley, Robert F. Mitchell, Patamarerk Engsontia, Yolanda H. Chen, and Sean D. Schoville. Please contact Sean Schoville with any questions: sean.schoville@wisc.edu

Description of data and file structure

The files below contain genetic sequence data or derived data that have no units.

File: CPB_chemosensory_sequences.fasta

This file contains the nucleotide coding sequences of Colorado potato beetle chemosensory genes: each olfactory receptor (OR), ionotropic receptor (IR), odorant binding protein (OBP), and gustatory receptor (GR). These sequences were derived from a combination of genomic DNA and RNAseq data, with manual editing to annotate start/stop positions and splice sites. Sequence identifiers have a letter for a gene with alternatively spliced transcripts and an underscore with a number if part of a gene model. Annotation names are designated with the AnnotName field in the fasta header name.

Files (1 ct): Annotation_tables.xls

This excel file contains information on the re-annotation of chemosensory genes (ORs and GRs) undertaken in this paper. Building upon prior genes models (Liu et al. 2015\, Schoville et al. 2018\, Mitchell et al. 2019)\, chemosensory genes were manually edited using the tissue-specific RNA sequence data. We focused on identifying new genes and resolving known problems in the annotation of OR and GR families. Earlier annotation efforts (for the IRs and OBPs) are documented in Schoville et al. 2018. A model species for agricultural pest genomics: the genome of the Colorado potato beetle\, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae). Scientific Reports 8(1): 1931. https://www.nature.com/articles/s41598-018-20154-1. Abbreviations in the tables are as follows: The number (No.) and name (Gene name) assigned to each gene; the newly added suffix (Suffix) or previous suffix (Original) that indicates whether the gene model has the following issue: CTE = C-terminal missing; JOI = exons from two scaffolds joined into one gene model; FIX = model completed manually using raw reads; PSE = pseudogene; 1-letter abbreviations for genes with multiple suffixes: FJ = FIX + JOI.; de novo transcripts that match chemoreceptor genes (Transcriptome)\, coding sequence (mRNA) and protein sequence (Protein)\, genomic locations (columns 'Scaffold'\, 'Coordinates' [start-end position in genome] and 'Strand')\, number of introns and splicing phases (INtrons|phases)\, number of amino acids (AA). Cells containing "not applicable" (N/A) have no suffix or note associated with them.

Files (2 ct): Annotation_tables*.csv

These csv files are identifical to the Annotation_table.xls above, but in machine-readable format for each spreadsheet. Each file contains information on the re-annotation of chemosensory genes (ORs or GRs) undertaken in this paper. Building upon prior genes models (Liu et al. 2015, Schoville et al. 2018, Mitchell et al. 2019), chemosensory genes were manually edited using the tissue-specific RNA sequence data. We focused on identifying new genes and resolving known problems in the annotation of OR and GR families. Earlier annotation efforts (for the IRs and OBPs) are documented in Schoville et al. 2018. A model species for agricultural pest genomics: the genome of the Colorado potato beetle, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae). Scientific Reports 8(1): 1931. https://www.nature.com/articles/s41598-018-20154-1. Abbreviations in the tables are as follows: The number (No.) and name (Gene name) assigned to each gene; the newly added suffix (Suffix) or previous suffix (Original) that indicates whether the gene model has the following issue: CTE = C-terminal missing; JOI = exons from two scaffolds joined into one gene model; FIX = model completed manually using raw reads; PSE =
pseudogene; 1-letter abbreviations for genes with multiple suffixes: FJ = FIX + JOI.; de novo transcripts that match chemoreceptor genes (Transcriptome)\, coding sequence (mRNA) and protein sequence (Protein)\, genomic locations (columns ’Scaffold’\, ’Coordinates’ [start-end position in genome] and ’Strand’)\, number of introns and splicing phases (INtrons|phases)\, number of amino acids (AA). Cells containing "not applicable" (N/A) have no suffix or note associated with them.

File (1 ct): RNAseq_read_counts.xls

This Excel file contains the read count data (number of reads per gene) for tissue and among-population differential expression analyses. We generated RNA sequencing datasets (NCBI: PRJNA716273; 2x125 bp HiSeq2500) to compare gene expression data in chemosensory organ tissues (antennae versus palps) of adult L. decemlineata. For further population comparison, we also obtained RNAseq reads (NCBI SRA Accessions: SRR1948057 and SRR1948059, 2 x 100 bp, HiSeq2000) generated from the antennae of adult male and female L. decemlineata from a site in Xinjiang, China by Liu et al. (2015). We additionally obtained published whole-body RNAseq reads for a non-pest population from Colorado (n=6) and a pest population from Wisconsin (n=9) to examine gene expression variation in comparisons of populations. Short read data were mapped to the CPB genome v1.1 (NCBI Project: PRJNA171749) and read counts per sample per gene were generated by combining our new chemoreceptor annotations with the L. decemlineata official gene set (OGSv1.1) from Schoville et al. (2018). Counts were obtained using the functions makeTxDbFromGFF, transcriptsBy, and summarizeOverlaps available in the R packages GenomicAlignments and GenomicFeatures (Lawrence et al., 2013). The first worksheet provides results from the tissue specific samples, including published data from a population in China. Samples represent pooled adult individuals, separated by tissue type and sex. The second worksheet provides results from comparisons of adult populations using whole body tissues, where each sample represents an indvidual beetle.

Files (4 ct): Leptinotarsa_*_msa.tgz

Each tarballed file contains a directory of files that represent the multiple sequence alignments (multi-fasta format) of chemoreceptor genes used for interspecific comparisons in the branch site test in Hyphy. Among the ten Leptinotarsa species (NCBI PRJNA580490), gene models were isolated from de novo genomes of each Leptinotarsa species following the procedures described therein. In brief, chemosensory orthologs were first identified using a multiphase heuristic algorithm implemented by the spALN v2.3.3 program (Gotoh 2008), which efficiently maps cDNA sequences onto whole genomes (options -M4 -O6 -S0 -Q7 -LS were selected). These sequences were subsequently aligned to each other, using the CPB reference ortholog sequence, via a custom C program CATaNNN v4.0 (Weibel and Cohen 2018) and MAFFT v7.450 (Katoh and Standley 2013). We corrected for misalignment and quality using Guidance v2.02, at high stringency, with 30 bootstrap replicates per alignment (Penn et al. 2010). Each file is named by the gene name, and in the file each species name occurs at the sequence header (e.g. ">Ldecemlineata") followed by the sequence.

Files (4 ct): CPB_.tar.gz

Each tarballed file contains a directory of files that represent the multiple sequence alignments (multi-fasta format) of chemoreceptor genes used for intraspecific comparisons in the Tajima's D and McDonald-Kreitman test. We used whole-genome resequencing data (NCBI PRJNA580490) to compare nucleotide diversity and signatures of positive selection in olfactory genes between pest and non-pest lineages of L. decemlineata. We first used SNP genotypes identified by Pélissié et al. (2021) as input for BEAGLE 4.1 (Browning and Browning 2016) to generate phased, genome-wide haplotype sequences (two per individual) and then isolated chemosensory loci of the annotated families. Each file is named by the gene name, and the entries represent individual samples of Leptinotarsa decemlineata (e.g. ">CPBWGS_"). See Supplemental File 1 Table S1 for a key to the individual ids.

File (1 ct): Trinity_denovo.zip

This fasta file contains the de novo transcriptome nucelotide sequences generated for chemosensory gene annotation. RNA sequencing reads (NCBI SRA Accessions: SRR1948057 and SRR1948059, 2 x 100 bp, HiSeq2000) from Wisconsin antennae and palps tissues were used to generate this assembly. using the software Trinity v2.3.2 (Haas et al., 2013). We first trimmed adaptor sequences and low-quality bases from Illumina reads using Trimmomatic v0.36 (Bolger et al., 2014), and then used default settings to assemble a transcriptome in Trinity.

Methods

Chemosensory annotations: Building upon prior genes models (Liu et al. 2015, Schoville et al. 2018, Mitchell et al. 2019), chemosensory genes were manually edited using the tissue-specific RNA sequence data. We focused on identifying new genes and resolving known problems in the annotation of OR and GR families.

Gene expression analysis: We generated RNA sequencing datasets (NCBI: PRJNA716273; 2x125 bp HiSeq2500) to compare gene expression data in chemosensory organ tissues (antennae versus palps) of adult L. decemlineata. For further population comparison, we also obtained RNAseq reads (NCBI SRA Accessions: SRR1948057 and SRR1948059, 2 x 100 bp, HiSeq2000) generated from the antennae of adult male and female L. decemlineata from a site in Xinjiang, China by Liu et al. (2015). We additionally obtained published whole-body RNAseq reads for a non-pest population from Colorado (n=6) and a pest population from Wisconsin (n=9) to examine gene expression variation in comparisons of populations. 

Population genomic analysis: We used whole-genome resequencing data (NCBI PRJNA580490) to compare nucleotide diversity and signatures of positive selection in olfactory genes between pest and non-pest lineages of L. decemlineata. We first used SNP genotypes identified by Pélissié et al. (2021) as input for BEAGLE 4.1 (Browning and Browning 2016) to generate phased, genome-wide haplotype sequences (two per individual) and then isolated chemosensory loci of the annotated families.

Phylogenomic analysis: Among the ten Leptinotarsa species (NCBI PRJNA580490), gene models were isolated from de novo genomes of each Leptinotarsa species following the procedures described therein. In brief, chemosensory orthologs were first identified using a multiphase heuristic algorithm implemented by the spALN v2.3.3 program (Gotoh 2008), which efficiently maps cDNA sequences onto whole genomes (options -M4 -O6 -S0 -Q7 -LS were selected). These sequences were subsequently aligned to each other, using the CPB reference ortholog sequence, via a custom C program CATaNNN v4.0 (Weibel and Cohen 2018) and MAFFT v7.450 (Katoh and Standley 2013). We corrected for misalignment and quality using Guidance v2.02, at high stringency, with 30 bootstrap replicates per alignment (Penn et al. 2010).

Usage notes

All files can be opened using text editors.

Funding

United States Department of Agriculture, Award: 2015-67030-23495, NIFA AFRI

United States Department of Agriculture, Award: WIS02004, Hatch Formula Funds

Wisconsin Potato and Vegetable Growers Association