Skip to main content

A novel method for using RNA-seq data to identify imprinted genes in social Hymenoptera with multiply mated queens

Cite this dataset

Howe, Jack et al. (2020). A novel method for using RNA-seq data to identify imprinted genes in social Hymenoptera with multiply mated queens [Dataset]. Dryad.


Genomic imprinting results in parent-of-origin dependent gene expression biased towards either the maternally- or paternally-derived allele at the imprinted locus. The kinship theory of genomic imprinting argues that this unusual expression pattern is a manifestation of intra-genomic conflict between the maternally- and paternally-derived halves of the genome that arises because they are not equally related to the genomes of social partners. The theory thus predicts that imprinting may evolve wherever there are close interactions among asymmetrically related kin. The social Hymenoptera with permanent caste differentiation are suitable candidates for testing the kinship theory because haplodiploid sex determination creates strong relatedness asymmetries and nursing workers interact closely with kin. However, progress in the search for imprinted genes in the social Hymenoptera has been slow, in part because tests for imprinting rely on reciprocal crosses that are impossible in most species. Here, we develop a method to systematically search for imprinting in haplodiploid social insects without crosses, using instead samples of pooled individuals collected from natural colonies. We tested this protocol using data available for the leaf-cutting ant Acromyrmex echinatior, providing the first genome-wide search for imprinting in any ant. While we identified several genes as potentially imprinted, none of the four genes tested could be verified as imprinted using digital droplet PCR, highlighting the need for higher quality genomic assemblies that accurately map duplicated genes.


Detailed information regarding data collection can be found in the manuscript, and a detailed description of each data file is included in the README.txt, but are summarised briefly here:

ASE_data_frame.csv: gives the output of the bioinformatics pipeline run by Qiye Li and Zongji Wang. In short, they used the data published in Li et al 2014, aligned reads using a Burrows-Wheeler aligner, followed by BLAT. SNPs were then identified using RES-Scanner in both DNA and RNA, and the numbers of reads supporting each SNP-allele recorded. Fisher's exact tests were conducted for each SNP to test for differences in the ratio of alleles between DNA and RNA at the same locus. This file records those SNPs that showed significant differences between DNA and RNA, and gives the number of reads supporting each allele for each sample, as well as information for each gene. SNPs that could not be placed within an annotated gene were not included. The raw data can be downloaded with the original article at the NCBI GEO accession GSE51576. For questions regarding the bioinformatics steps preceeding this table, contact QL or ZW. 

Compiled_ddPCR_results.csv: gives the data output from the ddPCR experiments. Including the number of droplets that were positive for both alleles, positive for only one allele, and negative droplets. This was used to calculate the relative proportion of the focal allele, and the poisson distributed confidence intervals around this estimate are also given (as templates will show a poisson distribution among droplets). 

dilution_data.csv: also gives ddPCR data. This data shows the output from the tests of the relative concentration of different genes following serial dilution. The Sample column gives the relative dilution levels (1 to 0.016)

Usage notes

See README.txt for description of the data, files and script uploaded. 

In short, there are three data files uploaded in input_data (one for the allele-specific expression: ASE_data_frame.csv, one for the ddPCR allele-specific PCR: compiled_ddPCR_results.csv, and one for the copy number dilution assay: dilution_data.csv). There is one RMarkdown script (Howe_et_al_RScript.Rmd), which conducts all analyses, relying on only basic R packages that must be installed. 

Some intermediate data-values are missing from the ddPCR file (ie. the absolute number of droplets for each channel) due to a recording error. The fraction and confidence intervals are reported however.  


European Research Council, Award: 323085