Data from: Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression
Data files
Aug 06, 2024 version files 12.21 MB
-
all_mbased_and_qleelic_ASE_results.csv
5.15 MB
-
CRM_predictions_all_tissues.tsv
299.66 KB
-
cross_Kisumu_gene_diff.csv
1.93 MB
-
cross_Kisumu_isoform_comparison.csv
2.53 MB
-
NagongeraMother_KisumuMother_gene_diff.csv
1.86 MB
-
README.md
7.35 KB
-
segregating_sites_in_predicted_CRMs.csv
426.30 KB
Aug 22, 2024 version files 12.21 MB
-
all_mbased_and_qleelic_ASE_results.csv
5.15 MB
-
CRM_predictions_all_tissues.tsv
299.66 KB
-
cross_Kisumu_gene_diff.csv
1.93 MB
-
cross_Kisumu_isoform_comparison.csv
2.53 MB
-
NagongeraMother_KisumuMother_gene_diff.csv
1.86 MB
-
README.md
7.70 KB
-
segregating_sites_in_predicted_CRMs.csv
426.30 KB
Abstract
Malaria control relies on insecticides targeting the mosquito vector, but this is increasingly compromised by insecticide resistance, which can be achieved by elevated expression of detoxifying enzymes that metabolize the insecticide. In diploid organisms, gene expression is regulated both in cis, by regulatory sequences on the same chromosome, and by trans acting factors, affecting both alleles equally. Differing levels of transcription can be caused by mutations in cis-regulatory modules (CRM), but few of these have been identified in mosquitoes. We crossed bendiocarb resistant and susceptible Anopheles gambiae strains to identify cis-regulated genes that might be responsible for the resistant phenotype using RNAseq, and cis-regulatory module sequences controlling gene expression in insecticide resistance relevant tissues were predicted using machine learning. We found 115 genes showing allele specific expression in hybrids of insecticide susceptible and resistant strains, suggesting cis regulation is an important mechanism of gene expression regulation in Anopheles gambiae. The genes showing allele specific expression included a higher proportion of Anopheles specific genes on average younger than genes with balanced allelic expression.
README: Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression
https://doi.org/10.5061/dryad.3n5tb2rr1
Description of the data and file structure
supplementary_tables_140524.xlsx (Zenodo)
Dyer_et_al_2024_supplementary_tables_Aug.xlsx are the supplementary tables 1 to 16 described in "Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression"
supplementary_Figures_and_Methods.pdf (Zenodo)
supplementary_Figures _and_Methods.pdf are the supplementary figures and methods described in "Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression"
segregating_sites_in_predicted_CRMs.csv
segregating_sites_in_predicted_CRMs.csv is a csv table of the sites in predicted cis regulatory modules flanking genes that showed allele specific expression in at least four of the six crosses described in "Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression". The columns are contig (chromosome arm), flanking_gene_1, flanking_gene_2, position (of the SNP), the PEST reference allele (0 in VCF formatted genotype), alt1 allele (1 in VCF formatted genotype), alt2 allele (2 in VCF formatted genotype), alt3 allele (3 in VCF formatted genotype), and then 16 VCF formatted genotype columns for Kisumu colony individuals and 14 VCF formatted genotype columns ofr Nagongera colony individuals. Note missing data is indicated by .|.
CRM_predictions_all_tissues.tsv
CRM_predictions_all_tissues.tsv is a tsv table of all the 4122 CRM predictions described in "Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression" . The files is in bed format. Column 1 is chromosome arm (contig), column 2 is CRM start site, column 3 CRM end site, column 4 is the CRM score, column 5 is flanking gene 1, column 6 is flanking gene 2, column 7 is the training sets (comma separated), column 8 is the model(s) under which the CRM was predicted (comma separated).
all_mbased_and_qleelic_ASE_results.csv
all_mbased_and_qleelic_ASE_results.csv is a csv table of all the allele specific expression analysis described in "Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression". This file was produced by processing the counts tables of reads matching each SNP. The raw data and counts tables are available at Gene Expression Omnibus accession GSE241768. For all crosses, data was analysed using the R package MBASED, and for crosses B5, K2, K4 and K6 the same data was additionally analysed using Qllelic. Columns 1-4 are ID (gene ID), TranscriptID, GeneName and GeneDescription. NA for GeneName and GeneDescription indicates this was not available due to lack of annotation of the corresponding gene. This is followed by 5 columns for each cross analysed with MBASED (major allele frequency, raw P value of ASE, raw P Value of heterogeneity of ASE, FDR corrected P value of ASE and a Boolean (true/flase) as to whether that gene showed siginificant ASE for that cross. This is then followed by 7 columns for crosses B5, K2, K4 and K6 analaysed with Qllelic: sumCOV is the aggregated total counts at that gene, matCOV is the aggregated maternal counts at that gene, AI is the degree of allelic imbalance (ASE) at that gene, BT_pval is the raw P value of ASE at that gene, BT_pval_CC is the QCC corrected p value of ASE at that gene, BT is a Boolean(true/false) of whether there is significant ASE at that gene and BT_CC is a Boolean (true/false) of whether there is significant ASE at that gene once QCC correction is applied. NA in any of the numerical columns indicates not applicable as the number could not be calculated.
NagongeraMother_KisumuMother_gene_diff.csv
NagongeraMother_KisumuMother_gene_diff.csv is a csv table of the comparison of expression at the gene level between reciprocal crosses where the mother was from Nagongera (crosses B1, B3 and B5) versus the crosses where the mother was from Kisumu (crosses K2, K4 and K6). This is the table output by rna-seq-pop (which uses Kallisto to pseudoalign reads and DESeq2 to analyse differential expression at the gene level). Columns are GeneID, baseMean (the mean normalised count over the samples), log2FoldChange between Nagongera mother versus Kisumu mother samples, lfcSE (standard error estimate of the log fold change), stat (value of the test statistic for the gene), pvalue (raw P value for the test for the gene), padj (P value adjusted for multiple testing using FDR correction by DESeq2), FC (fold change), absolute_diff (difference in mean normalised count between the samples), GeneName and GeneDescription. NA indicates gene name or description was not available due to lack of annotation.
cross_Kisumu_gene_diff.csv
cross_Kisumu_gene_diff.csv is a csv table of the comparison of expression at the gene level between F1 progeny of Nagongera x Kisumu and the Kisumu strain. This is the table output by rna-seq-pop (which uses Kallisto to pseudoalign reads and DESeq2 to analyse differential expression at the gene level). Columns are GeneID, baseMean (the mean normalised count over the samples), log2FoldChange between cross versus Kisumu samples, lfcSE (standard error estimate of the log fold change), stat (value of the test statistic for the gene), pvalue (raw P value for the test for the gene), padj (P value adjusted for multiple testing using FDR correction by DESeq2), FC (fold change), absolute_diff (difference in mean normalised count between the samples), GeneName and GeneDescription. NA indicates gene name or description was not available due to lack of annotation.
cross_Kisumu_isoform_comparison.csv
crossKisumuisoformcomparison.csv is a csv table of the comparison of expression at the isoform level between F1 progeny of Nagongera x Kisumu and the Kisumu strain. This is the table ouptup by rna-seq-pop (which uses Kallisto to pseudoalign reads and sleuth to analyse differential expression at the transcript level). Columns are TranscriptID, pval (Wald test FDR adjusted p value), qval (P value corrected for muptiple testing), b (beta value which is log fold change in isoform expression between the cross progeny and Kisumu), seb (standard error of beta value), mean_obs (log2 mean expression of the transcript accross all samples), varobs (biological variance of expression), tech_var (technical variance of expression derived form bootstrapping), sigma_sq (estimate of variance without the technical variance), smooth_sigma_sq (smooth regression fit for shrinkage estimation), final_sigma_sq (the max of sigma_sq and smooth_sigma_sq used for covariance estimation of beta value), FC (fold change), GeneName and GeneDescription. NA indicates gene name or description was not available due to lack of annotation.
Code/Software
The full methods, code and software versions used to produce these files is described in "Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression" (DOI:10.1098/rspb.2024.1142)
Version changes
22nd August 2024: Table S6 added to supplementary_tables_140524.xlsx. Table S6 is a table of the 115 genes that showed allele specific expression in the progeny of at least 4 out of six crosses between Nagongera x Kisumu strains described in the study. The previous tables S6 to S14 were relabelled as S7 to S15.