DNA methylation differences between stick insect ecotypes
Data files
Sep 29, 2023 version files 619.70 MB
-
2018Methylation_covariates_batch_first.txt
929 B
-
2018Methylation_first_run_host_predictor.txt
48 B
-
2018Methylation_first_run_relatedness.cXX.txt
8.79 KB
-
2018Methylation_info_spreadsheet_standard.csv
7.60 KB
-
Bayesian_regression_cutoff_beta_geno.txt
727 B
-
first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt
29.07 MB
-
first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt
105.78 MB
-
first.low5.high60.noCT.GA.methylation.status.mRNA.GO.txt
486.09 KB
-
jtdistance.matrix.first.txt
5.37 KB
-
macau.first.output.CpG.tiles.low10.var.adjusted.txt
18.22 MB
-
MethylRaw_coverage_first_methyl_tiles_low10.txt
9.04 MB
-
MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA.txt
31.73 MB
-
MethylRaw_methylC_first_methyl_tiles_low10.txt
8.67 MB
-
README.md
21.82 KB
-
tiles.macau.low10.genes.id.order.txt
2.65 MB
-
variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci
414 MB
Abstract
Epigenetic mechanisms, such as DNA methylation, can influence gene regulation and affect phenotypic variation, raising the possibility that they contribute to ecological adaptation. To begin to address this issue requires high-resolution sequencing studies of natural populations to pinpoint epigenetic regions of potential ecological and evolutionary significance. However, such studies are still relatively uncommon, especially in insects, and are mainly restricted to a few model organisms. Here, we characterize patterns of DNA methylation for natural populations of Timema cristinae adapted to two host plant species (i.e., ecotypes). By integrating results from sequencing of whole transcriptomes, genomes, and methylomes, we investigate whether environmental, host, and genetic differences of these stick insects are associated with methylation levels of cytosine nucleotides in CpG context. We report an overall genome-wide methylation level for T. cristinae of ~14%, being enriched in gene bodies and impoverished in repetitive elements. Genome-wide DNA methylation variation was strongly positively correlated with genetic distance (relatedness) but also exhibited significant host-plant effects. Using methylome-environment association analysis, we pinpointed specific genomic regions that are differentially methylated between ecotypes, with these regions being enriched for genes with functions in membrane processes. The observed association between methylation variation with genetic relatedness and the ecologically-important variable of host plant suggest a potential role for epigenetic modification in T. cristinae adaptation. To substantiate such adaptive significance, future studies could test if methylation has a heritable component and the extent to which it responds to experimental manipulation in field and laboratory studies.
1 .Title of Dataset: Data from: DNA methylation differences between stick insect ecotypes.
2. Author Information
Corresponding Investigator 1
Name: Dr. Clarissa F. de Carvalho
Institution: Universidade Federal de São Paulo (UNIFESP), São Paulo, SP 04021-001, Brazil
Email: clarissa.ferreira@unifesp.br
Corresponding Investigator 2
Name: Dr. Patrik Nosil
Institution: CEFE, Univ Montpellier, CNRS, EPHE, IRD, Univ Paul Valéry Montpellier 3, Montpellier 34293, France
Email: patrik.nosil@cefe.cnrs.fr
Co-investigator 1
Name: Dr. Romain Villoutreix
Institution: CEFE, Univ Montpellier, CNRS, EPHE, IRD, Univ Paul Valéry Montpellier 3, Montpellier 34293, France
Co-investigator 2
Name: Pr. Zachariah Gompert
Institution: Department of Biology, Utah State University, Logan, UT 84321, USA
Co-investigator 3
Name: Pr. Jon Slate
Institution: School of Biosciences, University of Sheffield; Sheffield, S10 2TN, UK
Co-investigator 4
Name: Pr. Jeffrey L. Feder
Institution: Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA
Co-investigator 5
Name: Rüdiger Riesch
Institution: Department of Biological Sciences, Centre for Ecology, Evolution and Behaviour, Royal Holloway University of London; Egham, TW20 0EX, UK
3. Date of data collection:
2017
4. Geographic location of data collection:
California, USA
5. Funding sources that supported the collection of the data:
This work was funded by supporting grants from ERC NatHisGen R/129639, Royal Society of London RG140369 (C.F.d.C, P.N.), the University of Sheffield, the Human Frontier Science Program (R.R.), and FAPESP 2020/07556-8 (C.F.d.C).
6. Recommended citation for this dataset:
de Carvalho et al. (2023), Data from: DNA methylation differences between stick insect ecotypes, Dryad, Dataset
DATA & FILE OVERVIEW
1. Description of dataset
These data were generated to characterize patterns of DNA methylation for natural populations of Timema cristinae adapted to two host plant species (i.e., ecotypes). By integrating results from sequencing of whole transcriptomes, genomes, and methylomes, we investigate whether environmental, host, and genetic differences of these stick insects are associated with methylation levels of cytosine nucleotides in the CpG context.
2. File list:
File 1 Name: 2018Methylation_info_spreadsheet_standard.csv
File 1 Description: Table with information regarding samples, locations, climatic information, and bisulfite conversion
File 2 Name: variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci
File 2 Description: list of C/T and G/A polymorphisms from new and previously published T. cristinae data
File 3 Name: first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt
File 3 Description: Compilation of the annotation tables. Here, only loci covered by a minimum of 5 reads and maximum of 60 were retained. We also selected loci present at at least 12 samples. This table shows the methylation status at each loci (based on the binomial distribution)
File 4 Name: first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt
File 4 Description: Same as File 3, but here the mean methylation levels were calculated. Intergenic regions were removed to ease the analyses
File 5 Name: first.low5.high60.noCT.GA.methylation.status.mRNA.GO.txt
File 5 Description: Table with gene ids. Formatted to input at script GO.enrichment.genes.R
File 6 Name: MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA.txt
File 6 Description: Table compiling methylation cytosine reports among all 24 samples. Loci with minimum 2 reads and maximum 60 (above 99th quantile) were removed. The methylation levels were calculated as methylated cytosines at a certain locus over the sum of all reads covering it.
File 7 Name: jtdistance.matrix.first.txt
File 7 Description: Genetic distances between the 24 individuals based on the RAD-seq data
File 8 Name: MethylRaw_methylC_first_methyl_tiles_low10.txt
File 8 Description: methylated cytosine counts for MACAU
File 9 Name: MethylRaw_coverage_first_methyl_tiles_low10
File 9 Description: coverage input for MACAU
File 10 Name: 2018Methylation_covariates_batch_first.txt
File 10 Description: Covariates input at MACAU, namely: PC1 and PC2 from climate, bisulfite conversion (calculated based on the lambda phage), and sequencing batch
File 11 Name: 2018Methylation_first_run_relatedness.cXX.txt
File 11 Description: Kinship matrix calculated based on RAD-seq. Performed using gemma with default parameters (Zhou et al. 2013).
File 12 Name: Bayesian_regression_cutoff_beta_geno.txt
File 12 Description: compiled coefficient results of the Bayesian regression across the different cut-offs designating DMRs
File 13 Name: macau.first.output.CpG.tiles.low10.var.adjusted.txt
File 13 Description: MACAU output. We disregarded the unassembled scaffolds (i.e. lgNA)
File 14 Name: 2018Methylation_first_run_host_predictor.txt
File 14 Description: host plant information used in MACAU
File 15 Name: tiles.macau.low10.genes.id.order.txt
File 15 Description: MACAU table with gene.id formatted to estimate GO enrichment. The annotation was performed similarly to the scripts above
METHODOLOGICAL INFORMATION
Timema cristinae individuals from the selected populations were collected on the same date (25th April 2017) in the Californian spring using sweep nets, and kept in plastic containers at room temperature. Individuals were digitally photographed the following day under standard conditions, flash frozen using liquid nitrogen, and preserved at -80OC temperature. All procedures were performed to assure the methylation status was not considerably affected by variation in sampling conditions.
Half of each specimen’s body (cut longitudinally) was used to isolate its genomic DNA using DNeasy Blood and Tissue Kits (Qiagen). We included non-methylated cl857 Sam7 Lambda phage DNA (Promega Corporation) a spike-in in each sample (1% of the final volume). We submitted genomic DNA of one T. cristinae sample (individual 17_0015, ‘NO_BS.WGBS’ sample) for BS-seq, and as a control for the BS-treatment (i.e., sequenced without sodium-bisulfite treatment). The sodium-bisulfite treatment and high-throughput sequencing were performed by Biomedicum Functional Genomics Unit (FuGU, Helsinki). The libraries were sequenced using the Illumina NextSeq 500 system, with High Output 2 x 150 bp runs. In total, three flow cells with four lanes were run.
In addition, we performed RNA-seq of 18 individuals, which were the only ones that yielded enough material to perform further sequencing. The RNA extractions, library preparations and sequencing were performed by Genome Quebec. Total RNA for each individual was extracted from the remaining half of the specimens’ bodies using the Quiacube animal tissue kit and protocol. Libraries were multiplexed and sequenced on one lane of HiSeq4000 to obtain 150 base pair paired-end reads. More detailed information can be found in the folders for each particular analysis in the Online Supplementary Materials.
DATA SPECIFIC INFORMATION FOR: 2018Methylation_info_spreadsheet_standard.csv
Main spreadsheet with information of each individual. Climatic differences were estimated using WorldClim (columns 12-30 see https://www.worldclim.org/data/bioclim.html)
- Number of variables: 46
- Number of cases/rows: 24
- Variable List:
id: individual unique id file_id: individual id on the file run: flow cell in which the individuals were sequenced species: Timema species (T.cristinae) location: locality in which the indivudal was collected (see Online Supplementary Materials) host: host plant species in which the individuals were collected (A=Adenostoma and C=Ceanothus) sex: sex of each individual (F=female; M=male) morph: color-pattern morph of each indiviual latitude: latitude of the location longitude: longitude of the location elevation: elevation in which the individuals were collected (m) annual_mean_temperature: Annual Mean Temperature (BIO1 worldclim variable) (oC) mean_diurnal_range: Mean Diurnal Range (Mean of monthly (max temp - min temp); BIO2 worldclim variable) (oC) isothermality: Isothermality (BIO2/BIO7 ×100; BIO3 worldclim variable) temperature_seasonality: Temperature Seasonality (standard deviation ×100; BIO 5 worldclim variable) max_temp_warmest_month: Max Temperature of Warmest Month (BIO 5) (oC) min_temp_coldest_month: Min Temperature of Coldest Month (BIO6 worldclim variable) (oC) temp_anual_range: Temperature Annual Range (BIO5-BIO6; BIO7 worldclim variable) (oC) mean_temp_wettest_quarter: Mean Temperature of Wettest Quarter (BIO8 worldclim variable) (oC) mean_temp_driest_quarter: Mean Temperature of Driest Quarter (BIO9 worldclim variable) (oC) mean_temp_warmest_quarter: Mean Temperature of Warmest Quarter (BIO10 worldclim variable) (oC) mean_temp_coldest_quarter: Mean Temperature of Coldest Quarter (BIO11 worldclim variable) (oC) annual_precipitation: Annual Precipitation (BIO12 world clim variable) (mm) precipitation_wettest_month: Precipitation of Wettest Month (BIO13 worldclim variable) (mm) precipitation_driest_month: Precipitation of Driest Month (BIO14 worldclim variable) (mm) precipitation_seasonality: Precipitation Seasonality (Coefficient of Variation; BIO15 worldclim variable) precipitation_wettest_quarter: Precipitation Seasonality (Coefficient of Variation; BIO16 worldclim variable) precipitation_driest_quarter: Precipitation of Driest Quarter (BIO17 worldclim variable) (mm) precipitation_warmest_quarter: Precipitation of Warmest Quarter (BIO18 worldclim variable) (mm) precipitation_coldest_quarter: Precipitation of Coldest Quarter (BIO19 worldclim varaieble) (mm) clim_PC1: first principal component of the climatic bioclimatic variables clim_PC2: second principal component of the climatic bioclimatic variables clim_PC3: third principal component of the climatic biocliatic variables BL: body length, estimated following Riesch et al. 2017 (cm) BW: body width, estimated following Riesch et al. 2017 (cm) HW: head width, estimated following Riesch et al. 2017 (cm) sequencing batch: the batch of bisulfite sequencing n_parsed: reads parsed after filtering n_mapped: reads after mapping to Bismark mapping_efficiency_Tcristinae: efficiency of mapping to Bismark methylated_cytosines_CpG: number of cytosines methylated in CpG context unmethylated_cytosines_CpG: number of cytosines unmethylated in CpG context percentage_CpG: percentage of the CpG that were methylated methylated_cytosines_CpG_phage: number of methylated cytosines in the lambda phage unmethylated_cytosines_CpG_phage: number of unmethylated cytosines in the lambda phage percentage_CpG_phage: perecntage of methylation of cytosines in CpG context - Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci
This is a list of C/T and G/A polymorphisms from new and previously published T. cristinae data
- Number of variables: 1
- Number of cases/rows: 15534254
- Variable List: not applicable
- Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt
This data is a compilation of the annotation tables among the 24 variables. Here, only loci covered by a minimum of 5 reads and maximum of 60 were retained. We also selected loci present at at least 12 samples. This table shows the methylation status at each loci (based on the binomial distribution)
- Number of variables: 35
- Number of cases/rows: 796,200
-
Variable List:
site: loci of the CpG with methylation information columns 2-25: binomal information about the status of methylation (0=umethylated and 1=methylated, NA=missing data) for the 24 individuals as labeled in File 1. cds: boolean value for whether the site is located on a CDS intron: boolean value for whether the site is located on an intron mRNA: boolean value for whether the site is located on a predicted mRNA (gene) upstream: boolean value for whether the site is located 1kbp upstream of the transcription starting site downstream: boolean value for whether the site is located 1kbp downstream of the transcription ending site gene.id: if the site was located within a gene, then the gene id is printed; else, "NA" gene: the name of the gene, isolated from other information number: number of individuals with data n.methylated: number of individuals with methylated status status: general status of the site (0=unmethylated; 1=methylated)
- Missing data codes:
NA - Abbreviations used:
CDS=coding sequence - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt
ame as File 3, but here the mean methylation levels were calculated. Intergenic regions were removed to ease the analyses
- Number of variables: 13
- Number of cases/rows: 249066
-
Variable List:
site: loci of the CpG with methylation information number: number of individuals with data mean: mean methylation level across the individuals (%) cds: boolean value for whether the site is located on a CDS intron: boolean value for whether the site is located on an intron mRNA: boolean value for whether the site is located on a predicted mRNA (gene) upstream: boolean value for whether the site is located 1kbp upstream of the transcription starting site downstream: boolean value for whether the site is located 1kbp downstream of the transcription ending site gene.id: if the site was located within a gene, then the gene id is printed; else, "NA" gene: the name of the gene, isolated from other information dist.from.gene: distance between this site and a nearby gene (bp) dist.from.start: if within a gene, the distance between this site and the transcription start order (bp) order: the order of the CDS or the intron (eg. CDS1, CDS2, CDS3)
- Missing data codes:
NA - Abbreviations used:
CDS=coding sequence - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: first.low5.high60.noCT.GA.methylation.status.mRNA.GO.txt
Table with gene ids. Formatted to input at script GO.enrichment.genes.R
- Number of variables: 26
- Number of cases/rows: 8,472
-
Variable List:
gene: gene id columns 2-25: individuals and whether the gene status was methylated (1) or not (0) by binomial analyses number: number of individuals with methylated status at each gene
- Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA.txt
Table compiling methylation cytosine reports among all 24 samples. Loci with minimum 2 reads and maximum 60 (above 99th quantile) were removed. The methylation levels were calculated as methylated cytosines at a certain locus over the sum of all reads covering it.
- Number of variables: 25
- Number of cases/rows: 296,731
-
Variable List:
site: CpG locus columns 2-25: information about methylation level in each of the 24 indiviuals
- Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR:jtdistance.matrix.first.txt
Genetic distance matrix between the 24 individuals based on the RAD-seq data. There is no header, and individuals are as the rownmaes
- Number of variables: 24
- Number of cases/rows: 24
- Variable List:
not applicable - Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: MethylRaw_methylC_first_methyl_tiles_low10.txt
This is the methylated cytosine counts to be input in MACAU
- Number of variables: 25
- Number of cases/rows: 82,696
-
Variable List:
site: id for the 1kbp tiles columns 2-25: cytosine counts for each 1kbp tile at each individual
- Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: MethylRaw_coverage_first_methyl_tiles_low10.txt
This is the data of the coverage input in MACAU
- Number of variables: 25
- Number of cases/rows: 82,696
-
Variable List:
site: id for the 1kbp tiles columns 2-25: coverage for each 1kbp tile at each individual
- Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: 2018Methylation_covariates_batch_first.txt
Covariates input at MACAU, namely: PC1 and PC2 from climate, bisulfite conversion (calculated based on the lambda phage), and sequencing batch
- Number of variables: 4
- Number of cases/rows: 24
-
Variable List:
first row: PC1 loadings of climatic variables second row: PC2 loadings of climatic variables third row: error rate of bisulfite conversion fourth row: sequencing batch
- Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: 2018Methylation_first_run_relatedness.cXX.txt
Kinship matrix between the 24 individuals calculated based on RAD-seq. Performed using gemma with default parameters (Zhou et al. 2013).
- Number of variables: 24
- Number of cases/rows: 24
- Variable List:
not applicable - Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: Bayesian_regression_cutoff_beta_geno.txt
Ths represents the compiled coefficient results of the Bayesian regression across the different cut-offs designating DMRs
- Number of variables: 5
- Number of cases/rows: 20
-
Variable List:
quantile: quantile of the empirical pvalue distribution used to designate DMRs var: variable explaining variation, geographical distance or host plant beta: coefficient output from Bayesian regresssion min: minimal 95% ETPI max: maximal 95% ETPI
- Missing data codes:
none - Abbreviations used:
geog=geographical distance
host=host plant - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: macau.first.output.CpG.tiles.low10.var.adjusted.txt
MACAU output. We disregarded the unassembled scaffolds (i.e. lgNA)
- Number of variables: 20
- Number of cases/rows: 82,696
-
Variable List:
id: id of the 1kbp tile n: number of individuals analysed acpt_rate: acceptance rate beta: beta coefficient of the predictor's effects se_beta: standard error of the predictor's effects pvalue: p-value of the association between methylation count and the predictor (here, host plant) h: heritability of the logit transformed methylation proportion se_h: standard error of the heritability sigma2: variance component se_sigma2: standard error of the variance component alpha0: coefficient of the climate PC1 effects on methylation variation se_alpha0: standard error of alpha0 alpha1: coefficient of the climate PC2 effects on methylation variation se_alpha1: standard error of alpha1 alpha2: coefficient of the bisulfite error rates effects on methylation variation se_alpha2: standard error of alpha1 alpha3: coefficient of the batch effects on methylation variation se_alpha3: standard error of alpha3 alpha4: coefficient effects of the column with number 1 automatically added at the end of the covariates file (standard) se_alpha4: standard error of alpha4
- Missing data codes:
none - Abbreviations used:
none - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: 2018Methylation_first_run_host_predictor.txt
Host plant predictor used in MACAU. 0 denotes Adenostoma and 1 denotes Ceanothus
- Number of variables: 1
- Number of cases/rows: 24
- Variable List:
not applicable - Missing data codes:
none - Abbreviations used:
NA - Other relevant information:
none
DATA SPECIFIC INFORMATION FOR: tiles.macau.low10.genes.id.order.txt
MACAU table with gene.id formatted to estimate GO enrichment. The annotation was performed similarly to the scripts above
- Number of variables:
- Number of cases/rows:
-
Variable List:
lg: linkage group scaf: scaffold pos1: first position in the 1kbp tile pos2: last position in the 1kbp tile pvalue: pvalue from MACAU gene: boolean for tile located within a gene gene.id: id of the gene
- Missing data codes:
none - Abbreviations used:
NA - Other relevant information:
none
Timema cristinae individuals from the selected populations were collected on the same date (25th April 2017) in the Californian spring using sweep nets, and kept in plastic containers at room temperature. Individuals were digitally photographed the following day under standard conditions, flash frozen using liquid nitrogen, and preserved at -80OC temperature. All procedures were performed to assure the methylation status was not considerably affected by variation in sampling conditions.
Half of each specimen’s body (cut longitudinally) was used to isolate its genomic DNA using DNeasy Blood and Tissue Kits (Qiagen). We included non-methylated cl857 Sam7 Lambda phage DNA (Promega Corporation) a spike-in in each sample (1% of the final volume). We submitted genomic DNA of one T. cristinae sample (individual 17_0015, ‘NO_BS.WGBS’ sample) for BS-seq, and as a control for the BS-treatment (i.e., sequenced without sodium-bisulfite treatment). The sodium-bisulfite treatment and high-throughput sequencing were performed by Biomedicum Functional Genomics Unit (FuGU, Helsinki). The libraries were sequenced using the Illumina NextSeq 500 system, with High Output 2 x 150 bp runs. In total, three flow cells with four lanes were run.
In addition, we performed RNA-seq of 18 individuals, which were the only ones that yielded enough material to perform further sequencing. The RNA extractions, library preparations and sequencing were performed by Genome Quebec. Total RNA for each individual was extracted from the remaining half of the specimens’ bodies using the Quiacube animal tissue kit and protocol. Libraries were multiplexed and sequenced on one lane of HiSeq4000 to obtain 150 base pair paired-end reads. More detailed information can be found in the folders for each particular analysis in the Online Supplementary Materials.
- 2018Methylation_info_spreadsheet_standard.csv: Table with information regarding samples, locations, climatic information, and bisulfite conversion.
-
BSseq_pipeline: The series of scripts below were used in the pipeline to process bifulfite reads
- 1.1_parallel_trimmomatic.sh: Runs Trimmomatic to filter raw bisulfite reads
- 1.2_sampling24k.sh: Samples a number of reads to reduce batch effects on downstream analyses
- 1.3_bismark.mapping.to.phage.sh: Runs Bismark to map the bisulfite reads to the lambda phage (GenBank J02459)
- 1.4_bismark.maping.to.tcristinae.sh: Runs Bismark on unmapped reads to the phage to T. cristinae genome (v1.3c2)
- 1.5_parallel_bismark_methylation_extractor.sh: Runs Bismark function 'bismark_methylation_extractor' to call methylation into cytosine reports tables
- 1.6_remove.CT.GA.polymorphisms.pl: This script removes the SNPs listed on the file 'variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci' and remove from the cytosine reports
- variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci: list of C/T and G/A polymorphisms from new and previously published T. cristinae data
- Bismark_deduplicating_reads.sl: Runs deduplication
-
Annotation: This section contains scripts used to annotate the methylation variation at Timema cristinae species level using the 24 individuals
- 2.1_get.methylation.status.individual.binomial.pl: It calculates the methylation status based on binomial distributions. Run on the cytosine reports on each individual
- 2.2_retrieve_annotation_augustus.R: Determines the annotation of each methylation position based on the annotation file from Villoutreix et al. 2020.
- 2.3_retrieve.genes.id.R: Gets the gene id based on the annotation file from Villoutreix et al. 2020, after running 2_retrieve_annotation_augustus.R
- 2.4_retrieve_annotation_repeatmasker_1.3c2.R: Gets the repeats annotation based on the repeatable elements annotation from Villoutreix et al. 2020.
- 2.5_retrieve.exon.intron.oreder.on.meth.table.R: Gets the exons and introns in the order (e.g. CDS1, CDS2, etc.)
- 2.6_level.methylation.exons.introns.R: Estimates methylation levels on different exons and introns.
- 2.7_enrichment.genomic.features.R: Estimates the ernichment of methylation levels on different genomic features
- 2.8_GO.enrichment.genes.R: Estimates the enrichment of GO terms in genes that are hypo or hyper methylated.
- first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt: Compilation of the annotation tables. Here, only loci covered by a minimum of 5 reads and maximum of 60 were retained. We also selected loci present at at least 12 samples. This table shows the methylation status at each loci (based on the binomial distribution)
- first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt: Same as above, but here the mean methylation levels were calculated. Intergenic regions were removed to ease the analyses
- first.low5.high60.noCT.GA.methylation.status.mRNA.GO: Table with gene ids. Formatted to input at script 8_GO.enrichment.genes.R.
-
RNA-seq: This section contains scripts used to process RNA-seq data
- 3.1_cutadapt_filtering.sh: Filters adapters from the data
- 3.2_trimmomatic_filtering.sh: Runs Trimmomatic
- 3.3_mapping_array_STAR_relaxed_pe.sh: Runs STAR to map RNA-seq data to T. cristinae reference genome (v1.3c2)
- 3.4_featureCounts_Tcristinae_genes.sl: Peforms featureCounts function
- 3.5_plot_expression_methylation.R: Estimates relationship between methylation levels and expression data
-
Genome_wide_comparison: This section contains scripts and inputs from the genome-wide analyses
- 4.1.methylation_genetic_mantel_bayesian.R: Runs mantel tests and Bayesian regressions
- MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA: Table compiling methylation cytosine reports among all 24 samples. Loci with minimum 2 reads and maximum 60 (above 99th quantile) were removed. The methylation levels were calculated as methylated cytosines at a certain locus over the sum of all reads covering it.
- jtdistance.matrix.txt: Genetic distances between the 24 individuals based on the RAD-seq data
-
MACAU: This section contains scripts, inputs and outputs related to MACAU analyses
- 5.1_methylKit.tiles.R: Runs methylKit and summarizes the data into 1kbp tiles
- 5.2_tiles.filtering.before.macau.R: Removes tiles that are hypo and hyper metylated following Lea et al. (2016)
- 5.3_MethylRaw_formatting_first.sh: Formats the table generated by methylKit into MACAU inputs
- 5.4_macau.first.sh: Runs MACAU
- 5.5_ibd_methylation_by_host_diff_cutoffs.R: Runs bayesian regressions on the outputs from MACAU
- 5.6_GO.term.DMR.cutoff.R: Estimates GO enrichments on DMRs from different p-value cutoffs
- MethylRaw_methylC_first_methyl_tiles_low10: methylated cytosine counts for MACAU
- MethylRaw_coverage_first_methyl_tiles_low10: coverage input for MACAU
- 2018Methylation_covariates_batch_first.txt: Covariates input at MACAU, namely: PC1 and PC2 from climate, bisulfite conversion (calculated based on the lambda phage), and sequencing batch
- 2018Methylation_covariates_batch_first.txt: Host-plant predictor for MACAU
- 2018Methylation_first_run_relatedness.cXX.txt: Kinship matrix calculated based on RAD-seq. Performed using gemma with default parameters (Zhou et al. 2013).
- macau.first.output.CpG.tiles.low10.var.adjusted.txt: MACAU output. We disregarded the unassembled scaffolds (i.e. lgNA).
- tiles.macau.low10.genes.id.order.txt: MACAU table with gene.id formatted to estimate GO enrichment. The annotation was performed similarly to the scripts above