DNA methylation differences between stick insect ecotypes

Abstract

Epigenetic mechanisms, such as DNA methylation, can influence gene regulation and affect phenotypic variation, raising the possibility that they contribute to ecological adaptation. To begin to address this issue requires high-resolution sequencing studies of natural populations to pinpoint epigenetic regions of potential ecological and evolutionary significance. However, such studies are still relatively uncommon, especially in insects, and are mainly restricted to a few model organisms. Here, we characterize patterns of DNA methylation for natural populations of Timema cristinae adapted to two host plant species (i.e., ecotypes). By integrating results from sequencing of whole transcriptomes, genomes, and methylomes, we investigate whether environmental, host, and genetic differences of these stick insects are associated with methylation levels of cytosine nucleotides in CpG context. We report an overall genome-wide methylation level for T. cristinae of ~14%, being enriched in gene bodies and impoverished in repetitive elements. Genome-wide DNA methylation variation was strongly positively correlated with genetic distance (relatedness) but also exhibited significant host-plant effects. Using methylome-environment association analysis, we pinpointed specific genomic regions that are differentially methylated between ecotypes, with these regions being enriched for genes with functions in membrane processes. The observed association between methylation variation with genetic relatedness and the ecologically-important variable of host plant suggest a potential role for epigenetic modification in T. cristinae adaptation. To substantiate such adaptive significance, future studies could test if methylation has a heritable component and the extent to which it responds to experimental manipulation in field and laboratory studies.

1 .Title of Dataset: Data from: DNA methylation differences between stick insect ecotypes.

2. Author Information

Corresponding Investigator 1

Name: Dr. Clarissa F. de Carvalho
Institution: Universidade Federal de São Paulo (UNIFESP), São Paulo, SP 04021-001, Brazil
Email: clarissa.ferreira@unifesp.br

Corresponding Investigator 2

Name: Dr. Patrik Nosil
Institution: CEFE, Univ Montpellier, CNRS, EPHE, IRD, Univ Paul Valéry Montpellier 3, Montpellier 34293, France
Email: patrik.nosil@cefe.cnrs.fr

Co-investigator 1

Name: Dr. Romain Villoutreix
Institution: CEFE, Univ Montpellier, CNRS, EPHE, IRD, Univ Paul Valéry Montpellier 3, Montpellier 34293, France

Co-investigator 2

Name: Pr. Zachariah Gompert
Institution: Department of Biology, Utah State University, Logan, UT 84321, USA

Co-investigator 3

Name: Pr. Jon Slate
Institution: School of Biosciences, University of Sheffield; Sheffield, S10 2TN, UK

Co-investigator 4

Name: Pr. Jeffrey L. Feder
Institution: Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA

Co-investigator 5

Name: Rüdiger Riesch
Institution: Department of Biological Sciences, Centre for Ecology, Evolution and Behaviour, Royal Holloway University of London; Egham, TW20 0EX, UK

3. Date of data collection:

2017

4. Geographic location of data collection:

California, USA

5. Funding sources that supported the collection of the data:

This work was funded by supporting grants from ERC NatHisGen R/129639, Royal Society of London RG140369 (C.F.d.C, P.N.), the University of Sheffield, the Human Frontier Science Program (R.R.), and FAPESP 2020/07556-8 (C.F.d.C).

6. Recommended citation for this dataset:

de Carvalho et al. (2023), Data from: DNA methylation differences between stick insect ecotypes, Dryad, Dataset

DATA & FILE OVERVIEW

1. Description of dataset

These data were generated to characterize patterns of DNA methylation for natural populations of Timema cristinae adapted to two host plant species (i.e., ecotypes). By integrating results from sequencing of whole transcriptomes, genomes, and methylomes, we investigate whether environmental, host, and genetic differences of these stick insects are associated with methylation levels of cytosine nucleotides in the CpG context.

2. File list:

File 1 Name: 2018Methylation_info_spreadsheet_standard.csv
File 1 Description: Table with information regarding samples, locations, climatic information, and bisulfite conversion

File 2 Name: variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci
File 2 Description: list of C/T and G/A polymorphisms from new and previously published T. cristinae data

File 3 Name: first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt
File 3 Description: Compilation of the annotation tables. Here, only loci covered by a minimum of 5 reads and maximum of 60 were retained. We also selected loci present at at least 12 samples. This table shows the methylation status at each loci (based on the binomial distribution)

File 4 Name: first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt
File 4 Description: Same as File 3, but here the mean methylation levels were calculated. Intergenic regions were removed to ease the analyses

File 5 Name: first.low5.high60.noCT.GA.methylation.status.mRNA.GO.txt
File 5 Description: Table with gene ids. Formatted to input at script GO.enrichment.genes.R

File 6 Name: MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA.txt
File 6 Description: Table compiling methylation cytosine reports among all 24 samples. Loci with minimum 2 reads and maximum 60 (above 99th quantile) were removed. The methylation levels were calculated as methylated cytosines at a certain locus over the sum of all reads covering it.

File 7 Name: jtdistance.matrix.first.txt
File 7 Description: Genetic distances between the 24 individuals based on the RAD-seq data

File 8 Name: MethylRaw_methylC_first_methyl_tiles_low10.txt
File 8 Description: methylated cytosine counts for MACAU

File 9 Name: MethylRaw_coverage_first_methyl_tiles_low10
File 9 Description: coverage input for MACAU

File 10 Name: 2018Methylation_covariates_batch_first.txt
File 10 Description: Covariates input at MACAU, namely: PC1 and PC2 from climate, bisulfite conversion (calculated based on the lambda phage), and sequencing batch

File 11 Name: 2018Methylation_first_run_relatedness.cXX.txt
File 11 Description: Kinship matrix calculated based on RAD-seq. Performed using gemma with default parameters (Zhou et al. 2013).

File 12 Name: Bayesian_regression_cutoff_beta_geno.txt
File 12 Description: compiled coefficient results of the Bayesian regression across the different cut-offs designating DMRs

File 13 Name: macau.first.output.CpG.tiles.low10.var.adjusted.txt
File 13 Description: MACAU output. We disregarded the unassembled scaffolds (i.e. lgNA)

File 14 Name: 2018Methylation_first_run_host_predictor.txt
File 14 Description: host plant information used in MACAU

File 15 Name: tiles.macau.low10.genes.id.order.txt
File 15 Description: MACAU table with gene.id formatted to estimate GO enrichment. The annotation was performed similarly to the scripts above

METHODOLOGICAL INFORMATION

Timema cristinae individuals from the selected populations were collected on the same date (25th April 2017) in the Californian spring using sweep nets, and kept in plastic containers at room temperature. Individuals were digitally photographed the following day under standard conditions, flash frozen using liquid nitrogen, and preserved at -80OC temperature. All procedures were performed to assure the methylation status was not considerably affected by variation in sampling conditions.

Half of each specimen’s body (cut longitudinally) was used to isolate its genomic DNA using DNeasy Blood and Tissue Kits (Qiagen). We included non-methylated cl857 Sam7 Lambda phage DNA (Promega Corporation) a spike-in in each sample (1% of the final volume). We submitted genomic DNA of one T. cristinae sample (individual 17_0015, ‘NO_BS.WGBS’ sample) for BS-seq, and as a control for the BS-treatment (i.e., sequenced without sodium-bisulfite treatment). The sodium-bisulfite treatment and high-throughput sequencing were performed by Biomedicum Functional Genomics Unit (FuGU, Helsinki). The libraries were sequenced using the Illumina NextSeq 500 system, with High Output 2 x 150 bp runs. In total, three flow cells with four lanes were run.

In addition, we performed RNA-seq of 18 individuals, which were the only ones that yielded enough material to perform further sequencing. The RNA extractions, library preparations and sequencing were performed by Genome Quebec. Total RNA for each individual was extracted from the remaining half of the specimens’ bodies using the Quiacube animal tissue kit and protocol. Libraries were multiplexed and sequenced on one lane of HiSeq4000 to obtain 150 base pair paired-end reads. More detailed information can be found in the folders for each particular analysis in the Online Supplementary Materials.

DATA SPECIFIC INFORMATION FOR: 2018Methylation_info_spreadsheet_standard.csv

Main spreadsheet with information of each individual. Climatic differences were estimated using WorldClim (columns 12-30 see https://www.worldclim.org/data/bioclim.html)

Number of variables: 46
Number of cases/rows: 24
Variable List:
id: individual unique id file_id: individual id on the file run: flow cell in which the individuals were sequenced species: Timema species (T.cristinae) location: locality in which the indivudal was collected (see Online Supplementary Materials) host: host plant species in which the individuals were collected (A=Adenostoma and C=Ceanothus) sex: sex of each individual (F=female; M=male) morph: color-pattern morph of each indiviual latitude: latitude of the location longitude: longitude of the location elevation: elevation in which the individuals were collected (m) annual_mean_temperature: Annual Mean Temperature (BIO1 worldclim variable) (oC) mean_diurnal_range: Mean Diurnal Range (Mean of monthly (max temp - min temp); BIO2 worldclim variable) (oC) isothermality: Isothermality (BIO2/BIO7 ×100; BIO3 worldclim variable) temperature_seasonality: Temperature Seasonality (standard deviation ×100; BIO 5 worldclim variable) max_temp_warmest_month: Max Temperature of Warmest Month (BIO 5) (oC) min_temp_coldest_month: Min Temperature of Coldest Month (BIO6 worldclim variable) (oC) temp_anual_range: Temperature Annual Range (BIO5-BIO6; BIO7 worldclim variable) (oC) mean_temp_wettest_quarter: Mean Temperature of Wettest Quarter (BIO8 worldclim variable) (oC) mean_temp_driest_quarter: Mean Temperature of Driest Quarter (BIO9 worldclim variable) (oC) mean_temp_warmest_quarter: Mean Temperature of Warmest Quarter (BIO10 worldclim variable) (oC) mean_temp_coldest_quarter: Mean Temperature of Coldest Quarter (BIO11 worldclim variable) (oC) annual_precipitation: Annual Precipitation (BIO12 world clim variable) (mm) precipitation_wettest_month: Precipitation of Wettest Month (BIO13 worldclim variable) (mm) precipitation_driest_month: Precipitation of Driest Month (BIO14 worldclim variable) (mm) precipitation_seasonality: Precipitation Seasonality (Coefficient of Variation; BIO15 worldclim variable) precipitation_wettest_quarter: Precipitation Seasonality (Coefficient of Variation; BIO16 worldclim variable) precipitation_driest_quarter: Precipitation of Driest Quarter (BIO17 worldclim variable) (mm) precipitation_warmest_quarter: Precipitation of Warmest Quarter (BIO18 worldclim variable) (mm) precipitation_coldest_quarter: Precipitation of Coldest Quarter (BIO19 worldclim varaieble) (mm) clim_PC1: first principal component of the climatic bioclimatic variables clim_PC2: second principal component of the climatic bioclimatic variables clim_PC3: third principal component of the climatic biocliatic variables BL: body length, estimated following Riesch et al. 2017 (cm) BW: body width, estimated following Riesch et al. 2017 (cm) HW: head width, estimated following Riesch et al. 2017 (cm) sequencing batch: the batch of bisulfite sequencing n_parsed: reads parsed after filtering n_mapped: reads after mapping to Bismark mapping_efficiency_Tcristinae: efficiency of mapping to Bismark methylated_cytosines_CpG: number of cytosines methylated in CpG context unmethylated_cytosines_CpG: number of cytosines unmethylated in CpG context percentage_CpG: percentage of the CpG that were methylated methylated_cytosines_CpG_phage: number of methylated cytosines in the lambda phage unmethylated_cytosines_CpG_phage: number of unmethylated cytosines in the lambda phage percentage_CpG_phage: perecntage of methylation of cytosines in CpG context
Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci

This is a list of C/T and G/A polymorphisms from new and previously published T. cristinae data

Number of variables: 1
Number of cases/rows: 15534254
Variable List: not applicable
Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt

This data is a compilation of the annotation tables among the 24 variables. Here, only loci covered by a minimum of 5 reads and maximum of 60 were retained. We also selected loci present at at least 12 samples. This table shows the methylation status at each loci (based on the binomial distribution)

Number of variables: 35
Number of cases/rows: 796,200

Variable List:

 site: loci of the CpG with methylation information  

 columns 2-25: binomal information about the status of methylation (0=umethylated and 1=methylated, NA=missing data) for the 24 individuals as labeled in File 1.   

 cds: boolean value for whether the site is located on a CDS  

 intron: boolean value for whether the site is located on an intron  

 mRNA: boolean value for whether the site is located on a predicted mRNA (gene)  

 upstream: boolean value for whether the site is located 1kbp upstream of the transcription starting site  

 downstream: boolean value for whether the site is located 1kbp downstream of the transcription ending site  

 gene.id: if the site was located within a gene, then the gene id is printed; else, "NA"  

 gene: the name of the gene, isolated from other information  

 number: number of individuals with data  

 n.methylated: number of individuals with methylated status  

 status: general status of the site (0=unmethylated; 1=methylated)

Missing data codes:
NA
Abbreviations used:
CDS=coding sequence
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt

ame as File 3, but here the mean methylation levels were calculated. Intergenic regions were removed to ease the analyses

Number of variables: 13
Number of cases/rows: 249066

Variable List:

 site: loci of the CpG with methylation information  

 number: number of individuals with data  

 mean: mean methylation level across the individuals (%) 

 cds: boolean value for whether the site is located on a CDS  

 intron: boolean value for whether the site is located on an intron  

 mRNA: boolean value for whether the site is located on a predicted mRNA (gene)  

 upstream: boolean value for whether the site is located 1kbp upstream of the transcription starting site  

 downstream: boolean value for whether the site is located 1kbp downstream of the transcription ending site  

 gene.id: if the site was located within a gene, then the gene id is printed; else, "NA"  

 gene: the name of the gene, isolated from other information  

 dist.from.gene: distance between this site and a nearby gene (bp)  

 dist.from.start: if within a gene, the distance between this site and the transcription start order (bp)

 order: the order of the CDS or the intron (eg. CDS1, CDS2, CDS3)

Missing data codes:
NA
Abbreviations used:
CDS=coding sequence
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: first.low5.high60.noCT.GA.methylation.status.mRNA.GO.txt

Table with gene ids. Formatted to input at script GO.enrichment.genes.R

Number of variables: 26
Number of cases/rows: 8,472

Variable List:

 gene: gene id  

 columns 2-25: individuals and whether the gene status was methylated (1) or not (0) by binomial analyses  

 number: number of individuals with methylated status at each gene

Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA.txt

Table compiling methylation cytosine reports among all 24 samples. Loci with minimum 2 reads and maximum 60 (above 99th quantile) were removed. The methylation levels were calculated as methylated cytosines at a certain locus over the sum of all reads covering it.

Number of variables: 25
Number of cases/rows: 296,731

Variable List:

 site: CpG locus  

 columns 2-25: information about methylation level in each of the 24 indiviuals

Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR:jtdistance.matrix.first.txt

Genetic distance matrix between the 24 individuals based on the RAD-seq data. There is no header, and individuals are as the rownmaes

Number of variables: 24
Number of cases/rows: 24
Variable List:
not applicable
Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: MethylRaw_methylC_first_methyl_tiles_low10.txt

This is the methylated cytosine counts to be input in MACAU

Number of variables: 25
Number of cases/rows: 82,696

Variable List:

 site: id for the 1kbp tiles  

 columns 2-25: cytosine counts for each 1kbp tile at each individual

Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: MethylRaw_coverage_first_methyl_tiles_low10.txt

This is the data of the coverage input in MACAU

Number of variables: 25
Number of cases/rows: 82,696

Variable List:

 site: id for the 1kbp tiles  

 columns 2-25: coverage for each 1kbp tile at each individual

Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: 2018Methylation_covariates_batch_first.txt

Covariates input at MACAU, namely: PC1 and PC2 from climate, bisulfite conversion (calculated based on the lambda phage), and sequencing batch

Number of variables: 4
Number of cases/rows: 24

Variable List:

 first row: PC1 loadings of climatic variables  

 second row: PC2 loadings of climatic variables  

 third row: error rate of bisulfite conversion  

 fourth row: sequencing batch

Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: 2018Methylation_first_run_relatedness.cXX.txt

Kinship matrix between the 24 individuals calculated based on RAD-seq. Performed using gemma with default parameters (Zhou et al. 2013).

Number of variables: 24
Number of cases/rows: 24
Variable List:
not applicable
Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: Bayesian_regression_cutoff_beta_geno.txt

Ths represents the compiled coefficient results of the Bayesian regression across the different cut-offs designating DMRs

Number of variables: 5
Number of cases/rows: 20

Variable List:

 quantile: quantile of the empirical pvalue distribution used to designate DMRs  

 var: variable explaining variation, geographical distance or host plant  

 beta: coefficient output from Bayesian regresssion  

 min: minimal 95% ETPI  

 max: maximal 95% ETPI

Missing data codes:
none
Abbreviations used:
geog=geographical distance
host=host plant
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: macau.first.output.CpG.tiles.low10.var.adjusted.txt

MACAU output. We disregarded the unassembled scaffolds (i.e. lgNA)

Number of variables: 20
Number of cases/rows: 82,696

Variable List:

 id: id of the 1kbp tile

 n: number of individuals analysed

 acpt_rate: acceptance rate

 beta: beta coefficient of the predictor's effects

 se_beta: standard error of the predictor's effects

 pvalue: p-value of the association between methylation count and the predictor (here, host plant)

 h: heritability of the logit transformed methylation proportion

 se_h: standard error of the heritability

 sigma2: variance component

 se_sigma2: standard error of the variance component

 alpha0: coefficient of the climate PC1 effects on methylation variation

 se_alpha0: standard error of alpha0

 alpha1: coefficient of the climate PC2 effects on methylation variation

 se_alpha1: standard error of alpha1

 alpha2: coefficient of the bisulfite error rates effects on methylation variation

 se_alpha2: standard error of alpha1

 alpha3: coefficient of the batch effects on methylation variation

 se_alpha3: standard error of alpha3

 alpha4: coefficient effects of the column with number 1 automatically added at the end of the covariates file (standard)

 se_alpha4: standard error of alpha4

Missing data codes:
none
Abbreviations used:
none
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: 2018Methylation_first_run_host_predictor.txt

Host plant predictor used in MACAU. 0 denotes Adenostoma and 1 denotes Ceanothus

Number of variables: 1
Number of cases/rows: 24
Variable List:
not applicable
Missing data codes:
none
Abbreviations used:
NA
Other relevant information:
none

DATA SPECIFIC INFORMATION FOR: tiles.macau.low10.genes.id.order.txt

MACAU table with gene.id formatted to estimate GO enrichment. The annotation was performed similarly to the scripts above

Number of variables:
Number of cases/rows:

Variable List:

 lg: linkage group  

 scaf: scaffold  

 pos1: first position in the 1kbp tile  

 pos2: last position in the 1kbp tile  

 pvalue: pvalue from MACAU  

 gene: boolean for tile located within a gene  

 gene.id: id of the gene

Missing data codes:
none
Abbreviations used:
NA
Other relevant information:
none

2018Methylation_info_spreadsheet_standard.csv: Table with information regarding samples, locations, climatic information, and bisulfite conversion.
BSseq_pipeline: The series of scripts below were used in the pipeline to process bifulfite reads
- 1.1_parallel_trimmomatic.sh: Runs Trimmomatic to filter raw bisulfite reads
- 1.2_sampling24k.sh: Samples a number of reads to reduce batch effects on downstream analyses
- 1.3_bismark.mapping.to.phage.sh: Runs Bismark to map the bisulfite reads to the lambda phage (GenBank J02459)
- 1.4_bismark.maping.to.tcristinae.sh: Runs Bismark on unmapped reads to the phage to T. cristinae genome (v1.3c2)
- 1.5_parallel_bismark_methylation_extractor.sh: Runs Bismark function 'bismark_methylation_extractor' to call methylation into cytosine reports tables
- 1.6_remove.CT.GA.polymorphisms.pl: This script removes the SNPs listed on the file 'variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci' and remove from the cytosine reports
- variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci: list of C/T and G/A polymorphisms from new and previously published T. cristinae data
- Bismark_deduplicating_reads.sl: Runs deduplication
Annotation: This section contains scripts used to annotate the methylation variation at Timema cristinae species level using the 24 individuals
- 2.1_get.methylation.status.individual.binomial.pl: It calculates the methylation status based on binomial distributions. Run on the cytosine reports on each individual
- 2.2_retrieve_annotation_augustus.R: Determines the annotation of each methylation position based on the annotation file from Villoutreix et al. 2020.
- 2.3_retrieve.genes.id.R: Gets the gene id based on the annotation file from Villoutreix et al. 2020, after running 2_retrieve_annotation_augustus.R
- 2.4_retrieve_annotation_repeatmasker_1.3c2.R: Gets the repeats annotation based on the repeatable elements annotation from Villoutreix et al. 2020.
- 2.5_retrieve.exon.intron.oreder.on.meth.table.R: Gets the exons and introns in the order (e.g. CDS1, CDS2, etc.)
- 2.6_level.methylation.exons.introns.R: Estimates methylation levels on different exons and introns.
- 2.7_enrichment.genomic.features.R: Estimates the ernichment of methylation levels on different genomic features
- 2.8_GO.enrichment.genes.R: Estimates the enrichment of GO terms in genes that are hypo or hyper methylated.
- first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt: Compilation of the annotation tables. Here, only loci covered by a minimum of 5 reads and maximum of 60 were retained. We also selected loci present at at least 12 samples. This table shows the methylation status at each loci (based on the binomial distribution)
- first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt: Same as above, but here the mean methylation levels were calculated. Intergenic regions were removed to ease the analyses
- first.low5.high60.noCT.GA.methylation.status.mRNA.GO: Table with gene ids. Formatted to input at script 8_GO.enrichment.genes.R.
RNA-seq: This section contains scripts used to process RNA-seq data
- 3.1_cutadapt_filtering.sh: Filters adapters from the data
- 3.2_trimmomatic_filtering.sh: Runs Trimmomatic
- 3.3_mapping_array_STAR_relaxed_pe.sh: Runs STAR to map RNA-seq data to T. cristinae reference genome (v1.3c2)
- 3.4_featureCounts_Tcristinae_genes.sl: Peforms featureCounts function
- 3.5_plot_expression_methylation.R: Estimates relationship between methylation levels and expression data
Genome_wide_comparison: This section contains scripts and inputs from the genome-wide analyses
- 4.1.methylation_genetic_mantel_bayesian.R: Runs mantel tests and Bayesian regressions
- MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA: Table compiling methylation cytosine reports among all 24 samples. Loci with minimum 2 reads and maximum 60 (above 99th quantile) were removed. The methylation levels were calculated as methylated cytosines at a certain locus over the sum of all reads covering it.
- jtdistance.matrix.txt: Genetic distances between the 24 individuals based on the RAD-seq data
MACAU: This section contains scripts, inputs and outputs related to MACAU analyses
- 5.1_methylKit.tiles.R: Runs methylKit and summarizes the data into 1kbp tiles
- 5.2_tiles.filtering.before.macau.R: Removes tiles that are hypo and hyper metylated following Lea et al. (2016)
- 5.3_MethylRaw_formatting_first.sh: Formats the table generated by methylKit into MACAU inputs
- 5.4_macau.first.sh: Runs MACAU
- 5.5_ibd_methylation_by_host_diff_cutoffs.R: Runs bayesian regressions on the outputs from MACAU
- 5.6_GO.term.DMR.cutoff.R: Estimates GO enrichments on DMRs from different p-value cutoffs
- MethylRaw_methylC_first_methyl_tiles_low10: methylated cytosine counts for MACAU
- MethylRaw_coverage_first_methyl_tiles_low10: coverage input for MACAU
- 2018Methylation_covariates_batch_first.txt: Covariates input at MACAU, namely: PC1 and PC2 from climate, bisulfite conversion (calculated based on the lambda phage), and sequencing batch
- 2018Methylation_covariates_batch_first.txt: Host-plant predictor for MACAU
- 2018Methylation_first_run_relatedness.cXX.txt: Kinship matrix calculated based on RAD-seq. Performed using gemma with default parameters (Zhou et al. 2013).
- macau.first.output.CpG.tiles.low10.var.adjusted.txt: MACAU output. We disregarded the unassembled scaffolds (i.e. lgNA).
- tiles.macau.low10.genes.id.order.txt: MACAU table with gene.id formatted to estimate GO enrichment. The annotation was performed similarly to the scripts above

DNA methylation differences between stick insect ecotypes

Data files

Abstract

README: GENERAL INFORMATION

1 .Title of Dataset: Data from: DNA methylation differences between stick insect ecotypes.

2. Author Information

Corresponding Investigator 1

Corresponding Investigator 2

Co-investigator 1

Co-investigator 2

Co-investigator 3

Co-investigator 4

Co-investigator 5

3. Date of data collection:

4. Geographic location of data collection:

5. Funding sources that supported the collection of the data:

This work was funded by supporting grants from ERC NatHisGen R/129639, Royal Society of London RG140369 (C.F.d.C, P.N.), the University of Sheffield, the Human Frontier Science Program (R.R.), and FAPESP 2020/07556-8 (C.F.d.C).

6. Recommended citation for this dataset:

DATA & FILE OVERVIEW

1. Description of dataset

2. File list:

METHODOLOGICAL INFORMATION

DATA SPECIFIC INFORMATION FOR: 2018Methylation_info_spreadsheet_standard.csv

DATA SPECIFIC INFORMATION FOR: variants.raw.CT.GA.bial.noindel.qs20.cov0.whole.genomes.and.radseq.loci

DATA SPECIFIC INFORMATION FOR: first.batch.compiled.noCT.GA.low5.high60.binomial.annot.12samples.txt

DATA SPECIFIC INFORMATION FOR: first.batch.compiled.12samples.no.intergenic.noCT.GA.low5.high60.annot.txt

DATA SPECIFIC INFORMATION FOR: first.low5.high60.noCT.GA.methylation.status.mRNA.GO.txt

DATA SPECIFIC INFORMATION FOR: MethylRaw_CpG_first_run_low2_high60_percentage.no.CT.GA.txt

DATA SPECIFIC INFORMATION FOR:jtdistance.matrix.first.txt

DATA SPECIFIC INFORMATION FOR: MethylRaw_methylC_first_methyl_tiles_low10.txt

DATA SPECIFIC INFORMATION FOR: MethylRaw_coverage_first_methyl_tiles_low10.txt

DATA SPECIFIC INFORMATION FOR: 2018Methylation_covariates_batch_first.txt

DATA SPECIFIC INFORMATION FOR: 2018Methylation_first_run_relatedness.cXX.txt

DATA SPECIFIC INFORMATION FOR: Bayesian_regression_cutoff_beta_geno.txt

DATA SPECIFIC INFORMATION FOR: macau.first.output.CpG.tiles.low10.var.adjusted.txt

DATA SPECIFIC INFORMATION FOR: 2018Methylation_first_run_host_predictor.txt

DATA SPECIFIC INFORMATION FOR: tiles.macau.low10.genes.id.order.txt

Methods

Usage notes

Works referencing this dataset