Molecular mechanisms of seasonal brain shrinkage and regrowth in Sorex araneus
Data files
Oct 05, 2023 version files 86.44 MB
-
README.md
-
Supplemental1_SampleData.xlsx
-
Supplemental2_Liver.xlsx
-
Supplemental3_Hippocampus.xlsx
-
Supplemental4_Cortex.xlsx
Abstract
Human brains typically grow through development, then remain the same size in adulthood, and often shrink through age-related degeneration that induces cognitive decline and impaired functionality. In most cases, however, the neural and organismal changes that accompany shrinkage, especially early in the process, remain unknown. Paralleling neurodegenerative phenotypes, the Eurasian common shrew Sorex araneus, shrinks its brain in autumn through winter, but then reverses this process by rapidly regrowing the brain come spring. To identify the molecular underpinnings and parallels to human neurodegeneration of this unique brain size change, we analyzed multi-organ, season-specific transcriptomics and metabolomic data. Simultaneous with brain shrinkage, we discovered system-wide metabolic shifts from lipid to glucose metabolism, as well as neuroprotection of brain metabolic homeostasis through reduced cholesterol efflux. These mechanisms rely on a finely tuned brain-liver crosstalk that results in changes in expression of human markers of aging and neurodegeneration in Parkinson’s disease and Huntington’s disease. We propose metabolic shifts with signals that cross the brain-blood barrier are central to seasonal brain size changes in S. araneus, with potential implications for therapeutic treatment of human neurodegeneration.
README: Molecular mechanisms of seasonal brain shrinkage and regrowth in Sorex araneus
https://doi.org/10.5061/dryad.pc866t1w3
Scripts and code to reproduce RNAseq analysis for looking at changes in expression through Dehnel's phenomenon; specifically in the cortex, hippocampus, and liver. These regions will help to understand both size and metabolic changes underpinning Dehnel’s phenomenon.
Description of the data and file structure
The first data table, Supplemental1_SampleData, consists of the phenotypes of the 24 individual shrews samples and there sequencing meta data.
Sheet 1 (Data)
- Rows =individuals
- Columns = Individual (#), Season , Sexual Maturity, Stage, Date, Sex, Body Mass, Brain Mass (g), Liver Mass (g), Spleen Mass (g), Heart Mass (g), Stage BoM Average (g), Stage BrM Average (g), Stage LM Average (g), Stage SM Average (g), Stage HM Average (g), Cortex RNA ID, Cortex RIN, Cortex Reads prefilter, Cortex Reads postfilter, Cortex filter difference, Cortex Pseudo aligned (%), Hippocampus ID, Hippocampus RIN, Hippocampus Reads prefilter, Hippocampus Reads postfilter, Hipp filter difference, Hippocampus Pseudo aligned Percent, Liver ID, Liver RIN, Liver Reads prefilter, Liver Reads postfilter, Liver filter difference, Liver Pseudo aligned Percent Sheet 2 (Phenotype T-tests)
- Each box is a T-test for each organ
- Stages of Dehnel's phenomenon are both columns and rows, with adjusted p-values as each entry (highlighted values indicate significant values p<0.05).
Next, we will have to access the quality of our RNA-seq data, filter low quality reads and trim adapters, map to the transcriptome and quantify abundance. Then we ran analyses dependent on each tissue 1) analyze differential expression between stages of Dehnel’s phenomenon using DESeq2, 2) characterize temporal patterns in expression using TCSeq, and 3) build gene correlation networks and identify correlation between network structure and traits. Throughout the analysis, we will look at resultant genes and test whether they enrich KEGG pathways using DAVID Functional Enrichment Tools. Data and results for each tissue can be found in these three files Supplemental2_Liver, Supplemental3_Hippocampus, Supplemental4_Cortex, with sheet patterns listed below.
Sheet 1 - Gene Counts (counts)
- Rows = Genes
- Columns = Samples
Sheet 2 - Normalized Gene Counts (counts)
- Rows = Genes
- Columns = Samples
Sheet 3 - DESeq2 Results
- Rows = Genes
- Columns = means , log-fold changes, p-values
Sheet 4 - DAVID Gene Enrichment (Downregulated from Sheet 3)
- Rows = Pathways
- Columns = genes tested, total hits, percent hits, p-values
Sheet 5 - DAVID Gene Enrichment (Upregulated from Sheet 3)
- Rows = PAthways
- Columns = genes tested, total hits, percent hits, p-values
Sheet 6 - TimeCluster Z-score Inputs
- Rows = Genes
- Columns = Season Z-score
Sheet 7 - TimeClusters Membership Matrix (output from TCseq)
- Rows = Genes
- Columns = Module Memberships
Sheet 9 - WGCNA Module Memberships (Liver/Hippocampus)
- Rows = Genes
- Columns = modules, memberships, p-values
Sheet 10 - WGCNA Trait to Module Correlations (Liver/Hippocampus)
- Rows = Modules
- Columns = Traits
Code/Software
RNA-seq analyses require alignment to a reference and quantification of reads. The genome and original unfiltered reads can be downloaded as described below. However, these steps could be skipped when reproducing, as count data has been saved in ./data/TISSUE/GeneCounts.
The reference (sorAra2; GCF_000181275.1) can be download from straight from NCBI, or using the code below.
mkdir ./data/ref/
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/181/275/GCF_000181275.2_SorAra2.0/GCF_000181275.2_SorAra2.0_genomic.gff.gz ./data/ref/
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/181/275/GCF_000181275.2_SorAra2.0/GCF_000181275.2_SorAra2.0_rna.fna.gz ./data/ref/
gunzip ../ref/GCF_000181275.2_SorAra2.0_rna.fna.gz
RNA-seq data from this project can also be found on NCBI Sequencing Read Archive, . The list of samples and associated accession numbers can be found in the data folder. These can be downloaded manually, or using the getter.sh script with the help of sratoolkit (https://github.com/ncbi/sra-tools). Note, all scripts are meant to be ran from the scripts folder with indirect paths contained in this git.
bash get_rawseq.sh
Quality control, filtering, trimming
Again, these scripts can be skipped if reproducing from counts. If not proceed! Here we will trim adapters from our reads and remove low quality reads using default settings and fastp. Will need to download fastp to your local environment (https://github.com/OpenGene/fastp).
bash fastp.sh
Mapping and quantification
Reads that have went through quality control are then mapped to the reference transcriptome and quantified using pseudoalignment. This method does not directly map reads to the genome, but can infer counts despite similarities between different coding regions (https://pachterlab.github.io/kallisto/about).
bash kallisto.sh
Note: This will create new transcript abundances separate ffrom the ones used in this analysis. Further scripts will use the ones I generated ./data/TISSUE/TranscriptAbundances and naming convention, but feel free to update the paths in the scripts with the ones you generated.
Analyses
Each analysis was conducted using the R code below for each tissue type. For best results, run in RStudio, as each matrix and figure is not set to print out in a best attempt to not overwrite results. If this is your desired outcome, edit code to include saving.
R Dehnel_Liver.R
R Dehnel_Liver.R
R Dehnel_Liver.R
DAVID Geneset Enrichment and MetaboAnalyst5.0
Both the above programs were done online at the below links. In a perfect world these should be scripted, however, due to conflicts in packages and Rversions they were not.https://www.metaboanalyst.ca https://david.ncifcrf.gov/summary.jsp