Data from: The constrained maximal expression level owing to haploidy shapes gene content on the mammalian X chromosome

Hurst LD, Ghanbarian AT, Forrest ARR, Fantom Consortium, Huminiecki L

Date Published: December 23, 2015

DOI: http://dx.doi.org/10.5061/dryad.p4s57

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title chicken.all_samples.galGal3.tpm.refgene.osc
Downloaded 33 times
Description Data for the analysis of the chicken chromosome Z. FANTOM5 chicken libraries consisted of 25 CAGE libraries including: chicken aortic smooth muscles, hepatocytes, mesenchymal stem cells, leg buds, wing buds, embryo extra-embryonic tissue (day 7 and day 15), and whole body developmental time course (from 5 hours 30 minutes to 20 days). The number of available datapoints to which TPM was normalized was limited by the number of annotated chicken RefSeq transcripts (which was approximately six times smaller than human, N = 4,426 on autosomes, and N = 241 on chromosome Z). Consequently, the cutoff for a gene to be classified as “on” was adjusted six times higher to 60 TPM.
Download chicken.all_samples.galGal3.tpm.refgene.osc.txt (893.9 Kb)
Download README.txt (48.87 Mb)
Details View File Details
Title human.primary_cell.hCAGE.hg19.tpm.refgene.osc
Downloaded 12 times
Description The FANTOM5 dataset for human primary cells.
Download human.primary_cell.hCAGE.hg19.tpm.refgene.osc.txt (96.70 Mb)
Details View File Details
Title human.cell_line.hCAGE.hg19.tpm.refgene.osc
Downloaded 9 times
Description The FANTOM5 dataset for human cancer cell-lines.
Download human.cell_line.hCAGE.hg19.tpm.refgene.osc.txt (48.87 Mb)
Details View File Details
Title human.tissue.hCAGE.hg19.tpm.refgene.osc
Downloaded 18 times
Description The FANTOM5 dataset for human tissue. CAGE tags were mapped to RefSeq transcripts +/-500 base pairs (bps) from their TSSes and normalized to tags per million (TPM), as previously described [37,45]. The signal of ten TPM was chosen as the cutoff for a gene to be classified as “on” (this cutoff was accepted as the standard for human data throughout the consortium). FANTOM5 is the most comprehensive expression dataset ever generated, including 952 human and 396 mouse tissues, primary cells and cancer cell-lines. FANTOM5 is based on cap analysis of gene expression (CAGE) a unique technology that characterizes TSSes across the entire genome in an unbiased fashion and at a single-base resolution level [21]. CAGE automatically sums expression levels of all transcripts beginning at a given transcription start site.
Download human.tissue.hCAGE.hg19.tpm.refgene.osc.txt (38.51 Mb)
Details View File Details
Title raw_Z_Exp_Anc_L
Downloaded 22 times
Description Data for Fig 2 "The comparison of change in gene expression (Z) since the human-Chimpanzee common ancestor for five somatic tissues."
Download raw_Z_Exp_Anc_L.csv (2.998 Mb)
Details View File Details
Title SUPPLEMENTARY TABLES
Downloaded 24 times
Description Data in Table S3 underlies Figure 4. Data in Table S7 partially underlies Fig 1. Data in Tables S4 underlies Fig 3. Data in Tables S10-12 underlies Fig S1.
Download SUPPLEMENTARY TABLES.xlsx (332.9 Kb)
Details View File Details
Title data for Fig1
Downloaded 9 times
Description R environment containing data underlying Fig1. The environment contains the following variables sorted identically as the gene list in refSeqs: chromosome (chromosomal location), chromosome_short (location on autosomes,chrX, or chrY?), data_matrix (F5 data matrix in TPM for human tissues)‚ MAX (maximal expression for each RefSeq)‚ max (maximal expression for each tissue)‚ strata_classification (strata classification for genes on chromosome X)‚ refSeqs_2entrezIDs (entrez ids mapped to refseqs)‚ boe (the breadth of expression)
Download env_fig1 (10.61 Mb)
Details View File Details
Title GC-contents data for for Fig S6 and S7
Downloaded 6 times
Description This R environment contains GC-contents data for either proximal promoters or isochore around the TSS (marked as big). The data is calculated for either masked or unmasked genome seqeuence.
Download env_gc_contents (29.92 Mb)
Details View File Details
Title data for Fig S3
Downloaded 12 times
Description numbers of ENCODE transcription factor binding sites mapped to TSSes of RefSeq genes in symmetrical windows of different sizes (from 250 to 20000 bps) and depending on ENCODE quality cut-off (strict or all).
Download FigS3_data.txt (1.939 Mb)
Details View File Details
Title data underlying Fig S8
Downloaded 10 times
Description Breadth of expression and maximal expression is compared in three groups of observations: (1) autosomal paralogs of X-linked genes, (2) other autosomal paralogs matched by age, (3) X-linked paralogs. Newly formed paralogs are defined as those mapped by phylogenetic timing to taxa Theria or younger. Pre-existing duplications are defined as those descending from duplication notes mapped by phylogenetic timing to taxa Amniota or older.
Download FigS8_data.txt (1.159 Mb)
Details View File Details
Title data underlying Fig7
Downloaded 9 times
Download Fig7_data.txt (340.2 Kb)
Details View File Details
Title TreeFam data for timing of gene duplications in R environments
Downloaded 5 times
Description These files are R environments. Use load() to load them into your R session! You ls() to view contents. You may use attach() syntax to load the namespace or access data members of the environment using the "$" reference operator. There is no warranty for this software
Download env_duplicator_base (11.80 Mb)
Details View File Details
Title Additional TreeFam gene duplication data with duplication timing
Downloaded 7 times
Download env_duplicator_vectors (4.961 Mb)
Details View File Details

When using this data, please cite the original publication:

Hurst LD, Ghanbarian AT, Forrest ARR, Fantom Consortium, Huminiecki L (2015) The constrained maximal expression level owing to haploidy shapes gene content on the mammalian X chromosome. PLoS Biology 13(12): e1002315. http://dx.doi.org/10.1371/journal.pbio.1002315

Additionally, please cite the Dryad data package:

Hurst LD, Ghanbarian AT, Forrest ARR, Fantom Consortium, Huminiecki L (2015) Data from: The constrained maximal expression level owing to haploidy shapes gene content on the mammalian X chromosome. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.p4s57
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: