Comparative analyses of central molecular networks uncover variation that can be targeted by biomedical research to develop insights and interventions into disease. The insulin/insulin-like signaling and target of rapamycin (IIS/TOR) molecular network regulates metabolism, growth, and aging. With the development of new molecular resources for reptiles, we show that genes in IIS/TOR are rapidly evolving within amniotes (mammals and reptiles, including birds). Additionally, we find evidence of natural selection that diversified the hormone-receptor binding relationships that initiate IIS/TOR signaling. Our results uncover substantial variation in the IIS/TOR network within and among amniotes and provide a critical step to unlocking information on vertebrate patterns of genetic regulation of metabolism, modes of reproduction, and rates of aging.
TRANSCRIPTOMES.tar
TRANSCRIPTOMES folder includes:
(1) The transcriptome assembly for each of the 18 reptile individuals sequenced (e.g. file name: SM1_centroids_nucleotides.fasta), see list below. These contain the longest open reading frames (ORFs) produced by Trinity, which were then clustered by UCLUST into centroids to reduce redundancy within a single species' transcriptome. A centroid may have collapsed multiple isoforms, truncated transcripts, and alleles from a gene, but it may also have collapsed very recent paralogs. Files are named by sequence identifier (e.g. SM) listed below.
(2) The Trinotate annotation databases for each transcriptome (e.g. file name: trinotate_annotation_report_SM1.xls). The prot_id corresponds to the centroid sequence in the transcriptome .fasta file.
(3) An ortholog key file (Ortholog_Key_3_20_2014.xlsx) that contains a list of each putative ortholog clustered by OrthoMCL, the best blast hit to uniprot, the number of species that were included in the ortholog, and the centroid IDs of those species (corresponding the the transcriptome assembly .fasta files). OrthoMCL was ran on 74 total species, and we excluded form our alignments 8 species with very poor representation (for a maximum of 66 species contained within the alignments analysed for the paper).
Sequence ID, Common Name, Species ID
SM1, Snapping turtle, Cheyldra serpentina
SM2, Anolis lizard, Anolis sagrei
SM3, California alligator lizard, Elgaria multicarinata
SM4, African House, Lamprophis fuliginosus
SM5, Cottonmouth, Agkistrodon piscivorus
SM6, Sunbeam snake, Xenopeltis unicolor
SM7, Alligator, Alligator mississippiensis
SM8, Fence Lizard, Sceloporus undulatus
SM9, Bearded dragon, Pogona vitticeps
SM10, Stinkpot turtle, Sternotherus odoratus
SM11, Sideneck turtle, Pelusios castaneus
SM12, Skink, Scincella lateralis
SM13, Box turtle, Terrapene ornata
SM14, Viper Boa, Candoia aspera
SM15, Gecko, Eublepharis macularius
TC, Western aquatic garter snake, Thamnophis couchii
HS08, Western terrestrial garter snake –lakeshore (fast-living) ecotype, Thamnophis elegans
HS11, Western terrestrial garter snake –meadow (slow-living) ecotype, Thamnophis elegans
ALIGNED_BEST_FIRST_1500_ORTHOLOGS.tar
ALIGNED_BEST_FIRST_15000_ORTHOLOGS folder: “Best” ortholog amino acid and nucleotide alignments. The 104,235 putative orthologs described in ALIGNED_ALL-ORTHOLOGS often contained more than two representative sequences per species. For the first 15,000 putative orthologs (those with the most species included in the alignments) we used UCLUST to find the best representative per species per ortholog by taking the sequence that was closest to the centroid for that ortholog. Alignments contain one representative per species per alignment (found by centroid clustering explained in the methods) are given for Orthologs 1-15,000, regardless of how many species are contained in the centroid. After designating one representative sequence per species, alignments were performed as described in the methods (e.g. MSAProbs followed by TranslatorX). Amino acid and nucleotides are given in their respective folders.
- Also included in each folder is an ortholog key file (Ortholog_Key_3_20_2014.xlsx) that contains a list of each putative ortholog clustered by OrthoMCL, the best blast hit to uniprot, the number of species that were included in the ortholog, and the centroid IDs of those species (corresponding the the transcriptome assembly .fasta file). OrthoMCL was ran on 74 total species, and we excluded form our alignments 8 species with very poor representation (for a maximum of 66 species contained within the alignments analyzed for the paper).
ALIGNED_ALL_ORTHOLOGS.tar
ALIGNED_ALL_ORTHOLOGS folder: We used OrthoMCL to cluster ORF-centroids from all species into putative orthologs from all the species included in this study. This folder contains the putative ortholog amino acid alignments and corresponding nucleotide alignments for which there were two or more species present in the ortholog clustered by OrthoMCL (104,235 total alignments). These are available as separate files for each ortholog (104,235 total orthologs with two or more species). Amino acid and nucleotides are given in their respective folders.
- Also included is an ortholog key file (Ortholog_Key_3_20_2014.xlsx) that contains a list of each putative ortholog clustered by OrthoMCL, the best blast hit to uniprot, the number of species that were included in the ortholog, and the centroid IDs of those species (corresponding the the transcriptome assembly .fasta file). OrthoMCL was ran on 74 total species, and we excluded form our alignments 8 species with very poor representation (for a maximum of 66 species contained within the alignments analyzed for the paper).
IIS-TOR_TEST_GENES.tar
IIS-TOR_TEST_GENES folder: The hand-curated nucleotide and amino acid alignments for 61 IIS/TOR network genes. The supplementary text in the PNAS paper explains the quality controls. These were the final alignments used in the PAML and Ka/Ks analyses. This folder also contains an excel file with the annotation for the IIS/TOR test genes.
Species.Identifier
Species.Identifiers file. For the species included in this study this file contains the (i) Sequence Identifier (for the transcriptomes we developed) or the Ensemble identifier (for the genomes and transcriptomes we downloaded); (2) the species name; (3) the common name; (4) any "nicknames" that were used during the analysis, and (5) the identifiers used in the alignments. Further details on these individuals used in this study can be found in Supplementary Table 2 of the PNAS paper (McGaugh et al. 2015 PNAS).
Ortholog_Key
An ortholog key file (Ortholog_Key_3_20_2014.xlsx) that contains a list of each putative ortholog clustered by OrthoMCL, the best blast hit to uniprot, the number of species that were included in the ortholog, and the centroid IDs of those species (corresponding the the transcriptome assembly .fasta file). OrthoMCL was ran on 74 total species, and we excluded form our alignments 8 species with very poor representation (for a maximum of 66 species contained within the alignments analyzed for the paper).
CONTROL_GENES.tar
CONTROL_GENES folder: The final nucleotide and amino acid alignments for the 1417 "control genes". This folder also contains an excel file with the annotation for the control genes.
Readme
The Readme file explaining the contents of this Dryad submission.