Readme Dataset 5 De novo assembled transcriptomes for each tissue type from frogs in Experiment B Sequence identifier line descriptions for both nucleotide and amino acid sequences 1. Sequence identifier line descriptions for nucleotide sequence .fasta files (generated by Trinity de novo transcriptome assembly): Example: >c200577_g1_i2 len=1208 path=[4973:0-85 2889:86-86 2890:87-128 9019:129-141 9032:142-231 4708:232-287 1536:288-312 9129:313-340 9157:341-354 9171:355-358 8109:359-402 9279:403-416 9293:417-439 9311:440-452 9342:453-454 9344:455-462 9335:463-471 44:472-490 1172:491-494 1334:495-495 3354:496-546 2076:547-571 2701:572-572 2101:573-577 9462:578-590 9475:591-598 9483:599-604 2166:605-614 9505:615-617 5571:618-618 370:619-633 9518:634-642 9527:643-657 9543:658-658 267:659-683 292:684-689 9564:690-710 5656:711-711 8661:712-733 5676:734-753 4336:754-808 9637:809-848 9675:849-871 5801:872-895 9690:896-908 9703:909-913 9708:914-914 9709:915-922 9717:923-937 9732:938-943 9738:944-954 9749:955-968 9761:969-997 9790:998-1028 9817:1029-1029 9818:1030-1039 9828:1040-1047 9836:1048-1054 9848:1055-1055 9849:1056-1058 9852:1059-1071 851:1072-1074 854:1075-1085 865:1086-1104 9888:1105-1105 9889:1106-1111 9895:1112-1121 9905:1122-1133 9917:1134-1145 9929:1146-1146 926:1147-1160 9944:1161-1182 4233:1183-1207] CCGCGACCGCAGGCGGCAGGAGCCGAACGAGAAGAACTCACAGAGGCGCCGGGATGAATAAAACGTCTGG TTTTATTTCAGGTCCATTTTTTTACAGAAGATATATAAACATTTCCATAAAAGATTTCTCAAGCTTGCAT AGAAAATTATAATATCTGGTTCAACTCTAATATTATATAAAAGCGCAAGTGCTTTTTGCCAAGTTTATTA CATACATATATATTACATATGCTATATTATAAATAGGAAATATACTTCTCTATAGAAAGCTCCAGCGCGT TCTGGGCACTTTTACGTCACTTTATGATCGATTGGGGGGTCCGACCTCTGGGACGCTCACGGATCCTAAC AAATGCAGCACTTTACTGGTTTTTCCCTGCACAATGGCCCTCGCGGCTACCCACTGAGGCTGGCCCCATT CTCTTCAATGCAATGGGTGGCCGGGGCACCACCGTGCAAGGAAAAACAGTTAAGTGCTGTGGCCCTTACA TTCTCAGGATCAATGGGGGCTCCAGCCCCCAATCATAAAGTGATGGTATATCCTAGCAATATGACATCAC TTTAGAAAACCCTTTTAGTTTGTAGAATAATAGCTCTTAAGTGACTCCTCTCCCGATGGGTACGTGTAAC CTCGTCTGCTAGGACCGAGATGGTCCATTGAATAATACCTATTACGTTACTACGCCATCGTCCACATTAC ATAATCACTTGACTAAATGGCTGCTGGAACAGCGGGAATATTACAGAAAATATAATCCAGAATAGGATGG AGTTCTTGAGTATTTACATGTACAGTAAGTTACCTTATGCAGAACACTGTAAACTTATGAGATTGAAATC TATGTGGCCCACGGGAGCCATTACATTTAAGGGACTCTCCAGGAATATATAATTCACTCTTAATAACTGA GCACGTGTTATCCTGGCTTTGAAAAATTAATGCCTTGTACTAAGAATAGGTCACCAATGTTGGACCAGTG GAGGGCGCCGATCAGCTGGTTCAAGGGGCCATGGCGCGTGAGCCTCTTCCTTTCTTTGCATCGTATTAGC ACATGTTGTAGTTGAATGTGACACCGCTCCATTCCTAGTAGAGCCCCATACAAAAAGTATGGCGGGTGCT AGTAGGAGCTCATGTAACAAACAGCTGACCAGTGGGGATGTCAAGAGTCGGACCCCACCAATCCAATATT GATGGCCTATCCTTCAAA Interpretation: [>] = The description line for each transcript is distinguished from the sequence data by this '>' symbol [c200577_g1_i2] = This is a unique transcript scaffold identifier (ie, 'c200577') with corresponding gene and isoform identifiers (ie, 'g1' and 'i2') [len=1208] = This value indicates the length of the sequence (ie, 1208 nucleotides) [path=[4973:0-85 2889:86-86...]] = The rest of the header shows nodes of the de Bruijn graph (of the de novo transcriptome assembly) traversed by the transcript Nucleic acid codes (standard IUB/IUPAC): A --> adenosine C --> cytidine G --> guanine T --> thymidine U --> uridine R --> G A (purine) Y --> T C (pyrimidine) K --> G T (keto) M --> A C (amino) S --> G C (strong) W --> A T (weak) B --> G T C D --> G A T H --> A C T V --> G C A N --> A G C T (any) - gap of indeterminate length 2. Sequence identifier line descriptions for amino acid sequence .fasta files (generated by TransDecoder): Example: >c100587_g1_i1|m.2453 c100587_g1_i1|g.2453 ORF c100587_g1_i1|g.2453 c100587_g1_i1|m.2453 type:complete len:218 (+) c100587_g1_i1:76-729(+) MSSKVSRDTLYEAVREVLGGAKRKKRKFLQTVELQISLKNYDPQKDKRFSGTVRLKSTPR PKFSVCVLGDQQHCDEAKAVDMSHMDIDALKKLNKNKKMVKKLAKKYDAFLASESLIKQI PRILGPGLNKAGKFPSLLTHNENLVAKVDEVKSTIKFQMKKVLCLAVAVGHVKMTEEELV YNIHLAINFLVSLLKKNWQNVRALYVKSTMGKPQRLY* Interpretation: [>] = The description line for each polypeptide sequence is distinguished from the sequence data by this '>' symbol [c100587_g1_i1|m.2453] = This is a unique protein identifier composed of the original transcripts along with the 'm.(number)'. It corresponds to the open reading frame of the relevant transcript [c100587_g1_i1|g.2453] = This is the original transcript identifier (ie, 'c100587') with corresponding gene and isoform identifiers (ie, 'g1' and 'i1') [type:complete] = The 'type' attribute indicates whether the protein is 'complete' (contains a start and stop codon). For example, '5prime_partial' would mean it misses a start codon and presumably part of the N terminus; '3prime_partial' means it's missing the stop codon and presumably part of the C-terminus; 'internal' means it's both 5prime-partial and 3prime-partial. [len=218] = This value indicates the length of the polypeptide sequence (ie, 218 amino acids) [(+) c100587_g1_i1:76-729(+)] = This provides an indicator (+) or (-) of which strand the coding region is found on, along with the coordinates of the ORF in that transcript sequence. Accepted amino acid codes (standard IUB/IUPAC): A ALA alanine B ASX aspartate or asparagine C CYS cystine D ASP aspartate E GLU glutamate F PHE phenylalanine G GLY glycine H HIS histidine I ILE isoleucine K LYS lysine L LEU leucine M MET methionine N ASN asparagine P PRO proline Q GLN glutamine R ARG arginine S SER serine T THR threonine U selenocysteine V VAL valine W TRP tryptophan Y TYR tyrosine Z GLX glutamate or glutamine X any * translation stop - gap of indeterminate length