Proteome database of 36 million proteins from 4,351 species, including marine microbial sequences
Citation
Rius, Mariana; Collier, Jackie; Rest, Joshua (2023), Proteome database of 36 million proteins from 4,351 species, including marine microbial sequences, Dryad, Dataset, https://doi.org/10.5061/dryad.4tmpg4ffn
Abstract
A fasta-formatted database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla.
Methods
A database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla (see table below) was constructed using the UniProt Reference Proteome (RP) at the 35% co-membership threshold including 4,295 Representative Proteome Groups (RPGs) (Chen et al. 2011) in addition to all taxonomically identifiable transcriptomes of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) (Keeling et al. 2014) that were processed through WinstonCleaner (https://github.com/kolecko007/WinstonCleaner). The database also included proteins inferred from the annotated and assembled genomes of Aurantiochytrium limacinum ATCC MYA-1381, Schizochytrium aggregatum ATCC 28209, and Aplanochytrium kerguelensis PBS07 from the U.S. Department of Energy’s Joint Genome Institute (JGI), all PFAM PF00494 Aurantiochytrium sp. KH105 proteome hits from the Okinawa Institute of Science and Technology Marine Genomics Unit genome browser, all of UniProt's annotated Hondaea fermentalgiana proteins, and the annotated proteins of the breviate Lenisia limosa and associated mutualistic epibionts (Hamann et al. 2016).
Unique phyla are represented in the reference proteome database.
References
Chen C, Natale DA, Finn RD, Huang H, Zhang J, Wu CH, Mazumder R. 2011. Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation. PLOS ONE. 6(4):e18910.
Keeling PJ, Burki F, Wilcox HM, Allam B, Allen EE, Amaral-Zettler LA, Armbrust EV, Archibald JM, Bharti AK, Bell CJ, et al. 2014. The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol. 12(6):e1001889.
Hamann E, Gruber-Vodicka H, Kleiner M, Tegetmeyer HE, Riedel D, Littmann S, Chen J, Milucka J, Viehweger B, Becker KW, et al. 2016. Environmental Breviatea harbour mutualistic Arcobacter epibionts. Nature. 534:254–258.
Funding
Gordon and Betty Moore Foundation, Award: 4982