Skip to main content
Dryad

Proteome database of 36 million proteins from 4,351 species, including marine microbial sequences

Cite this dataset

Rius, Mariana; Collier, Jackie; Rest, Joshua (2023). Proteome database of 36 million proteins from 4,351 species, including marine microbial sequences [Dataset]. Dryad. https://doi.org/10.5061/dryad.4tmpg4ffn

Abstract

A fasta-formatted database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla.

Methods

A database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla (see table below) was constructed using the UniProt Reference Proteome (RP) at the 35% co-membership threshold including 4,295 Representative Proteome Groups (RPGs) (Chen et al. 2011) in addition to all taxonomically identifiable transcriptomes of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) (Keeling et al. 2014) that were processed through WinstonCleaner (https://github.com/kolecko007/WinstonCleaner). The database also included proteins inferred from the annotated and assembled genomes of Aurantiochytrium limacinum ATCC MYA-1381, Schizochytrium aggregatum ATCC 28209, and Aplanochytrium kerguelensis PBS07 from the U.S. Department of Energy’s Joint Genome Institute (JGI), all PFAM PF00494 Aurantiochytrium sp. KH105 proteome hits from the Okinawa Institute of Science and Technology Marine Genomics Unit genome browser, all of UniProt's annotated Hondaea fermentalgiana proteins, and the annotated proteins of the breviate Lenisia limosa and associated mutualistic epibionts (Hamann et al. 2016).

Unique phyla are represented in the reference proteome database.

Archaea

Bacteria

Eukaryota

Candidatus

Caudovirales

Crenarchaeota

Euryarchaeota

Hyperthermophilic

Nanoarchaeota

Thaumarchaeota

Abditibacteriota

Acidobacteria

Actinobacteria

Aquificae

Armatimonadetes

bacterium

Bacteroidetes

Balneolaeota

Calditrichaeota

candidate

Candidatus

Chlamydiae

Chlorobi

Chloroflexi

Chrysiogenetes

Coprothermobacterota

Cyanobacteria

Deferribacteres

Deinococcus-Thermus

Dictyoglomi

Elusimicrobia

Fibrobacteres




Firmicutes

Fusobacteria

Gemmatimonadetes

Haloplasmatales

Ignavibacteriae

Kiritimatiellaeota

Lentisphaerae

Natronospirillum

Nitrospinae

Nitrospirae

Planctomycetes

Proteobacteria

Rhodothermaeota

Spirochaetes

Synergistetes

Tenericutes

Thermobaculum

Thermodesulfobacteria

Thermotogae

Vampirococcus

Verrucomicrobia

Annelida

Apicomplexa

Apusomonadidae

Arthropoda

Ascomycota

Bacillariophyta

Basidiomycota

Bigyra

Blastocladiomycota

Bolidophyceae

Brachiopoda

Breviatea

Cercozoa

Chlorophyta

Choanoflagellata

Chordata

Chromeraceae

Chromerida

Chrysophyceae

Chytridiomycota

Ciliophora

Cnidaria

Cryptomycota

Cryptophyta

Dictyochophyceae

Dinophyceae

Discosea

Echinodermata

Endomyxa

Euglenozoa

Evosea

Filasterea

Foraminifera

Fornicata

Glaucocystophyceae

Haptista

Heterolobosea

Ichthyosporea

Microsporidia

Mollusca

Mucoromycota

Nematoda

Oomycetes

Palpitomonas

Parabasalia

Pelagophyceae

Perkinsozoa

Phaeophyceae

Pinguiophyceae

Placozoa

Platyhelminthes

Porifera

Raphidophyceae

Rhodophyta

Rotifera

Rotosphaerida

Stereomyxa

Streptophyta

Synchromophyceae

Synurophyceae

Tardigrada

Tubulinea

Vitrellaceae

Xanthophyceae

Zoopagomycota

References

Chen C, Natale DA, Finn RD, Huang H, Zhang J, Wu CH, Mazumder R. 2011. Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation. PLOS ONE. 6(4):e18910.

Keeling PJ, Burki F, Wilcox HM, Allam B, Allen EE, Amaral-Zettler LA, Armbrust EV, Archibald JM, Bharti AK, Bell CJ, et al. 2014. The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol. 12(6):e1001889.

Hamann E, Gruber-Vodicka H, Kleiner M, Tegetmeyer HE, Riedel D, Littmann S, Chen J, Milucka J, Viehweger B, Becker KW, et al. 2016. Environmental Breviatea harbour mutualistic Arcobacter epibionts. Nature. 534:254–258. 

 

Funding

Gordon and Betty Moore Foundation, Award: 4982