Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases
Cite this dataset
Taujale, Rahil et al. (2020). Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases [Dataset]. Dryad. https://doi.org/10.5061/dryad.v15dv41sh
The GT-A sequences were collected by a similarity search strategy using multiply aligned manually curated GT-A fold profiles. The sequences were further aligned to the profiles to determine the GT-A domain bounds and insertions.
This dataset includes all putative GT-A fold sequences that belong to one of the 53 GT-A fold families. These were collected by searching the NCBI nr and the UniProt proteomes databases. The file hierarchy.tsv contains a table that lists the hierarchy of the families. For each level and family, there will be a corresponding _nr.fasta, _nr.tsv, _uniprot,fasta and _uniprot.tsv files that contain the sequences from the NCBInr and the Uniprot proteomes database in fasta and tsv formats respectively.
National Institutes of Health, Award: R01 GM130915
National Institutes of Health, Award: T32 GM107004