Data from: Evolution of steroid receptor coactivator Taiman in arthropods
Data files
Jun 02, 2026 version files 780.76 KB
-
68_representative_arthropod_sequences_used_in_Fig.S3-S18.fasta
114.13 KB
-
Figure_6_AE1-containing_sequences.fasta
77.17 KB
-
MEME_suite_motifs_command_lines.txt
1.91 KB
-
README.md
12.50 KB
-
Suppl_Figure_S1_sequences_for_alignment.fasta
343.53 KB
-
Suppl_Figure_S10_sequences_L5_for_alignment.fasta
4.08 KB
-
Suppl_Figure_S10_sequences_L5_for_MEME.fasta
3.25 KB
-
Suppl_Figure_S10_sequences_L6_for_alignment.fasta
3.86 KB
-
Suppl_Figure_S10_sequences_L6_for_MEME.fasta
3.08 KB
-
Suppl_Figure_S11_sequences_PPxY_for_alignment.fasta
4.45 KB
-
Suppl_Figure_S12_sequences_bi-NLS_for_alignment.fasta
3.95 KB
-
Suppl_Figure_S12_sequences_biNLS_C-terminal_part_for_MEME.fasta
2.45 KB
-
Suppl_Figure_S12_sequences_biNLS_N-terminal_part_for_MEME.fasta
2.66 KB
-
Suppl_Figure_S12_sequences_N-cNLS_for_alignment.fasta
3.24 KB
-
Suppl_Figure_S12_sequences_N-cNLS_for_MEME_Trichoptera-Diptera.fasta
650 B
-
Suppl_Figure_S12_sequences_N-cNLS_for_MEME_Zygentoma-Coleoptera.fasta
1.88 KB
-
Suppl_Figure_S14_sequences_AIAILM_for_alignment.fasta
3.88 KB
-
Suppl_Figure_S15_sequences_Q-rich1_for_alignment.fasta
4.50 KB
-
Suppl_Figure_S15_sequences_Q-rich2_for_alignment.fasta
3.73 KB
-
Suppl_Figure_S16_sequences_PNV_for_alignment.fasta
4.65 KB
-
Suppl_Figure_S17_sequences_DLELGL_for_alignment.fasta
3.47 KB
-
Suppl_Figure_S18_sequences_L7_for_alignment.fasta
2.39 KB
-
Suppl_Figure_S2_sequences_for_alignment.fasta
34.90 KB
-
Suppl_Figure_S3_sequences_bHLH_for_alignment.fasta
6.44 KB
-
Suppl_Figure_S4_sequences_PAS-A_for_alignment.fasta
6.48 KB
-
Suppl_Figure_S5-S6_sequences_PAS-B_for_alignment.fasta
9.71 KB
-
Suppl_Figure_S7_sequences_of_N-terminal_TAI_region_for_alignment.fasta
4.05 KB
-
Suppl_Figure_S7_sequences_of_N-terminal_TAI_region_for_MEME.fasta
4.05 KB
-
Suppl_Figure_S8_sequences_L1_for_alignment.fasta
3.49 KB
-
Suppl_Figure_S8_sequences_L1_for_MEME.fasta
3.33 KB
-
Suppl_Figure_S8_sequences_L2_for_alignment.fasta
3.74 KB
-
Suppl_Figure_S8_sequences_L2_for_MEME.fasta
2.98 KB
-
Suppl_Figure_S9_sequences_L3_for_alignment.fasta
3.85 KB
-
Suppl_Figure_S9_sequences_L3_for_MEME.fasta
3.08 KB
-
Suppl_Figure_S9_sequences_L4_for_alignment.fasta
3.88 KB
-
Suppl_Figure_S9_sequences_L4_for_MEME.fasta
3.03 KB
-
Table_S1_Accession_numbers_for_Figure_1_Phylogeny_of_TAI_and_SRC_proteins.xlsx
23.20 KB
-
Table_S2_Accession_numbers_of_68_representative_arthropod_sequences_used_in_Fig.S3-S18.xlsx
14.16 KB
-
Table_S3_Accession_numbers_for_gene_models_in_Figure_5_.xlsx
10.24 KB
-
Table_S4_Accession_numbers_for_sequences_used_in_Figure_6.xlsx
12.43 KB
-
Table_S5_Accession_numbers_for_sequences_used_in_Figure_7A.xlsx
11.51 KB
-
Table_S6_Accession_numbers_for_sequences_used_in_Figure_7B.xlsx
10.77 KB
Abstract
Insect steroid receptor coactivator TAIMAN (TAI) belongs to the p160/SRC/NCoA family of proteins together with the mammalian Steroid/Nuclear Receptor Coactivators 1-3. In the past two decades, TAI has been established as an indispensable key component of both the juvenile hormone (JH) and ecdysone signaling pathways. Although association with the JH and ecdysone pathways has dominated TAI research, several publications have linked insect TAI to additional processes, including Hippo and Hedgehog signaling. To explore TAI diversity, we systematically searched for conserved sequence motifs in TAI proteins across representatives of nearly all insect orders and many non-insect arthropods. Our dataset enabled us to identify new conserved motifs and regions, and to interpret them in the context of insect and arthropod evolution. We demonstrated that the tai gene structure is conserved, with conserved alternative splicing patterns at both the 5’ and 3’ regions of the gene, and a tendency toward exon fusion in holometabolous species. However, species- and order-specific variations were identified; for example, the C-terminal LxxLL motif was gradually lost in Diptera, resulting in an atypical TAI structure in Drosophila and mosquitoes. Overall, we demonstrated that TAI conservation extends beyond the canonical bHLH and PAS domains. We anticipate that our analyses and the extensive TAI sequence dataset will serve as a comprehensive reference for future TAI research.
Dataset DOI: 10.5061/dryad.m905qfvgb
Description of the data and file structure
This dataset contains sequences and command lines to replicate analyses published in Evolution of steroid receptor coactivator Taiman in arthropods (Chen et al., 2026). We provide lists of TAIMAN and Steroid Receptor Coactivator protein FASTA sequences or their regions used in analyses, accession numbers for related data (e.g., genomic contigs and scaffolds), and the corresponding MEME Suite command lines. The majority of data was collected from GenBank (https://www.ncbi.nlm.nih.gov/genbank/about/), a small fraction from VectorBase (https://vectorbase.org/vectorbase/app/). Alignments and analyses were performed in Geneious Prime software (Biomatters, Auckland, New Zealand), using various plugins (cited in the original article Chen et al., 2026) and in MEME Suite (https://meme-suite.org/meme/). We provide protein sequences for alignments (marked: *_for_alignment), sequences for analyses in MEME Suite (marked: *_for_MEME), and lists of complete TAI and SRC proteins. Accession numbers of used sequences are listed in Table S1-S6.
Files and variables
File: 68_representative_arthropod_sequences_used_in_Fig.S3-S18.fasta
Description: A FASTA file of representative arthropod TAIMAN protein sequences used in Supplementary Figures S3-S18.
File: MEME_suite_motifs_command_lines.txt
Description: Description of parameters used for analyses in MEME Suite.
File: Figure_6_AE1-containing_sequences.fasta
Description: A FASTA file of TAIMAN protein sequences used in Figure 6. TAIMAN sequences of Folsomia candida, Daphnia magna, Penaeus monodon, Scolopendra cingulata, Tityus serrulatus, and Tetranychus urticae were used for the Figure 6 inset only (SEYVRQELRAVVGAR motif).
File: Suppl_Figure_S1_sequences_for_alignment.fasta
Description: A FASTA file of TAIMAN and SRC protein sequences used to infer the phylogenetic tree in Figure 1 and Supplementary Figure S1.
File: Suppl_Figure_S2_sequences_for_alignment.fasta
Description: A FASTA file of TAIMAN protein sequences used in Supplementary Figure S2. It shows taiman gene multiplication in three species of horseshoe crabs (Limulidae, Merostomata, Chelicerata). Represents a subset of Suppl_Figure_S1_sequences_for_alignment.fasta.
File: Suppl_Figure_S3_sequences_bHLH_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the bHLH motif in Supplementary Figure S3.
File: Suppl_Figure_S4_sequences_PAS-A_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the PAS-A motif in Supplementary Figure S4.
File: Suppl_Figure_S5-S6_sequences_PAS-B_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the PAS-B motif in Supplementary Figure S5 and Figure S6.
File: Suppl_Figure_S7_sequences_of_N-terminal_TAI_region_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the N-terminal_TAI_region in Supplementary Figure S7.
File: Suppl_Figure_S8_sequences_L1_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the LxxLL1 motif in Supplementary Figure S8.
File: Suppl_Figure_S8_sequences_L2_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the LxxLL2 motif in Supplementary Figure S8.
File: Suppl_Figure_S9_sequences_L3_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the LxxLL3 motif in Supplementary Figure S9.
File: Suppl_Figure_S9_sequences_L4_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the LxxLL4 motif in Supplementary Figure S9.
File: Suppl_Figure_S10_sequences_L5_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the LxxLL5 motif in Supplementary Figure S10.
File: Suppl_Figure_S10_sequences_L6_for_alignment.fasta
Description: A FASTA file of TAIMAN protein sequences used to infer an alignment of the LxxLL6 motif in Supplementary Figure S10.
File: Suppl_Figure_S11_sequences_PPxY_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of two 'PPxY' motifs in Supplementary Figure S11.
File: Suppl_Figure_S12_sequences_bi-NLS_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the bipartite Nuclear Localization Signal (biNLS) motif in Supplementary Figure S12.
File: Suppl_Figure_S12_sequences_N-cNLS_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the classical monopartite Nuclear Localization Signal (N-cNLS) motif in Supplementary Figure S12.
File: Suppl_Figure_S14_sequences_AIAILM_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the 'AIAILM' motif in Supplementary Figure S14.
File: Suppl_Figure_S15_sequences_Q-rich1_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the glutamine-rich motif 1 (Q-r1) in Supplementary Figure S15.
File: Suppl_Figure_S15_sequences_Q-rich2_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the glutamine-rich motif 2 (Q-r2) in Supplementary Figure S15.
File: Suppl_Figure_S16_sequences_PNV_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the 'PNV' motif in Supplementary Figure S16.
File: Suppl_Figure_S17_sequences_DLELGL_for_alignment.fasta
Description: A FASTA file of arthropod TAIMAN protein sequences used to infer an alignment of the 'DLELGL' in Supplementary Figure S17.
File: Suppl_Figure_S18_sequences_L7_for_alignment.fasta
Description: A FASTA file of TAIMAN and SRC protein sequences used to infer an alignment of the LxxLL7 motif in Figure 7A.
File: Suppl_Figure_S7_sequences_of_N-terminal_TAI_region_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer sequence logos of the N-terminal TAI motifs using MEME Suite. The logos are presented in Figure 3B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S8_sequences_L1_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the LxxLL1 motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S8_sequences_L2_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the LxxLL2 motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S9_sequences_L3_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the LxxLL3 motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S9_sequences_L4_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the LxxLL4 motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S10_sequences_L5_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the LxxLL5 motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S10_sequences_L6_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the LxxLL6 motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S12_sequences_biNLS_N-terminal_part_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the N-terminal part of bipartite Nuclear Localization Signal (biNLS) motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S12_sequences_biNLS_C-terminal_part_for_MEME.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the C-terminal part of bipartite Nuclear Localization Signal (biNLS) motif using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S12_sequences_N-cNLS_for_MEME_Zygentoma-Coleoptera.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the (N-terminal) classical monopartite Nuclear Localization Signal (N-cNLS) motif from insect orders Zygentoma-Coleoptera using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Suppl_Figure_S12_sequences_N-cNLS_for_MEME_Trichoptera-Diptera.fasta
Description: A FASTA file of insect TAIMAN protein sequences used to infer a sequence logo of the (N-terminal) classical monopartite Nuclear Localization Signal (N-cNLS) motif from insect orders Trichoptera-Diptera using MEME Suite. The logo is presented in Figure 4B. MEME Suite command line is provided in the file: MEME_suite_motifs_command_lines.txt
File: Table_S1_Accession_numbers_for_Figure_1_Phylogeny_of_TAI_and_SRC_proteins.xlsx
Description: Accession numbers of TAIMAN and SRC protein sequences used to infer the phylogenetic tree in Figure 1 and Supplementary Figures S1 and S2.
File: Table_S2_Accession_numbers_of_68_representative_arthropod_sequences_used_in_Fig.S3-S18.xlsx
Description: Accession numbers of TAIMAN protein sequences used in Supplementary Figures S3-S18.
File: Table_S3_Accession_numbers_for_gene_models_in_Figure_5_.xlsx
Description: Accession numbers of TAIMAN contigs/scaffold, gene and transcript sequences used to infer taiman gene models in Figure 5.
File: Table_S4_Accession_numbers_for_sequences_used_in_Figure_6.xlsx
Description: Accession numbers of TAIMAN sequences used in Figure 6. Folsomia candida, Daphnia magna, Penaeus monodon, Scolopendra cingulata, Tityus serrulatus, and Tetranychus urticae were used for the Figure 6 inset (SEYVRQELRAVVGAR motif).
File: Table_S5_Accession_numbers_for_sequences_used_in_Figure_7A.xlsx
Description: Accession numbers of TAIMAN and SRC protein sequences used in Figure 7A.
File: Table_S6_Accession_numbers_for_sequences_used_in_Figure_7B.xlsx
Description: Accession numbers of TAIMAN protein sequences used in Figure 7B.
Access information
Other publicly accessible locations of the data:
GenBank
https://www.ncbi.nlm.nih.gov/genbank/about/
- https://www.ncbi.nlm.nih.gov/home/about/policies/
- All used data are available under public domain. Public domain information on the National Library of Medicine (NLM) Web pages may be freely distributed and copied.
VectorBase
- All data on these websites are provided freely for public use through the contributions of many researchers involved in generating genome sequences, functional genomics datasets, and additional information.
