Gene duplication is an important evolutionary process thought to facilitate the evolution of phenotypic diversity. We investigated if gene duplication was associated with the evolution of phenotypic differences in a highly social insect, the honeybee Apis mellifera. We hypothesized that the genetic redundancy provided by gene duplication could promote the evolution of social and sexual phenotypes associated with advanced societies. We found a positive correlation between sociality and rate of gene duplications across the Apoidea, indicating that gene duplication may be associated with sociality. We also discovered that genes showing biased expression between A. mellifera alternative phenotypes tended to be found more frequently than expected among duplicated genes than singletons. Moreover, duplicated genes had higher levels of caste-, sex-, behavior-, and tissue-biased expression compared to singletons, as expected if gene duplication facilitated phenotypic differentiation. We also found that duplicated genes were maintained in the A. mellifera genome through the processes of conservation, neofunctionalization, and specialization, but not subfunctionalization. Overall, we conclude that gene duplication may have facilitated the evolution of social and sexual phenotypes, as well as tissue differentiation. Thus this study further supports the idea that gene duplication allows species to evolve an increased range of phenotypic diversity.
Amell_Ashby_casteFPKM
Column 1 has the A. mellifera gene IDs. Column 2 through 4 represents average FPKM for the respective caste (drone, queen, worker), across all samples. These expression levels were derived from Ashby et al.
Amell_SingleCopyGenes.txt
FIle with all the single copy genes in A. mellifera. Column 1(EOG) has the EOG id from OrthoDB v9.1 in which the single copy genes were derivived from. Column 1 (A. mellifera) has the single copy gene in A. mellifera and column 2 (B. terrestris) represents the sinlge copy ortholog in B. terrestris.
Amell_DupPairs.txt
File with all the duplicate genes in A. mellifera. Column 1(EOG) has the EOG id from OrthoDB v9.1 in which the gene duplicates were derivived from. Column 1 (Gene 1) and column 2 (Gene 2) represent the paralogs in A. mellifera in no particular order.
Amell_DupsPairs.txt
Amell_Bterr_EvoProc.txt
File that has the gene families used to identify the evolutionary processes maintaining paralogs in the A. mellifera genome. Column 1 (AMELL_D1) is the paralog with the highest sequence similarity to the single copy ortholog in B. terrestris, while column 2 (AMELL_D2) is the one with the lease sequence similarity. BTERR_ortho is the single copy ortholog of the duplicates in B. terrestris. Methods on how sequence similarity was measured can be found in the methods.
Bterr_caste_FPKM.txt
Column 1 has the B. terrestris gene IDs. Column 2 through 4 represents average FPKM for the respective caste, across all samples. These expression levels were derived from Harrison et al.
Bterr_casteFPKM.txt
Amell_Expression.txt
File that includes expression and other evolutionary measures for duplicate and singletons in A. mellifera.
column description
Gene A. mellifera gene ID
EOG OrthoDB v9.1 group ID
Homology Duplicate gene or singleton
Ashby_DQ_logFC Expression ratio (Drones/Queens) Ashby et al Dataset
Ashby_DQ_FDR FDR (Drones/Queens) Ashby et al Dataset
Ashby_DW_logFC Expression ratio (Drones/Workers) Ashby et al Dataset
Ashby_DW_FDR FDR (Drones/Workers) Ashby et al Dataset
Ashby_QW_logFC Expression ratio (Queens/Workers) Ashby et al Dataset
Ashby_QW_FDR FDR (Queens/Workers) Ashby et al Dataset
N_Tau Tissue Specificity, calculated across 10 tissues in Nurses; Jasper et al
F_Tau Tissue Specificity, calculated across 10 tissues in Foragers; Jasper et al
Jasper_NF_logFC Expression ratio (Nurse/Forager) in brains; Jasper et al Dataset
Jasper_NF_FDR FDR (Nurse/Forager) in brains; Jasper et al Dataset
V_DQ_logFC Expression ratio (Drones/Queens) Vleurinck et al Dataset
V_DQ_FDR (Drones/Queens) Vleurinck et al Dataset
V_DW_logFC Expression ratio (Drones/Workers) Vleurinck et al Dataset
V_DW_FDR FDR (Drones/Workers) Vleurinck et al Dataset
V_QW_logFC Expression ratio (Queens/Workers) Vleurinck et al Dataset
V_QW_FDR FDR (Queens/Workers) Vleurinck et al Dataset
C_QW_logFC Expression ratio (Queens/Workers) Cameron et al Dataset
C_QW_FDR FDR (Queens/Workers) Cameron et al Dataset
Amell_Expression.csv
ParseDups_ODBv9.1.pl
Perl script that uses an OrthoDB v9.1 tab delimited file to identify EOGs with a certain number of genes per species, which was used to identify genes that were duplicated in A. mellifera but single copy in every other species in the Apoidea.
ie. perl parse_full_orthodb9v1_byStuff.pl ODB_v9.1_input.txt 7460.1.1, 7461.1.1, 7462.1.1, 7463.1.1, 30195.1.1, 88501.1.1, 132113.1.1, 143995.1.1, 166423.1.1, 178035.1.1, 516756.1.1, 597456.1.1 ODB_v9.1_output.txt
Further details on how to run the script can be found in the file.
ParseDups_ODBv9.pl
Apoidea_DupRates.txt
Text file with 10 bees in the Apoidea, along with the number of species-specific duplicates, divergence time, duplication rate, and social state. Divergence time was taken from Cardinal and Danforth (2013). Social State was determined from Kapheim et al. (2015).