Novel genomic insights into body size evolution in cetaceans and a resolution of Peto’s Paradox
Data files
Feb 02, 2022 version files 124.57 MB
-
Gene_Multi-alignments.zip
Abstract
Cetaceans (whales, dolphins, and porpoises) have undergone a radical transformation from the typical terrestrial mammalian body plan to a streamlined one while exhibited dramatic inter-specific size ranges. However, the molecular mechanisms underlying the diversifying evolution of cetacean body size are largely unknown. Here, by using genome and phenotypic data from 22 cetaceans, we seek to investigate the genome-wide gene-phenotype correlation and to explore the genetic basis under the high diversity of body size in cetaceans. Results of the functional enrichment showed that body size-related genes in cetaceans were enriched in pathways associated with immunity, cell growth, and metabolism, suggesting their potential roles in the diversifying evolution of body size in cetaceans. A series of genes was also found coevolution with body size that are mainly involved in immune surveillance, tumor suppression function, and development of ‘cheater’ tumors. This in turn suggests that the genes play a role in tumor control and thus resolve Peto’s paradox, a finding that the expansion in body size and thereby cell number does not correlate with increases in cancer incidence in larger whales. The present study could provide novel insights into the evolution of great body size variation in cetaceans.
Methods
Gene Multi-alignments data
All the gene sets were aligned at the codon level using the Prank program with the option ‘-codon.’ All alignments have been deposited at Dryad. After the alignments were generated, the Gblocks program was used to trim potentially unreliable and gap regions. The parameters used were relatively strict to obtain as many bases as possible with the sequence type being codon (‘-t = c -b1=5 -b2=6 -b3=8 -b4=5 -b5’). The *html file showed the trim detail for each gene.
Usage notes
Supplementary data
Table S1–4. Phenotype data and sequencing data used in this study.
Table S5. PGLS results for genes with significant correlations between evolutionary rate (dN/dS) and body size (body length and body size, respectively).
Table S6. Summary of an enrichment analysis of candidate genes in three p-value categories by DAVID.
Table S7. Enrichment results for PAGs and NAGs by DAVID.
Table S8. Summary of positive selection in cetaceans.
Code
1.cds_prepare
01Getfa_gbk.pl: If input is gbk file, this script is used to extract cds, protein sequence and gene name.
02gff2cds.pl: If input is gff file, this script is used to extract cds, protein sequence and gene name.
03rm_sort_gene.pl: Delete the sequence with length < 150bp.
04Select_longest.pl: If there are more than one transcript for a gene, select the longest one.
2.paml_result
05collect.paml.Freeratio.pl: Used to batch collect dn/ds value from free ratio result.
3.pgls
06write1stepR.pl: Used to prepare script of PGLS in R.
07getresults.pl: Used to batch extract info from result of pgls.
08step23writeR.pl: Used to prepare script of two step verification in R.
09setpgls.pl: Used to batch extract info from result of two step verification.