Skip to main content
Dryad

Novel genomic insights into body size evolution in cetaceans and a resolution of Peto’s Paradox

Cite this dataset

Sun, Di et al. (2022). Novel genomic insights into body size evolution in cetaceans and a resolution of Peto’s Paradox [Dataset]. Dryad. https://doi.org/10.5061/dryad.280gb5mqd

Abstract

Cetaceans (whales, dolphins, and porpoises) have undergone a radical transformation from the typical terrestrial mammalian body plan to a streamlined one while exhibited dramatic inter-specific size ranges. However, the molecular mechanisms underlying the diversifying evolution of cetacean body size are largely unknown. Here, by using genome and phenotypic data from 22 cetaceans, we seek to investigate the genome-wide gene-phenotype correlation and to explore the genetic basis under the high diversity of body size in cetaceans. Results of the functional enrichment showed that body size-related genes in cetaceans were enriched in pathways associated with immunity, cell growth, and metabolism, suggesting their potential roles in the diversifying evolution of body size in cetaceans. A series of genes was also found coevolution with body size that are mainly involved in immune surveillance, tumor suppression function, and development of ‘cheater’ tumors. This in turn suggests that the genes play a role in tumor control and thus resolve Peto’s paradox, a finding that the expansion in body size and thereby cell number does not correlate with increases in cancer incidence in larger whales. The present study could provide novel insights into the evolution of great body size variation in cetaceans.

Methods

Gene Multi-alignments data

All the gene sets were aligned at the codon level using the Prank program with the option ‘-codon.’ All alignments have been deposited at Dryad. After the alignments were generated, the Gblocks program was used to trim potentially unreliable and gap regions. The parameters used were relatively strict to obtain as many bases as possible with the sequence type being codon (‘-t = c -b1=5 -b2=6 -b3=8 -b4=5 -b5’). The *html file showed the trim detail for each gene.

Usage notes

Supplementary data

Table S1–4. Phenotype data and sequencing data used in this study.

Table S5. PGLS results for genes with significant correlations between evolutionary rate (dN/dS) and body size (body length and body size, respectively).

Table S6. Summary of an enrichment analysis of candidate genes in three p-value categories by DAVID.

Table S7. Enrichment results for PAGs and NAGs by DAVID.

Table S8. Summary of positive selection in cetaceans.

Code

1.cds_prepare

01Getfa_gbk.pl: If input is gbk file, this script is used to extract cds, protein sequence and gene name.

02gff2cds.pl: If input is gff file, this script is used to extract cds, protein sequence and gene name.

03rm_sort_gene.pl: Delete the sequence with length < 150bp.

04Select_longest.pl: If there are more than one transcript for a gene, select the longest one.

2.paml_result

05collect.paml.Freeratio.pl: Used to batch collect dn/ds value from free ratio result.

3.pgls

06write1stepR.pl: Used to prepare script of PGLS in R.

07getresults.pl: Used to batch extract info from result of pgls.

08step23writeR.pl: Used to prepare script of two step verification in R.

09setpgls.pl: Used to batch extract info from result of two step verification.