Convergent evolution and predictability of gene copy numbers associated with diets in mammals
Data files
Dec 06, 2023 version files 2.75 MB
-
carnivore.tsv
-
Corrected_DFA_Table.csv
-
Data_Preparation_Script_Median.R
-
DFA.R
-
herbivore.tsv
-
Mammal_Data.xlsx
-
omnivore.tsv
-
Phylogenetic_Correction_Script_Median.R
-
Phylogenetic_Tree_Script_Median.R
-
PhyloTree.newick
-
RAxML_bipartitions.result_FIN4_raw_rooted_wBoots_4098mam1out_OK.newick
-
README.md
Abstract
Convergent evolution, the evolution of the same or similar phenotypes in phylogenetically independent lineages, is a widespread phenomenon in nature. If the genetic basis for convergent evolution is predictable to some extent, it may be possible to infer organismic phenotypes and adaptability based on genome sequence data. While repeated amino acid changes have been studied in association with convergent evolution, relatively little is known about the potential contribution of repeated gene copy number changes. In this study, we explore whether certain gene copy number changes are linked to diet shifts in mammals and assess if trophic ecology can be inferred from the copy numbers of a specific set of genes. Using 86 mammalian genome sequences, we identified several genes with higher copy numbers in herbivores, carnivores, and omnivores, even after phylogenetic corrections. We were able to confirm previous findings on genes such as amylase, olfactory receptor, and xenobiotic metabolism genes, and identify novel genes whose copy numbers correlate with dietary patterns. For example, omnivores exhibited higher copy numbers of genes encoding gene expression regulators. We also established a discriminant function based on the copy numbers of 13 genes that can help predict trophic ecology based on genome sequence data. These findings highlight a possible association between convergent evolution and repeated copy number changes in specific genes, suggesting the potential to develop a method for predicting animal ecology and adaptability from genome sequence data.
README: Data_Preparation_Script_Median.R
This README file was generated on 2023-11-28 by Jun Kitano.
GENERAL INFORMATION
Title of Dataset: Analysis of convergent evolution and predictability of gene copy numbers associated with diets in mammals
Author Information
Principal Investigator Contact Information
Name: Jun Kitano
Institution: National Institute of Genetics
Address: Mishima, Shizuoka, Japan
Email: jkitano@nig.ac.jpDate of data collection: 2022-2023
Geographic location of data collection: Mishima, Japan
Information about funding sources that supported the collection of the data: NIG Summer Internship Program, JSPS Kakenhi (23KJ0483, 22H04925, and 22H04983), JST CREST (JPMJCR19S2 and JPMJCR20S2), and MEXT (JPMXD1521474594).
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
Links to publications that cite or use the data:
Wilhoit, K., Yamanouchi, S., Chen. B.Y., Yamasaki., Y.Y., Ishikawa, A., Inoue, J., Iwasaki, W., and Kitano, J.(2024). Convergent evolution and predictability of gene copy numbers associated with diets in mammals.
Links to other publicly accessible locations of the data: None
Links/relationships to ancillary data sets: None
Was data derived from another source? Yes
A. If yes, list source(s): Dryad database (doi:10.5061/dryad.qd450 and doi.org/10.5061/dryad.tb03d03) and an original paper (doi: 10.1098/rspb.2014.2103).Recommended citation for this dataset:
Gainsbury AM, Tallowin OJS, Meiri S. 2018. An updated global data set for diet preferences in terrestrial mammals: testing the validity of extrapolation. Mamm. Rev. 48:160–167. doi:10.5061/dryad.qd450.
Upham NS, Esselstyn JA, Jetz W. 2019. Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLoS Biol. 17:e3000494. doi.org/10.5061/dryad.tb03d03.
Tucker MA, Rogers TL. 2014. Examining predator-prey body size, trophic level and body mass across marine and terrestrial mammals. Proc. Biol. Sci. 281. doi: 10.1098/rspb.2014.2103.
DATA & FILE OVERVIEW
- File List:
A) herbivore.tsv
B) carnivore.tsv
C) omnivore.tsv
D) Data_Preparation_Script_Median.R
E) Mammal_Data.xlsx
F) Phylogenetic_Tree_Script_Median.R
G) RAxML_bipartitions.result_FIN4_raw_rooted_wBoots_4098mam1out_OK.newick
H) PhyloTree.newick
I) Phylogenetic_Correction_Script_Median.R
J) DFA.R
K) Corrected DFA Table.csv
Relationship between files, if important: None
Additional related data collected that was not included in the current data package: None
Are there multiple versions of the dataset? No
A. If yes, name of file(s) that was updated: NA
i. Why was the file updated? NA
ii. When was the file updated? NA
#########################################################################
DATA-SPECIFIC INFORMATION FOR: herbivore.tsv, carnivore.tsv, and omnivore.tsv
Number of variables: 348
Number of cases/rows: 77 (herbivore.tsv), 19 (carnivore.tsv), and 61 (omnivore.tsv)
Variable List:
group_id: orthologous gene ID
pMCMC: significance test after phylogenetic correction
positive.median: median copy number of the target category (eg. herbivore)
negative.median: median copy number of the non-target category (eg. non-herbivore)
ratio: Ratio of the median copy number of the target category divided by that of the non-target category
*.tag: protein sequence identification number
*.len: translated amino acid length
*.prod: gene product
*.gcn: gene copy number
Abbreviations of species names
Acijub*: Acinonyx jubatus
Ailmel*: Ailuropoda melanoleuca
Balacusca*: Balaenoptera acutorostrata
Bisbisbis*: Bison bison
Bosmut*: Bos mutus
Bostar*: Bos taurus
Bubbub*: Bubalus bubalis
Caljac*: Callithrix jacchus
Calurs*: Callorhinus ursinus
Cambac*: Camelus bactrianus
Camdro*: Camelus dromedarius
Canlupfam*: Canis lupus
Caphir*: Capra hircus
Carsyr*: Carlito syrichta
Cersimsim*: Ceratotherium simum
Chlsab*: Chlorocebus sabaeus
Chrasi*: Chrysochloris asiatica
Colangpal*: Colobus angolensis palliatus
Concri*: Condylura cristata
Dasnov*: Dasypus novemcinctus
Delleu*: Delphinapterus leucas
Desrot*: Desmodus rotundus
Echtel*: Echinops telfairi
Eleedw*: Elephantulus edwardii
Enhlutken*: Enhydra lutris
Eptfus*: Eptesicus fuscus
Equasi*: Equus asinus asinus
Equcab*: Equus caballus
Equprz*: Equus przewalskii
Erieur*: Erinaceus europaeus
Eumjub*: Eumetopias jubatus
Felcat*: Felis catus
Glomel*: Globicephala melas
Gorgorgor*: Gorilla gorilla
Hiparm*: Hipposideros armiger
Homsap*: Homo sapiens
Lagobl*: Lagenorhynchus obliquidens
Lepwed*: Leptonychotes weddellii
Lipvex*: Lipotes vexillifer
Loxafr*: Loxodonta africana
Lyncan*: Lynx canadensis
Macmul*: Macaca mulatta
Manjav*: Manis javanica
Minnat*: Miniopterus natalensis
Mondom*: Monodelphis domestica
Monmon*: Monodon monoceros
Musmus*: Mus musculus
Musputfur*: Mustela putorius furo
Myoluc*: Myotis lucifugus
Neoasiasi*: Neomonachus schauinslandi
Neosch*: Neophocaena asiaeorientalis
Neovis*: Neovison vison
Nomleu*: Nomascus leucogenys
Odorosdiv*: Odobenus rosmarus
Odovirtex*: Odocoileus virginianus
Orcorc*: Orcinus orca
Ornana*: Ornithorhynchus anatinus
Oryafeafe*: Orycteropus afer
Orycun*: Oryctolagus cuniculus
Otogar*: Otolemur garnettii
Oviari*: Ovis aries
Panpar*: Panthera pardus
Pantig*: Panthera tigris altaica
Pantro*: Pan troglodytes
Phacin*: Phascolarctos cinereus
Phycat*: Phyllostomus discolor
Phydis*: Physeter catodon
Ponabe*: Pongo abelii
Pteale*: Pteropus alecto
Ptevam*: Pteropus vampyrus
Pumcon*: Puma concolor
Ratnor*: Rattus norvegicus
Rouaeg*: Rousettus aegyptiacus
Sarhar*: Sarcophilus harrisii
Sorara*: Sorex araneus
Sursur*: Suricata suricatta
Susscr*: Sus scrofa
Trimanlat*: Trichechus manatus
Turtru*: Tursiops truncatus
Ursame*: Ursus americanus
Ursarc*: Ursus arctos
Ursmar*: Ursus maritimus
Vicpac*: Vicugna pacos
Vomurs*: Vombatus ursinus
Vulvul*: Vulpes vulpes
Zalcal*: Zalophus californianus
Missing data codes: Empty cells indicate that the orthologous gene was not found.
Specialized formats or other abbreviations used: None
#########################################################################
R script for preparation of diet and copy number data used for the subsequent analysis
#########################################################################
DATA-SPECIFIC INFORMATION FOR: Mammal_Data.xlsx
Number of variables: 5
Number of cases/rows: 86
Variable List:
* binomial: ID for the species
* species: species names
* CommonName: common names for the species
* TrophicLevel: categorial classification of trophic ecology
* TrophicPosition: quantitative data on the trophic position. NA, not available.
Missing data codes: NA (data not available)
Specialized formats or other abbreviations used: None
#########################################################################
Phylogenetic_Tree_Script_Median.R
R script for preparation of the phylogenetic tree
#########################################################################
RAxML_bipartitions.result_FIN4_raw_rooted_wBoots_4098mam1out_OK.newick
Input file of a mammalian phylogenetic tree in newick format
#########################################################################
PhyloTree.newick
Output file of Phylogenetic_Tree_Script_Median.R in newick format
#########################################################################
Phylogenetic_Correction_Script_Median.R
R script for phylogenetic correction
#########################################################################
DFA.R
R script for discriminant function analysis
#########################################################################
DATA-SPECIFIC INFORMATION FOR: Corrected DFA Table.csv
This is an inout file for DFA.R.
Number of variables: 42
Number of cases/rows: 86
Variable List:
* binomial: ID for the species
* diet: categorial classification of trophic ecology
* habitat: categorial classification of habitat
* Columns 4-42 indicate the copy numbers of each orthologous gene ID
Missing data codes: None
Specialized formats or other abbreviations used: None
Methods
The publicly available data were used. The scripts for processing the data are uploaded here.