Natural selection and language genes in humans
Data files
Aug 29, 2025 version files 5.31 MB
-
Lang_Supplemental_File1.zip
3.70 MB
-
Language_suppl_figures.pdf
1.46 MB
-
README.md
14.72 KB
-
Supplemental_Table_S1.csv
33.83 KB
-
Supplemental_Table_S2.csv
15.15 KB
-
Supplemental_Table_S3A.csv
1.79 KB
-
Supplemental_Table_S3B.csv
3.58 KB
-
Supplemental_Table_S4.csv
1.24 KB
-
Supplemental_Table_S5.csv
6.70 KB
-
Supplemental_Table_S6.csv
75.54 KB
Abstract
In this study we construct lists of candidate genes for articulate language. Analysis of coding regions of over 100 candidate genes for the effects of natural selection (directional episodic selection and relaxed/intensified selection) in the various lineages of primates (thirty-four nonhuman primate species, plus Homo sapiens Neanderthals and Denisovans) revealed a burst of increased selection effects on neural genes at the node leading to the Homo sapiens-Neanderthal-Denisova triad, followed by bursts of selection effects on neural genes related to language in both the Denisovan and Neanderthal lineages. Those latter increases in involvement of neural genes in Neanderthals and Denisovans can be contrasted with the missing or slight response to selection on those same genes in the H. sapiens lineage. The genes involved in these bursts can mostly be classified as involved in synapse structure and maintenance. We develop a hypothesis for how synaptic efficiency could be related to language acquisition in these lineages.
Dataset DOI: 10.5061/dryad.sbcc2frk5
Description of the data and file structure
The data in this collection are supplemental to the paper "Natural Selection and Language Genes in Humans". DNA sequence matrices for 175 genes for 42 taxa can be found in "Lang_Supplemental_File1.zip". All sequences used to construct these matrices were obtained from GENBANK. These matrices were used to perform all tests for selection accomplished in the paper. We used DATAMONKEY (https://www.datamonkey.org/). The study involves an examination of natural selection in these genes some of which are involved in language acquisition. Once we obtained measures of selection, genes obviously under selection were examined for interactions using STRING (https://string-db.org/).
Files and variables
File: Language_Supplemental_File1.zip
Description: This zip file holds phylogenetic matrices for 175 genes that we examined for natural selection. These files can be imported directly into DATAMONKEY (https://www.datamonkey.org/) and analyzed using BUSTED, aBSREL and RELAX algorithms, using Lemurs as outgroups. This online program is freely available and can also be downloaded as a desktop standalone (announced in the following published paper: https://pmc.ncbi.nlm.nih.gov/articles/PMC5850112/).
The files included in this zip archive are listed below. The file type is at the end of the name. Three file types are given here (.txt, .nex and .fasta). All three file types are compatible with DATAMONKEY. The naming convention of the files is as follows: the gene name is given first in capital letters. The information between the gene name and the file type describes the way the phylogenetic matrices were aligned. All files have a "nt_ali." indicating that the file is made of nucelotides that have been aligned using a codon aware algorithm (TranslatorX: a freely available alignment program see Abascal, Federico, Rafael Zardoya, and Maximilian J. Telford. "TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations." Nucleic acids research 38, no. suppl_2 (2010): W7-W13. and available at http://161.111.160.230/index_v5.html). The "proteins", "res.", "mafft." and "DNA_sequence.txt" simply refer to the format of the data matrix before being converted to a TranslatorX codon aware alignment.
ABDH8_proteins.txt.nt_ali.fasta
ABDH8_res.nt_ali.fasta.nex
ABDH8*_res.nt_ali.fasta
ABTB3.BTBD11.mafft.txt.nt_ali.fasta
ADSL_DNA_sequence.txt.nt_ali.fasta
AKT1.sequence.txt.nt_ali.fasta
ANXA1_mafft.txt.nt_ali.fasta
BTB.12_mafft.txt.nt_ali.fasta.txt
ARFGEF2.sequence.txt.nt_ali.fasta
ARHGAP32_DNA_sequence.txt.nt_ali.fasta
ARMCX2_res.nt_ali.fasta.nex
ASPM.mafft.txt.nt_ali.fasta
AUTS2_mafft.txt.nt_ali.clean.nex
BCL2L1.mafft.txt.nt_ali.fasta
BSN_sequence.txt.nt_ali.fasta
CACNA1C_mafft.txt.nt_ali.fasta
CACNA1C*_mafft.txt.nt_ali.fasta
CACNA1E.sequence.txt.nt_ali.fasta
CACNA1I_mafft.txt.nt_ali1.fasta
CACNA1I_mafft.txt.nt_aliA.fasta
CACNA1S_proteins.txt.nt_ali.fasta
CACNA2D2.sequence.txt.nt_ali.fasta
CACNB4.mafft.txt.nt_ali.fasta
CDH13.mafft.txt.nt_ali.fasta
CDKL5_res.nt_ali.fasta.nex
CFAP99.mafft.txt.nt_ali.fasta
CHEK1_proteins.txt.nt_ali.fasta
CHRM3.mafft.txt.nt_ali.fasta
CHRNA4_proteins.txt.nt_ali.fasta
CLCN2.sequence.txt.nt_ali.fasta
CLDN5.mafft.txt.nt_ali.fasta
CLOCK.sequence.txt.nt_ali.fasta
CNTNAP2.mafft.txt.nt_ali.fasta
CNTNAP4.sequence.txt.nt_ali.fasta
COMT.mafft.txt.nt_ali (modified).fasta
CORO2B_proteins.txt.nt_ali.fasta
CRHR1**.mafft.txt.nt_ali.fasta
CSMD1.mafft.txt.nt_ali.fasta
CSNK1G2_proteins.txt.nt_ali.fasta
CTNNB1.mafft.txt.nt_ali.fasta
CTNND2.mafft.txt.nt_ali.fasta
CUX1.mafft.txt.nt_ali.fasta
CUX2.mafft.txt.nt_ali.fasta
DBN1.sequence.txt.nt_ali.fasta
DCC.sequence.txt.nt_ali.fasta
DGKQ_proteins.txt.nt_ali.fasta
DHCR7_proteins.txt.nt_ali.fasta
DISC1.sequence.txt.nt_ali.fasta
DLG2.mafft.txt.nt_ali.fasta
DNAAF4_DYX1C1_mafft.txt.nt_ali.fasta
DRD1_proteins.txt.nt_ali.fasta
DTNBP1_mafft.txt.nt_ali.fasta
EDNRA.sequence.txt.nt_ali.fasta
EDNRB.sequence.txt.nt_ali.fasta
EGR3_proteins.txt.nt_ali.fasta
EN2.mafft.txt.nt_ali.fasta
EPN2_proteins.txt.nt_ali.fasta
ESR1.mafft.txt.nt_ali.fasta
EVC2_DNA_sequence.txt.nt_ali.fasta
FGFR1.sequence.txt.nt_ali.fasta
FGFR2.sequence.txt.nt_ali.fasta
FILIP1_proteins.txt.nt_ali.fasta
FOXP2.mafft.txt.nt_ali.fasta
GABRA4.mafft.txt.nt_ali.fasta
GABRB3.mafft.txt.nt_ali.fasta
GABRG2_proteins.txt.nt_ali.fasta
GGCX_DNA_sequence.txt.nt_ali.fasta
GLRA2.mafft.txt.nt_ali.fasta
GNPTAB.mafft.txt.nt_ali.fasta
GPR83_proteins.txt.nt_ali.fasta
GPR101_proteins.txt.nt_ali.fasta
GRID2_mafft.txt.nt_ali.fasta
GRIN2A_proteins.txt.nt_ali.fasta
GRIN2B.mafft.txt.nt_ali.fasta
GTF2I.sequence.txt.nt_ali.fasta
HAPLN1.mafft.txt.nt_ali.fasta
HSPBP1_proteins.txt.nt_ali.fasta
HTR1A.sequence.txt.nt_ali.fasta
HTR2B_DNA_sequence.txt.nt_ali.fasta
IL1RAPL1.mafft.txt.nt_ali.fasta
ITPR2.mafft.txt.nt_ali.fasta
JAZF1_proteins.txt.nt_ali.fasta
KATNA1_DNA_sequence.txt.nt_ali.fasta
KATNA1_DNA_sequence.txt.nt_aliA.nex
KCNA4_proteins.txt.nt_ali.fasta
KCNC4_proteins.txt.nt_ali.fasta
KCNC4_res.nt_ali.fasta
KCNH1.mafft.txt.nt_ali.fasta
KCNH8.mafft.txt.nt_ali.fasta
KCNJ3_proteins.txt.nt_ali.fasta
KCNK6.mafft.txt.nt_ali.fasta
KCNS3.mafft.txt.nt_ali.fasta
KCNT2_proteins.txt.nt_ali.fasta
KIT.sequence.txt.nt_ali.fasta
KRT90_res.nt_ali.fasta
LGR6.mafft.txt.nt_ali.fasta
LHX1_proteins.txt.nt_ali.fasta
LRP2_proteins.txt.nt_ali.fasta
LRRK2.mafft.txt.nt_ali.fasta
LSAMP_mafft.txt.nt_ali.fasta
LUZP1_DNA_sequence.txt.nt_ali.fasta
LYNX1_proteins.txt.nt_ali.fasta
MAD1l1.sequence.txt.nt_ali.fasta
MAOA.mafft.txt.nt_ali.fasta
MAPT.mafft.txt.nt_ali.fasta
MCAT_proteins.txt.nt_ali.fasta
MCPH1.mafft.txt.nt_ali.fasta
MGLL_proteins.txt.nt_ali.fasta
NAV3_proteins.txt.nt_ali.fasta
NBPF3.mafft.txt.nt_ali.fasta
NCAM1.sequence.txt.nt_ali.fasta
NEFM.maafft.txt.nt_ali.fasta
NLGN1.mafft.txt.nt_ali.fasta
NOS1.mafft.txt.nt_ali.fasta
NOVA1_DNA_sequence.txt.nt_ali.fasta
NPAS1.sequence.txt.nt_ali.fasta
NPAS3_proteins.txt.nt_ali.fasta
NPSR1.mafft.txt.nt_ali.fasta
NR4A2_proteins.txt.nt_ali.fasta
NRCAM.mafft.txt.nt_ali.fasta
NRXN1_mafft.txt.nt_ali.fasta
NTRK2.mafft.txt.nt_ali.fasta
OPRL1_proteins.txt.nt_ali.fasta
PARK7.mafft.txt.nt_ali.fasta
PCDH15.mafft.txt.nt_ali.fasta
PCDH17.sequence.txt.nt_ali.fasta
PCNT.mafft.txt.nt_ali.fasta
PCOLCE2_proteins.txt.nt_ali.fasta
PDE4B.mafft.txt.nt_ali.fasta
POMT1_proteins.txt.nt_ali.fasta
PRKCB.mafft.txt.nt_ali.fasta
PRKCB**.mafft.txt.nt_ali.fasta
RAPGEF4.mafft.txt.nt_ali.fasta
RELN.mafft.txt.nt_ali.fasta
RGMB_proteins.txt.nt_ali.fasta
RIMS1**_res.nt_ali.fasta
RNFT2_proteins.txt.nt_ali.fasta
RPRM_proteins.txt.nt_ali.fasta
RTN4R_proteins.txt.nt_ali.fasta
S100B.mafft.txt.nt_ali.fasta
SCAND1_proteins.txt.nt_ali.fasta
SCN10A_proteins.txt.nt_ali.fasta
SCN11A.mafft.txt.nt_ali.fasta
SLC6A11.mafft.txt.nt_ali.fasta
SLC6A12.mafft.txt.nt_ali.fasta
SLC17A8.mafft.txt.nt_ali.fasta
SLITRK1_DNA_sequence.txt.nt_ali.fasta
Snap25_mafft.txt.nt_trim_ali.fasta
SNTA1_proteins.txt.nt_ali.fasta
SPP1.sequence.txt.nt_ali.fasta
SRD5A1_proteins.txt.nt_ali.fasta
SRGAP2_proteins.txt.nt_ali.fasta
SST_proteins.txt.nt_ali.fasta
STX8_proteins.txt.nt_ali.fasta
STX11_proteins.txt.nt_ali.fasta
STX16.mafft.txt.nt_ali.fasta
SUSD6_proteins.txt.nt_ali.fasta
SV2c_trimfix_align.fasta.nex
SYNE1.mafft.txt.nt_ali.fasta
SYT2.mafft.txt.nt_ali.fasta
SYT7.sequence.txt.nt_ali.fasta
TAC3_proteins.txt.nt_ali.fasta
TF.mafft.txt.nt_ali.fasta
THSD7B_proteins.txt.nt_ali.fasta
TIAM1_MAFFT.txt.nt_ali.fasta
TRPC4_proteins.txt.nt_ali.fasta
UBASH3B.mafft.txt.nt_ali.fasta
UBE2W_proteins.txt.nt_ali.fasta
USP10.mafft.txt.nt_ali.fasta
VAPA.mafft.txt.nt_ali.fasta
VAT1L.mafft.txt.nt_ali.fasta
VCAN_DNA_sequence.txt.nt_ali.fasta
WFS1.mafft.txt.nt_ali.fasta
WNK1.mafft.txt.nt_ali.fasta
WNT7A_proteins.txt.nt_ali.fasta
YIPF3_proteins.txt.nt_ali.fasta
ZNF197.mafft.txt.nt_ali.fasta
ZNF713_proteins.txt.nt_ali.fasta
ZNRF1_proteins.txt.nt_ali.fasta
File: Language_suppl_figures.pdf
Description: This .pdf file holds the supplemental tables for "Natural Selection and Language Genes in Humans". All figures were produced using STRING.
Supplemental Figure S1. Interaction networks of two control gene sets. A. Network for 100 randomly chosen genes from the human genome. B. Network for the neural gene control dataset described in the text.
Supplemental Figure S2. A. Gene network for 21 randomly chosen control neural genes (out of 53). B. Gene network for 21 control genes randomly chosen from the human genome. C. Gene Network for 21 genes identified as under the aBSREL routine. Three sub-networks are clear – 1) NPAS3-AUTS2-RELN-CUX2, 2) GABRG2-MCPH1 and 3) CACNA1E – CACNA2D2- CACNAIC- NOS1.
Files: Supplemental_Table_S*
Supplemental_Table_S1.csv
Lists of candidate genes for language for 33 studies.
The headers represent published articles used in the study that are discussed in "Natural Selection and Language Genes in Humans". The candidate genes described in the listed publication are shown under the publication name. Under each header we list candidate genes, empty cells can be ignored.
Supplemental_Table_S2.csv
1000 neural genes: lists 1000 genes thought to be involved in language as well as whether the gene had synonomous and nonsynonomous variation. There are four columns as described below. Under each header we list candidate genes, empty cells can be ignored.
The four columns in the table are:
A. Gene symbol (gene symbol for the 1000 genes)
B. replacement only (whether the gene has replacement substitutions ONLY)
C. silent only (whether the gene has silent substitutions ONLY)
D. silent and replacement (whether the gene has BOTH silent and replacement substitutions)
Supplemental_Table_3A.csv
This Table contains lists of the candidate and control genes we used in tests for natural selection. We also indicate in this table whether or not the control genes have synonomous or nonsynonomous variation. Under each header we list control and candidate genes, empty cells can be ignored. There are three columns with the following content.
A. Final candidate gene list
B. Final control gene list
C. Indicates whether the control gene in column B has synonomous or nonsynonomous variation.
Supplemental_Table_S3B.csv
This table contains Gene Lengths fpr all of the candidate genes. Under each header we list the genes, empty cells can be ignored.
The six columns in the table are
A. Candidate genes
B. Candidate gene Length
C. control genes with syn and nonsyn variation
D. Length of genes with syn and nonsyn variation
E. Control genes without syn or nonsyn variation
F. Length of control genes without syn or nonsyn variation
Supplemental_Table_S4.csv
List of primate taxa used. Under the headers we list the taxon name and the abbreviation used in the study, empty cells can be ignored.
The two columns are for the
A. taxon abbreviation
B taxon name.
Supplemental_Table_S5.csv
List of phylogenetic matrices used in the study. These files can be found in Supplemental File 1and the description of the file name components is given in the caption for Supplemental File 1. Empty cells can be ignored. NA indicares control genes that were not analyzed due to alignment problems.
The four columns are for
A. Candidate genes
B. Cand_File_Name
C. Control genes.
D. Cont_File_Name
Supplemental_Table_S6.csv
Results of aBSREL, RELAX and BUSTED
There are 25 columns in the table that describe the results of the aBSREL, RELAX and BUSTED tests as described in "Natural Selection and Language Genes in Humans".
Empty cells can be ignored. The columns labels are listed below and refer to:
A. aBSREL-candidate gene: aBSREL-candidate genes and control genes tested
B. Triad or Single: Indicates whether the tested group was the triad (sapiens+Neamderthal+Denisova) or a single taxon;"triad" indicates the triad node was tested; other listings indicate that either NEA (Neanderthal), Denisova (DEN) or SAP (sapiens) had identical sequences and the test was not done on the triad.
C. aBSREL-Candidate/Control gene result: Result for the candidate and control genes using the aBSREL test with significance level; "NA" indicates not performed due to matrix size.
D. RELAX: first screen: List of the 122 candidate genes tested using RELAX
E. RELAX-Denisova/bb: Results of RELAX test for -Denisova/bb: Denisova vs primate backbone contrasted
F. RELAX-Neanderthal/bb: Results of RELAX test for Neanderthal/bb: Neanderthal vs primate backbone contrasted
G. RELAX-sapiens/bb: Results of RELAX test for sapiens/bb: sapiens vs primate backbone contrasted
H. RELAX-triad/bb: Results of RELAX test for the triad/bb: (sapiens+Neamderthal+Denisova) vs primate backbone contrasted
I. triad or single archaic: Indicates whether the test involved the triad or if Denisova and Neanderthal were identical
J. RELAX-Candidate gene: Candidate genes testeed for multiple nodes: gene name of the 20 candidate genes with positive results from previous test
K. DENvBB: Results of RELAX test for DENvBB: with Denisova vs primate backbone
L. NEAvBB: Results of RELAX test for NEAvBB: with Neanderthal vs primate backbone
M. SVBB: Results of RELAX test for SvBB: with sapiens vs primate backbone
N. triadvBB: Results of RELAX test for triadvBB: with triad (sapiens+Neamderthal+Denisova) vs primate backbone
O. apes: Results of RELAX test for apes : with apes vs rest of primate backbone
P. C+triad: Results of RELAX test for C+triad: with chimps plus triad vs rest of primate backbone
Q. G+C+triad: Results of RELAX test for G+C+triad: with gorillas, chimps plus triad vs rest of primate backbone
R. BUSTED-candidate-gene: Candidate genes tested with BUSTED for the 122 candidate genes
S. Triad: Whether or not the clade tested used the (sapiens+Neamderthal+Denisova) triad or if two of the triad were identical
T. BUSTED-candidate gene -result: Results of BUSTED tests for 119 candidate genes
U. BUSTED-control-gene: Control genes tested with BUSTED
V. BUSTED-control gene -result: Results of BUSTED tests for 54 control genes
W summary-aBSREL: gene list-summary-aBSRELgives list of genes detected using aBSREL
X. summary-busted: gene list-summary-busted gives list of genes detected using BUSTED
Y. summary-relax: gene list-summary-relax : list of genes detected using RELAX
NP=not performed due to anomalies in matrix
Human subjects data
There are no PPI data in this paper. Individuals are NOT identified.
