Supplementary data for: Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression

Vanderpool, Dan 1 ; Minh, Bin Quang2; Lanfear, Robert2; Hughes, Daniel3; Murali, Shwetha3; Harris, R. Alan3; Raveendran, Muthuswamy3; Muzny, Donna M.3; Gibbs, Richard A.3; Worley, Kim C.3; Rogers, Jeffrey3; Hahn, Matthew W.1; Hibbins, Mark S.1; Williamson, Robert J.4

Published Nov 03, 2020 on Dryad. https://doi.org/10.5061/dryad.rfj6q577d

Abstract

Our understanding of the evolutionary history of primates is undergoing continual revision due to ongoing genome sequencing efforts. Bolstered by growing fossil evidence, these data have led to increased acceptance of once controversial hypotheses regarding phylogenetic relationships, hybridization and introgression, and the biogeographical history of primate groups. Among these findings is a pattern of recent introgression between species within all major primate groups examined to date, though little is known about introgression deeper in time. To address this and other phylogenetic questions, here we present new reference genome assemblies for three Old World Monkey species: Colobus angolensis ssp. palliatus (the black and white colobus), Macaca nemestrina (southern pig-tailed macaque), and Mandrillus leucophaeus (the drill). We combine these data with 23 additional primate genomes to estimate both the species tree and individual gene trees using thousands of loci. While our species tree is largely consistent with previous phylogenetic hypotheses, the gene trees reveal high levels of genealogical discordance associated with multiple primate radiations. We use strongly asymmetric patterns of gene tree discordance around specific branches to identify multiple instances of introgression between ancestral primate lineages. In addition, we exploit recent fossil evidence to perform fossil-calibrated molecular dating analyses across the tree. Taken together, our genome-wide data help to resolve multiple contentious sets of relationships among primates, while also providing insight into the biological processes and technical artifacts that led to the disagreements in the first place.

***********************************************

ALIGNMENTS

Aligned single-copy orthologs from primate datasets obtained via NCBI. Alignments were performed using GUIDANCE2 with Mafft.

An example of an alignment script is below.

#!/bin/bash

#PBS -k o
#PBS -l nodes=1:ppn=8,vmem=40gb,walltime=45:00:00
#PBS -M ddvanderpool@gmail.com
#PBS -m abe
#PBS -N 1_CODON_AlignQSUB
#PBS -j oe

cd /N/u/danvand/Carbonate/work_Primates/ALIGN_all_PRIMATE_GROUPS_4TAX/CODON_UNALIGNED

for file in *unalign.fa;
do
dest="/N/u/danvand/Carbonate/work_Primates/ALIGN_all_PRIMATE_GROUPS_4TAX/CODON_UNALIGNED/";
f=${file%\_TRANS_unalign.fa};

/N/u/danvand/Carbonate/src/guidance.v2.02/www/Guidance/guidance.pl --seqFile "$dest""$file" --dataset "$f" --msaProgram MAFFT --out Order as_input --proc_num 10 --seqCutoff 0.93 --colCutoff 0.95 --seqType codon --bootstraps 60 --outDir $PWD;

## The above line runs the file through GUIDANCE2 which generates column and seq scores using 60 bootstrap replicates from MAFFT codon alignments.

/N/u/danvand/Carbonate/bin/maskLowScoreResidues.pl "$f".MAFFT.aln.With_Names "$f".MAFFT.Guidance2_res_pair_res.scr "$f".GUIDANCE.MASK_SEQ.fa 0.93 nuc;

##The above line uses the scores file to mask low confidence residues

/N/u/danvand/Carbonate/bin/trimal -in "$dest""$f".GUIDANCE.MASK_SEQ.fa -out "$dest""$f".GUIDANCE.MASK_SEQ_TRIMal.fa -gt 0.5 -cons 50;

##The above lineTrimAl to cut out sites not present in at least 50% of the taxa or conserved in 50%

/N/u/danvand/Carbonate/pythonscripts/Clean.N.py "$dest""$f".GUIDANCE.MASK_SEQ_TRIMal.fa;

## The above line is just an adhoc fix because I realized Trimal was was counting the masked sequence and not trimming them. the easiest fix was to convert the N’s to gaps.

/N/u/danvand/Carbonate/pythonscripts/remove_aln_gaps_update.pl "$dest""$f".GUIDANCE.MASK_SEQ_TRIMal.fa.NoNs.fa .5 .9 200 > "$dest""$f”_NO_GAP.fa;

## The above line now takes the new “gapped” sequence and deletes a site if more than 1/2 of the taxa don’t have, Masks the whole sequence if more than 10% of the sequence is missing or if it is under 200bp.

/N/u/danvand/Carbonate/pythonscripts/ReplaceString.py "$f"_NO_GAP.fa "_.5_.9_200" "" "$f”_TRANS_GUIDANCE_TRIMAL_NoNcol_CODON.fa;

## The Above line just deletes fasta headerline annotation left from the previous script

/N/u/danvand/Carbonate/pythonscripts/Remove_All_N_Taxa.py "$f”_TRANS_GUIDANCE_TRIMAL_NoNcol_CODON.fa;

## The Above line just takes out taxa that composed only of N’s, this turns out to matter and is more of a double check for all seqs before they are finalized.

mv "$dest"*GUIDANCE* "$dest"../CODON_ALIGNED;
rm "$dest"*MAFFT* "$dest"*Seqs* "$dest"COS* "$dest"END* "$dest"Sample* "$dest"log "$dest”*NO_GAP.fa

##The above lines just cut down build up and only keeps the most relevant last couple of steps, just in case you want to change something. You don’t have to realign everything.

done;

***********************************************

Supplementary data for: Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression

Data files

Abstract

Methods

Usage notes

Works referencing this dataset