A core question in evolutionary biology is whether convergent phenotypic evolution is driven by convergent molecular changes in proteins or regulatory regions. We combined phylogenomic, developmental, and epigenomic analysis of 11 new genomes of paleognathous birds, including an extinct moa, to show that convergent evolution of regulatory regions, more so than protein-coding genes, is prevalent among developmental pathways associated with independent losses of flight. A Bayesian analysis of 284,001 conserved noncoding elements, 60,665 of which are corroborated as enhancers by open chromatin states during development, identified 2355 independent accelerations along lineages of flightless paleognaths, with functional consequences for driving gene expression in the developing forelimb. Our results suggest that the genomic landscape associated with morphological convergence in ratites has a substantial shared regulatory component.
Assemblies and associated code and log files
01_assembly.tar.gz
Genome annotations and associated files
02_annotation.tar.gz
Homology calls and associated files
03_homology.tar.gz
Whole genome alignment and associated files
04_whole_genome_alignment.tar.gz
Phylogenomic datasets and associated code
05_phylogenomics.tar.gz
Datasets for protein-coding gene analysis
06_protein_coding_datasets.tar.gz
Scripts and associated files for the analysis of protein-coding genes
06_protein_coding_scripts.tar.gz
Results from protein-coding analysis
06_protein_coding_results.tar.gz
Scripts and associated files for CNEE analysis
07_cnee_scripts.tar.gz
Datasets for CNEE analysis
07_cnees_datasets.tar.gz
Results of PhyloAcc analysis of CNEEs
07_cnees_phyloAcc_results.tar.gz
Parsed CNEE analysis files and output
07_cnees_processed.tar.gz
Data and results from ATAC-seq
08_ATACseq.tar.gz
Assembly fasta files
All new assemblies; these are the pre-Genbank versions used for our analysis
assemblies.tar.gz
Conserved element BED files
BED files of conserved elements and CNEEs on both galGal4 and galGal5 coordinates
cons_elem_beds.tar.gz
Enhancer screen image files
raw images associated with in vivo tests of enhancer function
enhancer_screen_images.zip
ATAC-seq peak calls
ATAC-seq peaks called on galGal4 coordinates from a variety of tissues and time points (in chicken)
final_consistent_peaks.tar.gz
Annotations in GFF format
gene annotations for each new assembly in GFF3 format
gffs.tar.gz
Aligned protein sequences (expanded set)
Protein alignments from homology groups estimated across birds (including new palaognaths). The expanded set includes additionally sequences from moa and flightless cormorant.
HOG_fastas_aligned_filtered_expanded.tar.gz
Codon alignments (expanded set)
Codon alignments (produced with PRANK) from homology groups estimated across birds (including new palaognaths). The expanded set includes additionally sequences from moa and flightless cormorant.
PAML_final_PRANK_fastas_expanded.tar.gz
Codon alignments
Codon alignments (produced with PRANK) from homology groups estimated across birds (including new palaognaths).
PAML_final_PRANK_fastas_original.tar.gz
Aligned protein sequences (expanded set)
Protein alignments from homology groups estimated across birds (including new palaognaths).
HOG_fastas_aligned_filtered_original.tar.gz
Protein FASTA files
FASTA files of annotated proteins from new genomes. One file per genome.
protein_fastas.tar.gz
Species trees
Species trees in Newick format
species_trees.tar.gz
Transcript FASTA files
Transcript sequences for newly annotated genomes in FASTA format. One file per species.
transcript_fastas.tar.gz