Biologists routinely use molecular markers to identify conservation units, to quantify genetic connectivity, to estimate population sizes, and to identify targets of selection. Many imperiled eagle populations require such efforts and would benefit from enhanced genomic resources. We sequenced, assembled, and annotated the first eagle genome using DNA from a male golden eagle (Aquila chrysaetos) captured in western North America. We constructed genomic libraries that were sequenced using Illumina technology and assembled the high-quality data to a depth of ~40x coverage. The genome assembly includes 2,552 scaffolds >10 Kb and 415 scaffolds >1.2 Mb. We annotated 16,571 genes that are involved in myriad biological processes, including such disparate traits as beak formation and color vision. We also identified repetitive regions spanning 92 Mb (~6% of the assembly), including LINES, SINES, LTR-RTs and DNA transposons. The mitochondrial genome encompasses 17,332 bp and is ~91% identical to the Mountain Hawk-Eagle (Nisaetus nipalensis). Finally, the data reveal that several anonymous microsatellites commonly used for population studies are embedded within protein-coding genes and thus may not have evolved in a neutral fashion. Because the genome sequence includes ~800,000 novel polymorphisms, markers can now be chosen based on their proximity to functional genes involved in migration, carnivory, and other biological processes.
annotated genes_proteins
16,571 genes were annotated in the golden eagle genome, this fasta file describes the protein sequences (see readme.pdf for more information). Corresponding file archived in fortress is kmer70_min10000_scaffolds_revisedassembly.all.maker.proteins.
kmer70_min10000_scaffolds_revisedassembly.all.maker.proteins.fasta
annotated genes_transcripts
16,571 genes were annotated in the golden eagle genome, this fasta files describes the transcripts (see readme.pdf for additional information). Corresponding file archived in fortress is kmer70_min10000_scaffolds_revisedassembly.all.maker.transcripts.
kmer70_min10000_scaffolds_revisedassembly.all.maker.transcripts.fasta
mitochondrial genome_GE_MITObim_revised
Golden eagle mitochondrial genome sequence assembled with MITObim vs. 1.6 and MIRA MIRA 3.4.1.1.
GE_MITObim_revised.fasta
xenobiotic sequences_Golden_Eagle_BLASTN_output
blastn (DiaGrid)
Input Options
Query File: GoldenEagle sequences with no significant match (E<10-6) to chicken genome
Database Selection: nt
Custom Database Format: N
Expectation Value: 1.0E-6
Word Size: 28
Search and Result Restriction Options
Max Target Seqs: 1000
Output Formatting Options
Format: 6
Custom Selection: qseqid sseqid qstart qend sstart send evalue bitscore pident length mismatch gapopen staxids sscinames scomnames sblastnames sskingdoms stitle
Golden_Eagle_BLASTN_output.txt
transposable elements_RepeatProteinMask output_ge_v2_all
############################
##### RepeatProteinMask ####
############################
## version: 4.0.2
## command:
RepeatProteinMask -noLowSimple -pvalue 1e-4 -engine abblast kmer70-v2-min200-scaffolds.fa
## output files:
ge_v2_all.annot
ge_v2_all.annot
transposable elements_repeatmasker_kmer70-v2-min200-scaffolds.fa
############################
###### RepeatMasker ########
############################
## RepeatMasker version 4.0.2
## RepeatMaskerLibrary-20130422 version
## command:
RepeatMasker -nolow -no_is -norna -dir . \
-lib RepeatMaskerLib.embl.lib \
kmer70-v2-min200-scaffolds.fa
## output files:
repeatmasker_kmer70-v2-min200-scaffolds.fa.masked
repeatmasker_kmer70-v2-min200-scaffolds.fa.tbl
repeatmasker_kmer70-v2-min200-scaffolds.fa.tar.gz
transposable elements_repeatmasker_kmer70-v2-min200-scaffolds.fa
############################
###### RepeatMasker ########
############################
## RepeatMasker version 4.0.2
## RepeatMaskerLibrary-20130422 version
## command:
RepeatMasker -nolow -no_is -norna -dir . \
-lib RepeatMaskerLib.embl.lib \
kmer70-v2-min200-scaffolds.fa
## output files:
repeatmasker_kmer70-v2-min200-scaffolds.fa.masked
repeatmasker_kmer70-v2-min200-scaffolds.fa.tbl
repeatmasker_kmer70-v2-min200-scaffolds.fa.tbl
annotated genes_gff
16,571 genes were annotated in the golden eagle genome, this is the .gff file associated with the annotations (see readme.pdf for additional information).
genome.all.gff
transposable elements_repeatmodeler_kmer70-v2-min200-scaffolds.fa.tar
############################
####### RepeatModeler ######
############################
## version: 1.0.7
## command:
BuildDatabase -name goldenEagle_v2_tmp -engine ncbi kmer70-v2-min200-scaffolds.fa
RepeatModeler -database goldenEagle_v2_tmp
RepeatMasker -nolow -no_is -norna -dir . -lib Ach-v2.consensi.fa.classified kmer70-v2-min200-scaffolds.fa
## output files:
Ach-v2.consensi.fa.classified
repeatmodeler_kmer70-v2-min200-scaffolds.fa.masked
repeatmodeler_kmer70-v2-min200-scaffolds.fa.tbl
repeatmodeler_kmer70-v2-min200-scaffolds.fa.tar.gz
transposable elements_repeatmodeler_kmer70-v2-min200-scaffolds.fa
############################
####### RepeatModeler ######
############################
## version: 1.0.7
## command:
BuildDatabase -name goldenEagle_v2_tmp -engine ncbi kmer70-v2-min200-scaffolds.fa
RepeatModeler -database goldenEagle_v2_tmp
RepeatMasker -nolow -no_is -norna -dir . -lib Ach-v2.consensi.fa.classified kmer70-v2-min200-scaffolds.fa
## output files:
Ach-v2.consensi.fa.classified
repeatmodeler_kmer70-v2-min200-scaffolds.fa.masked
repeatmodeler_kmer70-v2-min200-scaffolds.fa.tbl
repeatmodeler_kmer70-v2-min200-scaffolds.fa.tbl
transposable elements_Ach-v2.consensi.fa
############################
####### RepeatModeler ######
############################
## version: 1.0.7
## command:
BuildDatabase -name goldenEagle_v2_tmp -engine ncbi kmer70-v2-min200-scaffolds.fa
RepeatModeler -database goldenEagle_v2_tmp
RepeatMasker -nolow -no_is -norna -dir . -lib Ach-v2.consensi.fa.classified kmer70-v2-min200-scaffolds.fa
## output files:
Ach-v2.consensi.fa.classified
repeatmodeler_kmer70-v2-min200-scaffolds.fa.masked
repeatmodeler_kmer70-v2-min200-scaffolds.fa.tbl
Ach-v2.consensi.fa.classified
transposable elements_kmer70-v2-min200-scaffolds.fa.2.7.7.80.10.50.200
############################
########### trf ############
############################
## version: 4.04
## command:
trf kmer70-v2-min200-scaffolds.fa 2 7 7 80 10 50 200 -f -h
## output files:
kmer70-v2-min200-scaffolds.fa.2.7.7.80.10.50.200.dat
kmer70-v2-min200-scaffolds.fa.2.7.7.80.10.50.200.dat
microsatellites_kmer70_min200_scaffolds_revisedassembly_allmicrosatellites.fasta
I used the MISA.pl script (http://pgrc.ipk-gatersleben.de/misa/misa.html) to identify all microsatellites present in scaffolds greater than 200 bp.
Misa.pl kmer70_min200_scaffolds_revisedassembly.fasta
kmer70_min200_scaffolds_revisedassembly_allmicrosatellites.fasta.misa
genome assembly_kmer70_min10000_scaffolds_revisedassembly
The golden eagle genome was assembled in ABySS with the following parameters: abyss-pe s=202 n=10 k=70 l=30; followed by file specifications for 1) 'lib' -- the paired-end and mate-pair reads, 2) 'se' -- the unpaired and mate-pair reads used as single-end reads in assembly and 3) 'mp' -- the paired-end and mate-pair reads used in scaffolding.
kmer70_min10000_scaffolds_revisedassembly.tar.gz