Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate-pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi-C and Dovetail Genomics Chicago libraries and long-read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high-quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high-quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi-C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome-level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi-C libraries increased the longest scaffold over 12-fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50-fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long-read sequencing.
CamDro2 predicted proteins
CamDro2 genome predicted proteins produced by a single round of MAKER (Holt and Yandell: MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 2011 12:491).
dromedary.pbjelly.pilon.abyss.pilon.all.fasta.all.maker.proteins.renamed.annotated.fasta.zip
CamDro2 predicted mRNA
CamDro2 genome predicted mRNA transcripts produced by a single round of MAKER (Holt and Yandell: MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 2011 12:491).
dromedary.pbjelly.pilon.abyss.pilon.all.fasta.all.maker.transcripts.renamed.annotated.fasta.zip
CamDro1 assembly (GCA_000803125.1) improved by Dovetail Genomics Chicago and Hi-C libraries gzipped (www.gzip.org) FASTA file
Dovetail Genomics created Chicago and Dovetail Hi-C libraries from a low passage cell culture line (Perelman et al. 2018) derived from ear fibroblasts of the same dromedary used in CamDro1 (GCA_000803125.1). Dovetail Genomics created both Chicago and Hi-C libraries with the DpnII restriction enzyme, sequenced these libraries on six lanes of an Illumina HiSeq sequencer, and then scaffolded the CamDro1 assembly using the HiRise pipeline (Putnam et al. 2016). First, the CamDro1 assembly was split at gaps and scaffolded using Dovetail Chicago data. Then, the Chicago assembly was improved by scaffolding with Hi-C data creating a Hi-C assembly. Assembly is a gzipped (www.gzip.org) FASTA file.
dromedary.fasta.gz
CamDro2 genome gzipped FASTA file
CamDro2 genome gzipped (www.gzip.org/
) FASTA file, chromosomes are identified by >1,>2,...,>36,>X (for chromosomes 1,2,...,36,X). There is no Y chromosome as the sequenced dromedary was female.
dromedary.pbjelly.pilon.abyss.pilon.chromosomes.fasta.gz
CamDro2 gene annotations in GFF format (zipped)
dromedary.pbjelly.pilon.abyss.pilon.all.renamed.annotated.maker.qualityfilter.gff.zip
cDNA transcripts used to train Augustus for the second round of MAKER
cDNA transcripts used to train Augustus for the second round of MAKER
dromedary.pbjelly.pilon.abyss.pilon.all.maker.transcripts.cdna.for.augustus.training.zip
analysis-steps-for-manuscript
These are the are the rough analysis steps for the manuscript. If you find these steps helpful, please cite the manuscript!
RH-alpaca-probe-sequences
RH alpaca probe sequences for Steps 20, 28, and 33 of analysis-steps-for-manuscript.txt. RH probe sequences are in 37 different files contained in the master zip file (chromosome 1,2,...,36,and X). Even though files are named with .txt format, they are FASTA formatted files.