Premise of the study: We used moderately low-coverage (17×) whole-genome sequencing of Artocarpus camansi (Moraceae) to develop genomic resources for Artocarpus and Moraceae. Methods and Results: A de novo assembly of Illumina short reads (251,378,536 pairs, 2 × 100 bp) accounted for 93% of the predicted genome size. Predicted coding regions were used in a three-way orthology search with published genomes of Morus notabilis and Cannabis sativa. Phylogenetic markers for Moraceae were developed from 333 inferred single-copy exons. Ninety-eight putative MADS-box genes were identified. Analysis of all predicted coding regions resulted in preliminary annotation of 49,089 genes. An analysis of synonymous substitutions for pairs of orthologs (Ks analysis) in M. notabilis and A. camansi strongly suggested a lineage-specific whole-genome duplication in Artocarpus. Conclusions: This study substantially increases the genomic resources available for Artocarpus and Moraceae and demonstrates the value of low-coverage de novo assemblies for nonmodel organisms with moderately large genomes.
Genome assemblies, annotations, targets, and bait sequences from Gardner et al. 2016
A_camansi_v1.0_scaffolds.fasta -- Genome assembly v. 1.0 (Ray 2.3.1)
A_camansi_v1.1_scaffolds.fasta -- Genome assembly v. 1.1 (L_RNA_Scaffolder)
A_camansi_v1.1_genepredictions.gff -- Gene predictions in GFF format (Augustus)
A_camansi_v1.1_genepredictions.fna -- Predicted coding sequences (Augustus)
A_camansi_v1.1_genepredictions.faa.aa -- Predicted protein sequences (Augustus)
A_camansi_v1.1_genepredictions_annotations.txt -- Gene annotations (Trinotate)
artocarpus_333genes.fasta -- 333 phylogenetic marker sequences
artocarpus_mbgenes.fasta -- 98 putativeMADS-box gene sequences
artocarpus_volgenes.fasta -- 27 putative volatile gene sequences
artocarpus_baits.fasta -- 120mer bait sequences (designed by MYcroarray)
Gardner_et_al_2016.tar.gz
HybPiper assemblies for Johnson et al. (2016)
Output of HybPiper for 22 Artocarpus species and six outgroups, including Ficus and Morus. BWA was used to map the reads to the targets, including 333 genes used for phylogenetics (prefix "gene"), 98 MADS-Box genes (prefix "MB") and 27 genes functionally annotated as involved in volatile compounds (prefix "Vol"). After running the main HybPiper script (reads_first.py) we also ran "intronerate.py" to extract exon sequences, "paralog_investigator.py" to extract putative paralog sequences, "depth_calculator.py" to estimate depth of coverage in recovered exon sequence, and "cleanup.py" to remove redundant files from the SPAdes contig assembly.
Data from: Johnson M., E.M. Gardner, J. Shaw, Y. Liu, R. Medina, B. Goffinet, N.J.C. Zerega, and N. Wickett. HybPiper: extracting phylogenetic datasets from high-throughput sequencing reads using targeted bait capture. Applications in Plant Sciences
artocarpus_hybseq_bwa.tar.gz
Analyses from Johnson et al. 2016
README for HybPiper_artocarpus_analysis.tar.gz
artocarpus_trimmed.exon.tar.gz
Trimmed coding sequence (exon) alignments for 28 taxa for 333 genes. Sequences were aligned in MAFFT and trimmed using Trimal, discarding all columns with more than 80% missing data.
supercontig_trimmed_fasta.tar.gz
Trimmed supercontig sequence alignments (containing both exons and flanking "splash zone" intron sequence) for 28 taxa for 333 genes. Sequences were aligned in MAFFT and trimmed using Trimal, discarding all columns with more than 80% missing data.
artocarpus_hybpiper_genelengths.txt
Lengths of exon sequence (CDS) recovered for 458 loci for 28 taxa using the BWA method. File generated using "get_seq_lengths.py" in HybPiper, and used to generate the heatmap figure in the HybPiper manuscript (with "gene_recovery_heatmap.R")
allbaitsuppercase.fna
Nucleotide "target" file used with HybPiper. For 333 loci, there are two orthologous sequences per gene: one from the Artocarpus camansi draft genome, and one from the Morus notabalis genome. For the remaining genes, only an Artocarpus ortholog is present.
artocarpus_bwa.supercontig.supermatrix.raxml.tre
RAxML phylogeny generated from a concatenated supermatrix of supercontig sequences for 22 Artocarpus species and six outgroups, using the 333 "phylogenetic" loci. Tree generated from nucleotide data using the GTRCAT model, with one partition per gene.
artocarpus_hybseq.exon.raxml.names.tre
RAxML phylogeny generated from a concatenated supermatrix of exon sequences extracted by HybPiper for 22 Artocarpus species and six outgroups, using the 333 "phylogenetic" loci. Node labels indicate bootstrap support from 200 "fast bootstrap" replicates.
Analyses from: Johnson M., E.M. Gardner, J. Shaw, Y. Liu, R. Medina, B. Goffinet, N.J.C. Zerega, and N. Wickett. HybPiper: extracting phylogenetic datasets from high-throughput sequencing reads using targeted bait capture. Applications in Plant Sciences
HybPiper_artocarpus_analysis.tar.gz