Cichlid fishes (family Cichlidae) are models for evolutionary and ecological research. Massively parallel sequencing approaches have been successfully applied to study relatively recent diversification in groups of African and Neotropical cichlids, but such technologies have yet to be used for addressing larger scale phylogenetic questions of cichlid evolution. Here we describe a process for identifying putative single-copy exons from five African cichlid genomes and sequence the targeted exons for a range of divergent (> tens of millions of years) taxa with probes designed from a single reference species (Oreochromis niloticus, Nile tilapia). Targeted sequencing of 923 exons across ten cichlid species that represent the family's major lineages and geographic distribution resulted in a complete taxon matrix of 564 exons (649,549 bp), representing 559 genes. Maximum likelihood and Bayesian analyses in both species tree and concatenation frameworks yielded the same fully resolved and highly supported topology, which matched the expected backbone phylogeny of the major cichlid lineages. This work adds to the body of evidence that it is possible to use a relatively divergent reference genome for exon target design and successful capture across a broad phylogenetic range of species. Furthermore, our results show that the use of a third-party laboratory coupled with accessible bioinformatics tools make such phylogenomics projects feasible for research groups that lack direct access to genomics facilities. We expect that these resources will be used in further cichlid evolution studies and hope the protocols and identified targets will also be useful for phylogenetic studies of a wider range of organisms.
Andinoacara pulcher sorted BAM file
Sorted BAM file for Andinoacara pulcher generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
AndinoacaraStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Biotoecus dicentrarchus sorted BAM file
Sorted BAM file for Biotoecus dicentrarchus generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
BiotoecusStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Cichla temensis sorted BAM file
Sorted BAM file for Cichla temensis generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
CichlaStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Crenicichla frenata sorted BAM file
Sorted BAM file for Crenicichla frenata generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
CrenicichlaStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Etroplus suratensis sorted BAM file
Sorted BAM file for Etroplus suratensis generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
EtroplusStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
‘Geophagus’ brasiliensis sorted BAM file
Sorted BAM file for ‘Geophagus’ brasiliensis generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
GeophagusStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Herichthys cyanoguttatus sorted BAM file
Sorted BAM file for Herichthys cyanoguttatus generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
HerichthysStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Heterochromis multidens sorted BAM file
Sorted BAM file for Heterochromis multidens generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
HeterochromisStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Reference sequences from Oreochromis niloticus
List of 923 reference sequences (FASTA format) from Oreochromis niloticus used for mapping and assembly.
Oreochromis_Refseq_923.fna
Oreochromis niloticus sorted BAM file
Sorted BAM file for Oreochromis niloticus generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
OreochromisStandaloneFiltered_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Paratilapia polleni sorted BAM file
Sorted BAM file for Paratilapia polleni generated using SAMtools. Mapping and assembly of reads was conducted with bowtie2 version 2.1.0 using the 'very sensitive' preset parameters, with a maximum fragment length for paired-end alignments (-X) of 600, with the 923 tilapia exons as the reference.
ParatilapiaStandaloneFiltered_Ref923_PairedEndandSingletonsAligned_Presetverysensitive.sorted.bam
Alignment files for exons captured for all ten species
Alignment files for exons captured for all ten species plus the reference sequence from the Nile tilapia (Oreochromis niloticus). Each individual exon alignments was sorted into one of three categories: 'good', 'passable', or 'poor'. Only alignments in the categories 'good' and 'passable' categories were used for phylogenetic analysis.
Included are the concatenated alignments used for Bayesian (MrBayes; nexus format) and maximum likelihood (RAxML; phylip format) phylogenetic analyses and individual exon alignments (phylip format) used in RAxML bootstrap analyses that were ultimately analyzed in a species tree framework using MP-EST.
Ilves&Lopez-Fernandez.MER_alignments.zip
Summary of steps
This file describes the steps used to identify SCP exons across 5 cichlid genomes, and process Illumina MiSeq paired-end sequence data from 923 exons across a broad phylogenetic range of 10 cichlid species.
This is most certainly not an elegant or efficient way to conduct these searches & analyses [efficiency could be greatly improved through more automation (i.e., use of scripts)], but the steps below are feasible by those with minimal programming expertise (and a lot of patience).
Ilves&Lopez-Fernandez_SummaryOfSteps.txt
Sequence files
Consensus sequences in FASTA format for all exons captured, generated with the following quality filters: base and mapping qualities of at least 20 and a minimum depth of coverage of 10. Sequence positions that did not meet all of these criteria are coded as N. Only sequences with a minimum of 50 bases are included. Each file is named with the ENSEMBL exon ID for Oreochromis niloticus and the genus name of the taxon.
Sequences_min50bases.zip