Genomic DNA sequences and other genomic resources are essential towards the elucidation of the genomic bases of adaptive divergence and reproductive isolation. Here, we describe the construction, characterization and screening of a nonarrayed BAC library for lake whitefish (Coregonus clupeaformis). We then show how the combined use of BAC library screening and next-generation sequencing can lead to efficient full-length assembly of candidate genes. The lake whitefish BAC library consists of 181 050 clones derived from a single heterozygous fish. The mean insert size is 92 Kb, representing 5.2 haploid genome equivalents. Ten BAC clones were isolated following a quantitative real-time PCR screening approach that targeted five previously identified candidate genes. Sequencing of these clones on a 454 GS FLX system yielded 178 000 reads with a mean length of 358 bp, for a total of 63.8 Mb. De novo assembly and annotation then allowed retrieval of contigs corresponding to each candidate gene, which also contained up- and/or downstream noncoding sequences. These results suggest that the lake whitefish BAC library combined with next-generation sequencing technologies will be key resources to achieve a better understanding of both adaptive divergence and reproductive isolation in lake whitefish species pairs as well as salmonid evolution in general.
CARB assembly (ace) file
De novo assembly of pyrosequencing data (454 Genome Sequencer FLX System, with long-read GS FLX Titanium chemistry) for 2 pooled BAC clones targeted for carboxylesterase (CARB). GS De novo Assembler software (Roche, Basel, Switzerland) was used with the following parameters: large or complex genome option, trimming database of the pCC1BAC HindIII vector, screening database of the E. coli str. K12 substr. DH10B genome (GenBank [accession number CP000948]), minimum overlap length 200 bp, minimum overlap identity 98% and minimum contig length 500 bp. Raw data available at Sequence Read Archive at NCBI: SRP003484
CARB.ace
HSC assembly (ace) file
De novo assembly of pyrosequencing data (454 Genome Sequencer FLX System, with long-read GS FLX Titanium chemistry) for 2 pooled BAC clones targeted for heat shock cognate 70 kDa protein (HSC). GS De novo Assembler software (Roche, Basel, Switzerland) was used with the following parameters: large or complex genome option, trimming database of the pCC1BAC HindIII vector, screening database of the E. coli str. K12 substr. DH10B genome (GenBank [accession number CP000948]), minimum overlap length 200 bp, minimum overlap identity 98% and minimum contig length 500 bp. Raw data available at Sequence Read Archive at NCBI: SRP003484
HSC.ace
MDH assembly (ace) file
De novo assembly of pyrosequencing data (454 Genome Sequencer FLX System, with long-read GS FLX Titanium chemistry) for 2 pooled BAC clones targeted for cytoplasmic malate dehydrogenase (MDH). GS De novo Assembler software (Roche, Basel, Switzerland) was used with the following parameters: large or complex genome option, trimming database of the pCC1BAC HindIII vector, screening database of the E. coli str. K12 substr. DH10B genome (GenBank [accession number CP000948]), minimum overlap length 200 bp, minimum overlap identity 98% and minimum contig length 500 bp. Raw data available at Sequence Read Archive at NCBI: SRP003484
MDH.ace
GAPDH assembly (ace) file
De novo assembly of pyrosequencing data (454 Genome Sequencer FLX System, with long-read GS FLX Titanium chemistry) for 2 pooled BAC clones targeted for glyceraldehyde-3-phosphate dehydrogenase (GAPDH). GS De novo Assembler software (Roche, Basel, Switzerland) was used with the following parameters: large or complex genome option, trimming database of the pCC1BAC HindIII vector, screening database of the E. coli str. K12 substr. DH10B genome (GenBank [accession number CP000948]), minimum overlap length 200 bp, minimum overlap identity 98% and minimum contig length 500 bp. Raw data available at Sequence Read Archive at NCBI: SRP003484
GAPDH.ace
MHC assembly (ace) file
De novo assembly of pyrosequencing data (454 Genome Sequencer FLX System, with long-read GS FLX Titanium chemistry) for 2 pooled BAC clones targeted for MHC classII beta (MHC). GS De novo Assembler software (Roche, Basel, Switzerland) was used with the following parameters: large or complex genome option, heterozygosity option, trimming database of the pCC1BAC HindIII vector, screening database of the E. coli str. K12 substr. DH10B genome (GenBank [accession number CP000948]), minimum overlap length 200 bp, minimum overlap identity 98% and minimum contig length 500 bp. Raw data available at Sequence Read Archive at NCBI: SRP003484
MHC.ace