Data from: Development of SNP genotyping arrays in two shellfish species

Lapègue, Sylvie1; Harrang, Estelle1; Heurtebise, Serge1; Flahauw, Emilie1; Donnadieu, Cécile2; Gayral, Philippe3; Ballenghien, Marion3; Genestout, Lucie4; Barbotte, Laetitia4; Mahla, Rachid4; Haffray, Pierrick5; Klopp, Christophe6

Published Jan 16, 2014 on Dryad. https://doi.org/10.5061/dryad.jr233

Data files

Jan 16, 2014 version files 39.46 MB

Alignments gigas in silico.xls

24.27 MB
Alignments in vitro edulis.zip

28.89 KB
Alignments in vitro gigas.zip

19.43 KB
merge.pileup-vcf.gz

15.14 MB

Abstract

Use of SNPs has been favored due to their abundance in plant and animal genomes, accompanied by the falling cost and rising throughput capacity for detection and genotyping. Here, we present in vitro (obtained from targeted sequencing) and in silico discovery of SNPs, and the design of medium-throughput genotyping arrays for two oyster species, the Pacific oyster, Crassostrea gigas, and European flat oyster, Ostrea edulis. Two sets of 384 SNP markers were designed for two Illumina GoldenGate arrays and genotyped on more than 1000 samples for each species. In each case, oyster samples were obtained from wild and selected populations and from three-generation families segregating for traits of interest in aquaculture. The rate of successfully genotyped polymorphic SNPs was about 60% for each species. Effects of SNP origin and quality on genotyping success (Illumina functionality score) were analyzed and compared with other model and non-model species. Furthermore, a simulation was made based on a subset of the C. gigas SNP array with a minor allele frequency of 0.3 and typical crosses used in shellfish hatcheries. This simulation indicated that at least 150 markers were needed to perform an accurate parental assignment. Such panels might provide valuable tools to improve our understanding of the connectivity between wild (and selected) populations and could contribute to future selective breeding programs.

Alignments of C. gigas in silico sequences

For the in silico SNPs, we investigated in 2009 the 6th assembly of the Crassostrea gigas EST database (http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html). The database contained results of the assembly of 55,851 public ESTs from dbEST and 417 Genbank mRNA sequences. The assembly, performed with TGICL (http://compbio.dfci.harvard.edu/tgi/software/; parameters -l 60 -p 96 -s 100000 -O '-p 75 -s 500'), produced an alignment file from which 1370 SNPs were extracted. We looked for SNPs that complied with the initial criteria: a minimum depth of seven sequences, with a minimum allele count of three, and the absence of any other SNP in the 60 bp segment flanking the analyzed SNP to the left or right. As these conditions appeared too stringent, and did not produce many SNPs, we relaxed the criteria to a minimum depth of five sequences with a minimum allele count of two, and allowed there to be a SNP within 120 bp of the SNP of interest, as long as there was only one and it was not close.

Alignments gigas in silico.xls

Alignments of O. edulis in silico sequences

For the in silico SNPs, we investigated O. edulis transcriptome sequence data from eight individuals from the natural range (Cahais et al. 2012, Gayral et al. 2011). For the present study, the 454 and Illumina reads were assembled using a multi-kmer strategy (kmers: 37, 41, 45, 49, 53, 57 and 61, assembled with Velvet version 1.1.03). Contigs longer than 100 bp from every assembly were then meta-assembled with TGICL (http://compbio.dfci.harvard.edu/tgi/software/). The Illumina reads were remapped on the contigs using BWA (0.5.9-r16) and a compressed alignment file was produced using SAMtools view (version 0.1.11). The alignment file was then used to call the SNPs with SAMtools pileup and varFilter (version 0.1.11). In this database, we looked for SNPs that represented different contigs, with a depth ranging from 20 to 500 at the position and no other SNPs in the surrounding 120 bp. The SNP quality score was initially set at 20 but finally, due to the high number of SNPs available, we only used SNPs with the highest score of 227.

merge.pileup-vcf.gz

Alignments of O. edulis in vitro sequences

For the European flat oyster, in vitro sequencing investigated 40 loci from two EST libraries (Morga et al. 2011, 2012). Primers were designed using Primer3 software package (Rozen and Skaletsky 2000). A total of 22 oysters, 16 from four different natural populations collected on the Atlantic and Mediterranean coasts and six belonging to the first generations of three selected families for resistance to bonamiosis were used to investigate polymorphisms. The PCR and sequencing protocols used were the same as those given in Harrang et al. (2013). Sequence alignment was performed with ClustalW via the BioEdit interface (Hall 1999). The validity of each SNP was checked individually on nucleotide sequences and sequence alignments. A total of 420 in vitro SNPs were detected in the dataset of 40 sequenced fragments. Among them, the indels (n = 34) were discarded. Moreover, 347 SNPs were also discarded because of neighboring polymorphisms or low functionality scores. However, as we wanted some genes of interest to be represented in the SNP dataset, we kept some (n = 13) that had neighboring polymorphisms. To favor genotyping, those polymorphic nucleotides were treated as degenerated nucleotides. In total, 52 in vitro SNPs were included in the array, representing 35 different gene fragments.

Alignments in vitro edulis.zip

Alignments of C. gigas in vitro sequences

For the Pacific oyster, in vitro sequencing investigated 103 loci from ESTs retrieved from the Genbank database (http://www.ncbi.nlm.nih.gov/) or from specific libraries that had been obtained to detect genes differentially regulated during summer mortality events (Fleury et al. 2009). Primers were designed using the Primer3 software package (Rozen and Skaletsky 2000). For a first set of ESTs (n = 61), 24 oysters belonging to a third generation of selection for summer mortality resistance were used in the SNP discovery phase (Sauvage 2008; Sauvage et al. 2007). A second set of ESTs (n = 42) was then added and 10 of the 24 oysters were used for sequencing, as described in Sauvage et al. (2007), together with a third set of five SNPs from the 20 developed by Bai et al. (2009). Sequence alignment was performed with ClustalW via the BioEdit interface (Hall 1999) and DNAMAN version 4.1 (www.lynnon.com). The validity of each SNP was checked manually on the chromatograms and sequence alignments. A total of 321 in vitro SNPs were detected in the first dataset of 61 sequenced fragments, and 380 in the second dataset of 42 sequenced fragments. Among those 701 SNPs, 72 were selected (39 and 33 from the two datasets, respectively) because they had high functionality scores and no neighboring polymorphisms. However, as we wanted to be sure that some genes of interest were represented in the SNP dataset, for several ESTs we kept two SNPs. Therefore, our 72 selected in vitro SNPs were obtained from 65 different ESTs.

Alignments in vitro gigas.zip