Data from: Intra-individual polymorphism in chloroplasts from NGS data: where does it come from and how to handle it?
Scarcelli, N., Institut de Recherche pour le Développement
Mariac, C., Institut de Recherche pour le Développement
Couvreur, T. L. P., Université de Yaoundé I
Faye, A., Institut de Recherche pour le Développement
Richard, D., Institut de Recherche pour le Développement
Sabot, F., Institut de Recherche pour le Développement
Berthouly-Salazar, C., Institut de Recherche pour le Développement
Vigouroux, Y., Institut de Recherche pour le Développement
Published Sep 14, 2015 on Dryad.
https://doi.org/10.5061/dryad.31733
Cite this dataset
Scarcelli, N. et al. (2015). Data from: Intra-individual polymorphism in chloroplasts from NGS data: where does it come from and how to handle it? [Dataset]. Dryad. https://doi.org/10.5061/dryad.31733
Abstract
Next generation sequencing allows access to a large quantity of genomic data. In plants, several studies used whole chloroplast genome sequences for inferring phylogeography or phylogeny. Even though the chloroplast is a haploid organelle, NGS plastome data identified a non negligible number of intra-individual polymorphic SNPs. Such observations could have several causes such as sequencing errors, the presence of heteroplasmy or transfer of chloroplast sequences in the nuclear and mitochondrial genomes. The occurrence of allelic diversity has practical important impacts on the identification of diversity, the analysis of the chloroplast data and beyond that, significant evolutionary questions. In this study, we show that the observed intra-individual polymorphism of chloroplast sequence data is probably the result of plastid DNA transferred into the mitochondrial and/or the nuclear genomes. We further assess nine different bioinformatics pipelines’ error rates for SNP and genotypes calling using SNPs identified in Sanger sequencing. Specific pipelines are adequate to deal with this issue, optimizing both specificity and sensitivity. Our results will allow a proper use of whole chloroplast NGS sequence and will allow a better handling of NGS chloroplast sequence diversity.
Usage notes
A571_Run4-TAG-10
Dioscorea abyssinica A571 sample - Whole genome sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
A571_Run4-R1-TAG-10.bam
A571_Run7-TAG-9
Dioscorea abyssinica A571 sample - Whole genome sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
A571_Run7-R1-TAG-9.bam
A571_RunHi-TAG-9
Dioscorea abyssinica A571 sample - Whole genome sequencing (HiSeq) - .bam file containing reads after mapping on the chloroplast genome
A571_RunHi-R1-TAG-9.bam
CR659_Run10-TAG-70
Dioscorea rotundata CR659 sample - Whole genome sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
CR659_Run10-R1-TAG-70.bam
P458_Run4-TAG-25
Dioscorea praehensilis P458 sample - Whole genome sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
P458_Run4-R1-TAG-25.bam
Pa-Mayo9-R11T57_R1R2.fastq
Podococcus acaulis Pa-Mayoko9 sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pa-Ndjo3-R11T18_R1R2.fastq
Podococcus acaulis Pa-Ndjo3 sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pa-Ndjo9-R11T16_R1R2.fastq
Podococcus acaulis Pa-Ndjo9 sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pa-Ndjo11-R11T19_R1R2.fastq
Podococcus barteri Pb-Ndjo11 sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pb-Aloum8-R11T73_R1R2.fastq
Podococcus barteri Pb-Aloum8 sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pb-Campo1-R11T81_R1R2.fastq
Podococcus barteri Pb-Campo1 sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pb-Kola5-R11T54_R1R2.fastq
Podococcus barteri Pb-Kola5 sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pb-Oyem-R11T47_R1R2.fastq
Podococcus barteri Pb-Oyem sample - Chloroplast enrichment sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pb-Podo5-1-R2T53_R1R2.fastq
Podococcus barteri Podo 5-1 sample - Whole genome sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
Pb-Podo5-R11T79_R1R2.fastq
Podococcus barteri Podo 5 sample - Long range PCR sequencing (MiSeq) - .bam file containing reads after mapping on the chloroplast genome
TOG6208_mt_Run1-TAG-1
O. glaberrima TOG6028 sample - Whole genome sequencing (MiSeq) - .bam file containing reads after mapping on the mitochondrial genome. Only reads properly mapped on the chloroplast genome were used.
TOG6208_mt_Run1-TAG-6
O. glaberrima TOG6028 sample - LR-PCR sequencing (MiSeq) - .bam file containing reads after mapping on the mitochondrial genome. Only reads properly mapped on the chloroplast genome were used.
TOG6208_mt_Run1-TAG-LR
O. glaberrima TOG6028 sample - LR-PCR (MiSeq) - .bam file containing reads after mapping on the mitochondrial genome. Only reads properly mapped on the chloroplast genome were used.
TOG6208_nr_Run1-TAG-1
O. glaberrima TOG6028 sample - Whole genome sequencing (MiSeq) - .bam file containing reads after mapping on the nuclear genome. Only reads properly mapped on the chloroplast genome were used.
TOG6208_nr_Run1-TAG-6
O. glaberrima TOG6028 sample - Chloroplast enrichment (MiSeq) - .bam file containing reads after mapping on the nuclear genome. Only reads properly mapped on the chloroplast genome were used.
TOG6208_nr_Run1-TAG-LR
O. glaberrima TOG6028 sample - LR-PCR (MiSeq) - .bam file containing reads after mapping on the nuclear genome. Only reads properly mapped on the chloroplast genome were used.
TOG6208_cp_RUN1_TAG1
O. glaberrima TOG6028 sample - genomic library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
TOG6208_cp_RUN1-TAG2-4_10-12.bam
O. glaberrima TOG6028 sample - LRPCR library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
TOG6208_cp_RUN1-AG2-4_10-12.bam
TOG6208_cp_RUN1_TAG6
O. glaberrima TOG6028 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
PE08106-E1_cp_RUN5-TAG35
Pennisetum glaucum PE08106-E1 sample - genomic library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
RUN2_TAG32
Pennisetum glaucum PE11356 sample - LRPCR library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
RUN2_TAG11-20
Dioscorea rotundata_CR629_sample - genomic library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
RUN4_TAG-27
RUN4_TAG28
RUN4_TAG29
RUN4_TAG24
RUN4_TAG26
RUN4_TAG-12.bam
Dioscorea praehensilis_P603_sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
RUN4_TAG25
Dioscorea praehensilis_P458_sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
RUN4_TAG30
Dioscorea praehensilis_A571_sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used.
RUN3-TAG-55_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-55_R1_paired
RUN3-TAG-56_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-56_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN5-TAG35_R1_paired.fastq
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN5-TAG35_R2_paired.fastq
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-57_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-57_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-59_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-59_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-65_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-65_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-66_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-66_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-67_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-67_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-70_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-70_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
RUN3-TAG-55
Pennisetum glaucum 18311 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
RUN3-TAG-56
Pennisetum glaucum 18945 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
RUN3-TAG-57
Pennisetum glaucum 19529 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
RUN3-TAG-59
Pennisetum glaucum 9024 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
RUN3-TAG-65
Pennisetum glaucum PE01514 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
RUN3-TAG-66
Pennisetum glaucum PE02747 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
RUN3-TAG-67
Pennisetum glaucum PE05720 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
RUN3-TAG-70
Pennisetum glaucum PE05727 sample - chloroplast enrichment library (MiSeq) - .bam file containing reads after mapping on the chloroplast genome. Only reads properly mapped were used
filtcutRUN4-TAG-12_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-12_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-24_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-24_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-25_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-26_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-25_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-26_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-27_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-27_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-28_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-29_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-28_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-29_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-30_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-30_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-31_R1_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
filtcutRUN4-TAG-31_R2_paired
Fastq data : see sup. table 1 (MER-15-0195) for more details
Dioscorea_Bcf
vcf produce when calling SNP on Dioscorea samples with Bcf method
Dioscorea_Bcf_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with Bcf method
Dioscorea_BcfB
vcf produce when calling SNP on Dioscorea samples with BcfB method
Dioscorea_BcfB_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with BcfB method
Dioscorea_BVar15
vcf produce when calling SNP on Dioscorea samples with BVar15 method
Dioscorea_BVar15_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with BVar15 method
Dioscorea_BVar50
vcf produce when calling SNP on Dioscorea samples with BVar50 method
Dioscorea_BVar50_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with BVar50 method
Dioscorea_Hap
vcf produce when calling SNP on Dioscorea samples with Hap method
Dioscorea_Hap_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with Hap method
Dioscorea_Uni
vcf produce when calling SNP on Dioscorea samples with Uni method
Dioscorea_Uni1
vcf produce when calling SNP on Dioscorea samples with Uni1 method
Dioscorea_Uni1_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with Uni1 method
Dioscorea_Uni_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with Uni method
Dioscorea_Var15
vcf produce when calling SNP on Dioscorea samples with Var15 method
Dioscorea_Var15_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with Var15 method
Dioscorea_Var50
vcf produce when calling SNP on Dioscorea samples with Var50 method
Dioscorea_Var50_A571
vcf produce when calling SNP on Dioscorea abyssinica samples with Var50 method
SNP_mil_HET50-MINFREQ-15
vcf produce when calling SNP on Pennisetum glaucum (enriched, genomic and LRPCR libraries) with Mpileup -Bg and Varscan (min_freq_for_hom =0.5 and --min-var-freq 0.15)
SNP_PODO_HET50-MINFREQ-15
vcf produce when calling SNP on Podoccocus ssp. (enriched, genomic and LRPCR libraries) with Mpileup -Bg and Varscan (min_freq_for_hom =0.5 and --min-var-freq 0.15)
SNP_yam_HET50-MINFREQ-15
vcf produce when calling SNP on Dioscorea (enriched, genomic and LRPCR libraries) with Mpileup -Bg and Varscan (min_freq_for_hom =0.5 and --min-var-freq 0.15)
SNP_rice_HET50-MINFREQ-0
n calling SNP on Oryza glaberima (enriched, genomic and LRPCR libraries) with Mpileup -Bg and Varscan (min_freq_for_hom =0.5 and --min-var-freq 0)
SNP_rice_HET50-MINFREQ-5
n calling SNP on Oryza glaberima (enriched, genomic and LRPCR libraries) with Mpileup -Bg and Varscan (min_freq_for_hom =0.5 and --min-var-freq 5 )
SNP_rice_HET50-MINFREQ-10
n calling SNP on Oryza glaberima (enriched, genomic and LRPCR libraries) with Mpileup -Bg and Varscan (min_freq_for_hom =0.5 and --min-var-freq 10)
SNP_rice_HET50-MINFREQ-15
n calling SNP on Oryza glaberima (enriched, genomic and LRPCR libraries) with Mpileup -Bg and Varscan (min_freq_for_hom =0.5 and --min-var-freq 15 )