Data from: How to optimize the precision of allele and haplotype frequency estimates using pooled-sequencing data
Data files
Oct 02, 2017 version files 417.27 MB
- 
              
                DataHoechst.csv
                13.21 KB
- 
              
                DataPicogreen.csv
                17.62 KB
- 
              
                DNAconcentration_estimation_script_Fig.2-3.html
                4.02 MB
- 
              
                DNAconcentration_estimation_script_Fig.2-3.Rmd
                21.26 KB
- 
              
                GenomicData.zip
                408.24 MB
- 
              
                PipetteMetrology.csv
                29.92 KB
- 
              
                README_for_Simulation_script_Fig.1Fig.S1-4.txt
                3.16 KB
- 
              
                Simulation_script_Fig.1Fig.S1-4.html
                4.90 MB
- 
              
                Simulation_script_Fig.1Fig.S1-4.Rmd
                31.70 KB
Abstract
    Sequencing pools of individuals rather than individuals separately reduces the costs of estimating allele frequencies at many loci in many populations. Theoretical and empirical studies show that sequencing pools comprising a limited number of individuals (typically fewer than 50) provides reliable allele frequency estimates, provided that the DNA pooling and DNA sequencing steps are carefully controlled. Unequal contributions of different individuals to the DNA pool and the mean and variance in sequencing depth both can affect the standard error of allele frequency estimates. To our knowledge, no study separately investigated the effect of these two factors on allele frequency estimates; so that there is currently no method to a priori estimate the relative importance of unequal individual DNA contributions independently of sequencing depth. We develop a new analytical model for allele frequency estimation that explicitly distinguishes these two effects. Our model shows that the DNA pooling variance in a pooled sequencing experiment depends solely on two factors: the number of individuals within the pool and the coefficient of variation of individual DNA contributions to the pool. We present a new method to experimentally estimate this coefficient of variation when planning a pooled sequencing design where samples are either pooled before or after DNA extraction. Using this analytical and experimental framework, we provide guidelines to optimize the design of pooled sequencing experiments. Finally, we sequence replicated pools of inbred lines of the plant Medicago truncatula and show that the predictions from our model generally hold true when estimating the frequency of known multilocus haplotypes using pooled sequencing.
  
  
  
  