Name: Demographic model selection using random forests and the site frequency spectrum
Keywords: Haplotrema vancouverense

Phylogeographic data sets have grown from tens to thousands of loci in recent years, but extant statistical methods do not take full advantage of these large data sets. For example, approximate Bayesian computation (ABC) is a commonly used method for the explicit comparison of alternate demographic histories, but it is limited by the “curse of dimensionality” and issues related to the simulation and summarization of data when applied to next-generation sequencing (NGS) data sets. We implement here several improvements to overcome these difficulties. We use a Random Forest (RF) classifier for model selection to circumvent the curse of dimensionality and apply a binned representation of the multidimensional site frequency spectrum (mSFS) to address issues related to the simulation and summarization of large SNP data sets. We evaluate the performance of these improvements using simulation and find low overall error rates (~7%). We then apply the approach to data from Haplotrema vancouverense, a land snail endemic to the Pacific Northwest of North America. Fifteen demographic models were compared, and our results support a model of recent dispersal from coastal to inland rainforests. Our results demonstrate that binning is an effective strategy for the construction of a mSFS and imply that the statistical power of RF when applied to demographic model selection is at least comparable to traditional ABC algorithms. Importantly, by combining these strategies, large sets of models with differing numbers of populations can be evaluated.

Barcodes_Grp2_Mar2016_MEC-17-0128

Barcodes associated with Grp2_i03_Mar2016.fastq.gz.

Barcodes_Grp1_Mar2016_MEC-17-0128

Barcodes associated with Grp1_i12_Mar2016.fastq.gz.

Barcodes_Grp3_Mar2016_MEC-17-0128

Barcodes associated with Grp3_i04_Mar2016.fastq.gz.

Barcodes_Grp4_Mar2016_MEC-17-0128

Barcodes associated with Grp4_i05_Mar2016.fastq.gz.

Barcodes_Grp5_Mar2016_MEC-17-0128

Barcodes associated with Grp5_i06_Mar2016.fastq.gz.

Barcodes_Grp1_Nov2015_MEC-17-0128

Barcodes associated with Grp1_i06_Nov2015.fastq.gz.

i4_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi4-V1T-1_S45_L006_R1_001.fastq.gz.

i5_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi5-V1T-1_S46_L006_R1_001.fastq.gz.

i6_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi6-V1T-1_S47_L006_R1_001.fastq.gz.

i7_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi7-V1T-1_S48_L006_R1_001.fastq.gz.

i8_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi8-V1T-1_S49_L006_R1_001.fastq.gz.

i12_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi12-V1T-1_S50_L006_R1_001.fastq.

i3_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi3-V1T-1_S44_L006_R1_001.fastq.gz.

i2_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi2-V1T-1_S43_L006_R1_001.fastq.gz.

i1_barcodes_July2016_MEC-17-0128

Barcodes associated with ZBCi1-V1T-1_S42_L006_R1_001.fastq.gz.

ZBCi8-V1T-1_S49_L006_R1_001.fastq

ZBCi2-V1T-1_S43_L006_R1_001.fastq

ZBCi1-V1T-1_S42_L006_R1_001.fastq

ZBCi3-V1T-1_S44_L006_R1_001.fastq

ZBCi4-V1T-1_S45_L006_R1_001.fastq

ZBCi5-V1T-1_S46_L006_R1_001.fastq

ZBCi6-V1T-1_S47_L006_R1_001.fastq

ZBCi7-V1T-1_S48_L006_R1_001.fastq

Grp1_i06_Nov2015.fastq

Grp2_i03_Mar2016.fastq

Grp1_i12_Mar2016.fastq

Grp5_i06_Mar2016.fastq

Grp3_i04_Mar2016.fastq

Grp4_i05_Mar2016.fastq

ZBCi12-V1T-1_S50_L006_R1_001.fastq

params_ex

Example of a params file used in pyramid for this study.

Haplo_July2016_77Samples_p60.unlinked_snps

pyRAD output: unlinked snps

Haplo_July2016_77Samples_p60.snps

pyRAD output: SNPs

Haplo_July2016_77Samples_p60.alleles

pyRAD output: alleles

Haplo_July2016_77Samples_p60.loci

pyRAD output: loci

Data from: Demographic model selection using random forests and the site frequency spectrum

Data files

Abstract

Barcodes_Grp2_Mar2016_MEC-17-0128

Barcodes_Grp1_Mar2016_MEC-17-0128

Barcodes_Grp3_Mar2016_MEC-17-0128

Barcodes_Grp4_Mar2016_MEC-17-0128

Barcodes_Grp5_Mar2016_MEC-17-0128

Barcodes_Grp1_Nov2015_MEC-17-0128

i4_barcodes_July2016_MEC-17-0128

i5_barcodes_July2016_MEC-17-0128

i6_barcodes_July2016_MEC-17-0128

i7_barcodes_July2016_MEC-17-0128

i8_barcodes_July2016_MEC-17-0128

i12_barcodes_July2016_MEC-17-0128

i3_barcodes_July2016_MEC-17-0128

i2_barcodes_July2016_MEC-17-0128

i1_barcodes_July2016_MEC-17-0128

ZBCi8-V1T-1_S49_L006_R1_001.fastq

ZBCi2-V1T-1_S43_L006_R1_001.fastq

ZBCi1-V1T-1_S42_L006_R1_001.fastq

ZBCi3-V1T-1_S44_L006_R1_001.fastq

ZBCi4-V1T-1_S45_L006_R1_001.fastq

ZBCi5-V1T-1_S46_L006_R1_001.fastq

ZBCi6-V1T-1_S47_L006_R1_001.fastq

ZBCi7-V1T-1_S48_L006_R1_001.fastq

Grp1_i06_Nov2015.fastq

Grp2_i03_Mar2016.fastq

Grp1_i12_Mar2016.fastq

Grp5_i06_Mar2016.fastq

Grp3_i04_Mar2016.fastq

Grp4_i05_Mar2016.fastq

ZBCi12-V1T-1_S50_L006_R1_001.fastq

params_ex

Haplo_July2016_77Samples_p60.unlinked_snps

Haplo_July2016_77Samples_p60.snps

Haplo_July2016_77Samples_p60.alleles

Haplo_July2016_77Samples_p60.loci

Data from: Demographic model selection using random forests and the site frequency spectrum

Data files

Abstract

Usage notes

Barcodes_Grp2_Mar2016_MEC-17-0128

Barcodes_Grp1_Mar2016_MEC-17-0128

Barcodes_Grp3_Mar2016_MEC-17-0128

Barcodes_Grp4_Mar2016_MEC-17-0128

Barcodes_Grp5_Mar2016_MEC-17-0128

Barcodes_Grp1_Nov2015_MEC-17-0128

i4_barcodes_July2016_MEC-17-0128

i5_barcodes_July2016_MEC-17-0128

i6_barcodes_July2016_MEC-17-0128

i7_barcodes_July2016_MEC-17-0128

i8_barcodes_July2016_MEC-17-0128

i12_barcodes_July2016_MEC-17-0128

i3_barcodes_July2016_MEC-17-0128

i2_barcodes_July2016_MEC-17-0128

i1_barcodes_July2016_MEC-17-0128

ZBCi8-V1T-1_S49_L006_R1_001.fastq

ZBCi2-V1T-1_S43_L006_R1_001.fastq

ZBCi1-V1T-1_S42_L006_R1_001.fastq

ZBCi3-V1T-1_S44_L006_R1_001.fastq

ZBCi4-V1T-1_S45_L006_R1_001.fastq

ZBCi5-V1T-1_S46_L006_R1_001.fastq

ZBCi6-V1T-1_S47_L006_R1_001.fastq

ZBCi7-V1T-1_S48_L006_R1_001.fastq

Grp1_i06_Nov2015.fastq

Grp2_i03_Mar2016.fastq

Grp1_i12_Mar2016.fastq

Grp5_i06_Mar2016.fastq

Grp3_i04_Mar2016.fastq

Grp4_i05_Mar2016.fastq

ZBCi12-V1T-1_S50_L006_R1_001.fastq

params_ex

Haplo_July2016_77Samples_p60.unlinked_snps

Haplo_July2016_77Samples_p60.snps

Haplo_July2016_77Samples_p60.alleles

Haplo_July2016_77Samples_p60.loci

Works referencing this dataset