Data from: Evolutionary history inferred from the de novo assembly of a non-model organism, the blue-eyed black lemur

Meyer WK, Venkat A, Kermany AR, van de Geijn B, Zhang S, Przeworski M

Date Published: July 22, 2015

DOI: http://dx.doi.org/10.5061/dryad.rn745

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title EfEmSangerSequencingForDryad
Downloaded 9 times
Description Sanger sequencing results for up to 16 individuals (9 blue-eyed black lemurs and 7 black lemurs from the Duke Lemur Center) sequenced at 17 amplicons throughout scaffold2503 (the scaffold containing the orthologs of HERC2/OCAs) and aligned using EBioX
Download EfEmSangerSequencingForDryad.zip (26.36 Kb)
Details View File Details
Title Variable sites VCF files
Downloaded 8 times
Description These files contain sites with a high posterior probability (> 0.8) of being polymorphic in the 8-sample dataset (4 samples from each species), along with their genotype likelihoods in these samples as estimated by ANGSD (http://popgen.dk/wiki/index.php/ANGSD). Genotype likelihoods were estimated using the GATK model (-GL 2). The probability of a site being variable was estimated using ngsStat (https://github.com/mfumagalli/ngsTools), based on posterior probabilities calculated using the genotype likelihoods and the site frequency spectrum estimated from high quality sites (minimum mapping quality 1, minimum quality 20, minimum 3 samples with data, minimum depth 9) within each species.
Download VariableSitesVCFFiles1.tar.gz (966.3 Mb)
Details View File Details
Title Variable sites VCF files
Downloaded 7 times
Description These files contain sites with a high posterior probability (> 0.8) of being polymorphic in the 8-sample dataset (4 samples from each species), along with their genotype likelihoods in these samples as estimated by ANGSD (http://popgen.dk/wiki/index.php/ANGSD). Genotype likelihoods were estimated using the GATK model (-GL 2). The probability of a site being variable was estimated using ngsStat (https://github.com/mfumagalli/ngsTools), based on posterior probabilities calculated using the genotype likelihoods and the site frequency spectrum estimated from high quality sites (minimum mapping quality 1, minimum quality 20, minimum 3 samples with data, minimum depth 9) within each species.
Download VariableSitesVCFFiles2.tar.gz (465.7 Mb)
Details View File Details
Title Eulemur flavifrons fastq file for PSMC
Downloaded 12 times
Description This fastq file was generated from the all-sites vcf file for Harlow (the E. flavifrons individual used for the de novo assembly), and was used to infer historic Ne using PSMC (Li and Durbin 2011). The vcf was generated by running GATK (McKenna et al. 2010; DePristo et al. 2011) on the filtered bam file (mapping quality 20 or above, PE reads aligned to the same scaffold with correctly oriented read pairs mapping within three standard deviations of the mean, duplicates removed), with the EMIT_ALL_SITES option and a minimum base quality of 20. The fastq was generated from this all-sites vcf using the vcf2fq function within the vcfutils.pl script, using a minimum depth of 26 and a maximum depth of 104 (0.5 and 2x the mean depth, respectively), and a minimum quality (QUAL*GQ) of 20.
Download Lemur.allLibs.mergedtoInsert_Sorted_Quake_...fq.gz (750.3 Mb)
Details View File Details
Title Eulemur macaco fastq files for PSMC
Downloaded 6 times
Description These two files should be combined into one fastq representing the whole genome. The combined fastq file was generated from the all-sites vcf file for Harmonia (the E. macaco individual used for the de novo assembly), and was used to infer historic Ne using PSMC (Li and Durbin 2011). The vcf was generated by running GATK (McKenna et al. 2010; DePristo et al. 2011) on the filtered bam file (mapping quality 10 or above, duplicates removed), with the EMIT_ALL_SITES option and a minimum base quality of 20. The fastq was generated from this all-sites vcf using a modification of the vcf2fq function within the vcfutils.pl script (vcf2fqnonref, in https://github.com/sorrywm/genome_analysis/vcfutils_mod.pl), using a minimum depth of 10 and a maximum depth of 41 (0.5 and 2x the mean depth, respectively), and a minimum quality (QUAL*GQ) of 20.
Download MergedBamsFromBWADefaultOneRound_PEandSE_s...tq.gz (866.9 Mb)
Details View File Details
Title Eulemur macaco fastq files for PSMC
Downloaded 8 times
Description These two files should be combined into one fastq representing the whole genome. The combined fastq file was generated from the all-sites vcf file for Harmonia (the E. macaco individual used for the de novo assembly), and was used to infer historic Ne using PSMC (Li and Durbin 2011). The vcf was generated by running GATK (McKenna et al. 2010; DePristo et al. 2011) on the filtered bam file (mapping quality 10 or above, duplicates removed), with the EMIT_ALL_SITES option and a minimum base quality of 20. The fastq was generated from this all-sites vcf using a modification of the vcf2fq function within the vcfutils.pl script (vcf2fqnonref, in https://github.com/sorrywm/genome_analysis/vcfutils_mod.pl), using a minimum depth of 10 and a maximum depth of 41 (0.5 and 2x the mean depth, respectively), and a minimum quality (QUAL*GQ) of 20.
Download MergedBamsFromBWADefaultOneRound_PEandSE_s...tq.gz (332.6 Mb)
Details View File Details
Title Dataset S1: Annotated transcripts in the 1% FST tail
Downloaded 13 times
Description This list contains all transcripts mapped to regions in the 1% tail of FST from the full dataset in either species. Columns represent Ensembl transcript ID, region of high FST where the transcript was annotated, start position of the transcript within the region, end position of the transcript within the region, proportion of the transcript mapped to the region, mean percent identity for parts of the transcript that mapped, Ensembl gene ID, and gene symbol. We annotated genes in the 1% tail of all 20 kb non-overlapping windows by aligning human protein sequences to the blue-eyed black lemur genome. We obtained protein sequences for human genome build hg18 and used TBLASTN version 2.2.22+ (Altschul et al. 1990, 1997), with an e-value threshold of 5 x 10-5 to identify orthologs within the regions of the blue-eyed black lemur reference genome corresponding to the 1% FST tail. We then took the list of all human proteins with hits within candidate regions and performed TBLASTN for these proteins against the entire lemur genome. We retained proteins whose best genome-wide match (containing the lowest e-value or maximum mean percent identity) for any subset of the protein sequence overlapped the candidate region. In cases in which multiple proteins mapped to the same location (>50% protein length overlapping, presumably representing multiple transcripts of the same gene or multiple genes in the same family), we retained the protein with the largest total length spanned by initial TBLASTN hits or the largest mean percent identity.
Download MeyerVenkatEtAlDatasetS1New.txt (63.74 Kb)
Details View File Details

When using this data, please cite the original publication:

Meyer WK, Venkat A, Kermany AR, van de Geijn B, Zhang S, Przeworski M (2015) Evolutionary history inferred from the de novo assembly of a non-model organism, the blue-eyed black lemur. Molecular Ecology 24(17): 4392–4405. http://dx.doi.org/10.1111/mec.13327

Additionally, please cite the Dryad data package:

Meyer WK, Venkat A, Kermany AR, van de Geijn B, Zhang S, Przeworski M (2015) Data from: Evolutionary history inferred from the de novo assembly of a non-model organism, the blue-eyed black lemur. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.rn745
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: