21 January 2016 This repository contains data and code associated with the article: Potter S, Bragg JG, Bi K, Peter BM, Moritz C Phylogenomics at the tips: inferring lineages and their demographic history in a tropical lizard, Carlia amax. Molecular Ecology The file library_sample_Carlia_amax.csv contains sample information, library names, and Accessions for data on the NCBI Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/). All are in SRA Bioproject PRJNA289283, Biosamples SAMN04420533-SAMN04420579. Additionally, there are three directories, containing data and workflows: 1. data ------------------------- Sample information, locus information, and assembled sequence data: SNP/ -- directory of analyses using SNP datasets and associated data files including locus list of these datasets BenPeterSpatialExpansion/ -- directory of files required for spatial expansion analysis BFD/ -- directory contains snapp file for BFD analysis using 12 ingroup samples locus_list_2053snps.txt -- text file listing the exon names and position of 2053 SNPs used in analyses locus_list_2084snps.txt -- text file listing the exon names and position of 2084 SNPs used in analyses when an outgroup was included PCoA/ -- directory containing .ape files used in PCoA analysis, one with one snp per locus 1pl and another using all snps SNAPP/ -- directory containing files used for SNAPP analysis with and without outgroup (ae without; gae with) SpaceMix/ -- directory containing .csv files used in SpaceMix analysis, with and without English Company Island samples (ECI; noECI) Structure/ -- directory containing .struct file for input in Structure analyses SEQUENCE/ -- BPP/ directory of sequence data files for BPP analysis including text files for sample identification ref/ -- directory with reference sequences used for mapping (references.fasta) lib_sample_Camax.csv -- file containing sample ID and information, CA number and reference to SRA data ------------------------------------- 2. design ----------------------- Information on the sequence capture kit targetExons.fa -- the list of target exons used for probe design # scripts for identifying target loci can be found at Dryad Repository: doi:10.5061/dryad.34274 -- perl scripts used to identify target loci :dryad_34274/design/scripts/ ------------------------------------- 3. code -------------------- workflow/ -- folder of perl scripts used to clean raw sequence data and assembling # cleanup scripts are found at https://github.com/MVZSEQ in the SCPP folder - 2-scrubReads.pl -- perl scripts to clean raw sequence data assembly/assemble/pl -- perl scripts for sequence assembly assembly/maptotarget/pl -- perl scripts for mapping to target exons snp/ -- code for mapping reads to references.fasta and calling snps #Perl scripts were used to call dependencies including bowtie2, samtools and GATK. These perl scripts were called using shell script wrappers (names follow in parentheses): mapsnp2targets.pl -- perl script to map snp to target reads(sub_mapsnp2targets.sh) readGroups.pl -- perl script to add readGroups to BAM file (sub_readGroups.sh) gatkSNPcalltype_camax.pl -- perl script to use gatk to call SNPs(sub_gatkSNPcalltype_camax.sh) gatkProcess.pl -- perl script to process gatk snps (sub_gatkProcess.sh) inf/ -- code for manipulating snp matrices and performing inference processVCF.r -- an R script that reads a vcf file, and writes input for various inferenetial packages -- this script calls functions in snpFilterFunctions.r and requires a sample list and .vcf file asm/ #scripts for assembling a subset of sequencing libraries to create reference sequences #(similar to code provided in http://dx.doi.org/10.5061/dryad.34274, provided here for archival purposes) ------------------------------------- Scripts are provided here primarily for archival purposes. For updated versions, see: https://github.com/MVZSEQ https://github.com/jasongbragg/exon-capture-phylo https://github.com/MozesBlom/EAPhy