Porites coral species RADseq data Citation: Simmonds SE, Chou V, Cheng SH, Rachmawati R, Campulong H, Mahardika GN, Barber PH (2018) Evidence of host-associated divergence from coral-eating snails (genus Coralliophila) in the Coral Triangle. Coral Reefs DOI: 10.1007/s00338-018-1661-6 Contact: Dr. Sara Simmonds, skoch@ucla.edu Coral reads were processed using the iPyrad v0.7.17 pipeline (http://ipyrad.readthedocs.io/). Sequence data were demultiplexed, low quality base calls were filtered out, and adapter sequences removed and dereplicated. To focus on coral genes and exclude any DNA from symbiotic micro-organisms, we then mapped reads to a reference transcriptome (Porites lobata available at reefgenomics.org) using the program BWA v0.7.16. From there, highly similar reads were clustered together and aligned. Then the joint estimate of heterozygosity and sequencing error rates were calculated and used in consensus base calling. Any samples that did not sequence well were removed from further analyses (N = 11). Only loci with less than 20% missing data across taxa were kept, and reads were thinned to one Single Nucleotide Polymorphism (SNP) per locus to removed linked loci. ------- ipyrad params file (v.0.7.17)------------------------------------------- coralreference ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps /Volumes/LaCie/Coral_RADseq/SEreference ## [1] [project_dir]: Project dir (made in curdir if not present) ## [2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files /Volumes/LaCie/Coral_RADseq/data/barcodes.txt ## [3] [barcodes_path]: Location of barcodes file /Volumes/LaCie/Coral_RADseq/SEdenovo/coraltest2_fastqs/*fastq.gz ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files reference ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference) /Volumes/LaCie/Coral_RADseq/SEreference/Porites_lobata_cds_100.final.clstr.fna ## [6] [reference_sequence]: Location of reference sequence file rad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc. TGCAGG, ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2) 5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read 33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard) 6 ## [11] [mindepth_statistical]: Min depth for statistical base calling 6 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling 10000 ## [13] [maxdepth]: Max cluster depth within samples 0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly 0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes 2 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter) 35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim 2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences 5, 5 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2) 8, 8 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2) 4 ## [21] [min_samples_locus]: Min # samples per locus for output 20, 20 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2)