SUMMARY =========== Contains the custom Perl scripts of the pipeline used to demultiplex raw reads and drop a base position that was causing a lane effect. #### Repository sections: + [0Pipeline_CleanDemultiplex](./0Pipeline_CleanDemultiplex) contains custom Perl scripts used to demultiplex raw reads and the output data. + [0.1DropBase](./0.1DropBase) contains custom scripts to delete one base of the sequences reads with a sequencing error that was causing a lane effect. Output data is available at the Sequence Read Archive (SRA), accession SRP035472 0Pipeline_CleanDemultiplex --------------------------- Contains the custom Perl scripts of the pipeline used to demultiplex raw reads. The Demultiplexing pipeline uses [CleanFastq_Pipeline_queue.pl](./0Pipeline_CleanDemultiplex/CleanFastq_Pipeline_queue.pl), a Perl script by Nils Arrigo that: 1. Demultiplexes (fastq-multx), 2. Cleans reads according to PHRED scores (fastq-clean), clip out primers, 3. Clips out barcodes (default), discard reads not starting with restriction site and trims all R1 reads to the same length (R2 reads kept at varying lengths);and 4. Collects and produce quality check stats. Enzime motif was set to Sbf1. And MaxLen to 85. To modify this go inside the script lines 51 and 52. The *Berberis* data is single end and the pipeline works for pair end, therefore we needed to: copy-paste-and-rename the fastq files to make a temporaly fake P2 pair. This was done for each lane with: ``` cd ~/RAD_raw_data/LaneName mkdir fakePair cp *.fastq.gz fakePair/ cd fakePair/ rename _R1_ _R2_ *.fastq.gz cd .. cp *.fastq.gz fakePair/ ``` And the directory `fakePair` was deleted after demultiplexing was complete. Job scripts (*.job) as run in the HPCC are shown in [0Pipeline_CleanDemultiplex](./0Pipeline_CleanDemultiplex). For an example see [bsub.demultiplexingBer1.job](./0Pipeline_CleanDemultiplex/bsub.demultiplexingBer1.job). Output data was writen to [data.out](./0Pipeline_CleanDemultiplex/data.out). [Reads stats](./0Pipeline_CleanDemultiplex/data.out/ReadStatsBerL1_2_3.txt) were explored with basic analyses in R, see output in [0.nreads_look_BERL_1_2_3.html](./0Pipeline_CleanDemultiplex/data.out/0.nreads_look_BERL_1_2_3.html). 0.1DropBase ------------- Contains custom scripts to delete one base of the sequences reads with a sequencing error that was causing a lane effect. Before proceeding to run Stacks (section **1stacks** of the repository) the base 70 was deleted in all demultiplexed reads, as there is an error in the sequences of lane BERL3 that causes a lane effect. We used [DropBpMultipleFastq.pl](./DropBpMultipleFastq.pl) as in [bsub.deleting70_1.job](./DropBpMultipleFastq.pl/bsub.deleting70_1.job) to drop this position in all reads of all lanes. We run each lane in an independent job in the cluster. See *.job files in `./DropBpMultipleFastq.pl` Output data is available at the Sequence Read Archive (SRA), accession SRP035472. This are the files used to run Stacks. #### Reproducibility The analyses were performed on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia (HPCC from UEA) and in a personal computer Mac OS X 10.7.5 (11G63) Kernel Version: Darwin 11.4.2