##first run the python script extract.all.unusual.reads.from.BWA.alignments.... ##this script uses the sam flags to id strange arrangements and lump them in categories ##it outputs an excel file for each line (input bam file) of all the ##abnormally mapped reads with mapping qual >= 29. (in bwa_read_configs) ##It also outputs a fwd and rev fastq file of all the unusually mapped read pairs. ##These can then be remapped with Novoalign python extract.all.unusual.reads.from.BWA.aligments.to.realign.with.novoalign.py #ps. this assumes the 2tb hard drive is connected. that's where I keep the bams and fastqs ##now realign these odd reads with novoalign ##first index ./novoindex -k 14 -s 1 Mim.genome.novo.index ~/Desktop/SV_validation/epcr/Mimulus_guttatus.main_genome.scaffolds.fasta ##next novoalign wants estimates of the mean insert size and sd for paired reads ##so run the following code to estimate these from 100K reads in the BWA aligment (only used proper pairs, i.e. flag & 2 == True) python find_mean_insert_sz_and_sd.py > isizes.txt ## then run novoalign on all reads, I bundled this into another python script python run.novoalign.on.all.reads.py ##this produces a bunch of sorted bams, one for each line ##now parse these bams using the same logic as the large bams from the bwa alignment ##the goal here is to figure out which ones map to the same local (+/- 50 bp (which is shorter than most reads ,and seems reasonable... how could a 75 bp read map to same locations within 50bp and cause a SV??? maybe it could!!)) python extract.all.reads.from.Novo.aligments.and.compare.with.bwa.py # this produces 2 files (SVs.confirmed.by.novoalign.txt) & (SVs.fail.in.novoalign.txt) ##finally, cluser the confirmed class of SVs!! python cluster.SVs.and.report.py ##this produces files with putative deletions, inversion, and translocation. These files are then independently fltered with ##scripts called extract.credible.inv.set.py, extract.credible.transloc.set.py, extract.credible.deletion.set.py ##these are python scripts that apply the filters described in the ms, but before running they need information about coverage ##and genomic distribution of abnormally mapped reads. This is generated by running: python get.coverage.above.map28.qual.distrib.py python make.missmapped.read.db.py python make.random.distrb.of.missmapped.reads.py #now run python extract.credible.inv.set.py python extract.credible.transloc.set.py python extract.credible.deletion.set.py ###these are the final data sets