Data from: Raw whole Drosophila genome sequence traces have contaminant sequences from bacterial symbionts

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title newcompseq
Downloaded 103 times
Description There is more information in the Pipeline Readme but here is information specific to this program: 1) Blast all the individual 454 reads from the genome to the known drosophila genome assemblies available. In my case I BLASTed each of 4 different files filled with genomic DNA from 4 different Drosophila species to the genomes of Drosophila pseudoobscura pseudoobscura and drosophila persimilis. The program I used to do this analysis is called newcompseq.plx. I wanted to run the program in parallel so I had to rename some of the temporary files that it opens. The 454 sequence reads that do not align to either one of the genomes are interesting. These interesting sequences are output to the designated file.
Download newcompseq.plx (2.741 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title bogotana
Downloaded 83 times
Description 2) I BLASTed all the interesting sequences in the output from newcompseq.plx to NCBI's database to see what these strange sequences align to. The program I used to run this was specific to the species I was analyzing. For example the strange sequences in Drosophila miranda's set of 454 reads were BLASTed to NCBI using a program called miranda.pl.
Download bogotana.pl (4.138 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title miranda
Downloaded 71 times
Description 2) I BLASTed all the interesting sequences in the output from newcompseq.plx to NCBI's database to see what these strange sequences align to. The program I used to run this was specific to the species I was analyzing. For example the strange sequences in Drosophila miranda's set of 454 reads were BLASTed to NCBI using a program called miranda.pl.
Download miranda.pl (4.138 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title per
Downloaded 84 times
Description 2) I BLASTed all the interesting sequences in the output from newcompseq.plx to NCBI's database to see what these strange sequences align to. The program I used to run this was specific to the species I was analyzing. For example the strange sequences in Drosophila miranda's set of 454 reads were BLASTed to NCBI using a program called miranda.pl.
Download per.pl (4.138 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title pseudo
Downloaded 53 times
Description 2) I BLASTed all the interesting sequences in the output from newcompseq.plx to NCBI's database to see what these strange sequences align to. The program I used to run this was specific to the species I was analyzing. For example the strange sequences in Drosophila miranda's set of 454 reads were BLASTed to NCBI using a program called miranda.pl.
Download pseudo.pl (4.138 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title BlastStat
Downloaded 62 times
Description 3) The messy BLAST output is then parsed and analyzed by a program called BlastStat.pl. This program gives some statistics on what these strange sequences aligned to. It isolates sequences that aligned to things that are not Drosophila or human and puts the output into a file ending in .stat. These strange sequences are placed into another file ending in .weird that is further analyzed. The .weird file actually was made manually by eliminating all of the useful statistics at the top of the .stat files.
Download BlastStat.pl (8.701 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title compWeird
Downloaded 57 times
Description 4) Then the .weird is used to BLAST all the weird sequences to one another to make sure there aren't duplicates within my data set. The program I wrote to do this comparison is called compWeird.plx. compWeird.plx takes longer than you think to finish, so I left myself some time to do this step. compWeird.plx produces a file ending in .single that identifies which sequences are duplicates and which are not.
Download compWeird.plx (5.046 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title dupEliminate
Downloaded 62 times
Description 5) In order to eliminate the duplicates, I used a program called dupEliminate.pl. This program finds duplicates and choses the longest sequence of that duplicate. This longest sequence is then put into a file ending in .noDupSeqs.
Download dupEliminate.pl (7.363 Kb)
Download README.txt (2.823 Kb)
Details View File Details
Title weirdStat
Downloaded 78 times
Description 6) All the species that aligned to something strange are then summarized by a program called weirdStat.pl which produces a file ending in .speciesSummary. This program looks for things like which species were found the most often and which were identified only once or twice.
Download weirdStat.pl (14.46 Kb)
Download README.txt (2.823 Kb)
Details View File Details

When using this data, please cite the original publication:

O'Connell KE, Noor MAF (2010) Raw whole Drosophila genome sequence traces have contaminant sequences from bacterial symbionts. Drosophila Information Service 93: 127-131.

Additionally, please cite the Dryad data package:

O'Connell KE, Noor MAF (2010) Data from: Raw whole Drosophila genome sequence traces have contaminant sequences from bacterial symbionts. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.8085
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: