An unconstrained reference sequence facilitates the detection of selection. In Drosophila, sequence variation in short introns seems to be least influenced by selection and dominated by mutation and drift. Here, we test this with genome-wide sequences using an African population (Malawi) of D. melanogaster and data from the related outgroup species D. simulans, D. sechellia, D. erecta, and D. yakuba. The distribution of mutations deviates from equilibrium and the content of A and T (AT) nucleotides shows an excess of variance among introns. We explain this by a complex mutational pattern: a shift in mutational bias towards AT, leading to a slight non-equilibrium in base composition, and context-dependent mutation rates, with GC-sites mutating most frequently in AT-rich introns. By comparing the corresponding allele frequency spectra of AT-rich versus GC-rich introns, we can rule out the influence of directional selection or biased gene conversion (BGC) on the mutational pattern. Compared to neutral equilibrium expectations, polymorphism spectra show an excess of low frequency and a paucity of intermediate frequency variants, irrespective of the direction of mutation. Combining the information from different outgroups with the polymorphism data and using a generalized linear model, we find evidence for shared ancestral polymorphism between D. melanogaster and D. simulans/D. sechellia, arguing against a bottleneck in D. melanogaster. Generally, we find that short introns can be used as a neutral reference on a genome-wide level, if the spatially and temporally varying mutational pattern is accounted for.
Short intron sequence alignment of the 2L chromosome arm
We analyzed whole-genome data (Langley C. et al., accepted 2012) of D. melanogaster (Release 1.0) from a sample of six inbred Malawi isofemale lines from the 50 genomes Drosophila Population Genomics Project (DPGP) (http://www.dpgp.org/ ). We downloaded (http://genome.ucsc.edu/ ) aligned single sequences of D. simulans, D. sechellia, D. erecta and D. yakuba (Release 5) (Begun et al., 2007; Clark et al., 2007) and combined them with the six D. melanogaster sequences into a multiple alignment for all autosomes. We used position 8 to 30 in short introns (less than 66 bp) as these are thought to be the least constrained sites in the Drosophila genome (Halligan and Keightley, 2006; Parsch et al., 2010). We used the D. melanogaster Flybase annotation (release 5.31) to identify these sites. Introns, which overlapped with coding regions were excluded from the analyses. Python scripts were written to concatenate all 23 considered positions per short intron into one alignment file for each chromosome arm.
alignment_2L
Short intron sequence alignment of the 2R chromosome arm
We analyzed whole-genome data (Langley C. et al., accepted 2012) of D. melanogaster (Release 1.0) from a sample of six inbred Malawi isofemale lines from the 50 genomes Drosophila Population Genomics Project (DPGP) (http://www.dpgp.org/ ). We downloaded (http://genome.ucsc.edu/ ) aligned single sequences of D. simulans, D. sechellia, D. erecta and D. yakuba (Release 5) (Begun et al., 2007; Clark et al., 2007) and combined them with the six D. melanogaster sequences into a multiple alignment for all autosomes. We used position 8 to 30 in short introns (less than 66 bp) as these are thought to be the least constrained sites in the Drosophila genome (Halligan and Keightley, 2006; Parsch et al., 2010). We used the D. melanogaster Flybase annotation (release 5.31) to identify these sites. Introns, which overlapped with coding regions were excluded from the analyses. Python scripts were written to concatenate all 23 considered positions per short intron into one alignment file for each chromosome arm.
alignment_2R
Short intron sequence alignment of the 3L chromosome arm
We analyzed whole-genome data (Langley C. et al., accepted 2012) of D. melanogaster (Release 1.0) from a sample of six inbred Malawi isofemale lines from the 50 genomes Drosophila Population Genomics Project (DPGP) (http://www.dpgp.org/ ). We downloaded (http://genome.ucsc.edu/ ) aligned single sequences of D. simulans, D. sechellia, D. erecta and D. yakuba (Release 5) (Begun et al., 2007; Clark et al., 2007) and combined them with the five D. melanogaster sequences into a multiple alignment for all autosomes. We used position 8 to 30 in short introns (less than 66 bp) as these are thought to be the least constrained sites in the Drosophila genome (Halligan and Keightley, 2006; Parsch et al., 2010). We used the D. melanogaster Flybase annotation (release 5.31) to identify these sites. Introns, which overlapped with coding regions were excluded from the analyses. Python scripts were written to concatenate all 23 considered positions per short intron into one alignment file for each chromosome arm.
alignment_3L
Short intron sequence alignment of the 3R chromosome arm
We analyzed whole-genome data (Langley C. et al., accepted 2012) of D. melanogaster (Release 1.0) from a sample of six inbred Malawi isofemale lines from the 50 genomes Drosophila Population Genomics Project (DPGP) (http://www.dpgp.org/ ). We downloaded (http://genome.ucsc.edu/ ) aligned single sequences of D. simulans, D. sechellia, D. erecta and D. yakuba (Release 5) (Begun et al., 2007; Clark et al., 2007) and combined them with the five D. melanogaster sequences into a multiple alignment for all autosomes. We used position 8 to 30 in short introns (less than 66 bp) as these are thought to be the least constrained sites in the Drosophila genome (Halligan and Keightley, 2006; Parsch et al., 2010). We used the D. melanogaster Flybase annotation (release 5.31) to identify these sites. Introns, which overlapped with coding regions were excluded from the analyses. Python scripts were written to concatenate all 23 considered positions per short intron into one alignment file for each chromosome arm.
alignment_3R