Data from: Nucleotide variation in the Egfr locus of Drosophila melanogaster
Data files
Oct 17, 2009 version files 7.77 MB
-
EGFRalleles.zip
-
EGFRgenbank.doc
-
PRBDG_SF1.pdf
-
PRBDG_SF2.pdf
-
PRBDG_SF3.pdf
-
PRBDG_SF4.pdf
-
PRBDG_ST1.pdf
-
PRBDG_ST2.pdf
-
PRBDG_ST3.pdf
-
PRBDG_ST4.pdf
-
PRBDG_ST5.pdf
Abstract
The Epidermal growth factor receptor is an essential gene with diverse pleiotropic roles in development throughout the animal kingdom. Analysis of sequence diversity in 10.9 kb covering the complete coding region and 6.4 kb of potential regulatory regions in a sample of 250 alleles from three populations of Drosophila melanogaster suggests that the intensity of different population genetic forces varies along the locus. A total of 238 independent common SNPs and 20 indel polymorphisms were detected, with just six common replacements affecting >1475 amino acids, four of which are in the short alternate first exon. Sequence diversity is lowest in a 2-kb portion of intron 2, which is also highly conserved in comparison with D. simulans and D. pseudoobscura. Linkage disequilibrium decays to background levels within 500 bp of most sites, so haplotypes are generally restricted to up to 5 polymorphisms. The two North American samples from North Carolina and California have diverged in allele frequency at a handful of individual SNPs, but a Kenyan sample is both more divergent and more polymorphic. The effect of sample size on inference of the roles of population structure, uneven recombination, and weak selection in patterning nucleotide variation in the locus is discussed.