Both SGDP and 1000G tables are organized similarly. The first columns correpsond to the genotype (0, 1 or 2) of the different individuals: - from 1 to 20 for SGDP - from 1 to 100 for 1000G Then: - Chromosome - Position - Alternative allele YRI recombination map - rec. rate - centimorgan cumulative distance - distance to exons in cM - distance to exons in bp JPT recombination map - rec. rate - distance to exons in cM - distance to exons in bp CEU recombination map - rec. rate - distance to exons in cM - distance to exons in bp - DeCode recombination rate - GERP raw score - GERP RS score - Bstat - Ancestral state - Pantro4 state - Previous site (hg19) - hg19 state - Following site (hg19) - Synonymous or Non synonymous SNP - LoF - Distance to a transcribed region ############################################################################################################################## The genotype table contains only autosomes, polymoprhic SNPs without CpG. These 3 filters were performed using awk: i=1000GPAN.genot_annot #raw genotype table with annotations awk '{if($101!="X" && $101!="Y") print $0}'< $i > $i.autosome awk '{a=0; for(i=1;i<=100;i++){a+=$i}; if(a<200){print $0}}' < $i.autosome > $i.autosome.polymorphic awk '{if(!(toupper($116)=="C" && $103=="T" && toupper($120)=="G")) print $0}' < $i.autosome.polymorphic > toto awk '{if(!(toupper($116)=="G" && $103=="A" && toupper($118)=="C")) print $0}' < toto> $i.autosome.polymorphic.noCpG.bed rm toto wc -l $i.noCpG.bed #to get the distance to transcribed regions awk '{i=$102+1; print $101"\t"$102"\t"i}' <$i.autosome.polymorphic.noCpG.bed > 1000G.position bedtools closest -d -b merged.sorted.gene_region.bed -a 1000G.position > 1000G.transcribedDistances #merged.sorted.gene_region.bed is downloaded from UCSC and provide the distance to transcribed regions in human hg19 positions cut -f1,2,3,7 1000G.transcribedDistances > toto uniq toto > tmp mv tmp 1000G.transcribedDistances cut -f4 1000G.transcribedDistances > tDistances paste $i.autosome.polymorphic.noCpG.bed tDistances > tmp rm tDistances mv tmp $i.autosome.polymorphic.noCpG.bed rm toto 1000G.position