Supplementary datasets for: Polymorphic short tandem repeats make widespread contributions to blood and serum traits
Data files
Nov 13, 2023 version files 58.91 GB
-
black_alanine_aminotransferase_str_gwas_results.filtered.tab
404 B
-
black_alanine_aminotransferase_str_gwas_results.tab.gz
24.20 MB
-
black_albumin_str_gwas_results.filtered.tab
370 B
-
black_albumin_str_gwas_results.tab.gz
23.77 MB
-
black_alkaline_phosphatase_str_gwas_results.filtered.tab
396 B
-
black_alkaline_phosphatase_str_gwas_results.tab.gz
38.45 MB
-
black_apolipoprotein_a_str_gwas_results.filtered.tab
388 B
-
black_apolipoprotein_a_str_gwas_results.tab.gz
30.29 MB
-
black_apolipoprotein_b_str_gwas_results.filtered.tab
388 B
-
black_apolipoprotein_b_str_gwas_results.tab.gz
18.09 MB
-
black_aspartate_aminotransferase_str_gwas_results.filtered.tab
408 B
-
black_aspartate_aminotransferase_str_gwas_results.tab.gz
30.53 MB
-
black_c_reactive_protein_str_gwas_results.filtered.tab
392 B
-
black_c_reactive_protein_str_gwas_results.tab.gz
24.14 MB
-
black_calcium_str_gwas_results.filtered.tab
370 B
-
black_calcium_str_gwas_results.tab.gz
22.76 MB
-
black_cholesterol_str_gwas_results.filtered.tab
378 B
-
black_cholesterol_str_gwas_results.tab.gz
19.58 MB
-
black_creatinine_str_gwas_results.filtered.tab
376 B
-
black_creatinine_str_gwas_results.tab.gz
47.83 MB
-
black_cystatin_c_str_gwas_results.filtered.tab
376 B
-
black_cystatin_c_str_gwas_results.tab.gz
41.07 MB
-
black_eosinophil_count_str_gwas_results.filtered.tab
388 B
-
black_eosinophil_count_str_gwas_results.tab.gz
42.55 MB
-
black_eosinophil_percent_str_gwas_results.filtered.tab
392 B
-
black_eosinophil_percent_str_gwas_results.tab.gz
44.76 MB
-
black_gamma_glutamyltransferase_str_gwas_results.filtered.tab
406 B
-
black_gamma_glutamyltransferase_str_gwas_results.tab.gz
35.40 MB
-
black_glucose_str_gwas_results.filtered.tab
370 B
-
black_glucose_str_gwas_results.tab.gz
9.98 MB
-
black_glycated_haemoglobin_str_gwas_results.filtered.tab
396 B
-
black_glycated_haemoglobin_str_gwas_results.tab.gz
41.80 MB
-
black_haematocrit_str_gwas_results.filtered.tab
378 B
-
black_haematocrit_str_gwas_results.tab.gz
36.59 MB
-
black_haemoglobin_concentration_str_gwas_results.filtered.tab
406 B
-
black_haemoglobin_concentration_str_gwas_results.tab.gz
37.69 MB
-
black_hdl_cholesterol_str_gwas_results.filtered.tab
386 B
-
black_hdl_cholesterol_str_gwas_results.tab.gz
34.05 MB
-
black_igf_1_str_gwas_results.filtered.tab
366 B
-
black_igf_1_str_gwas_results.tab.gz
51.25 MB
-
black_ldl_cholesterol_direct_str_gwas_results.filtered.tab
400 B
-
black_ldl_cholesterol_direct_str_gwas_results.tab.gz
16.42 MB
-
black_lymphocyte_count_str_gwas_results.filtered.tab
388 B
-
black_lymphocyte_count_str_gwas_results.tab.gz
43.20 MB
-
black_lymphocyte_percent_str_gwas_results.filtered.tab
392 B
-
black_lymphocyte_percent_str_gwas_results.tab.gz
35.77 MB
-
black_mean_corpuscular_haemoglobin_concentration_str_gwas_results.filtered.tab
440 B
-
black_mean_corpuscular_haemoglobin_concentration_str_gwas_results.tab.gz
9.55 MB
-
black_mean_corpuscular_haemoglobin_str_gwas_results.filtered.tab
412 B
-
black_mean_corpuscular_haemoglobin_str_gwas_results.tab.gz
47.56 MB
-
black_mean_corpuscular_volume_str_gwas_results.filtered.tab
402 B
-
black_mean_corpuscular_volume_str_gwas_results.tab.gz
53.73 MB
-
black_mean_platelet_volume_str_gwas_results.filtered.tab
396 B
-
black_mean_platelet_volume_str_gwas_results.tab.gz
62.08 MB
-
black_mean_sphered_cell_volume_str_gwas_results.filtered.tab
404 B
-
black_mean_sphered_cell_volume_str_gwas_results.tab.gz
44.28 MB
-
black_neutrophil_count_str_gwas_results.filtered.tab
388 B
-
black_neutrophil_count_str_gwas_results.tab.gz
36.57 MB
-
black_neutrophil_percent_str_gwas_results.filtered.tab
392 B
-
black_neutrophil_percent_str_gwas_results.tab.gz
31.86 MB
-
black_phosphate_str_gwas_results.filtered.tab
374 B
-
black_phosphate_str_gwas_results.tab.gz
17.99 MB
-
black_platelet_count_str_gwas_results.filtered.tab
384 B
-
black_platelet_count_str_gwas_results.tab.gz
65.21 MB
-
black_platelet_crit_str_gwas_results.filtered.tab
382 B
-
black_platelet_crit_str_gwas_results.tab.gz
56.48 MB
-
black_platelet_distribution_width_str_gwas_results.filtered.tab
410 B
-
black_platelet_distribution_width_str_gwas_results.tab.gz
47.66 MB
-
black_red_blood_cell_count_str_gwas_results.filtered.tab
396 B
-
black_red_blood_cell_count_str_gwas_results.tab.gz
47.18 MB
-
black_red_blood_cell_distribution_width_str_gwas_results.filtered.tab
422 B
-
black_red_blood_cell_distribution_width_str_gwas_results.tab.gz
44.42 MB
-
black_shbg_str_gwas_results.filtered.tab
364 B
-
black_shbg_str_gwas_results.tab.gz
35.77 MB
-
black_total_bilirubin_str_gwas_results.filtered.tab
386 B
-
black_total_bilirubin_str_gwas_results.tab.gz
11.98 MB
-
black_total_protein_str_gwas_results.filtered.tab
382 B
-
black_total_protein_str_gwas_results.tab.gz
28.81 MB
-
black_triglycerides_str_gwas_results.filtered.tab
382 B
-
black_triglycerides_str_gwas_results.tab.gz
29.46 MB
-
black_urate_str_gwas_results.filtered.tab
366 B
-
black_urate_str_gwas_results.tab.gz
31.10 MB
-
black_urea_str_gwas_results.filtered.tab
364 B
-
black_urea_str_gwas_results.tab.gz
19.15 MB
-
black_vitamin_d_str_gwas_results.filtered.tab
374 B
-
black_vitamin_d_str_gwas_results.tab.gz
7.21 MB
-
black_white_blood_cell_count_str_gwas_results.filtered.tab
400 B
-
black_white_blood_cell_count_str_gwas_results.tab.gz
43.46 MB
-
chinese_alanine_aminotransferase_str_gwas_results.filtered.tab
140.25 KB
-
chinese_alanine_aminotransferase_str_gwas_results.tab.gz
17.15 MB
-
chinese_albumin_str_gwas_results.filtered.tab
151.06 KB
-
chinese_albumin_str_gwas_results.tab.gz
16.74 MB
-
chinese_alkaline_phosphatase_str_gwas_results.filtered.tab
212.46 KB
-
chinese_alkaline_phosphatase_str_gwas_results.tab.gz
27.19 MB
-
chinese_apolipoprotein_a_str_gwas_results.filtered.tab
184.68 KB
-
chinese_apolipoprotein_a_str_gwas_results.tab.gz
21.28 MB
-
chinese_apolipoprotein_b_str_gwas_results.filtered.tab
98.55 KB
-
chinese_apolipoprotein_b_str_gwas_results.tab.gz
12.72 MB
-
chinese_aspartate_aminotransferase_str_gwas_results.filtered.tab
169.94 KB
-
chinese_aspartate_aminotransferase_str_gwas_results.tab.gz
21.77 MB
-
chinese_c_reactive_protein_str_gwas_results.filtered.tab
139.99 KB
-
chinese_c_reactive_protein_str_gwas_results.tab.gz
17.05 MB
-
chinese_calcium_str_gwas_results.filtered.tab
148.56 KB
-
chinese_calcium_str_gwas_results.tab.gz
16.18 MB
-
chinese_cholesterol_str_gwas_results.filtered.tab
109.86 KB
-
chinese_cholesterol_str_gwas_results.tab.gz
13.76 MB
-
chinese_creatinine_str_gwas_results.filtered.tab
240.07 KB
-
chinese_creatinine_str_gwas_results.tab.gz
33.66 MB
-
chinese_cystatin_c_str_gwas_results.filtered.tab
222.73 KB
-
chinese_cystatin_c_str_gwas_results.tab.gz
28.81 MB
-
chinese_eosinophil_count_str_gwas_results.filtered.tab
230.87 KB
-
chinese_eosinophil_count_str_gwas_results.tab.gz
30.21 MB
-
chinese_eosinophil_percent_str_gwas_results.filtered.tab
229.74 KB
-
chinese_eosinophil_percent_str_gwas_results.tab.gz
31.79 MB
-
chinese_gamma_glutamyltransferase_str_gwas_results.filtered.tab
192.64 KB
-
chinese_gamma_glutamyltransferase_str_gwas_results.tab.gz
25.21 MB
-
chinese_glucose_str_gwas_results.filtered.tab
53.79 KB
-
chinese_glucose_str_gwas_results.tab.gz
6.99 MB
-
chinese_glycated_haemoglobin_str_gwas_results.filtered.tab
235.42 KB
-
chinese_glycated_haemoglobin_str_gwas_results.tab.gz
29.45 MB
-
chinese_haematocrit_str_gwas_results.filtered.tab
208.38 KB
-
chinese_haematocrit_str_gwas_results.tab.gz
25.95 MB
-
chinese_haemoglobin_concentration_str_gwas_results.filtered.tab
213.41 KB
-
chinese_haemoglobin_concentration_str_gwas_results.tab.gz
26.77 MB
-
chinese_hdl_cholesterol_str_gwas_results.filtered.tab
215.04 KB
-
chinese_hdl_cholesterol_str_gwas_results.tab.gz
23.95 MB
-
chinese_igf_1_str_gwas_results.filtered.tab
286.08 KB
-
chinese_igf_1_str_gwas_results.tab.gz
36.04 MB
-
chinese_ldl_cholesterol_direct_str_gwas_results.filtered.tab
92.87 KB
-
chinese_ldl_cholesterol_direct_str_gwas_results.tab.gz
11.53 MB
-
chinese_lymphocyte_count_str_gwas_results.filtered.tab
233.37 KB
-
chinese_lymphocyte_count_str_gwas_results.tab.gz
30.78 MB
-
chinese_lymphocyte_percent_str_gwas_results.filtered.tab
187.66 KB
-
chinese_lymphocyte_percent_str_gwas_results.tab.gz
25.55 MB
-
chinese_mean_corpuscular_haemoglobin_concentration_str_gwas_results.filtered.tab
55.33 KB
-
chinese_mean_corpuscular_haemoglobin_concentration_str_gwas_results.tab.gz
6.84 MB
-
chinese_mean_corpuscular_haemoglobin_str_gwas_results.filtered.tab
263.31 KB
-
chinese_mean_corpuscular_haemoglobin_str_gwas_results.tab.gz
34.05 MB
-
chinese_mean_corpuscular_volume_str_gwas_results.filtered.tab
296.46 KB
-
chinese_mean_corpuscular_volume_str_gwas_results.tab.gz
38.50 MB
-
chinese_mean_platelet_volume_str_gwas_results.filtered.tab
340.37 KB
-
chinese_mean_platelet_volume_str_gwas_results.tab.gz
44.18 MB
-
chinese_mean_sphered_cell_volume_str_gwas_results.filtered.tab
257.13 KB
-
chinese_mean_sphered_cell_volume_str_gwas_results.tab.gz
31.56 MB
-
chinese_neutrophil_count_str_gwas_results.filtered.tab
179.69 KB
-
chinese_neutrophil_count_str_gwas_results.tab.gz
25.92 MB
-
chinese_neutrophil_percent_str_gwas_results.filtered.tab
170.37 KB
-
chinese_neutrophil_percent_str_gwas_results.tab.gz
22.65 MB
-
chinese_phosphate_str_gwas_results.filtered.tab
105.28 KB
-
chinese_phosphate_str_gwas_results.tab.gz
12.69 MB
-
chinese_platelet_count_str_gwas_results.filtered.tab
329.90 KB
-
chinese_platelet_count_str_gwas_results.tab.gz
46.23 MB
-
chinese_platelet_crit_str_gwas_results.filtered.tab
289.18 KB
-
chinese_platelet_crit_str_gwas_results.tab.gz
40.25 MB
-
chinese_platelet_distribution_width_str_gwas_results.filtered.tab
248.04 KB
-
chinese_platelet_distribution_width_str_gwas_results.tab.gz
34.10 MB
-
chinese_red_blood_cell_count_str_gwas_results.filtered.tab
262.49 KB
-
chinese_red_blood_cell_count_str_gwas_results.tab.gz
33.67 MB
-
chinese_red_blood_cell_distribution_width_str_gwas_results.filtered.tab
240.12 KB
-
chinese_red_blood_cell_distribution_width_str_gwas_results.tab.gz
31.58 MB
-
chinese_shbg_str_gwas_results.filtered.tab
218.47 KB
-
chinese_shbg_str_gwas_results.tab.gz
25.19 MB
-
chinese_total_bilirubin_str_gwas_results.filtered.tab
61.43 KB
-
chinese_total_bilirubin_str_gwas_results.tab.gz
8.57 MB
-
chinese_total_protein_str_gwas_results.filtered.tab
191.76 KB
-
chinese_total_protein_str_gwas_results.tab.gz
20.26 MB
-
chinese_triglycerides_str_gwas_results.filtered.tab
163.44 KB
-
chinese_triglycerides_str_gwas_results.tab.gz
20.87 MB
-
chinese_urate_str_gwas_results.filtered.tab
180.57 KB
-
chinese_urate_str_gwas_results.tab.gz
21.85 MB
-
chinese_urea_str_gwas_results.filtered.tab
107.36 KB
-
chinese_urea_str_gwas_results.tab.gz
13.58 MB
-
chinese_vitamin_d_str_gwas_results.filtered.tab
43.07 KB
-
chinese_vitamin_d_str_gwas_results.tab.gz
5.07 MB
-
chinese_white_blood_cell_count_str_gwas_results.filtered.tab
211.48 KB
-
chinese_white_blood_cell_count_str_gwas_results.tab.gz
31.06 MB
-
finemapping_alanine_aminotransferase.tgz
370.72 MB
-
finemapping_albumin.tgz
362.18 MB
-
finemapping_alkaline_phosphatase.tgz
517.21 MB
-
finemapping_apolipoprotein_a.tgz
462.59 MB
-
finemapping_apolipoprotein_b.tgz
244.86 MB
-
finemapping_aspartate_aminotransferase.tgz
460.07 MB
-
finemapping_c_reactive_protein.tgz
375.34 MB
-
finemapping_calcium.tgz
332.36 MB
-
finemapping_cholesterol.tgz
239 MB
-
finemapping_creatinine.tgz
727.81 MB
-
finemapping_cystatin_c.tgz
625.63 MB
-
finemapping_eosinophil_count.tgz
668.41 MB
-
finemapping_eosinophil_percent.tgz
672.83 MB
-
finemapping_gamma_glutamyltransferase.tgz
503.29 MB
-
finemapping_glucose.tgz
157.20 MB
-
finemapping_glycated_haemoglobin.tgz
661.70 MB
-
finemapping_haematocrit.tgz
593.32 MB
-
finemapping_haemoglobin_concentration.tgz
598.53 MB
-
finemapping_hdl_cholesterol.tgz
489.93 MB
-
finemapping_igf_1.tgz
752.02 MB
-
finemapping_ldl_cholesterol_direct.tgz
178.27 MB
-
finemapping_lymphocyte_count.tgz
680.13 MB
-
finemapping_lymphocyte_percent.tgz
563.94 MB
-
finemapping_mean_corpuscular_haemoglobin_concentration.tgz
153.97 MB
-
finemapping_mean_corpuscular_haemoglobin.tgz
728.63 MB
-
finemapping_mean_corpuscular_volume.tgz
793.63 MB
-
finemapping_mean_platelet_volume.tgz
927.75 MB
-
finemapping_mean_sphered_cell_volume.tgz
717.92 MB
-
finemapping_neutrophil_count.tgz
555.10 MB
-
finemapping_neutrophil_percent.tgz
502.47 MB
-
finemapping_phosphate.tgz
253.71 MB
-
finemapping_platelet_count.tgz
915.71 MB
-
finemapping_platelet_crit.tgz
923.41 MB
-
finemapping_platelet_distribution_width.tgz
718.04 MB
-
finemapping_red_blood_cell_count.tgz
808.48 MB
-
finemapping_red_blood_cell_distribution_width.tgz
680.56 MB
-
finemapping_shbg.tgz
501.11 MB
-
finemapping_total_bilirubin.tgz
148.55 MB
-
finemapping_total_protein.tgz
498.35 MB
-
finemapping_triglycerides.tgz
447.31 MB
-
finemapping_urate.tgz
459.04 MB
-
finemapping_urea.tgz
283.96 MB
-
finemapping_vitamin_d.tgz
98.38 MB
-
finemapping_white_blood_cell_count.tgz
695.96 MB
-
irish_alanine_aminotransferase_str_gwas_results.filtered.tab
404 B
-
irish_alanine_aminotransferase_str_gwas_results.tab.gz
22.12 MB
-
irish_albumin_str_gwas_results.filtered.tab
370 B
-
irish_albumin_str_gwas_results.tab.gz
21.70 MB
-
irish_alkaline_phosphatase_str_gwas_results.filtered.tab
396 B
-
irish_alkaline_phosphatase_str_gwas_results.tab.gz
35.25 MB
-
irish_apolipoprotein_a_str_gwas_results.filtered.tab
388 B
-
irish_apolipoprotein_a_str_gwas_results.tab.gz
27.61 MB
-
irish_apolipoprotein_b_str_gwas_results.filtered.tab
388 B
-
irish_apolipoprotein_b_str_gwas_results.tab.gz
16.51 MB
-
irish_aspartate_aminotransferase_str_gwas_results.filtered.tab
408 B
-
irish_aspartate_aminotransferase_str_gwas_results.tab.gz
27.86 MB
-
irish_c_reactive_protein_str_gwas_results.filtered.tab
392 B
-
irish_c_reactive_protein_str_gwas_results.tab.gz
22.12 MB
-
irish_calcium_str_gwas_results.filtered.tab
370 B
-
irish_calcium_str_gwas_results.tab.gz
20.80 MB
-
irish_cholesterol_str_gwas_results.filtered.tab
378 B
-
irish_cholesterol_str_gwas_results.tab.gz
17.79 MB
-
irish_creatinine_str_gwas_results.filtered.tab
376 B
-
irish_creatinine_str_gwas_results.tab.gz
43.51 MB
-
irish_cystatin_c_str_gwas_results.filtered.tab
376 B
-
irish_cystatin_c_str_gwas_results.tab.gz
37.48 MB
-
irish_eosinophil_count_str_gwas_results.filtered.tab
388 B
-
irish_eosinophil_count_str_gwas_results.tab.gz
38.77 MB
-
irish_eosinophil_percent_str_gwas_results.filtered.tab
392 B
-
irish_eosinophil_percent_str_gwas_results.tab.gz
40.81 MB
-
irish_gamma_glutamyltransferase_str_gwas_results.filtered.tab
406 B
-
irish_gamma_glutamyltransferase_str_gwas_results.tab.gz
32.53 MB
-
irish_glucose_str_gwas_results.filtered.tab
370 B
-
irish_glucose_str_gwas_results.tab.gz
9.09 MB
-
irish_glycated_haemoglobin_str_gwas_results.filtered.tab
396 B
-
irish_glycated_haemoglobin_str_gwas_results.tab.gz
38.10 MB
-
irish_haematocrit_str_gwas_results.filtered.tab
378 B
-
irish_haematocrit_str_gwas_results.tab.gz
33.34 MB
-
irish_haemoglobin_concentration_str_gwas_results.filtered.tab
406 B
-
irish_haemoglobin_concentration_str_gwas_results.tab.gz
34.36 MB
-
irish_hdl_cholesterol_str_gwas_results.filtered.tab
386 B
-
irish_hdl_cholesterol_str_gwas_results.tab.gz
31.01 MB
-
irish_igf_1_str_gwas_results.filtered.tab
366 B
-
irish_igf_1_str_gwas_results.tab.gz
46.78 MB
-
irish_ldl_cholesterol_direct_str_gwas_results.filtered.tab
400 B
-
irish_ldl_cholesterol_direct_str_gwas_results.tab.gz
14.95 MB
-
irish_lymphocyte_count_str_gwas_results.filtered.tab
388 B
-
irish_lymphocyte_count_str_gwas_results.tab.gz
39.68 MB
-
irish_lymphocyte_percent_str_gwas_results.filtered.tab
392 B
-
irish_lymphocyte_percent_str_gwas_results.tab.gz
32.91 MB
-
irish_mean_corpuscular_haemoglobin_concentration_str_gwas_results.filtered.tab
440 B
-
irish_mean_corpuscular_haemoglobin_concentration_str_gwas_results.tab.gz
8.68 MB
-
irish_mean_corpuscular_haemoglobin_str_gwas_results.filtered.tab
412 B
-
irish_mean_corpuscular_haemoglobin_str_gwas_results.tab.gz
43.25 MB
-
irish_mean_corpuscular_volume_str_gwas_results.filtered.tab
402 B
-
irish_mean_corpuscular_volume_str_gwas_results.tab.gz
49.08 MB
-
irish_mean_platelet_volume_str_gwas_results.filtered.tab
396 B
-
irish_mean_platelet_volume_str_gwas_results.tab.gz
56.82 MB
-
irish_mean_sphered_cell_volume_str_gwas_results.filtered.tab
404 B
-
irish_mean_sphered_cell_volume_str_gwas_results.tab.gz
40.46 MB
-
irish_neutrophil_count_str_gwas_results.filtered.tab
388 B
-
irish_neutrophil_count_str_gwas_results.tab.gz
33.24 MB
-
irish_neutrophil_percent_str_gwas_results.filtered.tab
392 B
-
irish_neutrophil_percent_str_gwas_results.tab.gz
29.05 MB
-
irish_phosphate_str_gwas_results.filtered.tab
374 B
-
irish_phosphate_str_gwas_results.tab.gz
16.41 MB
-
irish_platelet_count_str_gwas_results.filtered.tab
384 B
-
irish_platelet_count_str_gwas_results.tab.gz
59.41 MB
-
irish_platelet_crit_str_gwas_results.filtered.tab
382 B
-
irish_platelet_crit_str_gwas_results.tab.gz
51.68 MB
-
irish_platelet_distribution_width_str_gwas_results.filtered.tab
410 B
-
irish_platelet_distribution_width_str_gwas_results.tab.gz
43.58 MB
-
irish_red_blood_cell_count_str_gwas_results.filtered.tab
396 B
-
irish_red_blood_cell_count_str_gwas_results.tab.gz
43.11 MB
-
irish_red_blood_cell_distribution_width_str_gwas_results.filtered.tab
422 B
-
irish_red_blood_cell_distribution_width_str_gwas_results.tab.gz
40.35 MB
-
irish_shbg_str_gwas_results.filtered.tab
364 B
-
irish_shbg_str_gwas_results.tab.gz
32.65 MB
-
irish_total_bilirubin_str_gwas_results.filtered.tab
386 B
-
irish_total_bilirubin_str_gwas_results.tab.gz
10.98 MB
-
irish_total_protein_str_gwas_results.filtered.tab
382 B
-
irish_total_protein_str_gwas_results.tab.gz
26.29 MB
-
irish_triglycerides_str_gwas_results.filtered.tab
382 B
-
irish_triglycerides_str_gwas_results.tab.gz
26.98 MB
-
irish_urate_str_gwas_results.filtered.tab
366 B
-
irish_urate_str_gwas_results.tab.gz
28.36 MB
-
irish_urea_str_gwas_results.filtered.tab
364 B
-
irish_urea_str_gwas_results.tab.gz
17.56 MB
-
irish_vitamin_d_str_gwas_results.filtered.tab
374 B
-
irish_vitamin_d_str_gwas_results.tab.gz
6.55 MB
-
irish_white_blood_cell_count_str_gwas_results.filtered.tab
400 B
-
irish_white_blood_cell_count_str_gwas_results.tab.gz
39.83 MB
-
README.md
2.05 MB
-
south_asian_alanine_aminotransferase_str_gwas_results.filtered.tab
404 B
-
south_asian_alanine_aminotransferase_str_gwas_results.tab.gz
22.61 MB
-
south_asian_albumin_str_gwas_results.filtered.tab
370 B
-
south_asian_albumin_str_gwas_results.tab.gz
22.18 MB
-
south_asian_alkaline_phosphatase_str_gwas_results.filtered.tab
396 B
-
south_asian_alkaline_phosphatase_str_gwas_results.tab.gz
35.94 MB
-
south_asian_apolipoprotein_a_str_gwas_results.filtered.tab
388 B
-
south_asian_apolipoprotein_a_str_gwas_results.tab.gz
28.21 MB
-
south_asian_apolipoprotein_b_str_gwas_results.filtered.tab
388 B
-
south_asian_apolipoprotein_b_str_gwas_results.tab.gz
16.89 MB
-
south_asian_aspartate_aminotransferase_str_gwas_results.filtered.tab
408 B
-
south_asian_aspartate_aminotransferase_str_gwas_results.tab.gz
28.44 MB
-
south_asian_c_reactive_protein_str_gwas_results.filtered.tab
392 B
-
south_asian_c_reactive_protein_str_gwas_results.tab.gz
22.54 MB
-
south_asian_calcium_str_gwas_results.filtered.tab
370 B
-
south_asian_calcium_str_gwas_results.tab.gz
21.26 MB
-
south_asian_cholesterol_str_gwas_results.filtered.tab
378 B
-
south_asian_cholesterol_str_gwas_results.tab.gz
18.28 MB
-
south_asian_creatinine_str_gwas_results.filtered.tab
376 B
-
south_asian_creatinine_str_gwas_results.tab.gz
44.58 MB
-
south_asian_cystatin_c_str_gwas_results.filtered.tab
376 B
-
south_asian_cystatin_c_str_gwas_results.tab.gz
38.39 MB
-
south_asian_eosinophil_count_str_gwas_results.filtered.tab
388 B
-
south_asian_eosinophil_count_str_gwas_results.tab.gz
39.67 MB
-
south_asian_eosinophil_percent_str_gwas_results.filtered.tab
392 B
-
south_asian_eosinophil_percent_str_gwas_results.tab.gz
41.80 MB
-
south_asian_gamma_glutamyltransferase_str_gwas_results.filtered.tab
406 B
-
south_asian_gamma_glutamyltransferase_str_gwas_results.tab.gz
33.15 MB
-
south_asian_glucose_str_gwas_results.filtered.tab
370 B
-
south_asian_glucose_str_gwas_results.tab.gz
9.37 MB
-
south_asian_glycated_haemoglobin_str_gwas_results.filtered.tab
396 B
-
south_asian_glycated_haemoglobin_str_gwas_results.tab.gz
39.29 MB
-
south_asian_haematocrit_str_gwas_results.filtered.tab
378 B
-
south_asian_haematocrit_str_gwas_results.tab.gz
34.18 MB
-
south_asian_haemoglobin_concentration_str_gwas_results.filtered.tab
406 B
-
south_asian_haemoglobin_concentration_str_gwas_results.tab.gz
35.15 MB
-
south_asian_hdl_cholesterol_str_gwas_results.filtered.tab
386 B
-
south_asian_hdl_cholesterol_str_gwas_results.tab.gz
31.69 MB
-
south_asian_igf_1_str_gwas_results.filtered.tab
366 B
-
south_asian_igf_1_str_gwas_results.tab.gz
47.82 MB
-
south_asian_ldl_cholesterol_direct_str_gwas_results.filtered.tab
400 B
-
south_asian_ldl_cholesterol_direct_str_gwas_results.tab.gz
15.36 MB
-
south_asian_lymphocyte_count_str_gwas_results.filtered.tab
388 B
-
south_asian_lymphocyte_count_str_gwas_results.tab.gz
40.41 MB
-
south_asian_lymphocyte_percent_str_gwas_results.filtered.tab
392 B
-
south_asian_lymphocyte_percent_str_gwas_results.tab.gz
33.53 MB
-
south_asian_mean_corpuscular_haemoglobin_concentration_str_gwas_results.filtered.tab
440 B
-
south_asian_mean_corpuscular_haemoglobin_concentration_str_gwas_results.tab.gz
8.90 MB
-
south_asian_mean_corpuscular_haemoglobin_str_gwas_results.filtered.tab
412 B
-
south_asian_mean_corpuscular_haemoglobin_str_gwas_results.tab.gz
44.49 MB
-
south_asian_mean_corpuscular_volume_str_gwas_results.filtered.tab
402 B
-
south_asian_mean_corpuscular_volume_str_gwas_results.tab.gz
50.32 MB
-
south_asian_mean_platelet_volume_str_gwas_results.filtered.tab
396 B
-
south_asian_mean_platelet_volume_str_gwas_results.tab.gz
58.07 MB
-
south_asian_mean_sphered_cell_volume_str_gwas_results.filtered.tab
404 B
-
south_asian_mean_sphered_cell_volume_str_gwas_results.tab.gz
41.32 MB
-
south_asian_neutrophil_count_str_gwas_results.filtered.tab
388 B
-
south_asian_neutrophil_count_str_gwas_results.tab.gz
33.89 MB
-
south_asian_neutrophil_percent_str_gwas_results.filtered.tab
392 B
-
south_asian_neutrophil_percent_str_gwas_results.tab.gz
29.70 MB
-
south_asian_phosphate_str_gwas_results.filtered.tab
374 B
-
south_asian_phosphate_str_gwas_results.tab.gz
16.74 MB
-
south_asian_platelet_count_str_gwas_results.filtered.tab
384 B
-
south_asian_platelet_count_str_gwas_results.tab.gz
60.60 MB
-
south_asian_platelet_crit_str_gwas_results.filtered.tab
382 B
-
south_asian_platelet_crit_str_gwas_results.tab.gz
52.74 MB
-
south_asian_platelet_distribution_width_str_gwas_results.filtered.tab
410 B
-
south_asian_platelet_distribution_width_str_gwas_results.tab.gz
44.53 MB
-
south_asian_red_blood_cell_count_str_gwas_results.filtered.tab
396 B
-
south_asian_red_blood_cell_count_str_gwas_results.tab.gz
44.11 MB
-
south_asian_red_blood_cell_distribution_width_str_gwas_results.filtered.tab
422 B
-
south_asian_red_blood_cell_distribution_width_str_gwas_results.tab.gz
41.40 MB
-
south_asian_shbg_str_gwas_results.filtered.tab
364 B
-
south_asian_shbg_str_gwas_results.tab.gz
33.30 MB
-
south_asian_total_bilirubin_str_gwas_results.filtered.tab
386 B
-
south_asian_total_bilirubin_str_gwas_results.tab.gz
11.24 MB
-
south_asian_total_protein_str_gwas_results.filtered.tab
382 B
-
south_asian_total_protein_str_gwas_results.tab.gz
26.91 MB
-
south_asian_triglycerides_str_gwas_results.filtered.tab
382 B
-
south_asian_triglycerides_str_gwas_results.tab.gz
27.52 MB
-
south_asian_urate_str_gwas_results.filtered.tab
366 B
-
south_asian_urate_str_gwas_results.tab.gz
28.96 MB
-
south_asian_urea_str_gwas_results.filtered.tab
364 B
-
south_asian_urea_str_gwas_results.tab.gz
17.90 MB
-
south_asian_vitamin_d_str_gwas_results.filtered.tab
374 B
-
south_asian_vitamin_d_str_gwas_results.tab.gz
6.72 MB
-
south_asian_white_blood_cell_count_str_gwas_results.filtered.tab
400 B
-
south_asian_white_blood_cell_count_str_gwas_results.tab.gz
40.74 MB
-
white_british_alanine_aminotransferase_str_gwas_results.filtered.tab
404 B
-
white_british_alanine_aminotransferase_str_gwas_results.tab.gz
680.72 MB
-
white_british_albumin_str_gwas_results.filtered.tab
370 B
-
white_british_albumin_str_gwas_results.tab.gz
648.48 MB
-
white_british_alkaline_phosphatase_str_gwas_results.filtered.tab
396 B
-
white_british_alkaline_phosphatase_str_gwas_results.tab.gz
670.77 MB
-
white_british_apolipoprotein_a_str_gwas_results.filtered.tab
388 B
-
white_british_apolipoprotein_a_str_gwas_results.tab.gz
658.69 MB
-
white_british_apolipoprotein_b_str_gwas_results.filtered.tab
388 B
-
white_british_apolipoprotein_b_str_gwas_results.tab.gz
659.96 MB
-
white_british_aspartate_aminotransferase_str_gwas_results.filtered.tab
408 B
-
white_british_aspartate_aminotransferase_str_gwas_results.tab.gz
676.41 MB
-
white_british_c_reactive_protein_str_gwas_results.filtered.tab
392 B
-
white_british_c_reactive_protein_str_gwas_results.tab.gz
684.16 MB
-
white_british_calcium_str_gwas_results.filtered.tab
370 B
-
white_british_calcium_str_gwas_results.tab.gz
639.04 MB
-
white_british_cholesterol_str_gwas_results.filtered.tab
378 B
-
white_british_cholesterol_str_gwas_results.tab.gz
656.69 MB
-
white_british_creatinine_str_gwas_results.filtered.tab
376 B
-
white_british_creatinine_str_gwas_results.tab.gz
665.59 MB
-
white_british_cystatin_c_str_gwas_results.filtered.tab
376 B
-
white_british_cystatin_c_str_gwas_results.tab.gz
657.77 MB
-
white_british_eosinophil_count_str_gwas_results.filtered.tab
1.30 KB
-
white_british_eosinophil_count_str_gwas_results.tab.gz
677.07 MB
-
white_british_eosinophil_percent_str_gwas_results.filtered.tab
1.31 KB
-
white_british_eosinophil_percent_str_gwas_results.tab.gz
675.22 MB
-
white_british_gamma_glutamyltransferase_str_gwas_results.filtered.tab
406 B
-
white_british_gamma_glutamyltransferase_str_gwas_results.tab.gz
685.28 MB
-
white_british_glucose_str_gwas_results.filtered.tab
370 B
-
white_british_glucose_str_gwas_results.tab.gz
656.11 MB
-
white_british_glycated_haemoglobin_str_gwas_results.filtered.tab
396 B
-
white_british_glycated_haemoglobin_str_gwas_results.tab.gz
660.36 MB
-
white_british_haematocrit_str_gwas_results.filtered.tab
1.30 KB
-
white_british_haematocrit_str_gwas_results.tab.gz
653.85 MB
-
white_british_haemoglobin_concentration_str_gwas_results.filtered.tab
1.32 KB
-
white_british_haemoglobin_concentration_str_gwas_results.tab.gz
656.92 MB
-
white_british_hdl_cholesterol_str_gwas_results.filtered.tab
386 B
-
white_british_hdl_cholesterol_str_gwas_results.tab.gz
662.84 MB
-
white_british_igf_1_str_gwas_results.filtered.tab
366 B
-
white_british_igf_1_str_gwas_results.tab.gz
670.03 MB
-
white_british_ldl_cholesterol_direct_str_gwas_results.filtered.tab
400 B
-
white_british_ldl_cholesterol_direct_str_gwas_results.tab.gz
571.12 MB
-
white_british_lymphocyte_count_str_gwas_results.filtered.tab
1.30 KB
-
white_british_lymphocyte_count_str_gwas_results.tab.gz
671.22 MB
-
white_british_lymphocyte_percent_str_gwas_results.filtered.tab
1.31 KB
-
white_british_lymphocyte_percent_str_gwas_results.tab.gz
672.46 MB
-
white_british_mean_corpuscular_haemoglobin_concentration_str_gwas_results.filtered.tab
440 B
-
white_british_mean_corpuscular_haemoglobin_concentration_str_gwas_results.tab.gz
637.29 MB
-
white_british_mean_corpuscular_haemoglobin_str_gwas_results.filtered.tab
412 B
-
white_british_mean_corpuscular_haemoglobin_str_gwas_results.tab.gz
654.29 MB
-
white_british_mean_corpuscular_volume_str_gwas_results.filtered.tab
402 B
-
white_british_mean_corpuscular_volume_str_gwas_results.tab.gz
647.88 MB
-
white_british_mean_platelet_volume_str_gwas_results.filtered.tab
396 B
-
white_british_mean_platelet_volume_str_gwas_results.tab.gz
651.97 MB
-
white_british_mean_sphered_cell_volume_str_gwas_results.filtered.tab
404 B
-
white_british_mean_sphered_cell_volume_str_gwas_results.tab.gz
649.25 MB
-
white_british_neutrophil_count_str_gwas_results.filtered.tab
1.30 KB
-
white_british_neutrophil_count_str_gwas_results.tab.gz
661.66 MB
-
white_british_neutrophil_percent_str_gwas_results.filtered.tab
1.31 KB
-
white_british_neutrophil_percent_str_gwas_results.tab.gz
664.34 MB
-
white_british_phosphate_str_gwas_results.filtered.tab
374 B
-
white_british_phosphate_str_gwas_results.tab.gz
653.38 MB
-
white_british_platelet_count_str_gwas_results.filtered.tab
1.30 KB
-
white_british_platelet_count_str_gwas_results.tab.gz
679.61 MB
-
white_british_platelet_crit_str_gwas_results.filtered.tab
1.30 KB
-
white_british_platelet_crit_str_gwas_results.tab.gz
665.62 MB
-
white_british_platelet_distribution_width_str_gwas_results.filtered.tab
410 B
-
white_british_platelet_distribution_width_str_gwas_results.tab.gz
639.26 MB
-
white_british_red_blood_cell_count_str_gwas_results.filtered.tab
1.31 KB
-
white_british_red_blood_cell_count_str_gwas_results.tab.gz
645.26 MB
-
white_british_red_blood_cell_distribution_width_str_gwas_results.filtered.tab
422 B
-
white_british_red_blood_cell_distribution_width_str_gwas_results.tab.gz
652.76 MB
-
white_british_shbg_str_gwas_results.filtered.tab
364 B
-
white_british_shbg_str_gwas_results.tab.gz
677.77 MB
-
white_british_total_bilirubin_str_gwas_results.filtered.tab
386 B
-
white_british_total_bilirubin_str_gwas_results.tab.gz
574.01 MB
-
white_british_total_protein_str_gwas_results.filtered.tab
382 B
-
white_british_total_protein_str_gwas_results.tab.gz
645.37 MB
-
white_british_triglycerides_str_gwas_results.filtered.tab
382 B
-
white_british_triglycerides_str_gwas_results.tab.gz
674.09 MB
-
white_british_urate_str_gwas_results.filtered.tab
366 B
-
white_british_urate_str_gwas_results.tab.gz
676.93 MB
-
white_british_urea_str_gwas_results.filtered.tab
364 B
-
white_british_urea_str_gwas_results.tab.gz
659.44 MB
-
white_british_vitamin_d_str_gwas_results.filtered.tab
374 B
-
white_british_vitamin_d_str_gwas_results.tab.gz
674.81 MB
-
white_british_white_blood_cell_count_str_gwas_results.filtered.tab
1.32 KB
-
white_british_white_blood_cell_count_str_gwas_results.tab.gz
662.23 MB
-
white_other_alanine_aminotransferase_str_gwas_results.filtered.tab
404 B
-
white_other_alanine_aminotransferase_str_gwas_results.tab.gz
23.65 MB
-
white_other_albumin_str_gwas_results.filtered.tab
370 B
-
white_other_albumin_str_gwas_results.tab.gz
23.21 MB
-
white_other_alkaline_phosphatase_str_gwas_results.filtered.tab
396 B
-
white_other_alkaline_phosphatase_str_gwas_results.tab.gz
37.60 MB
-
white_other_apolipoprotein_a_str_gwas_results.filtered.tab
388 B
-
white_other_apolipoprotein_a_str_gwas_results.tab.gz
29.61 MB
-
white_other_apolipoprotein_b_str_gwas_results.filtered.tab
388 B
-
white_other_apolipoprotein_b_str_gwas_results.tab.gz
17.67 MB
-
white_other_aspartate_aminotransferase_str_gwas_results.filtered.tab
408 B
-
white_other_aspartate_aminotransferase_str_gwas_results.tab.gz
29.80 MB
-
white_other_c_reactive_protein_str_gwas_results.filtered.tab
392 B
-
white_other_c_reactive_protein_str_gwas_results.tab.gz
23.63 MB
-
white_other_calcium_str_gwas_results.filtered.tab
370 B
-
white_other_calcium_str_gwas_results.tab.gz
22.25 MB
-
white_other_cholesterol_str_gwas_results.filtered.tab
378 B
-
white_other_cholesterol_str_gwas_results.tab.gz
19.09 MB
-
white_other_creatinine_str_gwas_results.filtered.tab
376 B
-
white_other_creatinine_str_gwas_results.tab.gz
46.46 MB
-
white_other_cystatin_c_str_gwas_results.filtered.tab
376 B
-
white_other_cystatin_c_str_gwas_results.tab.gz
40.04 MB
-
white_other_eosinophil_count_str_gwas_results.filtered.tab
388 B
-
white_other_eosinophil_count_str_gwas_results.tab.gz
41.48 MB
-
white_other_eosinophil_percent_str_gwas_results.filtered.tab
392 B
-
white_other_eosinophil_percent_str_gwas_results.tab.gz
43.65 MB
-
white_other_gamma_glutamyltransferase_str_gwas_results.filtered.tab
406 B
-
white_other_gamma_glutamyltransferase_str_gwas_results.tab.gz
34.74 MB
-
white_other_glucose_str_gwas_results.filtered.tab
370 B
-
white_other_glucose_str_gwas_results.tab.gz
9.73 MB
-
white_other_glycated_haemoglobin_str_gwas_results.filtered.tab
396 B
-
white_other_glycated_haemoglobin_str_gwas_results.tab.gz
40.75 MB
-
white_other_haematocrit_str_gwas_results.filtered.tab
378 B
-
white_other_haematocrit_str_gwas_results.tab.gz
35.66 MB
-
white_other_haemoglobin_concentration_str_gwas_results.filtered.tab
406 B
-
white_other_haemoglobin_concentration_str_gwas_results.tab.gz
36.71 MB
-
white_other_hdl_cholesterol_str_gwas_results.filtered.tab
386 B
-
white_other_hdl_cholesterol_str_gwas_results.tab.gz
33.25 MB
-
white_other_igf_1_str_gwas_results.filtered.tab
366 B
-
white_other_igf_1_str_gwas_results.tab.gz
50.05 MB
-
white_other_ldl_cholesterol_direct_str_gwas_results.filtered.tab
400 B
-
white_other_ldl_cholesterol_direct_str_gwas_results.tab.gz
16.02 MB
-
white_other_lymphocyte_count_str_gwas_results.filtered.tab
388 B
-
white_other_lymphocyte_count_str_gwas_results.tab.gz
42.42 MB
-
white_other_lymphocyte_percent_str_gwas_results.filtered.tab
392 B
-
white_other_lymphocyte_percent_str_gwas_results.tab.gz
35.17 MB
-
white_other_mean_corpuscular_haemoglobin_concentration_str_gwas_results.filtered.tab
440 B
-
white_other_mean_corpuscular_haemoglobin_concentration_str_gwas_results.tab.gz
9.29 MB
-
white_other_mean_corpuscular_haemoglobin_str_gwas_results.filtered.tab
412 B
-
white_other_mean_corpuscular_haemoglobin_str_gwas_results.tab.gz
46.40 MB
-
white_other_mean_corpuscular_volume_str_gwas_results.filtered.tab
402 B
-
white_other_mean_corpuscular_volume_str_gwas_results.tab.gz
52.43 MB
-
white_other_mean_platelet_volume_str_gwas_results.filtered.tab
396 B
-
white_other_mean_platelet_volume_str_gwas_results.tab.gz
60.71 MB
-
white_other_mean_sphered_cell_volume_str_gwas_results.filtered.tab
404 B
-
white_other_mean_sphered_cell_volume_str_gwas_results.tab.gz
43.22 MB
-
white_other_neutrophil_count_str_gwas_results.filtered.tab
388 B
-
white_other_neutrophil_count_str_gwas_results.tab.gz
35.53 MB
-
white_other_neutrophil_percent_str_gwas_results.filtered.tab
392 B
-
white_other_neutrophil_percent_str_gwas_results.tab.gz
31.01 MB
-
white_other_phosphate_str_gwas_results.filtered.tab
374 B
-
white_other_phosphate_str_gwas_results.tab.gz
17.52 MB
-
white_other_platelet_count_str_gwas_results.filtered.tab
384 B
-
white_other_platelet_count_str_gwas_results.tab.gz
63.51 MB
-
white_other_platelet_crit_str_gwas_results.filtered.tab
382 B
-
white_other_platelet_crit_str_gwas_results.tab.gz
55.20 MB
-
white_other_platelet_distribution_width_str_gwas_results.filtered.tab
410 B
-
white_other_platelet_distribution_width_str_gwas_results.tab.gz
46.52 MB
-
white_other_red_blood_cell_count_str_gwas_results.filtered.tab
396 B
-
white_other_red_blood_cell_count_str_gwas_results.tab.gz
46.08 MB
-
white_other_red_blood_cell_distribution_width_str_gwas_results.filtered.tab
422 B
-
white_other_red_blood_cell_distribution_width_str_gwas_results.tab.gz
43.13 MB
-
white_other_shbg_str_gwas_results.filtered.tab
364 B
-
white_other_shbg_str_gwas_results.tab.gz
34.96 MB
-
white_other_total_bilirubin_str_gwas_results.filtered.tab
386 B
-
white_other_total_bilirubin_str_gwas_results.tab.gz
11.74 MB
-
white_other_total_protein_str_gwas_results.filtered.tab
382 B
-
white_other_total_protein_str_gwas_results.tab.gz
28.13 MB
-
white_other_triglycerides_str_gwas_results.filtered.tab
382 B
-
white_other_triglycerides_str_gwas_results.tab.gz
28.82 MB
-
white_other_urate_str_gwas_results.filtered.tab
366 B
-
white_other_urate_str_gwas_results.tab.gz
30.30 MB
-
white_other_urea_str_gwas_results.filtered.tab
364 B
-
white_other_urea_str_gwas_results.tab.gz
18.75 MB
-
white_other_vitamin_d_str_gwas_results.filtered.tab
374 B
-
white_other_vitamin_d_str_gwas_results.tab.gz
7.01 MB
-
white_other_white_blood_cell_count_str_gwas_results.filtered.tab
400 B
-
white_other_white_blood_cell_count_str_gwas_results.tab.gz
42.59 MB
Abstract
Short tandem repeats (STRs) are genomic regions consisting of repeated sequences of 1-6bp in succession. Single nucleotide polymorphism (SNP) based genome-wide association studies (GWAS) do not fully capture STR effects. To study these effects, we imputed 445,720 STRs into genotype arrays from 408,153 White British UK Biobank participants and tested for association with 44 blood phenotypes. Using two fine-mapping methods, we identify 119 candidate causal STR-trait associations and estimate that STRs account for 5.2–7.6% of causal variants identifiable from GWAS for these traits. These are among the strongest associations for multiple phenotypes, including a coding CTG repeat associated with apolipoprotein B levels, a promoter CGG repeat with platelet traits and an intronic poly-A repeat with mean platelet volume. Our study suggests that STRs make widespread contributions to complex traits, provides stringently selected candidate causal STRs, and demonstrates the need to consider a more complete view of genetic variation in GWAS.
README: Supplementary Datasets for “Polymorphic short tandem repeats make widespread contributions to blood and serum traits.”, Margoliash et al. 2023
https://doi.org/10.5061/dryad.z612jm6jk
Please see the manuscript for method details of how these datasets were produced. Alternatively, the relevant portion of the methods from the paper have been pasted into the methods section on this page, with some distortions due to the difficulty of copying mathematical formulae.
The two supplementary datasets are:
STR association tests, by population and phenotype
These files are named {population}_{phenotype}_str_gwas_results.tab.gz
Association statistics referred to in the paper are from the White British population unless otherwise specified. Association tests in the White British population were performed genome-wide, while STR association tests in the other populations were performed in fine-mapping regions identified by the White British association tests.
Populations are:
- white_british
- black
- south_asian
- chinese
- irish
- white_other
Phenotypes are written exactly as listed in Supplementary Table 1.
We filtered STRs with total minor allele dosage less than 20. This amounted to very few STRs for each population, phenotype combination. A list of those STRs (if any) are available per population and phenotype in the files named {population}_{phenotype}_str_gwas_results.filtered.tab
Association file columns:
- chromosome
- base_pair_location: beginning of the repeat (hg19, 1-indexed, inclusive)
- alleles: lengths alleles in the population measured in number of repeat units. For example, the allele 5 for an AC repeat implies the bases “ACACACACAC” (possibly with some impurity). Occasionally the repeat unit will be listed as none. This occurs when it was hard to determine the repeat unit from the period as there were multiple repeat units present in the reference allele of length equal to period and with similar frequencies. In that case, the period of the repeat will still be given, and the length of an allele in base pairs can still be calculated by multiplying the allele by the period.
- beta: measured effect size of the linear association of the rank-inverse-normalized phenotype against the length-dosages of unnormalized STR genotypes, measured in number of repeat units. Phenotypes are measured in unspecified units as they are rank-inverse-normalized, so these betas should only be compared to betas from other studies with sufficient reason to believe that such a comparison is meaningful. p-values may be more comparable between studies.
- standard_error: See caveats for beta
- allele_frequencies
- p_value: p-values less than 1e-300 exceeded our software’s numeric precision and are listed as 0
- ref_allele: measured in number of repeat units
- repeat_unit: the standardized repeat unit of this STR, or none if there was no one clear repeat unit
- period: the length of the repeat unit
- end_pos (hg19): end of the repeat (1-indexed, inclusive)
- start_pos (hg38)
- end_pos (hg38)
- n: this study worked with imputed calls and no call-level filters, as such n will be equivalent for each variant associated with the same phenotype
- number_of_common_alleles: the number of alleles in the population with frequency >= 1%
- mean_{phenotype}_per_summed_gt: the mean phenotype value for each sum of allele lengths, where each participant’s contribution to the phenotype mean for each length-sum is weighted by the imputed probability of their true genotype sum being equal to that length-sum. Can be used for plotting graphs of mean phenotype value vs summed-length. Summed gts are measured in number of summed repeat units.
- summed_0.05_significance_CI: The 95% symmetric confidence interval for each of the means above
- summed_5e-8_significance_CI: The (1 - 5e-8) symmetric confidence interval for each of the means above. I.e. this interval is expected to contain the true mean with a probability of 1 - 5e-8, which is very close to one.
- mean_{phenotype}_per_paired_gt: the mean phenotype value for each unordered pair of allele lengths, where each participant’s contribution to the phenotype mean for each pair is weighted by the imputed probability of their true genotype pair being equal to that pair. Can be used for plotting graphs of mean phenotype value vs length pair. Each gt in each pair is measured in number of repeat units.
- paired_0.05_significance_CI: The 95% symmetric confidence interval for each of the means above
- paired_5e-8_significance_CI: The (1 - 5e-8) symmetric confidence interval for each of the means above
In addition, the filtered files have the column locus_filtered
which has the same value MAC<20
for each row.
Tarballs of fine-mapping outputs by phenotype
Fine-mapping tarballs are named finemapping_{phenotype}.tgz
. Each tarball contains two summary tables: finemapping_first_pass.tab
and finemapping_followup.tab
, describing the results from standard fine-mapping runs and followup fine-mapping runs under alternate conditions, respectively. Additionally, each tarball contains all the raw fine-mapping outputs for each fine-mapping run in each trait-region for both SuSiE and FINEMAP. Note that fine-mapping was only performed in the White British population.
Summary file (finemapping_first_pass.tab
and finemapping_followup.tab
) notes and column descriptions:
- positions are hg19 and 1-indexed
- Regions are denoted by {chrom}_{start_pos_inclusive}_{end_pos_inclusive}
- variant names for SNPS are SNP_{position}_{ref}_{alt}
- variant names for STRs are STR_{start_position}
- p_val, coeff and se are from the association of that variant with the trait being fine-mapped
- susie_cs values denote which pure cs a variant is present in, or -1 if its in no pure cs
- finemap_pip and susie_alpha values were used as CP values in the paper
- susie_cs values of -1 imply that the susie_alpha value should be ignored (treated as zero)
- only variants included in fine-mapping are present in these files
- some variants lack susie values if they were not included in the susie fine-mapping (probably because they had too high a p-value) The followup files only contain variants from the regions subjected to follow-up fine-mapping conditions.
- traits with no regions that were followed-up on are empty.
- In addition to the columns in the first pass file (except for coeff or se), there are additional FINEMAP or SuSiE columns for each extra condition.
- the best_guess column corresponds to the use of best guess genotypes from imputation for fine-mapping
- the ratio columns correspond to the prior of favoring SNP over STR causality by a 4-to-1 ratio
- the repeat column corresponds to the repeat FINEMAP run with no changed settings
- the total_prob column corresponds to the FINEMAP run with a prior of there being 4 total causal variants
- the prior_std_derived and prior_std_low columns correspond to the priors for the effect sizes of causal variants of 0.05% and 0.0025%, respectively
- the conv_tol column corresponds to the flag –prob-conv-sss-tol 0.0001
- the mac column corresponds to the non-major allele dosage threshold of 100
- the p_thresh column corresponds to the p_value threshold of 1e-4
- values for the additional columns will be missing for variants which were not fine-mapped in specifically those conditions (say, a variant with p-value of 1e-3 in the 1e-4 threshold column)
Raw fine-mapping output file descriptions
This consists of files for both FINEMAP and SuSiE.
The files for FINEMAP are ( described at http://christianbenner.com/ , or screenshots of the website pasted below if the URL doesn't work):
FINEMAP_first_pass_{region}_finemap_output.log
FINEMAP_first_pass_{region}_finemap_output.snp
FINEMAP_first_pass_{region}_finemap_output.config
FINEMAP_first_pass_{region}_finemap_output.credX
and the following files for SuSiE:
SuSiE_first_pass_{region}_alpha.tab
SuSiE_first_pass_{region}_colnames.txt
SuSiE_first_pass_{region}_csX.txt
SuSiE_first_pass_{region}_lbf.tab
SuSiE_first_pass_{region}_lbf_variable.tab
SuSiE_first_pass_{region}_lfsr.tab
SuSiE_first_pass_{region}_sigma2.txt
SuSiE_first_pass_{region}_V.tab
- Except for the colnames and cs files, these are arrays written from the output fields of the susie() function described at https://stephenslab.github.io/susieR/reference/susie.html (with screenshots pasted below if the URL doesn't work)
- colnames contains one variant name per line, each line corresponding to one column of the alpha array
- csX contains three rows:
- the first contains the 1-indexed numbers of the variables included in the credible set, in ascending order
- the second contains the coverage of the credible set (Always greater than 0.9 which was the requested coverage)
- the third contains three numbers, the min, mean and median absolute correlations between each variable in the credible set (or 100 randomly subsampled variables if there are more than 100 variables)
- the 1-indexed number in the filename corresponds to the corresponding column of the alpha.tab array
Additionally, for each of the regions we followed up on, this tarball contains the same files as above, but with the following prefixes, corresponding to the follow-up fine-mapping condition being tested:
FINEMAP_derived_effect_size_prior - (effect size prior of 0.05%)
FINEMAP_low_effect_size_prior - (effect size prior of 0.0025%)
FINEMAP_mac_threshold_100 - (non-major allele dosage threshold of 100)
FINEMAP_prior_4_signals - (prior of 4 causal variants per region)
FINEMAP_prior_snps_over_strs - (prior of favoring SNP over STR causality by a 4-to-1 ratio)
FINEMAP_pval_threshold_1e4 - (p_value threshold of 1e-4)
FINEMAP_stricter_stopping_threshold - (using the flag --prob-conv-sss-tol 0.0001)
SuSiE_prior_snps_over_strs - (prior of favoring SNP over STR causality by a 4-to-1 ratio)
SuSiE_best_guess_genotypes - (the use of best guess genotypes from imputation for fine-mapping)
See the paper and Supplementary Note 3 for more details on each follow-up condition.
We do not have raw output files for FINEMAP_repeat runs.
Notes:
- For FINEMAP, I have focused on the posterior probabilities in the finemap_output.snp files
- For SuSiE I focused on the cs files with min absolute correlation > 0.8, and then the values in alpha.tab with rows identified by the CS number and columns identified by the variants in the first row of the CS file
- the log files for FINEMAP saying v1.4.1 seem to be buggy, the actual FINEMAP output to the command line indicates that v1.4.2 was run.
Sharing/Access information
This data is also accessible at https://gymreklab.com/science/2023/09/08/Margoliash-et-al-paper.html
Code/Software
The code repository used to produce these results is https://github.com/LiterallyUniqueLogin/ukbiobank_strs and is available as a repository frozen at the time of publication on Zenodo at the DOI: https://doi.org/10.5281/zenodo.8436632
FINEMAP documentation screenshots
SuSiE documentation screenshots
Methods
Please see the Cell Genomics article or biorXiv preprint for detailed methods.
Alternatively, the relevant portion of the methods from the paper have been pasted below, but lacking references and with some distortions due to the difficulty of copying mathematical formulae. For references to tables, figures, notes and the key resources table, please refer to the paper.
Selection of UK Biobank participants
We downloaded the fam file and sample file for version 2 of the phased SNP array data (referred to in the UKB documentation as the ‘haplotype’ dataset) using the ukbgene utility (ver Jan 28 2019 14:09:15 - using Glibc2.28(stable)) described in UKB Data Showcase Resource ID 664 (see Key Resources Table). The IDs from the sample file already excluded 968 individuals previously identified as having excessive principal component-adjusted SNP array heterozygosity or excessive SNP array missingness after call-level filtering indicating potential DNA contamination. We further removed withdrawn participants, indicated by non-positive IDs in the sample file as well as by IDs in email communications from the UKB access management team. After the additional filtering, data for 487,279 individuals remained.
We downloaded the sample quality control (QC) file (described in the sample QC section of UKB Data Showcase Resource ID 531 (see Key Resources Table)) from the European Genome-Phenome Archive (accession EGAF00001844707) using pyEGA3. We subsetted the non-withdrawn individuals above to the 408,870 (83.91%) participants identified as White-British by column in.white.British.ancestry.subset of the sample QC file. This field was computed by the UKB team to only include individuals whose self-reported ethnic background was White British and whose genetic principal components were not outliers compared to the other individuals in that group. In concordance with previous analyses of this cohort we additionally removed data for:
● 2 individuals with an excessive number of inferred relatives, removed due to plausible SNP array contamination (participants listed in sample QC file column excluded.from.kinship.inference that had not already been removed by the UKB team prior to phasing)
● 308 individuals whose self-reported sex did not match the genetically inferred sex, removed due to concern for sample mislabeling (participants where sample QC file columns Submitted.Gender and Inferred.Gender did not match)
● 407 additional individuals with putative sex chromosome aneuploidies removed as their genetic signals might differ significantly from the rest of the population (listed in sample QC file column putative.sex.chromosome.aneuploidy)
Following these additional filters the data for 408,153 individuals remained (99.82% of the White British individuals considered above).
SNP and indel dataset preprocessing
We obtained both phased hard-called and imputed SNP and short indel genotypes made available by the UKB. These variants were provided in reference genome hg19 coordinates, and all analyses in this study, unless otherwise specified, were performed with hg19 coordinates.
Phased hard-called genotypes: We downloaded the bgen files containing the hard-called SNP and indel haplotypes (release version 2) and the corresponding sample and fam files using the ukbgene utility (UKB Data Showcase Resource 664 (see Key Resources Table)). These variants had been genotyped using microarrays and phased using SHAPEIT3 with the 1000 genomes phase 3 reference panel. Variants genotyped on the microarray were excluded from phasing and downstream analyses if they failed QC on more than one microarray genotyping batch, had overall call-missingness rate greater than 5% or had minor allele frequency less than 0.01%. Of the resulting 658,720 variants, 99.5% were single nucleotide variants, 0.2% were short indels (average length 1.9bp, maximal length 26bp), and 0.2% were short deletions (average length 1.9bp, maximal length 29bp).
Imputed genotypes: We similarly downloaded imputed SNP data using the ukbgene utility (release version 3). Variants had been imputed with IMPUTE4 using the Haplotype Reference Consortium panel, with additional variants from the UK10K and 1000 Genomes phase 3 reference panels. The resulting imputed variants contain 93,095,623 variants, consisting of 96.0% single nucleotide variants, 1.3% short insertions (average length 2.5bp, maximum length 661bp), 2.6% short deletions (average length 3.1bp, maximum length 129bp). This set does not include the 11 classic human leukocyte antigen alleles imputed separately.
We used bgen-reader 4.0.8 to access the downloaded bgen files in python. We used plink2 v2.00a3LM (build AVX2 Intel 28 Oct 2020) to convert bgen files from both hard-called and imputed SNPs to the plink2 format for downstream analyses. For hard-called genotypes, we used plink to set the first allele to match the hg19 reference genome. Imputed genotypes already matched the reference. Unless otherwise noted, our pipeline worked with imputed genotypes as non-reference allele dosages, i.e. for each individual.
STR imputation
We previously published a reference panel containing phased haplotypes of SNP variants alongside 445,720 autosomal STR variants in 2,504 individuals from the 1000 Genomes Project (see Key Resources Table). This panel focuses on STRs ascertained to be highly polymorphic and well-imputed in European individuals. Notably, this excludes many STRs known to be implicated in repeat expansion diseases, STRs that are primarily polymorphic only in non-European populations, or STRs that are too mutable to be in strong linkage disequilibrium (LD) with nearby SNPs.
The IDs listed in the ‘str’ column of Supplemental Table 2 at that URL describe which variants in the reference panel are STRs and which are other types of variants. That produces a list of 445,715 unique variant IDs and 5 IDs which are each assigned to four separate variants in the reference panel VCFs. For the IDs with multiple assignments, we selected the variant that appeared first in the VCF and discarded the others, leaving 445,720 unique STR variants each with unique IDs.
While our analyses with these STRs were performed using hg19 coordinates unless otherwise stated, we also provide hg38 reference coordinates for these STRs in the supplemental tables. We obtained those coordinates using LiftOver which resulted in identical coordinates as in HipSTR’s hg38 STR reference panel (see Key Resources Table). All STRs successfully lifted over to hg38 coordinates.
To select shared variants for imputation, we note that 641,582 (97.4%) of SNP and indel variants that were hard-called and phased in the UKB participants were present in our SNP-STR reference panel. As a quality control step, we filtered variants that had highly discordant minor allele frequencies between the 1000 Genomes European subpopulations (see Key Resources Table) and White British individuals from the UKB. We first took a maximal unrelated set of the White British individuals (see Phenotype Methods below) and then visually inspected the alternate allele frequency of the overlapping variants (Figure S1) and chose to remove the 110 variants with an alternate allele frequency difference of more than 12%.
We used Beagle v5.1 (build 25Nov19.28d) with the tool’s provided human genetic maps (see Key Resources Table) and non-default flag ap=true to impute STRs into the remaining 641,472 SNPs and indels from the SNP-STR panel into the hard-called SNP haplotypes. Though we performed the above comparison between reference panel Europeans and UKB White British individuals, we performed this STR imputation into all UKB participants using all the individuals in the reference panel. We chose Beagle because it can handle multi-allelic loci. Due to computational constraints, we ran Beagle per chromosome on batches of 1000 participants at a time with roughly 18GB of memory. We merged the resulting VCFs across batches and extracted only the STR variants. Lastly, we added back the INFO fields present in the SNP-STR reference panel that Beagle removed during imputation.
Estimated allele frequencies (Figure 1b) were computed as follows: for each allele length for each STR, we summed the imputed probability of the STR on that chromosome to have length over both chromosomes of all unrelated participants. That sum is divided by the total number of chromosomes considered to obtain the estimated frequency of each allele.
Inferring repeat units
Each STR in the SNP-STR reference panel was previously annotated with a repeat period - the length of its repeat unit - but not the repeat unit itself. We inferred the repeat unit of each STR in the panel as follows: we considered the STR’s reference allele and given period. We then took each k-mer in the reference allele where k is the repeat period, standardized those k-mers, and took their counts. We define the standardization of a k-mer to be the sequence produced by looking at all cyclic rotations of that k-mer and choosing the first one lexicographically. For example, the standardization of the k-mer CAG would be AGC. If the most common standardized k-mer was less than twice as frequent as the second most common standardized k-mer, we did not call a repeat unit for that STR (11,962 STRs; 2.68%). Otherwise, the most common standardized k-mer was labeled as the forward-strand (based on the reference genome) repeat unit for that STR. To infer the strand-independent repeat unit for the STR we looked at all rotations of the forward-strand repeat unit in both the forward and reverse-complement directions and chose whichever comes first lexicographically. For example, the repeat unit for the STR TGTGTGTG would be AC, while the forward-strand repeat unit would be GT. In the large majority of cases the repeat unit identified by this approach is the unit which is duplicated or deleted in alternate alleles, but this method of identifying repeat units does not consider alternate alleles and so does not make that guarantee.
Phenotypes and covariates
IDs listed in this section refer to the UKB Data Showcase (see Key Resources Table).
We analyzed a total of 44 blood traits measured in the UKB. 19 phenotypes were chosen from Category Blood Count (Data Field ID 100081) and 25 from Category Blood Biochemistry (Data Field ID 17518). We refer to them as blood cell count and biomarker phenotypes respectively. The blood cell counts were measured in fresh whole blood while all the biomarkers were measured in serum except for glycated haemoglobin which was measured in packed red blood cells (details in Resource ID 5636). The phenotypes we analyzed are listed in Table S1, along with the categorical covariates specific to each phenotype that were included during association testing.
We analyzed all the blood cell count phenotypes available except for the nucleated red blood cell, basophil, monocyte, and reticulocyte phenotypes. Nucleated red blood cell percentage was omitted from our study as any value between the bounds of 0% and 2% was recorded as exactly either 0% or 2% making the data inappropriate for study as a continuous trait. Nucleated red blood cell count was omitted similarly. Basophil and monocyte phenotypes were omitted as those cells deteriorate significantly during the up-to-24-hours between blood draw and measurement. This timing likely differed consistently for different clinics, and different clinics drew from distinct within-White British ancestry groups, which could lead to confounding with true genetic effects. See Resource ID 1453 for more information. Reticulocytes were excluded from our initial pipeline. This left us with 19 blood cell count phenotypes. For each blood cell count phenotype we included the machine ID (1 of 4 possible IDs) as a categorical covariate during the association tests to account for batch effects.
Biomarker measurements were subject to censoring of values below and above the measuring machine’s reportable range (Resource IDs 1227, 2405). Table S1 includes the range limits and the number of data points censored in each direction. Five biomarkers (direct bilirubin, lipoprotein(a), oestradiol, rheumatoid factor, testosterone) were omitted from our study for having >40,000 censored measurements across the population (approximately 10% of all data), since those would require analysis with models that take censoring into account. The remaining biomarkers had less than 2,000 censored measurements. We excluded censored measurements for those biomarkers from downstream analyses as they consisted of a small number of data points. For each serum biomarker, we included aliquot number (0-3) as a categorical covariate during association testing as an additional step to mediate the dilution issue (described in Resource ID 5636). Glycated haemoglobin was not subject to the dilution issue, being measured in packed red blood cells and not serum, so no aliquot covariate was published in the UKB showcase or included in our analysis.
For each phenotype we took the subset of the 408,153 individuals above that had a measurement for that phenotype during the initial assessment visit or the first repeat assessment visit, preferentially choosing the measurement at the initial assessment when measurements were taken at both visits. We include a binary categorical covariate in association testing to distinguish between phenotypes measured at the initial assessment and those measured at the repeat assessment. Each participant’s age at their measurement’s assessment was retrieved from Data Field ID 21003.
The initial and repeat assessment visits were the only times the biomarkers were measured. The blood cell count phenotypes were additionally measured for those participants who attended the first imaging visit. We did not use those measurements and for each phenotype excluded the <200 participants whose only measurement for that phenotype was taken during the first imaging visit as we could not properly account for the batch effect of a group that small (Table S1).
No covariate values were missing. Before each association test, we checked that each category of each categorical covariate was obtained by at least 0.1% of the tested participants. We excluded the participants with covariate values not matching this criterion, as those quantities would be too small to properly account for batch effects. In practice, this meant that for each biomarker phenotype, we excluded the <100 participants that were measured using aliquot 4, and that for 8 of the biomarker phenotypes, we additionally excluded the ≤125 participants that were measured using aliquot 3 (Table S1).
For each phenotype, we then selected a maximally-sized genetically unrelated subset of the remaining individuals using PRIMUS v1.9.0. When multiple such maximal subsets existed (for instance, wherever a single individual needed to be chosen from a family of two), one subset was chosen arbitrarily, thus introducing some lack of reproducibility. Precomputed measures of genetic relatedness between participants (described in UKB paper supplement section 3.7.1) were downloaded using ukbgene (Resource ID 664). We ran PRIMUS with non-default options --no_PR -t 0.04419417382 where the t cutoff is equal to , chosen so that two individuals are considered to be related if they are relatives of third degree or closer. This left between 304,658 and 335,585 unrelated participants per phenotype (Table S1).
Genetic sex and ancestry principal components (PCs) were included as covariates for all phenotypes. Participant sex was extracted from fam file available with hard-called genotypes (see above). The top 40 ancestry PCs were extracted from the corresponding columns of the sample QC file (see the Participants Methods section above).
We then rank-inverse-normalized phenotype values for association testing. The remaining unrelated individuals for each phenotype were ranked by phenotype value from least to greatest (ties broken arbitrarily) and the phenotype value for association testing for each individual was taken to be . We use rank-inverse normalization as it is standard practice, though it does not have a strong theoretical foundation and only moderate empirical support.
For each phenotype and its remaining unrelated individuals, we standardized all covariates to have mean zero and variance one for numeric stability.
Association testing
We performed STR and SNP association testing separately. We developed associaTR to streamline performing association tests between STR length and quantitative traits. While our approach relies on a standard linear model, linear mixed models based on STR length dosages would likely result in increased power and will be considered in future studies. As our downstream analyses required STR and SNP associations to be comparable, we also used a standard linear model for SNP association testing.
We used plink2 v2.00a3LM (build AVX2 Intel 28 Oct 2020) for association testing of imputed SNPs and indels. For each analysis, plink first converts the input datasets to its pgen file format. To avoid performing this operation for every invocation of plink, we first used plink to convert the SNP and indel bgen files to pgen files a single time. We invoked plink once per chromosome per phenotype. We used the plink flag --mac 20 to filter loci with minor allele dosage less than 20. Plink calculates minor allele counts across all individuals before subsetting to individuals with a supplied phenotype, so this uniformly filtered 22,396,837 (24.1%) of the input loci from each phenotype’s association test leaving 70,698,786 SNPs and indels. Plink fit the same linear model described above in the STR associations, except that is the vector of dosages of the non-reference SNP or indel allele.
p-values calculated from association testing are two-sided.
Comparison with Pan-UKB pipeline
We compared the results of our pipeline to results available on the Pan UKBB website (see Key Resources Table) using bilirubin as an example trait. We matched variants between datasets on chromosome, position, reference and alternate alleles, excluding variants not present in both pipelines. We found our pipeline produced largely similar but somewhat less significant p-values than those reported for European participants in Pan UKBB (Figure S2).
Identifying indels which are STR alleles
Some STR variant alleles are represented both as alleles in our SNP-STR reference panel and as indel variants in the UKB imputed variants panel. We excluded the indel representations of those alleles from fine-mapping, as they represent identical variants and could confound the fine-mapping process. For each STR we constructed the following interval:
where period is the length of the repeat unit. and start and end give the coordinates of the STR in base pairs. We call an indel an STR-indel if it only represents either a deletion of base pairs from the reference or an insertion of base pairs into the reference (not both), overlaps only a single STR based on the interval above, and represents an insertion or deletion of full copies of that STR’s repeat unit. We conservatively did not mark any STR-indels for STRs whose repeat units were not called (see above) or for which the insertion or deletion was not a whole number of copies of any rotation of the repeat unit.
Fine-mapping
For each phenotype, we selected contiguous regions to fine-map in the following manner:
- Choose a variant (SNP or indel or STR) with p-value < 5e-8 not in the major histocompatibility complex (MHC) region (chr6:25e6-33.5e6).
- While there is a variant (SNP or indel or STR) with p-value < 5e-8 not in the MHC region and within 250kb of a previously chosen variant, include that variant in the region and repeat.
- This fine-mapping region is (min variant bp – 125kb, max variant bp + 125kb).
- Start again from step 1 to create another region, starting with any variant with p-value < 5e-8 not already in a fine-mapping region.
This is similar to the peak selection algorithm above but is designed to produce slightly wider regions so that we could fine-map nearby peaks jointly. We excluded the MHC because it is known to be difficult to effectively fine-map. Note that peaks within 125kb of the end of a chromosome will necessarily be smaller than the minimum 125kb width in that direction.
This produced 14,494 trait-regions. Due to computational challenges during fine-mapping (see below), we excluded three regions (urate 4:8165642-11717761, total bilirubin 12:19976272-22524428 and alkaline phosphatase 1:19430673-24309348) from downstream analyses (see below), leaving 14,491 trait-regions.
We used two fine-mapping methods to analyze each region:
SuSiE: For each fine-mapping trait-region, for each STR and SNP and indel variant in that region that was not filtered before association testing, was not an STR-indel variant (see above) and had p-value ≤ 5e-4 (chosen to reduce computational burden), we loaded the dosages for that variant from the set of participants used in association testing for that phenotype. For those regions, we also loaded the rank-inverse-normalized phenotype values and covariates corresponding to that phenotype. We separately regressed the covariates out of the phenotype values and out of each variant’s dosages and streamed the residual values to HDF5 arrays using h5py v3.6.0. We used rhdf5 v2.38.0 to load the h5 files into R. We used an R script to run SuSiE v0.11.42 on that data with non-default values min_abs_corr=0 and scaled_prior_variance=0.005. min_abs_corr=0 forced SuSiE to output all credible sets it found so that we could determine the appropriate minimum absolute correlation filter threshold in downstream analyses. We set scaled_prior_variance to 0.005 which we considered to be a more realistic guess of the per-variant percentage of signal explained than the default of 20%, although we determined that this parameter had no effect on the results (Note S3). The SuSiE results for some regions did not converge within the default number of iterations (100) or produced the default maximum number of credible sets (10) and all those credible sets seemed plausible (minimum pair-wise absolute correlation ≥ 0.2 or size ≤ 50). We reran those regions with the additional parameters L=30 (maximum number of credible sets) and max_iter=500. No regions failed to converge in under 500 iterations. We re-analyzed several loci that produced 30 plausible credible sets again with L=50. No regions produced 50 plausible credible sets. SuSiE failed to finish for two regions (urate 4:8165642-11717761, total bilirubin 12:19976272-22524428) in under 48 hours; we excluded those regions from downstream analyses. A prior version of our pipeline had applied a custom filter to some SuSiE fine-mapping runs that caused SNPs with total minor allele dosage less than 20 across the entire population to be excluded. For consistency, any regions run with that filter that produced STRs included in our confidently fine-mapped set were rerun without that filter. Results from the rerun are reported in Table S4.
SuSiE calculates credible sets for independent signals and calculates an alpha value for each variant for each signal – the probability that that variant is the causal variant in that signal. We used each variant’s highest alpha value from among credible sets with purity ≥ 0.8 as its casual probability (CP) in our downstream analyses (or zero if it was in no such credible sets). See Note S1.
FINEMAP: We selected the STR and SNP and indel variants in each fine-mapping region that were not filtered before association testing and had p-value < 0.05 (chosen to reduce computational burden). We excluded STR-indels (see above). We constructed a FINEMAP input file for each region containing the effect size of each variant and the effect size’s standard error. All MAF values were set to nan and the ref and alt columns were set to nan for STRs as this information is not required. We then took the unrelated participants for the phenotype, loaded their dosage genotypes for those variants and saved them to an HDF5 array with h5py v3.6.0. To construct the LD input file required by FINEMAP, we computed the Pearson correlation between dosages of each pair of variants. We then ran FINEMAP v1.4 with non-default options --sss --n-causal-snps 20. In regions which FINEMAP gave non-zero probability to their being 20 causal variants, we reran FINEMAP with the option –n-causal-snps 40 and used the results from the rerun. FINEMAP did not suggest 40 causal variants in any region. FINEMAP caused a core dump when running on the region alkaline phosphatase 1:19430673-24309348 so we excluded that region from downstream analyses. (For convenience, for the regions containing no STRs, we directly ran FINEMAP with --n-causal-snps 40, unless those regions contained less than 40 variants in which case we ran FINEMAP with --n-causal-snps <#variants>).
We used FINEMAP’s posterior inclusion probability (PIP) output for each variant in each region as its CP in downstream analyses.
Alternative fine-mapping conditions
We reran SuSiE and FINEMAP using alternative settings on trait-regions that contained one or more STRs with p-value < 1e-10 and CP ≥ 0.8 in both the original SuSiE and FINEMAP runs. Each new run differed from the original run in exactly one condition. We restricted our set of high-confidence fine-mapped STRs (Table S5) to those that had p-value < 1e-10 and CP ≥ 0.8 in the original runs and maintained CP ≥ 0.8 in a selected set of those alternate conditions.
For SuSiE, we evaluated using best-guess genotypes vs. genotype dosages as input. For FINEMAP, we tested varying the p-value threshold, choice of non-major allele frequency threshold, effect size prior, number of causal variants per region, and stopping threshold. Additionally, we reran FINEMAP with no changed settings to examine potential FINEMAP instability.
See Note S3 for a more detailed discussion of these various settings and their impact on fine-mapping results.
WGS validation of imputed fine-mapped STRs
We worked with WGS CRAM files for 200,025 UKB participants on the UKB Research Analysis Platform cloud solution provided by DNA Nexus. This data was aligned to reference genome hg38. HipSTR was unable to load the index files for the CRAM files of 10 participants, possibly due to file corruption. Removing those participants left us with 200,015 participants. We inadvertently truncated the participant list, leaving 200,000 participants. From that participant list, we called genotypes of the 409 STRs in Table S4 using HipSTR in batches of 500 participants, using the flag --min-reads 10 and allowing HipSTR to estimate stutter-error models from the data. We merged batches using MergeSTR. We performed call level filtering using DumpSTR with the flags --hipstr-min-call-Q 0.9 --hipstr-min-call-DP 10 --hipstr-max-call-DP 10000 --hipstr-min-supp-reads 2 --hipstr-max-call-stutter 0.15 --hipstr-max-call-flank-indel 0.1. After calling all 200,000 individuals we summarized their genotypes separately per population, noting that 166,638 individuals were in our set of QC’ed (potentially related) White British UKB participants, accounting for 40.8% of the QC’ed White British participants.
We did not apply any locus-level filters, such as Hardy-Weinberg equilibrium, to our WGS results. We report per-locus WGS call rates for QCed (potentially related) individuals in each population. We used LiftOver to lift the hg38 WGS calls to the hg19 reference genome (see Key Resources Table). To compare the WGS calls to the imputed STR calls, we used CompareSTR from TRTools branch compareSTR_upgrade using the flags --ignore-phasing --balanced-accuracy --vcf2-beagle-probabilities . We report multiple metrics at each locus, specifically concordance, the mean absolute summed-length difference, r2 and dosage r2.
We report (summed-length) per-locus concordances as Ex∈X[Prx,imp(sx,WGS)]. This metric has the advantage of being intuitive but is biased upwards for loci with a single very common allele and so should be interpreted cautiously for such loci. We also report mean absolute summed length differences as Ex∈X[Σs∈S Prx,imp(s) · |sx,WGS − s|]. This metric has similar caveats as the concordance metric. However, for highly multi-allelic loci where concordance is low, this metric can help quantify how close (or not) imputed calls are to the actual genotypes. We calculated r2 as the square of the weighted Pearson correlation between sx,WGS and s for each sample x ∈ X and all possible summed-lengths s ∈ S (so that there are |X| · |S| total values being correlated), weighting by the imputation probabilities Prx,imp(s). This correlation measure is more comparable across loci with different numbers of alleles than concordance. It has the downside of being less intuitive and of being more sensitive to the WGS-vs-imputation concordance of rare long and short alleles than the WGS-vs-imputation concordance of common average-length alleles. We report dosage r2 as the square of the Pearson correlation between sx,WGS and the dosage Σs∈S s · Prx,imp(s) for each sample x ∈ X. Dosage r2 is strictly greater than or equal to the weighted r2 measure. While the weighted r2 measure more directly measures the concordance of individual imputation probabilities with the WGS calls, the dosage r2 measure better estimates how analyses like GWAS, which condense imputed probabilities into dosages, will perform.
Lastly, at each locus, we report the frequency of each summed-length according to WGS calls, and for all samples with each WGS summed-length we report the probability that imputation concurs with that length: EX|sx,WGS =s[Prx,imp(sx,WGS)].
Replication in other populations
We separated the participants not in the White British group into population groups using the self-reported ethnicities summarized by UKB showcase data field 21000 (see Key Resources Table). This field uses UKB showcase data coding 1001. We defined the following five populations based on those codings (counts give the maximal number of unrelated QC’ed participants, ignoring per-phenotype missingness):
● Black (African and Caribbean, n=7,562, codings 4, 4001, 4002, 4003)
● South Asian (Indian, Pakistani and Bangladeshi, n=7,397, codings 3001, 3002, 3003)
● Chinese (n=1,525, coding 5)
● Irish (n=11,978, coding 1002)
● Other White (White non-Irish non-British, n=15,838, coding 1003)
Self-reported ethnicities were collected from participants at three visits (initial assessment, repeat assessment, first imaging). The above groups also exclude participants who self-reported ethnicity at more than one visit and where their answers corresponded to more than one population (after ignoring ‘prefer not to answer’ code=-3 responses). We did not include any participants who were neither in the White British population nor any of the above populations. Unlike for the determination of White British participants, genetic principal components were not used as filters for these categories.
For the association tests in these populations, we applied the same procedures for sample quality control, unrelatedness filtering, phenotype transformations, and preparing genotypes and covariates as in the White British group. The only changes in procedure were that (a) we removed categorical covariate values where there were fewer than 50 participants with that value, (in which case we also removed those participants from analysis, as that would be too few to properly control for batch effects), whereas for White British individuals we used a cutoff of 0.1% instead and (b) we also applied this cutoff to the visit of measurement categorical covariate, resulting in some association tests that excluded individuals whose first measurement of the phenotype occurred outside the initial assessment visit. See Table S9 for details.
STRs were marked as replicating in another population (Figure 2) if any of the traits confidently fine-mapped to that STR share the same direction of effect as the White British association and reached association p-value < 0.05 after multiple hypothesis correction (i.e. if there are three confidently fine-mapped traits, then an STR is marked as replicating in the Black population if any of them has association p-value < 0.05/3 = 0.0167 in the Black population).
We validated imputation STR lengths using WGS data in these populations as was done in the White British population, and report these results in Tables S4 and S5. The number of samples in our QC’ed set that had WGS data were 2,990 Black, 3,373 South Asian, 619 Chinese, 5,174 Irish and 6,428 Other White samples, all roughly 40% of their respective populations.