Data for: Distinct impact modes of polygenic disposition to dyslexia in the adult brain
Data files
Dec 06, 2024 version files 8.56 GB
-
data_s1.tar.gz
8.56 GB
-
README.md
17.55 KB
Abstract
Dyslexia is a common and partially heritable condition that impacts reading ability. In a study of up to 35,231 adults, we explored the structural brain correlates of genetic disposition to dyslexia. Individual dyslexia-disposing genetic variants showed distinct patterns of association with brain structure. Independent component analysis revealed various brain networks that each had their own genomic profiles related to dyslexia susceptibility. Circuits involved in motor coordination, vision, and language were implicated. Polygenic scores for eight traits genetically correlated with dyslexia, including cognitive, behavioural, and reading-related psychometric measures, showed partial similarities to dyslexia in terms of brain-wide associations. Notably, the microstructure of the internal capsule was consistently implicated across all of these genetic dispositions, while the lower volume of the motor cortex was more specifically associated with dyslexia genetic disposition alone. These findings reveal genetic and neurobiological features that may contribute to dyslexia and its associations with other traits at the population level.
README: Data for: Distinct impact modes of polygenic disposition to dyslexia in the adult brain
Access this dataset on Dryad: https://doi.org/10.5061/dryad.80gb5mkz6
Descriptions
This dataset was generated as part of a research study and consists solely of group-level statistics. It does not include any individual-level information, such as individual brain scans or genetic data. The underlying UK Biobank data, which were used to generate these statistical maps, were collected with ethical approval from the National Research Ethics Service Committee North West-Haydock (reference: 11/NW/0382). All procedures adhered to the guidelines of the World Medical Association guidelines. Access to the UK Biobank data was granted following approval by the UK Biobank core (application number 16066, P.I. Clyde Francks). Written informed consent was obtained from all participants and their data have been accessed following de-identification.
The dataset includes:
-Data analysis scripts
-Group-level IDPs: MRI phenotypes derived via ICA and PCA decompositions across various dimensions.
-FBA: Fixel-Based Analysis
- ICA-X:
- melodic_IC.nii.gz: 4D z-transformed component volumes at dimension X.
- PCA-324:
- melodic_pca.nii.gz: First 324 PCA components ordered by increasing explained variance along the 4th dimension.
-TBM: Tensor-Based Morphometry
- ICA-X:
- melodic_IC.nii.gz: 4D z-transformed component volumes at dimension X.
- PCA-324:
- melodic_pca.nii.gz: First 324 PCA components ordered by increasing explained variance along the 4th dimension.
Impact Modes: Decomposed impact modes from variant-wise parametric t-maps.
-FBA: Diffusion Impact Modes
- hist.svg: Histogram of variant-wise weight distribution for diffusion impact mode #10.
- melodic_mix: Variant-wise weights for diffusion impact mode #10.
- mix_plot.R: Heatmap plotting script for variant-wise impact mode weights.
- perm_rows.txt: Variant-row permutations (for visualization).
- weights.png: Visualization of variant-wise weights.
-TBM: T1 Morphometry Impact Modes
- melodic_mix: Impact mode weights.
- variant-list.txt: rsIDs of all variants in impact mode decompositions.
-PGS Maps: Brain-wide association maps of polygenic scores (PGS) for various traits.
-FBA: Fixel-Based Analysis
- Trait X:
- parametric-tstat.nii.gz: GLM t-maps for trait X PGS.
- randomise-neg_vox_corrp_tstat1.nii.gz: Nonparametric GLM voxel-wise p-values (corrected with 5000 permutations).
-TBM: T1 Tensor-Based Morphometry
- Trait X:
- parametric-tstat.nii.gz: GLM t-maps for trait X PGS.
- randomise-neg_vox_corrp_tstat1.nii.gz: Nonparametric GLM voxel-wise p-values (corrected with 5000 permutations).
-Dyslexia:
- logJac: PGS GLM analyses for dyslexia, using log-transformed Jacobians as outcomes.
- parametric-tstat.nii.gz: Dyslexia PGS GLM.
- logJac-correctedFor-VNR-education: PGS GLM analyses for dyslexia, using log-transformed Jacobians as outcomes, adjusted for fluid intelligence and education
- Multiple outputs for pooled, sex-specific, and corrected maps.
- logJac-correctedFor-VNR-education-headSize: PGS GLM analyses for dyslexia, using log-transformed Jacobians as outcomes, adjusted for fluid intelligence, education, and head size.
- rawJac: Analyses using raw Jacobians.
-Scripts
- edu_proc.sh: Educational attainment estimation.
- script-dMRI-fixel_based_analysis.sh: Diffusion MRI analysis script.
- script-pipeline.sh: Polygenic score analysis pipeline script.
Templates: Templates and masks for visualization and transformation.
- analysis_voxel_mask.mif.gz: Fiber orientation density template mask.
- MNI-transform: Warps for transforming between study and MNI spaces.
- fixels: Template fixel directory for visualization in MRtrix.
- hybrid-T1-and-ADC-template-just-for-visualisation.nii.gz: T1 and diffusion ADC combined contrast.
-Tractography
Tractography streamlines initiated from dyslexia-associated clusters.
- dentate_seed.tck
- foceps_major_seed.tck
- internal_capsule_seed.tck
- SLF_seed.tck : superior longitudinal fasciculus
-Variant Maps: association of allele dosage of genomic variants with MRI phenotypes.
-FBA: Fixel-Wise Maps
- rs-IDs:
- parametric-t-stat.nii.gz: t-values for parametric GLM of allele dosage.
- randomise-pos_vox_corrp_tstat1.nii.gz: Nonparametric t-value maps of the positive contrast
- randomise-neg_vox_corrp_tstat1.nii.gz: Nonparametric t-value maps of the negative contrast
-TBM: Voxel-Wise Maps
- rs-IDs: Similar outputs as FBA, specific to TBM.
- overlap: Spatial overlap among 35 genome-wide significant dyslexia variants.
- TBM-correctedFor-headSize: Adjusted maps for head size.
References
- C. Sudlow, J. Gallacher, N. Allen, V. Beral, P. Burton, J. Danesh, P. Downey, P. Elliott, J. Green, M. Landray, B. Liu, P. Matthews, G. Ong, J. Pell, A. Silman, A. Young, T. Sprosen, T. Peakman, R. Collins, UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779 (2015).\
- K. L. Miller, F. Alfaro-Almagro, N. K. Bangerter, D. L. Thomas, E. Yacoub, J. Xu, A. J. Bartsch, S. Jbabdi, S. N. Sotiropoulos, J. L. R. Andersson, L. Griffanti, G. Douaud, T. W. Okell, P. Weale, I. Dragonu, S. Garratt, S. Hudson, R. Collins, M. Jenkinson, P. M. Matthews, S. M. Smith, Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience 19, 1523-1536 (2016).\
- F. Alfaro-Almagro, M. Jenkinson, N. K. Bangerter, J. L. R. Andersson, L. Griffanti, G. Douaud, S. N. Sotiropoulos, S. Jbabdi, M. Hernandez-Fernandez, E. Vallee, D. Vidaurre, M. Webster, P. McCarthy, C. Rorden, A. Daducci, D. C. Alexander, H. Zhang, I. Dragonu, P. M. Matthews, K. L. Miller, S. M. Smith, Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, 400-424 (2018).\
- S. M. Smith, G. Douaud, W. Chen, T. Hanayik, F. Alfaro-Almagro, K. Sharp, L. T. Elliott, An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature Neuroscience 24, 737-745 (2021).\
- C. Bycroft, C. Freeman, D. Petkova, G. Band, L. T. Elliott, K. Sharp, A. Motyer, D. Vukcevic, O. Delaneau, J. O'Connell, A. Cortes, S. Welsh, A. Young, M. Effingham, G. McVean, S. Leslie, N. Allen, P. Donnelly, J. Marchini, The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203-209 (2018).\
- J. Ashburner, C. Good, K. J. Friston, Tensor based morphometry. NeuroImage 11, S465 (2000).\
- B. B. Avants, N. J. Tustison, G. Song, P. A. Cook, A. Klein, J. C. Gee, A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage 54, 2033-2044 (2011).\
- M. Jenkinson, S. Smith, A global optimisation method for robust affine registration of brain images. Med Image Anal 5, 143-156 (2001).\
- J. D. Tournier, R. Smith, D. Raffelt, R. Tabbara, T. Dhollander, M. Pietsch, D. Christiaens, B. Jeurissen, C.-H. Yeh, A. Connelly, MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation. NeuroImage 202, 116137 (2019).\
- T. Dhollander, A. Clemente, M. Singh, F. Boonstra, O. Civier, J. D. Duque, N. Egorova, P. Enticott, I. Fuelscher, S. Gajamange, S. Genc, E. Gottlieb, C. Hyde, P. Imms, C. Kelly, M. Kirkovski, S. Kolbe, X. Liang, A. Malhotra, R. Mito, G. Poudel, T. J. Silk, D. N. Vaughan, J. Zanin, D. Raffelt, K. Caeyenberghs, Fixel-based Analysis of Diffusion MRI: Methods, Applications, Challenges and Opportunities. Neuroimage 241, 118417 (2021).\
- D. Raffelt, J. D. Tournier, S. Rose, G. R. Ridgway, R. Henderson, S. Crozier, O. Salvado, A. Connelly, Apparent Fibre Density: a novel measure for the analysis of diffusion-weighted magnetic resonance images. Neuroimage 59, 3976-3994 (2012).\
- A. Hyvarinen, Fast and robust fixed-point algorithms for independent component analysis. IEEE transactions on Neural Networks 10, 626-634 (1999).\
- C. F. Beckmann, S. M. Smith, Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging 23, 137-152 (2004).\
- S. M. Smith, A. Hyvärinen, G. Varoquaux, K. L. Miller, C. F. Beckmann, Group-PCA for very large fMRI datasets. Neuroimage 101, 738-749 (2014).\
- C. Doust, P. Fontanillas, E. Eising, S. D. Gordon, Z. Wang, G. Alagöz, B. Molz, S. Aslibekyan, A. Auton, E. Babalola, R. K. Bell, J. Bielenberg, K. Bryc, E. Bullis, D. Coker, G. C. Partida, D. Dhamija, S. Das, S. L. Elson, T. Filshtein, K. Fletez-Brant, W. Freyman, P. M. Gandhi, K. Heilbron, B. Hicks, D. A. Hinds, E. M. Jewett, Y. Jiang, K. Kukar, K.-H. Lin, M. Lowe, J. McCreight, M. H. McIntyre, S. J. Micheletti, M. E. Moreno, J. L. Mountain, P. Nandakumar, E. S. Noblin, J. O’Connell, A. A. Petrakovitz, G. D. Poznik, M. Schumacher, A. J. Shastri, J. F. Shelton, J. Shi, S. Shringarpure, V. Tran, J. Y. Tung, X. Wang, W. Wang, C. H. Weldon, P. Wilton, A. Hernandez, C. Wong, C. T. Tchakouté, F. Abbondanza, A. G. Allegrini, T. F. M. Andlauer, C. L. Barr, M. Bernard, K. Blokland, M. Bonte, D. I. Boomsma, T. Bourgeron, D. Brandeis, M. Carreiras, F. Ceroni, V. Csépe, P. S. Dale, P. F. de Jong, J. F. Démonet, E. L. de Zeeuw, Y. Feng, M.-C. J. Franken, M. Gerritse, A. Gialluisi, S. L. Guger, M. E. Hayiou-Thomas, J. Hernández-Cabrera, J.-J. Hottenga, C. Hulme, P. R. Jansen, J. Kere, E. N. Kerr, T. Koomar, K. Landerl, G. T. Leonard, Z. Liao, M. W. Lovett, H. Lyytinen, A. Martinelli, U. Maurer, J. J. Michaelson, N. Mirza-Schreiber, K. Moll, A. T. Morgan, B. Müller-Myhsok, D. F. Newbury, M. M. Nöthen, T. Paus, Z. Pausova, C. E. Pennell, R. J. Plomin, K. M. Price, F. Ramus, S. Reilly, L. Richer, K. Rimfeld, G. Schulte-Körne, C. Y. Shapland, N. H. Simpson, M. J. Snowling, J. F. Stein, L. J. Strug, H. Tiemeier, J. B. Tomblin, D. T. Truong, E. van Bergen, M. P. van der Schroeff, M. Van Donkelaar, E. Verhoef, C. A. Wang, K. E. Watkins, A. J. O. Whitehouse, K. G. Wigg, M. Wilkinson, G. Zhu, B. S. Pourcain, C. Francks, R. E. Marioni, J. Zhao, S. Paracchini, J. B. Talcott, A. P. Monaco, J. F. Stein, J. R. Gruen, R. K. Olson, E. G. Willcutt, J. C. DeFries, B. F. Pennington, S. D. Smith, M. J. Wright, N. G. Martin, A. Auton, T. C. Bates, S. E. Fisher, M. Luciano, T. andMe Research, C. Quantitative Trait Working Group of the GenLang, Discovery of 42 genome-wide significant loci associated with dyslexia. Nature Genetics 54, 1621-1629 (2022).\
- F. Privé, J. Arbel, H. Aschard, B. J. Vilhjálmsson, Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. HGG Adv 3, 100136 (2022).\
- L. R. Lloyd-Jones, J. Zeng, J. Sidorenko, L. Yengo, G. Moser, K. E. Kemper, H. Wang, Z. Zheng, R. Magi, T. Esko, A. Metspalu, N. R. Wray, M. E. Goddard, J. Yang, P. M. Visscher, Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat Commun 10, 5086 (2019).\
- T. Ge, C. Y. Chen, Y. Ni, Y. A. Feng, J. W. Smoller, Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019).\
- C. F. Beckmann, C. E. Mackay, N. Filippini, S. M. Smith, Group comparison of resting-state FMRI data using multi-subject ICA and dual regression. NeuroImage 47, S148 (2009).\
- A. M. Winkler, G. R. Ridgway, M. A. Webster, S. M. Smith, T. E. Nichols, Permutation inference for the general linear model. Neuroimage 92, 381-397 (2014).\
- J. D. Tournier, F. Calamante, A. Connelly, in Proceedings of the international society for magnetic resonance in medicine. (John Wiley & Sons, Inc, New Jersey, 2010), vol. 1670.\
- L. M. Oblong, S. Soheili-Nezhad, N. Trevisan, Y. Shi, C. F. Beckmann, E. Sprooten, Principal and independent genomic components of brain structure and function. Genes, Brain and Behavior 23, e12876 (2024).\
- B. K. Bulik-Sullivan, P. R. Loh, H. K. Finucane, S. Ripke, J. Yang, N. Patterson, M. J. Daly, A. L. Price, B. M. Neale, LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015).\
- B. Bulik-Sullivan, H. K. Finucane, V. Anttila, A. Gusev, F. R. Day, P. R. Loh, L. Duncan, J. R. Perry, N. Patterson, E. B. Robinson, M. J. Daly, A. L. Price, B. M. Neale, An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236-1241 (2015).\
- D. Demontis, G. B. Walters, G. Athanasiadis, R. Walters, K. Therrien, T. T. Nielsen, L. Farajzadeh, G. Voloudakis, J. Bendl, B. Zeng, W. Zhang, J. Grove, T. D. Als, J. Duan, F. K. Satterstrom, J. Bybjerg-Grauholm, M. Bækved-Hansen, O. O. Gudmundsson, S. H. Magnusson, G. Baldursson, K. Davidsdottir, G. S. Haraldsdottir, E. Agerbo, G. E. Hoffman, S. Dalsgaard, J. Martin, M. Ribasés, D. I. Boomsma, M. Soler Artigas, N. Roth Mota, D. Howrigan, S. E. Medland, T. Zayats, V. M. Rajagopal, A. Havdahl, A. Doyle, A. Reif, A. Thapar, B. Cormand, C. Liao, C. Burton, C. H. D. Bau, D. L. Rovaris, E. Sonuga-Barke, E. Corfield, E. H. Grevet, H. Larsson, I. R. Gizer, I. Waldman, I. Brikell, J. Haavik, J. Crosbie, J. McGough, J. Kuntsi, J. Glessner, K. Langley, K.-P. Lesch, L. A. Rohde, M. H. Hutz, M. Klein, M. Bellgrove, M. Tesli, M. C. O’Donovan, O. A. Andreassen, P. W. L. Leung, P. M. Pan, R. Joober, R. Schachar, S. Loo, S. H. Witt, T. Reichborn-Kjennerud, T. Banaschewski, Z. Hawi, M. J. Daly, O. Mors, M. Nordentoft, O. Mors, D. M. Hougaard, P. B. Mortensen, M. J. Daly, S. V. Faraone, H. Stefansson, P. Roussos, B. Franke, T. Werge, B. M. Neale, K. Stefansson, A. D. Børglum, A. W. G. o. t. P. G. Consortium, P.-B. C. i, Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains. Nature Genetics 55, 198-208 (2023).\
- V. M. Rajagopal, A. Ganna, J. R. I. Coleman, A. Allegrini, G. Voloudakis, J. Grove, T. D. Als, H. T. Horsdal, L. Petersen, V. Appadurai, A. Schork, A. Buil, C. M. Bulik, J. Bybjerg-Grauholm, M. Bækvad-Hansen, D. M. Hougaard, O. Mors, M. Nordentoft, T. Werge, R. Belliveau, C. E. Carey, F. Cerrato, K. Chambert, C. Churchhouse, M. J. Daly, A. Dumont, J. Goldstein, C. S. Hansen, D. P. Howrigan, H. Huang, J. Maller, A. R. Martin, J. Martin, M. Mattheisen, J. Moran, B. M. Neale, J. Pallesen, D. S. Palmer, C. B. Pedersen, M. G. Pedersen, T. Poterba, S. Ripke, F. K. Satterstrom, W. K. Thompson, P. Turley, R. K. Walters, P. B. Mortensen, G. Breen, P. Roussos, R. Plomin, E. Agerbo, A. D. Børglum, D. Demontis, P.-B. C. i, Genome-wide association study of school grades identifies genetic overlap between language ability, psychopathology and creativity. Scientific Reports 13, 429 (2023).\
- E. Eising, N. Mirza-Schreiber, E. L. de Zeeuw, C. A. Wang, D. T. Truong, A. G. Allegrini, C. Y. Shapland, G. Zhu, K. G. Wigg, M. L. Gerritse, B. Molz, G. Alagöz, A. Gialluisi, F. Abbondanza, K. Rimfeld, M. van Donkelaar, Z. Liao, P. R. Jansen, T. F. M. Andlauer, T. C. Bates, M. Bernard, K. Blokland, M. Bonte, A. D. Børglum, T. Bourgeron, D. Brandeis, F. Ceroni, V. Csépe, P. S. Dale, P. F. de Jong, J. C. DeFries, J. F. Démonet, D. Demontis, Y. Feng, S. D. Gordon, S. L. Guger, M. E. Hayiou-Thomas, J. A. Hernández-Cabrera, J. J. Hottenga, C. Hulme, J. Kere, E. N. Kerr, T. Koomar, K. Landerl, G. T. Leonard, M. W. Lovett, H. Lyytinen, N. G. Martin, A. Martinelli, U. Maurer, J. J. Michaelson, K. Moll, A. P. Monaco, A. T. Morgan, M. M. Nöthen, Z. Pausova, C. E. Pennell, B. F. Pennington, K. M. Price, V. M. Rajagopal, F. Ramus, L. Richer, N. H. Simpson, S. D. Smith, M. J. Snowling, J. Stein, L. J. Strug, J. B. Talcott, H. Tiemeier, M. P. van der Schroeff, E. Verhoef, K. E. Watkins, M. Wilkinson, M. J. Wright, C. L. Barr, D. I. Boomsma, M. Carreiras, M. J. Franken, J. R. Gruen, M. Luciano, B. Müller-Myhsok, D. F. Newbury, R. K. Olson, S. Paracchini, T. Paus, R. Plomin, S. Reilly, G. Schulte-Körne, J. B. Tomblin, E. van Bergen, A. J. O. Whitehouse, E. G. Willcutt, B. St Pourcain, C. Francks, S. E. Fisher, Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people. Proc Natl Acad Sci U S A 119, e2202764119 (2022).\
- C. M. Williams, H. Peyre, R. Toro, F. Ramus, Sex differences in the brain are not reduced to differences in body size. Neuroscience and Biobehavioral Reviews 130, 509-511 (2021).\
- A. Okbay, Y. Wu, N. Wang, H. Jayashankar, M. Bennett, S. M. Nehzati, J. Sidorenko, H. Kweon, G. Goldman, T. Gjorgjieva, Y. Jiang, B. Hicks, C. Tian, D. A. Hinds, R. Ahlskog, P. K. E. Magnusson, S. Oskarsson, C. Hayward, A. Campbell, D. J. Porteous, J. Freese, P. Herd, C. Watson, J. Jala, D. Conley, P. D. Koellinger, M. Johannesson, D. Laibson, M. N. Meyer, J. J. Lee, A. Kong, L. Yengo, D. Cesarini, P. Turley, P. M. Visscher, J. P. Beauchamp, D. J. Benjamin, A. I. Young, Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat Genet 54, 437-449 (2022).
Code/Software
MRtrix3 'mrview' software is required for visualising fixel-wise data and the fiber orientation density template (.mif.gz)
Any NIFTI brain volume viewer can visualise the NIFTI files, e.g., mrview, FSLeyes.
Methods
Experimental Design
UK Biobank data were accessed following approval of application number 16066, P.I. Clyde Francks. UK Biobank is an in-depth investigation of more than 500,000 volunteers in the UK who are assessed for health, lifestyle, genomics, and many other variables (1). Multimodal brain MRI data (2, 3) had also been released for approximately 10% of the individuals when the present study was initiated in 2022 (4). The UK Biobank received ethical approval from the National Research Ethics Service Committee North West-Haydock (reference 11/NW/0382), and all of their procedures were performed in accordance with the World Medical Association guidelines. Written informed consent was provided by all of the enrolled participants. Genotyping has been performed using either BiLEVE Axiom or Axiom arrays from Affymetrix, which target highly overlapping sets of ~800,000 genomic variants with more than 95% similarity (5). The UK Biobank has also released common genome-wide variants imputed to the Haplotype Reference Consortium and UK10K haplotypes (5). In this study, we focused on participants who also underwent brain MRI at one of the four imaging sites and for whom at least one usable T1-weighted and/or diffusion MRI (dMRI) scan had been produced (see the next section). The genetic analyses were focused on the largest ancestry group within this cohort, recorded as ‘white British’ using a combination of self-report and genomic principal component analysis (this group constitutes ~85% of the overall dataset: data field #22006). Pairs of genetically related subjects with kinship coefficients above 0.044 were identified in the target sample (70). Individuals related to the largest number of others were recursively removed until no two individuals were related at or above this kinship threshold, leaving 35,231 individuals (18,363 females). The resulting sample encompassed individuals aged from 45 to 82 years, with a mean age of 64.2 years and a standard deviation of 7.7 years. We then included bi-allelic genetic variants with minor allele frequency >= 0.01, imputation quality score of higher than 0.7, and Hardy-Weinberg equilibrium P-value of greater than 10-7, yielding 8,366,177 autosomal single nucleotide variants (SNVs) and 1,092,696 short insertion-deletions (indels). We accessed minimally processed and brain-extracted T1-weighted brain MRI volumes of 42,798 individuals (2, 3) for tensor-based morphometry using symmetric image normalization (SyN) registration (6, 7). For the present study, we generated a study-specific average brain template in a randomly chosen subset of 1000 individuals. The template was generated through 11 consecutive Advanced Normalization Tools (ANTs v2.3.5) registrations that iteratively refined the template shape using rigid, affine, and diffeomorphic SyN transformations at incremental resolutions up to native resolution (i.e. 1 mm3).
Statistical analysis
Structural MRI: tensor-based morphometry
Thereafter, all individuals’ original T1-weighted brain volumes were histogram matched, winsorized at 1-99 percentiles, and non-linearly registered to our study-specific template using SyN. Registration parameters included a variance for a total field of three, a variance for an update field of zero, a resolution downsampling scheme of 6×, 4×, 2×, and 1× (i.e. full resolution), and Gaussian smoothing at standard deviations of 4, 2, 1 and zero voxels. A cross-correlation metric with a radius of four voxels was used. The affine registration matrix was composed with the SyN deformation field and the final warps were subsequently converted to Jacobian determinant maps, which encode the amount of regional brain tissue ‘shrinkage’ or ‘expansion’ in the brain of each individual as compared to our study-specific, average T1 template. ANTs affine registrations failed in 2,098 individuals; instead of removing them, we opted to use a comparable linear registration method, FSL Flirt (8) to initialize SyN, while controlling for a potential batch effect in subsequent analyses as a binary covariate.
Diffusion MRI: data preprocessing and fixel-based analysis
We retrieved minimally-preprocessed dMRI volumes of 37,930 subjects from the UK Biobank (2, 3). These data have been collected at 2 mm3 isotropic resolution across 100 different diffusion-encoding directions evenly distributed on two spherical shells at b-values of 1000 and 2000 s/mm2, as well as eight blip-reversed b≅0 volumes. Diffusion images have been corrected for off-resonance warps, gradient non-linearity, Eddy currents, and head motion by the UK Biobank team (2, 3). For the present study, we reran these corrections on raw data for a first batch of 8,247 individuals whose corrected b-vector tables were not available, while accounting for a potential batch effect in the subsequent regression model fits through the use of a binary covariate. After data preprocessing, we constructed a study-specific fiber orientation density (FOD) template using MRTrix3 v3.0.3 (9) from a random subset of 890 individuals who passed registration quality control by visual inspection out of 1000. This procedure started with N4 bias field correction and intensity normalisation of the preprocessed diffusion volumes, and estimation of the average tract response function (10). Thereafter, spherical deconvolution was performed using the estimated response function to generate subject-wide FOD volumes. These volumes were subsequently non-linearly registered to a common space and an average FOD template was generated iteratively. The FOD template was then ‘fixelated’ to identify the principal directions of white-matter tracts in each voxel. The same procedures were repeated in all 37,930 individuals to generate FOD volumes, which were then registered to the study-specific FOD template (9). FOD registrations passed quality control in 37,884 individuals following a visual inspection of each individual’s template-transformed zeroth-order harmonic map, representing average isotropic diffusion in each voxel. FOD volumes were segmented to obtain fixel-wise readouts, which were then transformed, rotated, and corresponded to the template’s fixel-wise space (9). We considered apparent fiber density (AFD) readouts as a measure of white-matter microstructure for subsequent analyses (11). In combination with genetic data, the sample available was 31,695 adult individuals (16,198 female).
Optimizing polygenic scoring
We first concatenated the voxel-wise Jacobian and fixel-wise AFD maps across all individuals and then applied MELODIC independent component analysis (12, 13) to extract imaging-derived phenotypes (IDPs). MELODIC was performed separately per each imaging modality and at various dimensions to extract IDPs at incremental levels of spatial detail, following a geometric series corresponding with dimensions 11, 18, 29, 47, 76, 124, 200, and 324. Due to the large size of this data matrix (6.2×1010 voxels in structural MRI), we used 8,000 internal eigenmaps for independent source decomposition (14). In addition, principal component analysis was performed on the same data, and the first 324 principal components were extracted as additional IDPs. Altogether, a total of 1,153 IDPs were extracted from voxel-wise Jacobian maps and an equal number of IDPs from the fixel-wise AFD data. These IDPs were derived for the purpose of optimizing our polygenic scoring, but they were not used for our voxel- or fixel-based imaging genetic analyses, nor our impact mode analysis, which forms the bulk of the findings in this study. We used summary statistics from the largest genome-wide association study (GWAS) of dyslexia that has been performed to date, carried out by 23andMe, Inc. (15). This GWAS was based on 51,800 individuals of European ancestry who answered ‘Yes’ to the question ‘Have you been diagnosed with dyslexia?’, and 1,087,070 control individuals who answered ‘No’. The SNP-wise effect sizes from this GWAS were then applied to the genotype data of UK Biobank individuals, to estimate the polygenic disposition of each UK Biobank individual to dyslexia based on the combined effects of their autosome-wide genetic variants. Our primary approach for polygenic scoring was based on the Lassosum2 model (16). We observed a strong correlation between Lassosum2 PGS and two automated PGS methods, SBayesR (17) and PRS-CSauto (18). Lassosum2 generally explained the highest proportion of variance in brain IDPs (Supplementary Fig. S1) and was therefore used for the main analysis. This method fits a sparse elastic-net regression and optimizes two shrinkage penalties, including L1-norm (λ) and L2-norm (δ). A grid search across 30 λ and 10 δ values was utilized for optimization with respect to maximizing the top association with an IDP. The associations of dyslexia PGS were quantified with all 1,153 IDPs in each imaging modality using linear regression. A set of confound covariates were controlled for, including subject age at imaging visit (data field #21003, instance 2), age2, sex (data field #31), age×sex, age2×sex, the first ten principal components of genomic ancestry (data field #22009), genotyping array (data field #22000, either BiLEVE or Axiom), three dummy covariates encoding four UK Biobank neuroimaging sites (data field #54, instance 2), and the number of days passed since MRI scan incepted at the site (as a measure of slow drifts in MRI hardware performance; data field #53, instance 2). For structural MRI data, the type of affine registration (i.e. ANTs or Flirt) was further controlled as a covariate. Structural MRI analysis was performed either without (main analysis) or with (secondary analysis) correction for head size as a confounding covariate (data field #25000). For diffusion MRI data, the batch effect associated with diffusion preprocessing (i.e. either performed by our team or by the UK Biobank) was added to the covariates, and the analyses were done without (main analysis) and with (secondary analysis) the global mean apparent fiber density per individual as an extra covariate. We found that high δ values in the range of 102-104 slightly increased the accuracy of Lassosum2 over automated models PRS-CSauto and SBayesR, and λ in the range of 10-5-10-2 resulted in the highest accuracy of trait prediction (Supplementary Fig. S1). These shrinkage parameters were therefore used for subsequent analyses.
Voxel- and fixel-wise brain associations with dyslexia polygenic scores
We tested the brain-wide associations of dyslexia PGS with the voxel-wise and fixel-wise data in the UK Biobank. Both parametric (fsl_glm 6.0.3 (19)) and non-parametric (randomise v2.9 (20)) linear regression models were fitted to the data, the former to yield t-value maps for visualization and impact mode analysis, and the latter to generate brain-wide multiple comparisons-corrected P-value maps. To reduce computation costs, voxel-wise permutations were performed at half (2 mm3 isotropic) resolution with a wall-time of 9 days for 5,000 permutations per statistical contrast. The Randomise C++ code was modified to prevent short integer overflows due to the study sample size. No cluster enhancement was applied. The same sets of covariates as the previous section were used for optimization. In all cases, we observed that a parametric t-value of > 4.5 was equivalent to a non-parametric brain-wide corrected P-value of smaller than 0.05. To check the validity of our findings obtained with Lassosum2, we applied other methods for deriving PGS: SBayesR, PRS-CS, and PRS-CSauto. PRS-CS applies continuous shrinkage on variant-wise weights using Bayesian priors and is optimized using a single global shrinkage hyperparameter (ϕ). We explored four different ϕ values for optimizing PRS-CS, which were 10-6, 10-4, 0.01, and 1 (Supplementary Fig. S1). PRS-CSauto and SBayesR are automated polygenic scoring methods and therefore did not require hyperparameter optimization on an independent dataset. We found that dyslexia lassosum2 PGS was strongly correlated with dyslexia PGS derived from PRS-CSauto (Pearson’s r = 0.87 and 0.93 following optimization on structural or diffusion-derived measures, respectively) and SBayesR (r = 0.74 and 0.84, same order). Compared to lassosum2, these additional PGS exhibited highly similar brain-wide associations (Supplementary Fig. S2). To describe the white matter tracts that run through regions where fixels showed significant associations of AFD with dyslexia PGS, we ran probabilistic fiber tractography using the second-order Integration over Fiber Orientation Distributions (iFOD2) algorithm in the template space (21).
Dyslexia locus-based neuroimaging association
42 individual genomic loci were significantly associated with dyslexia after genome-wide multiple testing correction in the 23andMe Inc. GWAS for dyslexia (15). 35 of these variants passed our genetic quality control process in the UK Biobank data (see the Materials and Methods section ‘Genetics’, above). At each of these 35 loci, the dosage of the dyslexia-disposing allele was calculated and used in separate linear regression models to find brain-wide associations with regional volume and white-matter microstructure (i.e. voxel-wise Jacobian values and fixel-wise AFD values, respectively), using the same approach and covariates as when testing voxel-wise and fixel-wise PGS associations. These covariates included age, age2, sex, age×sex, age2×sex, ten principal components of genomic ancestry, genotyping array, UK Biobank imaging site, the number of days passed since MRI scan incepted at the site, the type of affine registration (for structural MRI), and preprocessing being either performed by our team or by the UK Biobank team (for diffusion MRI). For each variant, we also performed secondary analyses in which head size or subject-average AFD across all fixels were additionally included as confound covariates, respectively in T1 and diffusion data modalities.
Impact mode decomposition
PGS approximates polygenic influences through a single scalar value. These models represent a weighted average of all disposing allele counts and are agnostic to variability in the brain-wide associations of genetic variants. We aimed to model the heterogeneity and the hidden covariance patterns in the brain-wide genomic associations. To achieve this, we initially created a brain-wide univariate association map (i.e. voxel-wise or fixel-wise t-value maps generated by a parametric regression) for each of the top independent 13,766 dyslexia GWAS loci, after clumping at a GWAS p-value threshold of less than 0.01, linkage disequilibrium r2 threshold of less than 0.1 and genomic window size of 500 kb (and using the same set of covariates as in all sections above). These voxel- or fixel-wise t-value maps were then concatenated across all 13,766 variants and decomposed by MELODIC into ten independent components, separately per each imaging modality. In order to enhance the sensitivity of ICA rotations to local effects rather than genetic associations with global measures, voxel-wise Jacobian determinant values were normalised to total brain volume before ICA. The default MELODIC ICA data transformations, including variance normalization and mean signal removal, were not applied as these momentums reflect meaningful signals in t-value maps (22). We refer to the extracted independent components as genomic impact modes, that reflect combinations of distinct genomic variants and spatial profiles through a limited number of features.
Polygenic scores of additional traits related to dyslexia
We first used LD score regression (23, 24) to confirm that we could detect previously reported genetic correlations between dyslexia and each of eight other behavioural, cognitive or education-related traits, based on summary statistics from the 23andMe dyslexia GWAS (15) and other large-scale GWAS studies: Attention-deficit/hyperactivity disorder (ADHD (25)), verbal numerical reasoning (a.k.a. fluid intelligence) (Pan-UKB team. https://pan.ukbb.broadinstitute.org. 2020.), the first principal component of school grades in mathematics and language (26), General Certificate of Secondary Education (GCSE) education (Pan-UKB team. https://pan.ukbb.broadinstitute.org. 2020.), word reading, non-word reading, spelling, and phonemic awareness (27). All of these traits showed significant genetic correlations |rg| > 0.4 with dyslexia in our analysis (all P < 10-23, Supplementary Fig. S4). In order to compare and contrast with dyslexia PGS, we then used Lassosum2 to generate PGS in the UK Biobank data for each of these eight additional traits, and mapped their brain-wide associations with the voxel-wise and fixel-wise data, using the same approach as for the dyslexia PGS.
Further post hoc regression analyses
We performed further regression analyses of the association between dyslexia PGS and voxel-wise volumes, this time using logarithm-transformed Jacobian determinant values to take allometry into account (28), rather than raw values, to assess whether this made a difference. In another post-hoc analysis, two extra covariates were added to assess voxel-wise and fixel-wise associations with dyslexia PGS independently of fluid intelligence and educational attainment: these covariates were ‘fluid intelligence’ (data-field #20016) and the number of years of education estimated from the data fields ‘qualifications’ (#6138) and ‘age completed full-time education’ (#845), following a previously published approach (29). Apart from the inclusion of these two covariates, the linear regression models were the same as the primary analyses described above in the Materials and Methods section ‘Voxel- and fixel-wise brain associations with dyslexia polygenic scores’.