Skip to main content

Diagnostic Accuracy of Adrenal Imaging for Subtype Diagnosis in Primary Aldosteronism: Systematic Review and Meta-Analysis

Cite this dataset

Zhou, Yaqiong et al. (2020). Diagnostic Accuracy of Adrenal Imaging for Subtype Diagnosis in Primary Aldosteronism: Systematic Review and Meta-Analysis [Dataset]. Dryad.


Objectives: Accurate subtype classification in primary aldosteronism (PA) is critical in assessing the optimal treatment options. This study aimed to evaluate the diagnostic accuracy of adrenal imaging for unilateral PA classification.

Methods: Systematic searches of PubMed, EMBASE, and the Cochrane databases were performed from January 1, 2000, to February 1, 2020, for all studies that used computed tomography (CT) or magnetic resonance imaging (MRI) in determining unilateral PA and validated the results against invasive adrenal vein sampling (AVS). Summary diagnostic accuracies were assessed using a bivariate random-effects model. Subgroup analyses, meta-regression and sensitivity analysis were performed to explore the possible sources of heterogeneity.

Result: A total of 25 studies, involving a total of 4669 subjects, were identified. The overall analysis revealed a pooled sensitivity of 68% (95% confidence interval [CI]: 61 to 74) and specificity of 57% (95% CI: 50 to 65) for CT/MRI in identifying unilateral PA. Sensitivity was higher in the contrast-enhanced (CT) group versus the traditional CT group [77% (95% CI: 66 to 85) vs. 58% (95% CI: 50 to 66)]. Subgroup analysis stratified by screening test for PA showed that the sensitivity of the aldosterone-to-renin ratio (ARR) group was higher than that of the non-ARR group [78% (95% CI: 69 to 84) vs. 66% (95% CI: 58 to 72)]. The diagnostic accuracy of PA patients aged ≤40 years was reported in 4 studies, and the overall sensitivity was 71%, with 79% specificity. Meta-regression revealed a significant impact of sample size on sensitivity and of age and study quality on specificity.

Conclusion: CT/MRI is not a reliable alternative to invasive AVS without excellent sensitivity or specificity for correctly identifying unilateral PA. Even in young patients (≤40 years), 21% of patients would have undergone unnecessary adrenalectomy based on imaging results alone.


Search Strategy

The study followed the guidelines specified in the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P)11. We searched the PUBMED, EMBASE, and Cochrane Library databases from January 1, 2000, to February 1, 2020, using the following terms in combination, as both MeSH or Emtree terms and text words: "primary aldosteronism", "adrenal vein sampling" and "hyperaldosteronism". The electronic search strategy for PubMed is shown in supplementary Table S1. To reflect modern practice, we decided to limit the publication date to after January 1, 2000. We searched articles published in English, and the references of relevant studies were also searched. All studies were carefully examined to exclude overlapping or potential duplicate data.

Eligibility Criteria

We included a study if: 1) it used CT or MRI as a diagnostic test for PA subtyping; 2) it used AVS as the standard of reference. Successful AVS should be determined by calculating the selectivity index (SI), defined as the adrenal/peripheral vein cortisol ratio. Unilateral PA should be determined by calculating the lateralization index (LI), defined as the aldosterone/cortisol ratio between the dominant and the non-dominant adrenal gland; and 3) absolute numbers of true-positive, true-negative, false-positive, and false-negative results were provided or could be derived. Identified studies had to be independent. In the case of multiple reports on the same population or subpopulation, the most recent or comprehensive information was used.

Data Extraction and Quality Assessment

Data extraction from the eligible studies was performed by 2 independent investigators (Z.Y.Q and W.P.J) using a standardized data extraction form. The form included the following characteristics of each trial: first author's name and year of publication; study population characteristics, including sample size, geographical location, mean age and sex; diagnostic criteria characteristics, including screening test and confirmatory test for PA; AVS characteristics, including with/without adrenocorticotropic hormone (ACTH) stimulation, SI and LI; diagnostic test characteristics, including imaging methodology and whether contrast was administered. Differences between reviewers were resolved by discussion and consensus when necessary.

The methodological quality of the identified studies was assessed by 2 independent reviewers (Z.Y.Q and W.P.J) using the modified Quality Assessment of Diagnostic Accuracy Studies –2 (QUADAS-2) criteria. If a study was judged as “low” on all domains relating to bias or applicability, then it was judged to be a high-quality study. If a study was judged to be "high" and/or "unclear" in more than 1 domain, then it was judged as a low-quality study. If a study was judged to be “unclear” in 1 domain, it was considered an unclear-quality study12. Discrepancies were resolved by discussion and consensus.

Statistical Analysis

Measures of diagnostic accuracy are reported as point estimates with 95% confidence intervals (CIs). Sensitivity, specificity, the positive likelihood ratio (+LR) and the negative likelihood ratio (-LR) were modelled based on the true-positive, true-negative, false-positive, and false-negative rates for each trial13. The ratio of +LR to -LR was combined in a single global accuracy measure, the diagnostic odds ratio (DOR). Summary sensitivity, specificity, +LRs, -LRs and DORs were assessed using a bivariate random-effects model. The approach assumes bivariate normal distributions for the logit transformations of sensitivity and specificity from the individual studies. These bivariate models can be analysed using linear mixed model techniques that are now widely available in statistical packages, such as STATA gllamm14, 15. A hierarchical summary receiver operating characteristic (ROC) curve analysis was performed, yielding point estimates for each trial and pooled characteristics, including the 95% prediction region and the 95% confidence region.

Sources of statistical heterogeneity were explored by subgroup analyses, sensitivity analysis and meta-regression analysis16, which involved the I2 statistic; the following interpretation was applied for I2: <50%=low heterogeneity, 50% to 75%=moderate heterogeneity, and >75%=high heterogeneity.

Several studies demonstrated that MRI has poorer resolution and slower acquisition than CT, with a risk of respiratory artefacts and that MRI is inferior to adrenal CT in PA subtype evaluation17-20. Contrast materials can improve the visibility of adrenal structures imaged by CT and MRI scans and might have a positive effect on diagnosis accuracy21. Thus, imaging methods and contrast materials were thought to be confounders for subgroup analyses. Moreover, a large sample size may represent experienced interventional radiologists and support the credibility of the included studies. Thus, a small sample size was thought of as another confounder for subgroup analyses. The different diagnostic criteria for PA, the AVS procedure (with or without ACTH stimulation), different cut-offs for the LI criteria, and methodological quality might also affect the results for diagnosis accuracy20, Therefore, we also performed subgroup analyses stratified by these parameters. Thus, subgroup analyses were performed by the following factors: imaging methodology (CT or CT/MRI), contrast use, AVS procedure (with or without ACTH stimulation), cut-off value for the LI (2 or 4), diagnostic criteria for PA, sample size (divided by 100 subjects), and methodological quality (high-quality, low-quality and unclear-quality).

Potential publication bias was examined using the Deeks test22. The Cohen ĸ test was employed to assess the inter-rater reliability between 2 observers for quality assessment. If there was not agreement, a third reviewer was involved to resolve disagreements, and final decisions were determined by consensus. Statistical analyses were performed using Stata version 13.0 (StataCorp LP, TX, USA) and Review Manager version 5.3.


National Natural Science Foundation of China, Award: 81970262