Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

University Library, University of Illinois at Urbana-Champaign

Evidence-Based Medicine (EBM)

Tutorials & Handouts

Tools for Critical Appraisal

Studying a Study Text

Why Appraise?

After locating evidence and determining how useful the information will be for the patient's unique situation, you will need to determine how best to apply the evidence to practice.


  • Setting - Study similarities to your setting
  • Population - How similar your patient is to those studied
  • Providers  - Whether studied outcomes are realistic in your clinical setting
  • Study Design - What is quality of the overall methodology
  • Study Synthesis - Whether potential benefit outweighs potential risk
  • How the intervention relates to the patient's values and experiences
  • The available alternatives
  • What the outcome will be if no intervention is applied


  • Were there enough subjects in the study to establish that the findings did not occur by chance? 
  • Were subjects randomly allocated? Were the groups comparable? If not, could this have introduced bias?  ​
  • Are the measurements/ tools validated by other studies? 
  • Was the study "blinded"?
  • Could there be confounding factors?
  • Is the study reproducible?
  • What were the outcome measures? 


  • Is the evidence relevant to your question?
  • Is the methods section clear?
  • Is the method or study design appropriate for the question?
  • In a systematic review, are individual studies considered for validity?
  • Is there specific inclusion/exclusion criteria?
  • Are studies consistent?
  • How precise are results? 
  • Are results statistically significant?

USPSTF Hierarchy of Research Design (Benefits and Harms)

I. Properly powered and conducted RCT; well-conducted systematic review or meta-analysis of homogeneous RCTs
II-1. Well-designed controlled trial without randomization
II-2. Well-designed cohort or case-control analysis study
II-3. Multiple time-series, with or without the intervention; results from uncontrolled studies that yield results of large magnitude
III. Opinions of respected authorities, based on clinical experience; descriptive studies or case reports; reports of expert committees

GRADE - What makes evidence less more/less certain?

What makes evidence less certain?

For each of risk of bias, imprecision, inconsistency, indirectness, and publication bias, authors have the option of decreasing their level of certainty one or two levels (e.g., from high to moderate).
The GRADE Domains for rating down

1. Risk of bias
Bias occurs when the results of a study do not represent the truth because of inherent limitations in design or conduct of a study.[8] In practice, it is difficult to know to what degree potential biases influence the results and therefore certainty is lower in the estimated effect if the studies informing the estimated effect could be biased.

There are several tools available to rate the risk of bias in individual randomised trials[9] and observational studies.[10, 11]

GRADE is used to rate the body of evidence at the outcome level rather than the study level. Authors must therefore make a judgement about whether the risk of bias in the individual studies is sufficiently large that their confidence in the estimated treatment effect is lower. Key considerations for risk of bias and a detailed description of the process for moving from risk of bias at the study level to risk of bias for a body of evidence is described in detail in the GRADE guidelines series #4: Rating the quality of evidence – study limitations (risk of bias).[8]

2. Imprecision
The GRADE approach to rating imprecision focuses on the 95% confidence interval around the best estimate of the absolute effect.[12] Certainty is lower if the clinical decision is likely to be different if the true effect was at the upper versus the lower end of the confidence interval. Authors may also choose to rate down for imprecision if the effect estimate comes from only one or two small studies or if there were few events.[13] A detailed description of imprecision is described in the GRADE guidelines series #6: Rating the quality of evidence – imprecision.[12]

3. Inconsistency
Certainty in a body of evidence is highest when there are several studies that show consistent effects. When considering whether or not certainty should be rated down for inconsistency, authors should inspect the similarity of point estimates and the overlap of their confidence intervals, as well as statistical criteria for heterogeneity (e.g., the I2 and chi-squared test).[14] A full discussion of inconsistency is available in the GRADE guidelines series #7: rating the quality of evidence – inconsistency.[14]

4. Indirectness
Evidence is most certain when studies directly compare the interventions of interest in the population of interest, and report the outcome(s) critical for decision-making. Certainty can be rated down if the patients studied are different from those for whom the recommendation applies. Indirectness can also occur when the interventions studied are different than the real outcomes (for example, a study of a new surgical procedure in a highly specialized centre only indirectly applies to centres with less experience). Indirectness also occurs when the outcome studied is a surrogate for a different outcome – typically one that is more important to patients. A full discussion of indirectness is available in the GRADE guidelines series #8: rating the quality of evidence – indirectness.[15]

5. Publication bias
Publication bias is perhaps the most vexing of the GRADE domains, because it requires making inferences about missing evidence. Several statistical and visual methods are helpful in detecting publication bias, despite having serious limitations. Publication bias is more common with observational data and when most of the published studies are funded by industry. A full discussion of publication bias is available in the GRADE guidelines series #5: rating the quality of evidence – publication bias.[16]

What increases confidence in the evidence?

In rare circumstances, certainty in the evidence can be rated up (see table 2). First, when there is a very large magnitude of effect, we might be more certain that there is at least a small effect. Second, when there is a clear dose-response gradient. Third, when residual confounding is likely to decrease rather than increase the magnitude of effect. A more complete discussion of reasons to rate up for confidence is available at in the GRADE guidelines series #9: Rating up the quality of evidence.

GRADE certainty table