Attempts to rate evidence based on study design, rigor and validity have continually evolved since Evidence-Based Medicine emerged in the 1990s. There are multiple frameworks that attempt to rate or grade evidence and some of those are represented below.
I. Properly powered and conducted RCT; well-conducted systematic review or meta-analysis of homogeneous RCTs
II-1. Well-designed controlled trial without randomization
II-2. Well-designed cohort or case-control analysis study
II-3. Multiple time-series, with or without the intervention; results from uncontrolled studies that yield results of large magnitude
III. Opinions of respected authorities, based on clinical experience; descriptive studies or case reports; reports of expert committees
Describing the strength of a recommendation is an important part of communicating its importance to clinicians and other users. Although most of the grade definitions have evolved since the USPSTF first began, none has changed more noticeably than the definition of a C recommendation, which has undergone three major revisions since 1998. Despite these revisions, the essence of the C recommendation has remained consistent: at the population level, the balance of benefits and harms is very close, and the magnitude of net benefit is small. Grade C recommendations are particularly sensitive to patient values and circumstances. Determining whether or not the service should be offered or provided to an individual patient will typically require an informed conversation between the clinician and patient.
GRADE is a transparent framework for developing and presenting summaries of evidence and provides a systematic approach for making clinical practice recommendations. GRADE has four levels of evidence – also known as certainty in evidence or quality of evidence: very low, low, moderate, and high (Table 1). Evidence from randomised controlled trials starts at high quality and, because of residual confounding, evidence that includes observational data starts at low quality.
The CEBM “levels of evidence” were first produced in 1998 for Evidence-Based On Call to make the process of finding appropriate evidence feasible and its results explicit. There are two different ways to interpret Level 1 evidence for treatment benefits as it is currently stated. The intended interpretation is: “either N-of-1 randomized trials or systematic reviews of randomized trials”. The wrong interpretation is: “either systematic reviews of randomized trials or systematic reviews of n-of-1 trials”.
More about levels of evidence from CEBM can be found at: https://www.cebm.net/2011/06/explanation-2011-ocebm-levels-evidence/
In the UpToDate grading system, the strength of any recommendation depends on two factors: the tradeoff between benefits and risks and burden, and the quality of the evidence regarding treatment effect. We grade the tradeoff between benefits and risks and burden in two categories; 1, in which the tradeoff is clear enough that most patients, despite differences in values, would make the same choice, leading to a strong recommendation; and 2, in which the tradeoff is less clear, and individual patients values will likely lead to different choices, leading to a weak recommendation. We grade methodological quality in three categories: randomized trials that show consistent results, or observational studies with very strong treatment effects; randomized trials with limitations, or observational studies with exceptional strengths; and observational studies without exceptional strengths or randomized trials with major weaknesses.
What do the ratings mean?
All of the resources in this tool are based on intervention evaluations or studies that have evidence of effectiveness, feasibility, reach, sustainability, and transferability. The ratings indicate how strong the evidence is.
4 out of 4
These resources are based on rigorous evidence. Resources with this rating include systematic reviews of published intervention evaluations or studies that have evidence of effectiveness, feasibility, reach, sustainability, and transferability.
3 out of 4
These resources are based on strong evidence. Resources with this rating include non-systematic reviews of published intervention evaluations or studies that have evidence of effectiveness, feasibility, reach, sustainability, and transferability.
2 out of 4
These resources are based on moderate evidence. Resources with this rating include intervention evaluations or studies with peer review that have evidence of effectiveness, feasibility, reach, sustainability, and transferability.
1 out of 4
These resources are based on weak evidence. Resources with this rating include intervention evaluations or studies without peer review that have evidence of effectiveness, feasibility, reach, sustainability, and transferability.
What are the differences between the ratings?
4 vs. 3: A rating of 4 requires a formal, comprehensive, and systematic review of all relevant literature whereas a rating of 3 only requires an informal, non-comprehensive, non-systematic review of some but not all relevant literature.
3 vs. 2: A rating of 3 requires a review of multiple evaluations or studies whereas a rating of 2 only requires one evaluation or study.
2 vs. 1: A rating of 2 requires peer review whereas a rating of 1 does not require peer review.