Rating the quality of evidence and making recommendations

Introduction

Urologists require clinical expertise to integrate a patient’s circumstances and values with the best available evidence to initiate decision making related to the medical and surgical treatment of their patients. Using “best evidence” implies that a hierarchy of evidence exists and that clinicians are more confident about decisions based on evidence that offers greater protection against bias and random error [1].

Protection against bias and greater confidence in decisions arise from high-quality research evidence. We can consider quality of evidence as a continuum that reflects the confidence in estimates of the magnitude of effect of alternative patient management interventions on the outcomes of interest. However, gradations of this continuum are useful for communication with practicing clinicians, providing useful summaries of what is known for specific clinical questions to aid interpretation of clinical research.

Aiding interpretation becomes increasingly important considering that much of clinicians’ practice is guided by recommendations from experts summarized in clinical practice guidelines and textbooks such as this new book, Evidence-Based Urology. To integrate recommendations with their own clinical judgment, clinicians need to understand the basis for the clinical recommendations that experts offer them. A systematic approach to grading the quality of evidence and the resulting recommendations for clinicians thus represent an important step in providing evidence-based recommendations.

In this chapter we will describe the key features of “quality of evidence” and how we asked the authors of individual chapters to evaluate the available evidence and formulate their recommendations using a pragmatic approach that, out of necessity, falls short of the full development of evidence-based guidelines. The approach that most authors used was based on the work of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group [2–6]. Over 20 international organizations, including the World Health Organization, the American College of Physicians, the American College of Chest Physicians, the American Thoracic Society, the European Respiratory Society, UpToDate® and the Cochrane Collaboration, are now using the GRADE system.

Question formulation and recommendations in this book

The editors asked authors to ask clinical questions that are particularly relevant to urology practice using the framework of identifying the patient population(s), the intervention(s) examined (or exposure), alternative interventions (comparison) and the outcomes of interest [7]. Authors were further asked to identify relevant studies related to these questions or sets of questions. For example, in Chapter 30, the authors address the question whether asymptomatic patients with metastatic kidney cancer benefit from a cytoreductive debulking radical nephrectomy with regard to overall survival.

The authors were further asked to base the answers to their questions on evaluations of the scientific literature, in particular focusing on recent, methodologically rigorous systematic reviews of randomized controlled trials (RCTs) [8]. If authors could not identify a recent and rigorous systematic review, they were asked to search for RCTs and summarize the findings of these studies to answer their clinical questions. Only if RCTs did not answer the specific question (or did not provide information on a particular outcome) were observational studies included. Thus, the search studies we suggested focused on relevant systematic reviews or meta-analyses (a pooled statistical summary of relevant studies) followed by searches for randomized trials and observational studies if systematic reviews did not exist or did not include sufficient information to answer the questions posed.

Evaluating the quality of evidence and making recommendations

Many authors applied the GRADE system for evaluating the quality of evidence and presenting their recommendations. This approach begins with an initial assessment of the quality of evidence, followed by judgments about the direction (for or against) and strength of recommendations. Since clinicians are most interested in the best course of action, the GRADE system usually presents the strength of the recommendation first as strong (Grade 1) or weak (Grade 2), followed by the quality of the evidence as high (A), moderate (B), low (C) and very low (D). Authors of this book adopted a version of the grading system that combines the low and very low categories. Furthermore, the editors asked authors to phrase recommendations in a way that would express their strength. For strong (Grade 1) recommendations, many authors chose the words: “We recommend . . . (for or against a particular course of action).” For weak (Grade 2) recommendations, they used: “We suggest . . . (using or not using)” what they believed to be an optimal management approach. They then indicated the methodological quality of the supporting evidence, labeling it as A (high quality), B (moderate quality) or C (low or very low quality). Thus, recommendations could fall into one of the following six categories: 1A, 1B, 1C, 2A, 2B, and 2C (Table 4.1).

Table 4.1 Grading recommendations

Grade of recommendation	Balance of desirable versus undesirable effects	Methodological quality of supporting evidence
Strong recommendation, high-quality evidence 1A	Desirable effects clearly outweigh undesirable effects, or vice versa	Consistent evidence from randomized controlled trials without important limitations or exceptionally strong evidence from observational studies
Strong recommendation, moderate-quality evidence 1B	Desirable effects clearly outweigh undesirable effects, or vice versa	Evidence from randomized controlled trials with important limitations (inconsistent results, methodological flaws, indirect or imprecise), or very strong evidence from observational studies
Strong recommendation, low- or very low-quality evidence 1C	Desirable effects clearly outweigh undesirable effects, or vice versa	Evidence for at least one critical outcome from observational studies, case series, or from randomized controlled trials with serious flaws or indirect evidence
Weak recommendation, high-quality evidence 2A	Desirable effects closely balanced with undesirable effects	Consistent evidence from randomized controlled trials without important limitations or exceptionally strong evidence from observational studies
Weak recommendation, moderate-quality evidence 2B	Desirable effects closely balanced with undesirable effects	Evidence from randomized controlled trials with important limitations (inconsistent results, methodological flaws, indirect or imprecise), or very strong evidence from observational studies
Weak recommendation, low- or very low-quality evidence 2C	Desirable effects closely balanced with undesirable effects	Evidence for at least one critical outcome from observational studies, case series, or from randomized controlled trials with serious flaws or indirect evidence

The GRADE system suggests the use of the wording “we recommend” for strong (Grade 1) recommendations and “we suggest” for weak (Grade 2) recommendations. The categories of low and very low quality that GRADE includes in its four category system are collapsed here into a single category, resulting in three categories of quality of evidence.

Strength of the recommendation

In determining the strength of recommendations, the GRADE system focuses on the degree of confidence in the balance between desirable effects of an intervention on the one hand and undesirable effects on the other (see Table 4.1). Desirable effects or benefits include favorable health outcomes, decreased burden of treatment, and decreased resource use (usually measured as costs). Undesirable effects or downsides include rare major adverse events, common minor side effects, greater burden of treatment, and more resource consumption. We define burdens as the demands of adhering to a recommendation that patients or caregivers (e.g. family) may dislike, such as taking medication, need for inconvenient laboratory monitoring, repeated imaging studies or office visits. If desirable effects of an intervention outweigh undesirable effects, we recommend that clinicians offer the intervention to typical patients. The balance between desirable and undesirable effects, and the uncertainty associated with that balance, will determine the strength of recommendations.

Table 4.2 describes the factors GRADE relies on to determine the strength of recommendation. When chapter authors were confident that the desirable effects of adherence to a recommendation outweighed the undesirable effects or vice versa, they offered a strong recommendation. Such confidence usually requires evidence of high or moderate quality that provides precise estimates of both benefits and downsides, and their clear balance in favor of, or against, one of the management options. The authors offered a weak recommendation when low-quality evidence resulted in appreciable uncertainty about the magnitude of benefits and/or downsides or the benefits and downsides were finely balanced. We will describe the factors influencing the quality of evidence in subsequent sections of this chapter. Other reasons for not being confident in the balance between desirable and undesirable effects include: imprecise estimates of benefits or harms, uncertainty or variation in how different individuals value particular outcomes and thus their preferences regarding management alternatives, small benefits, or situations when benefits may not be worth the costs (including the costs of implementing the recommendation). Although the balance between desirable and undesirable effects, and thus the strength of a recommendation, is a continuum, the GRADE system classifies recommendations for or against an intervention into two categories: strong or weak. This is inevitably arbitrary. The GRADE Working Group believes that the simplicity and behavioral implications of this explicit grading outweigh the disadvantages.

Table 4.2 Determinants of strength of recommendation

Factors that influence the strength of a recommendation	Comment
Balance between desirable and undesirable effects	A strong recommendation is more likely as the difference between the desirable and undesirable consequences becomes larger. A weak recommendation is more likely as the net benefit becomes smaller and the certainty around that net benefit decreases
Quality of the evidence	A strong recommendation becomes more likely with higher quality of evidence
Values and preferences	A strong recommendation is more likely as the variability of or uncertainty about patient values and preferences decreases. A weak recommendation is more likely as the variability or uncertainty about patient values and preferences increases
Costs (resource allocation)	A weak recommendation is more likely as the incremental costs of an intervention (more resources consumed) increase

Clinical decision making in the setting of weak recommendations remains a challenge. In such settings, urologists should have more detailed conversations with their patient than for strong recommendations, to explore the individual patient’s values and to ensure that the ultimate decision is consistent with these. For highly motivated patients, decision aids that present patients with both benefits and downsides of therapy are likely to improve their understanding, reduce decision-making conflict, and may promote a decision most consistent with the patient’s underlying values and preferences [9]. Thus, another way for clinicians to interpret strong recommendations is that they provide, for typical patients, a mandate for the clinician to provide a simple explanation of the intervention along with a suggestion that the patient will benefit from its use. Further elaboration will seldom be necessary. On the other hand, when clinicians face weak recommendations, they should more carefully consider the benefits, harms and burdens in the context of the patient before them, and ensure that the treatment decision is consistent with the patient’s values and preferences. These situations arise when appreciable numbers of patients, because of variability in values and preferences, will make different choices.

As benefits and risks become more finely balanced or more uncertain, decisions to administer an effective therapy also become more cost sensitive. We have not asked authors to explicitly include cost in the recommendations, but cost will bear on the implementation of many recommendations in clinical practice [10].

Only gold members can continue reading. Log In or Register to continue