Gastroparesis is a symptomatic condition of delayed gastric emptying with no mechanical obstruction . There are several etiologies of gastroparesis, including diabetic gastroparesis and postsurgical gastroparesis. In many patients, a cause cannot be found, and the condition is termed idiopathic gastroparesis. In some of these patients, a viral etiology may be suspected due to a sudden onset of symptoms associated with a viral-like prodrome.
A variety of symptoms are reported by patients with gastroparesis . These can include nausea, vomiting, early satiety, postprandial fullness, bloating, loss of appetite, abdominal distension, and abdominal pain. Patients may experience any combination of symptoms with varying degrees of severity . The symptoms are often chronic; however, patients may also have periodic exacerbations of their symptoms . These symptoms may impair physical, emotional and social functioning and well-being, and reduce the patient’s health-related quality of life (HRQL). Many patients experience weight loss due to their symptoms.
The diagnosis of gastroparesis is made by demonstrating in a symptomatic patient delayed gastric emptying without evidence of obstruction . Delayed gastric emptying is most commonly assessed using a validated measurement of gastric emptying. At present, the best validated, and approved method of measurement is scintigraphy of the solid phase of a meal . Other ways to assess gastric emptying include wireless capsule motility and breath tests using stable C-13 isotopes. Absence of obstruction is most commonly determined by upper endoscopy. An alternative test is an upper gastrointestinal radiographic series, which can also assess the small intestine.
There is a need for new safe and effective treatments for gastroparesis with favorable benefit risk profiles. Pharmacologic treatment of gastroparesis typically involves two classes of agents: prokinetics agents and antiemetic agents. Metoclopramide, a dopamine type 2 receptor antagonist has both prokinetic and antiemetic properties, and is the only drug currently approved by the FDA for gastroparesis, specifically for diabetic gastroparesis for up to 12 weeks of treatment.
Understanding the relevant symptoms of gastroparesis is important in treating patients with this disorder. In gastroparesis, the symptom experience and severity is obtained from the patient. Consequently, patient-reported symptom scales that capture overall gastroparesis severity are necessary for evaluating treatments for gastroparesis. In the 2015 draft FDA guidance, they note that the most frequently reported symptoms associated with gastroparesis of nausea (92–96%), vomiting (68–88%), post-prandial fullness (54–77%), early satiety (42–86%), and upper abdominal pain (36–85%) . A well-defined patient reported outcome (PRO) instrument that measures clinically important signs and symptoms of gastroparesis would be a useful assessment tool for clinical trials to support labeling claims for treatment of gastroparesis.
Patient-reported outcomes in gastroparesis clinical studies
Over the past 30 years, a number of different PRO measures have been used to assess symptom severity and frequency, and the impact of symptom severity on patient physical, emotional and social functioning and well-being, or HRQL. Various different symptom measures have been included in randomized clinical trials and other clinical studies (see Table 33.1 ). The most frequently utilized symptom scales were derived from the Gastroparesis Cardinal Symptom Inventory (GCSI) . Several clinical studies have included either the original GCSI or the more recently developed GCSI Daily Diary (GCSI-DD) . Often the GCSI is administered in the complete Patient Assessment of Upper Gastrointestinal-Symptoms (PAGI-SYM) . The current version of the GCSI-DD, the ANMS GCSI-DD contains symptom severity items for nausea, early satiety, post-prandial fullness and upper abdominal pain and a vomiting frequency item .
|VAS Nausea Severity|
|DG Symptom Severity (DGSSD; NRS)|
|Gastrointestinal Symptom Rating Scale|
|Gastrointestinal Symptom Assessment|
|Overall Treatment Effect|
|Global Relief of Symptoms|
|Patient-Reported Symptom Severity/Frequency (TSS)|
|Global Assessment of Symptoms|
Many studies focus on investigator developed symptom scales or a subset of gastroparesis-related symptoms depending on the treatment being evaluated or the objectives of the clinical study . Several studies focus on only nausea and vomiting . A number of studies include global assessments of symptom severity, change in symptom severity or relief in gastroparesis-related symptoms . More recently, a new diabetic gastroparesis-related scale, the Diabetic Gastroparesis Symptom Severity Diary (DGSSD) was developed. The DGSSD was developed as a daily diary and contains symptom severity items on nausea, abdominal pain, bloating, post-prandial fullness, abdominal pain, and early satiety, and a vomiting frequency item. Evidence supporting the content validity and reliability and validity of the DGSSD was recently published . Camilleri et al. reported on a 12-week phase 2b randomized clinical trial comparing relamorelin versus placebo in patients with diabetic gastroparesis. No treatment differences were observed between active and placebo treatment based on vomiting frequency and a 4-item composite score (nausea, abdominal pain, post-prandial fullness, bloating severity).
Previous clinical trials and other clinical studies have incorporated other PRO measures to characterize the impact of gastroparesis-related symptoms on patient physical, social and emotional functioning and well-being (i.e., HRQL). For example, the NIDDK Gastroparesis Clinical Research Consortium (GPCRC) includes a very comprehensive battery of HRQL and emotional well-being instruments, including the Patient Assessment of Upper Gastrointestinal Disorders-Quality of Life (PAGI-QOL) Scale, SF-36 Health Survey, Brief Pain Inventory, Beck Depression Inventory, State-Trait Anxiety Scale, Patient Health Questionnaire-15, and several patient completed global rating scales . Other researchers have incorporated the SF-36 Health Survey, a generic measure of functioning and well-being, into gastroparesis clinical studies .
Next, we will discuss PRO instrument development and regulatory issues, then we will illustrate the development of a gastroparesis-related PRO measure, using examples from the ANMS GCSI-DD, which is undergoing qualification evaluation by the FDA as a primary or key secondary endpoint for clinical trials in diabetic and idiopathic gastroparesis. Currently, the ANMS GCSI-DD is the gastroparesis symptom endpoint with the greatest supportive evidence available on content validity and psychometric characteristics (i.e., reliability, validity, responsiveness) . The gastroparesis-related symptoms included in the ANMS GCSI-DD are also consistent with the most recent draft FDA guidance on gastroparesis related endpoints. The conceptual framework for the ANMS GCSI-DD core composite is summarized in Fig. 33.1 and for the total score in Fig. 33.2 .
Developing gastroparesis-related patient-reported outcomes
Developing symptom assessments or other measures of the impact of gastroparesis-related symptoms on physical, emotional and social functioning and well-being (i.e., HRQL) is no different that PRO development for other disease conditions. A systematic program of qualitative and quantitative research is required where the instrument developers proceed through a standardized series of activities to eventually result in a measure with solid evidence of content validity and evidence supporting the measurement qualities (i.e., reliability, validity, responsiveness) of the final instrument. In addition, guidelines for determining the clinical significance of observed scores for between-group comparisons and within patient changes are also needed. The development of a new PRO instrument is accomplished through a series of standardized stages, proceeding from background research (i.e., literature reviews, clinician interviews), qualitative studies (i.e., concept elicitation studies), instrument construction, cognitive debriefing studies, and conduct of psychometric studies.
As an initial stage, medical literature reviews are conducted to provide background information to develop clinician interview guides and to gain insight into important symptoms and concepts associated with the disorder, in this case gastroparesis. Next, multiple clinician interviews are conducted to get further information about diagnosis, presenting symptoms and the course of symptoms in gastroparesis. The information from the literature and the clinician interviews is then used to develop a preliminary conceptual framework for the targeted PRO (e.g., physical functioning, emotional well-being, etc.). This conceptual framework is then used to develop patient interview guides to develop qualitative data on the patient experience with gastroparesis symptoms or other relevant HRQL related concepts. Concept elicitation interviews are then conducted in the targeted patient population and interviews and focus groups are completed until no new information is identified, that is, saturation of concepts. The textual data is coded and summarized to determine the relevant concepts, for example, relevant and core symptoms.
The draft PRO instrument is developed to measure the core concepts with instructions on the recall period (i.e., past 24 hours, past 7 days) and how to respond to the items, the various item stems, and the item response scales. There may be multiple iterations of review and development before the final draft is complete. The draft measure is administered to a sample of the target population as cognitive debriefing interviews to ensure that the patients completely understand the questionnaire instructions, recall period, item content, and response scales. This activity might also take several rounds of review and revision before the PRO instrument is ready for field testing. Based on the qualitative results the content validity is demonstrated.
Finally, one or more studies need to be designed and conducted to evaluate the measurement characteristics (i.e., reliability, validity, responsiveness to change). These psychometric studies may be stand-alone observational studies or may be psychometric analyses of clinical and PRO data collected in phase II or III clinical trials. The psychometric study will be developed to evaluate internal consistency reliability (i.e., homogeneity of the item content) and test-retest reliability (i.e., score stability when clinical status in unchanged). Next, construct and concurrent validity is evaluated be demonstrating significant correlations with other measures of the same or related concept. For example, we might examine the correlations between the ANMS GCSI-DD and the PAGI-SYM scores. Known groups validity is also evaluated to demonstrate that the PRO scores vary significantly in groups expected to vary in ‘known’ ways. For example, the mean scores on the PRO might be examined by clinician or patient rating of disease severity, where groups rated as mid, moderate or severe are compared. Finally, for clinical trial endpoints it is essential to determine whether the PRO measure is sensitive to changes in clinical status. Change in the PRO scores over time are compared between groups rated as improving, remaining stable or worsening. Clinician-rated and/or patient-rated changes in disease status, or in static severity scores can be used for the evaluation of responsiveness. Sometimes clinical measures are available, such as gastric emptying tests, although gastroparesis related symptoms and gastric emptying test results are not often well correlated.
For PRO endpoints for clinical trials and other studies, interpretation guidelines are needed to evaluate the clinical significance of changes or differences in scores. The interest is first in interpreting between group differences, and this is most often an estimated minimal important difference (MID). The MID is defined as a difference that is important to patients, or their clinicians, and that represents a meaningful change. The is also interest in interpreting clinically significant within-patients, as significant between group differences does not indicate that every patient in a treatment demonstrates clinically meaningful improvements. A series of analyses are conducted, using multiple anchors, to determine this responder threshold. For example, an anchor may be the patient-rated changes in disease status from baseline to the study endpoint. These type global anchors ask patients to rate their changes in clinical study on a scale, such, very much worsened, much worsened, worsened, no change, improved, much improved and very much improved. A responder threshold can be defined as the magnitude of change observed in the much improved group.
Regulatory issues and patient-reported outcomes
Ten years ago, the Food and Drug Administration (FDA) issued a guidance on the use of patient-reported outcome measures for product labeling. Within this guidance, the FDA summarized important issues related to PRO endpoints for registration of clinical trials, and provided information on the types of evidentiary requirements for demonstrating the content validity and psychometric characteristics of new or existing PROs for product labeling. There was a strong emphasis on conceptual frameworks and content validity based on qualitative research (i.e., concept elicitation, cognitive interviewing) before proceeding to the evaluation of measurement properties (i.e., reliability, validity, responsiveness) of a new PRO measure. The European Agency for the Evaluation of Medicinal Products (EMEA) has also released a reflection paper on the measures of health-related quality of life and evaluation of drug applications. EMA adopted a more flexible and general approach, whereas the FDA provides more direct recommendations. These documents emphasize good science measurement and systematic development, and psychometric evaluation of PRO measures and are consistent with best practices in PRO research .
PRO measures should be developed based on a clearly defined conceptual framework. Current practice requires input from published literature, clinicians, and patients with the condition over the course of the instrument development process. The patient’s perspective is critical for understanding key PRO domains and to ensure that the measures, questions, and response scales are understandable to patients. When patients’ input differs from the initial conceptual framework, revisions should be made to accurately represent their input. A sound understanding of the target concepts, and well developed items can then be constructed to represent the specified domains. Evaluation of the relevance and completeness of the instrument’s content is based on subjective judgment and may vary by researcher or regulatory perspective. The FDA guidance focuses on qualitative research to identify key concepts for PRO instruments, and to support content validity. The 2009 FDA guidance emphasizes more on qualitative research and content validity rather than on the analysis of measurement properties. Qualitative research, with focus groups and cognitive interviewing, requires many judgments from reviewers of interview transcripts when combining patient-derived information into larger concepts. Different researchers could identify different concepts based on the same data.
The guidance suggests that PRO measures with extended recall periods are subject to recall bias. Diaries and daily data-capture, focusing on current status or recall during the past 24 hour, seem to be preferred over instruments with recall periods of 1–4 weeks. Daily dairy – capture of PROs also have recognized problems. Prolonged recall periods could increase measurement error through recall bias, but might improve the capture of less frequent events and effect of such events on daily life. Patients’ recollections of the experience during the reference period are what matters most, and any response process, even for current status, relies on cognitive processing and (to some extent) on memory. Appropriate recall periods should be determined by the research question, disease, domain, and study context. Symptoms, functioning, and general health perceptions can be validly measured by use of different recall periods. Although aggregated daily-symptom assessments (i.e., pain, fatigue) are moderately to strongly correlated with weekly symptom assessments, although mean scores might differ. For example, Revicki et al. found that GCSI daily diary data correlated >0.90 with GCSI scores based on two-week recall. More importantly for clinical trials, very little evidence suggests any difference between daily and weekly measures in the detection of treatment differences (when the treatment is effective).
The FDA PRO guidance also addresses interpretation of differences or changes in PRO scores in between-group and within group comparisons. The guidance provides less emphasis on minimal important differences and between-group comparisons. Specific attention is focused on demonstrating clinically significant within-patient changes in PRO endpoints. This new emphasis recognizes that even with statistically significant between-treatment group differences, not all patients benefit from a treatment. The focus on determining clinically significant responder thresholds provides a meaningful approach to identifying the proportion of patients in a treatment that achieve a clinical benefit. Various anchor-based methods are utilized in identifying these responder definitions.
More recently, the FDA issued a draft guidance on gastroparesis clinical trials and endpoints. This guidance summarizes the FDAs thinking about clinical trial design, study entry criteria, and clinical and symptom severity endpoints for evaluating treatments for diabetic and idiopathic gastroparesis. Core symptoms include nausea, early satiety, post-prandial fullness and abdominal pain, all assessed based on a symptom severity response scale. In addition, measuring frequency of vomiting episodes is also considered an important endpoint, indicating more severe disease. These core symptoms, identified by the FDA are included in the ANMS GCSI-DD, the DGSSD, and other symptom outcome scales developed for gastroparesis clinical trials (see Table 33.1 ).
Development of the ANMS GCSI-DD
The ANMS GCSI-DD is a patient-reported outcome instrument that captures the daily relevant symptoms of gastroparesis. The original Gastroparesis Cardinal Symptom Index (GCSI) was developed to assess the core symptoms of gastroparesis and represents a subset of the longer, 20-item, Patient Assessment of Upper Gastrointestinal Disorders Symptoms (PAGI-SYM) questionnaire, which was developed to assess symptoms of gastroparesis, functional dyspepsia and gastroesophageal reflux disease .
The GCSI quantifies the severity of nine gastroparesis symptoms: nausea, retching, vomiting, stomach fullness, inability to finish a meal, excessive fullness, loss of appetite, bloating and abdominal distension . The symptoms that comprise the GCSI were elicited through focus groups and interviews with patients with diabetic and idiopathic gastroparesis with input from experts that care for gastroparesis patients as recommended by the FDA Guidance for PRO development. In its original conception, the GCSI assessed the severity of symptoms over a two week recall period . A six-point Likert response scale, with 0=none, 1=very mild, 2=mild, 3=moderate, 4=severe, and 5=very severe was used to rate severity of each symptom. The nine symptom severity items may be used to calculate three symptom subscale scores: nausea/vomiting subscale (comprised of symptoms of nausea, retching, vomiting), fullness/early satiety subscale (comprised of symptoms of stomach fullness, inability to finish a meal, excessive fullness, loss of appetite), and a bloating subscale (comprised of symptoms of bloating and abdominal distension). A total GCSI composite score may also be calculated as the mean of the three subscale scores .
The development of the ANMS GCSI-DD evolved based on several modifications to the original GCSI. First, given the concerns among regulatory agencies about recall bias, the ANMS GCSI-DD symptom assessments are based on a 24-hour recall period. Daily symptom assessment minimizes recall bias related to the patient’s symptom experience. Gastroparesis patients who participated in a GCSI-DD cognitive debriefing study indicated that a daily symptom assessment is needed to capture fluctuations in their symptom experience . As patient recall over two weeks may not be reliable, the GCSI-DD was developed to assess symptoms on a daily basis in patients with idiopathic or diabetic gastroparesis . Note that the Revicki et al. , in a small sample of gastroparesis patients, found that the GCSI daily diary symptom scores summarized over two weeks were correlated >0.90 with GCSI scores with a two-week recall period. The original GCSI-DD maintained the same items as the GCSI, with the only difference being the recall period.
Second, several of the symptoms were reworded to enhance understandability. The modifications were undertaken based on feedback from the FDA and from patients with gastroparesis during cognitive debriefing interviews. This was relevant for the symptoms of early satiety, postprandial fullness, bloating, and upper abdominal pain. The wording of early satiety was clarified to state “not able to finish a normal-sized meal (for a healthy person)”. Post-prandial fullness was reworded as, “Feeling excessively full after meals”, and bloating was clarified by adding the following statement: “Feeling like you need to loosen your clothes”. These three symptoms are moderately to strongly correlated. Post-prandial fullness was identified for the final daily diary instead of bloating as this symptom relates more closely with gastric emptying which is disordered in gastroparesis . Bloating is more often used to describe a symptom of irritable bowel syndrome which can coexist in some patients with gastroparesis . For abdominal pain, upper abdominal pain is assessed, as this is the usual location of pain/discomfort that might occur in gastroparesis. Lower abdominal pain is more frequently observed in irritable bowel syndrome and may confound the overall symptom assessment . Upper abdominal pain was originally defined as above the naval, but recently changed to above the belly button to improve patient understanding. The revised wording of these symptoms was demonstrated to be understandable to patients through cognitive debriefing interviews .
Third, the original response options for the GCSI/GCSI-DD (none, very mild, mild, moderate, severe, and very severe) represent the range of symptom severity seen in gastrointestinal disorders. This response scale has been used successfully in patients with gastroparesis and, based on a cognitive debriefing study and language translation related patient interviews, is well understood by patients with varying levels of education . The recent cognitive debriefing study in diabetic and idiopathic gastroparesis patients found that all the patients understood the response scale and could use this response scale in rating the severity of their symptoms . However, item response theory (IRT) analysis of the GCSI-DD items indicated that there were overlapping probability curves for the ‘very mild’ and ‘mild’ response categories . In addition, very few respondents selected the very mild response. Therefore, the ‘very mild’ response option was removed. Subsequent IRT analyses demonstrated that the items with the revised response scale fit the graded response model and were well ordered . Thus, a revised 5-point ordinal scale was now used for the ANMS GCSI-DD with 0=none, 1=mild, 2=moderate, 3=severe, and 4=very severe.
Finally, the response scale for vomiting was changed from a severity response scale to a frequency response scale, as the severity response might not completely capture severity if more than one vomiting episodes occur. For vomiting, the number of vomiting episodes (throwing up with food or liquid coming out) is used, not the number of trips to the bathroom to throw up. This takes into account the fluid and electrolyte shifts that are related to the number of emesis episodes. Vomiting was also been suggested to be added to help capture worsening of symptoms of gastroparesis. Note that vomiting frequency and vomiting severity are well correlated (r=0.80). In the scoring of vomiting episodes, the number of episodes of vomiting per day is used, with the episodes capped a 4, so that the scoring is similar to the symptom severity scores of the other items. Therefore, vomiting episodes are scored as 0=none; 1=one episode; 2=two episodes; 3=three episodes and 4=four or more episodes. Recent qualitative research confirmed that patients with gastroparesis understood the instructions for rating number vomiting episodes . Note that the actual number of vomiting episodes is captured in the daily diary and can be treated as a separate vomiting score.
The GCSI was originally developed for clinical studies in both diabetic and idiopathic gastroparesis . Subsequent studies have suggested that the type of symptoms diabetic and idiopathic gastroparesis are similar, although patients with idiopathic gastroparesis may have more severe upper abdominal pain, and diabetics may have more severe nausea and vomiting . Revicki completed a secondary analysis comparing psychometric characteristics of GCSI-DD core symptom scores in patients with idiopathic or diabetic gastroparesis. Completed confirmatory factor analyses, IRT analyses, and differential item functioning (DIF) analyses demonstrated (1) comparable and unidimensional factor structure for idiopathic and diabetic gastroparesis samples; (2) comparable item parameters and IRT model fit for idiopathic and diabetic gastroparesis samples; and (3) no evidence of uniform or non-uniform DIF for GCSI-DD items between idiopathic and diabetic gastroparesis samples.
Several of the symptoms in the original GCSI were interrelated and assessed similar types of symptoms. In addition, upper abdominal pain can be present in some patients with gastroparesis . The core relevant symptoms of gastroparesis were reduced to five symptoms: nausea, early satiety, postprandial fullness, upper abdominal pain, and vomiting. The symptom of bloating was also included as an exploratory symptom, based on the recommendations of regulatory authorities.
The revised ANMS GCSI-DD instrument has been used by patients with both idiopathic and diabetic gastroparesis . Based on extensive qualitative research, the instrument was found to be understood by patients, easily implanted in clinical studies, relevant to patients, and captures the main symptoms of gastroparesis . Evidence supports the content validity of the daily diary and patients with gastroparesis understand and comprehend the instructions, item stems, and response scales for the daily diary.
Subsequent evaluation of the measurement characteristics of the ANMS GCSI-DD demonstrated excellent internal consistency reliability and test-retest reliability, structural validity, and good concurrent and known groups validity. ANMS GCSI-DD item and composite scores vary significantly by measures of clinician rated disease severity and patient-rated disease severity . Table 33.2 summarizes mean item scores by clinician-rated severity (p<0.01), and Fig. 33.3 summarizes core symptom scores and total symptom scores by patient rated severity groups (p<0.001). We have some preliminary data on responsiveness and sensitivity to changes in clinical status , however, confirmatory results will be derived from phase 2 clinical trial data. Table 33.3 summarizes results comparing mean baseline to 4-week changes in ANMS GCSI-DD symptom severity scores between responders and non-responders ( P <.0001). Preliminary clinical responder thresholds range from 0.7 to 1 improvement in ANMS GCSI-DD composite scores.