Questionnaire structure
Definition
Acceptability level
Principal components factor analysis
Principal components factor analysis can be conducted to determine the item-scale structure of a questionnaire
All items included in a scale should load on a single unrotated factor >0.40 to support unidimensionality
No item redundancy (inter-item correlations <0.80) Variance explained by the second factor >20% after varimax rotation to be considered a second dimension of the questionnaire
No items should load >0.40 on more than one factor
Test–retest reliability
The stability of a measuring instrument, which is assessed by administering the instrument to respondents on two different occasions and examining the correlation between test and retest scores
Intraclass correlation coefficient ≥0.70 is considered satisfactory
Internal consistency reliability
To evaluate the extent to which individual items of the instrument are consistent with each other and reflect an underlying scheme or construct
Cronbach’s α coefficient ≥0.70 is considered acceptable
Known groups validity
Known groups validity: ability of a measure to distinguish between groups known to differ, such as between different disease severity groups
Statistically significant differences are expected between these groups
Convergent and divergent validity
Convergent validity: dimensions measuring similar or overlapping concepts are expected to be substantially correlated (r ≥ 0.40) Divergent validity: dimensions measuring dissimilar concepts are expected to correlate less strongly or not at all
Once the hypothesized domains have been determined, similar and dissimilar measures should be incorporated into the validation study to test for convergent and divergent validity
Responsiveness | Definition/test | Criteria for acceptability |
---|---|---|
Ability to detect change | To be useful for clinical trials, a measure must reflect changes in scores when change has occurred in health condition under study | Statistically significant change in scores |
Interpretation of magnitude of change | Definition/test | Criteria for acceptability |
Anchor-based method | A clinical anchor or patient-reported anchor can be used to establish whether or not a patient has improved. Those improving at least ‘‘slightly’’ inform the MID in score at end of treatment expected to be shown to indicate a meaningful change [28] | This varies from measure to measure |
Distribution-based method | Effect size [28] | ≥0.5 |
Questionnaire as diagnostic | Definition | Acceptability level |
---|---|---|
Sensitivity and specificity | Sensitivity is defined as the proportion of subjects with the disease that are diagnosed as having the disease (true-positive rate), and specificity is defined as the proportion of subjects without a disease that are diagnosed as not having the disease (true-negative rate) | Point at which the sensitivity/specificity ratio is closest to unity (this maximizes sensitivity and specificity) |
Positive and negative predictive value | Positive predictive value is the precision rate of patients being diagnosed with a condition and they do actually have the condition. Negative predictive value is the precision rate of patients not being diagnosed with condition when they do not have the condition | Predictive values greater than 70 % are considered acceptable |
16.4 Regulatory Requirements for Newly Developed Patient-Reported Outcomes
New measures being developed for use in a clinical trial program, which may lead to US labeling claims, need to satisfy the requirements of the FDA guidance for industry, PRO measures: use in medical product development to support labeling claims [15]. This document clearly sets forth the psychometric tests and properties that need to be demonstrated for a new PRO to be considered validated. The FDA guidance also emphasizes that the content of the tool be developed by interviewing patients and that the language used in the PRO reflect the experience of patients. Additionally, the guidance suggests that an end point model be developed (research and model of Patrick et al. [13] are a good starting point for any new PE measure) to support why particular end points under investigation have been chosen. Any changes to a measure, inclusive of medium of administration, recall period, or change in population would require revalidation.
16.5 Assessment Techniques
Methods used to assess treatment outcome for PE include (1) patient self-report, (2) clinician judgment, (3) structured interview, (4) omnibus measures of sexual function, (5) focused self-report inventories designed specifically to evaluate outcome of treatments for PE, (6) patient diaries, and (7) timing of ejaculatory latency by stopwatch.
16.5.1 Self-Report
Investigators have been notoriously skeptical of the value of patient self-report when conducting outcome studies. A recent study by Rosen et al. [16] reported that self-estimated and stopwatch-measured IELT were interchangeable. The authors correctly assigned PE status with 80 % specificity and 80 % sensitivity. Previously, Althof [17] correlated 13 patients’ self-report of IELT during a telephone interview, a face-to-face clinical interview, and stopwatch-assessed IELT. Correlation coefficients between telephone interview and actual time and structured interview and actual time were 0.619 and 0.627, respectively. These data would suggest that self-report of IELT might suffice in a clinical situation. Clinical research demands a higher standard of objectivity requiring IELT measurements, however. Given the data, for treating clinicians patient reported IELT appears adequate.
16.5.2 Clinician Judgment
There are no reported studies on clinician concordance in diagnosing PE. Clinician subjectivity in interpreting the current diagnostic criterion sets would likely hamper agreement. Subjectivity in interpretation would also interfere with reliability in gauging improvement with treatment or impact on the psychosocial parameters. Patient self-report and clinician judgment, however, remain unobtrusive means of obtaining data and are the least burdensome for patients.
16.5.3 Inventories of Sexual Function
The Derogatis sexual function inventory (DSFI) [18] and the Golumbok Russ inventory of sexual satisfaction (GRISS) [19] are omnibus sexual inventories, neither of which is specifically designed to assess any one particular sexual dysfunction. The DSFI is composed of 254 items comprising 10 subscales. This instrument offers investigators a reliable and valid means of measuring psychologic distress; however, it has few questions devoted to ejaculatory latency or voluntary control.
Similarly, the GRISS is a 28-item questionnaire designed to assess the existence and severity of sexual problems. It consists of 12 subscales, one of which is rapid ejaculation. The measure has good reliability and satisfactory validity but is more helpful with diagnosis than outcome.
16.5.4 Structured Interviews
Designed by Metz et al., the premature ejaculation severity index (PESI) (Metz M, Pryor J, Nessvacil R, personal communication, 1997) is a 10-item interview scale that offers a severity of distress score. It may be most helpful to clinicians interested in pre- and posttherapy changes. It has limited utility as a research instrument, however, because the validity and reliability of this measure have never been established.
16.5.5 Timing of Ejaculatory Latency by Stopwatch
Use of a stopwatch to time ejaculatory latency from vaginal penetration until ejaculation is a simple, objective, and reproducible outcome measure. The intrusiveness of stopwatch assessment may seem more severe to those unfamiliar with this approach than to subjects who participate in IELT studies. Surprisingly, most couples do not object to being asked to time their lovemaking. In fact, some men report that they enjoy competing with themselves to see if they can improve their ejaculatory latency. What remains unknown is what influence, if any, that asking couples to time lovemaking has on the outcome. Specifically, does timing, increase, decrease, or have no effect on ejaculatory latency? Is it possible that timing oneself serves as an occult treatment intervention?
It is, however, unlikely that physicians would be able to routinely require patients to time intercourse episodes before making a diagnosis of PE. It is too burdensome on the physician and patient. As mentioned previously, patient self-reported ejaculatory latency should suffice for diagnosis and treatment in a non-resea‘rch setting.
16.6 Patient-Reported Outcomes for Premature Ejaculation
Table 16.2 lists the PROs available to identify/diagnose men with PE and PROs for detecting change when treating men with PE. Each measure’s psychometric properties are described as well. Table 16.2 was compiled by reviewing the literature; only those measures in which the reliability and validity are documented have been included.
Table 16.2
Psychometric properties of premature ejaculation measures
Reliability | Validity | Diagnostic tests | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Instrument | Population | Factor analysis | Internal consistency | Test–retest | Convergent-divergent | Known groups | Responsiveness | MID | Sensitivity/ specificity | Positive and negative predictive value | ||
CIPE | n = 169 IELT: mean = 1.6 (SD = 1.2) minutes 61 % lifelong 39 % acquired
Stay updated, free articles. Join our Telegram channelFull access? Get Clinical TreeGet Clinical Tree app for offline access |