Reliability: a description of how well repeated measurements of the same parameter, with the same instrument, agree with one another
Inter-rater reliability: a description of how well measurements of the same parameter, with the same instrument, by different evaluators, agree with one another
Internal Consistency: a description of how well individual components of an instrument measure the same parameter
Responsiveness: a description of how well an instrument detects change in a parameter
Validity: a description of how well an instrument measures the parameter it is intended to measure
Face validity: a description of how intuitive an instrument’s validity appears
Convergent validity: a description of how well an instrument’s measurements agree with the measurements of other instruments designed to measure the same parameter
Construct validity: a description of how well an instrument’s measurements correlate with parameters known to be associated with the parameter in question
Discriminant validity: a description of how well an instrument’s measurements deviate from parameters known not to be associated with the parameter in question
General Health-Related Quality of Life Measures
The most commonly utilized nonspecific, multidimensional measure is the RAND Medical Outcomes Short Form-36 (SF-36). This questionnaire is made up of several domains including: global health perception, mental health, energy and vitality, pain, social function, physical function, physical role and mental role [2]. This is one example of a nonspecific questionnaire that has been utilized to assess the impact of TDS and its treatment. These generic questionnaires typically do not perform as well as testosterone deficiency specific measures [3]. Occasionally, authors try to utilize a single global question to measure overall quality of life. A specific example of this type of measure is the Quality of Life uniscale (QL uniscale), also known as the Spitzer uniscale QL [4]. One-item instruments are rarely useful as they are not accurate enough to capture the gradual changes most patient experience over time. Other questionnaires have multiple domains to assess quality of life. The most frequently used general instruments are summarized in Table 4.2. Alternative general HRQoL multidimensional questionnaires include the Rand Medical Outcomes Short form -12 (SF-12 v1 and SF-12 v2), Sickness Impact Profile, Endicott Quality of Life Enjoyment and Satisfaction Scale, Psychological general well-being index, Pusan National University Hospital Quality of life scoring Scale, Life Satisfaction Scale, Visual analogue mood Scale, and the Likert Scale plus visual analogue Scale. While these may capture the constitutional or cognitive effects of hypogonadism, only three of these Scales include a sexual function domain: Pusan National University Hospital Quality of life scoring system, Life Satisfaction Scale, and Endicott Quality of Life Enjoyment and Satisfaction Scale. However, none of these scales contain both the energy/vitality and sexual function domains.
Table 4.2
General HRQoL measures used to assess testosterone deficiency
Author, year | Title | Domains measured |
---|---|---|
Ware, 1992 | SF-36 | Constitutional, cognitive |
Ware, 1996 | SF-12 | Constitutional, cognitive, physiologic |
Dupuy, 1984 | Psychological general well-being index | Constitutional, cognitive, physiologic |
Bergner, 1981 | Sickness Impact Profile | Constitutional, cognitive |
Edicott, 1981 | Endicott Quality of Life Enjoyment and Satisfaction Scale | Sexual, constitutional, cognitive, physiologic |
Schmidt, 2004 | Visual analogue mood Scale | Constitutional, cognitive |
Haren, 2005 | Likert Scale plus visual analogue Scale | Constitutional, cognitive |
Fugl-Meyer, 1997 | Life Satisfaction Scale | Sexual, cognitive |
Herr, 2000 | QL uniscale | Global assessment of quality of health |
Park, 2003 | Pusan National University Hospital quality of life scoring system | Sexual, constitutional, cognitive, physiologic |
Langham et al. assessed the use of nonspecific tools for following response to treatment of hypogonadism [3]. Of the 14 intervention studies, seven showed a positive impact of treatment in a specific domain over time or between groups. Five studies showed no change and two studies revealed a negative impact of treatment. The generic tools performed poorly when evaluated for clinical face validity. They only performed better than specific tools in regards to incorporating a global rating separate from other domains.
General HRQoL questionnaires are useful to compare across populations or between different disease states. They can also augment the utility of disease-specific measures to make the findings applicable across a wider patient population.
Testosterone Deficiency Specific Questionnaires
Disease-specific questionnaires for TDS are available and improve on validity, relevance, and responsiveness compared to general HRQoL instruments. The instruments and the domains they address are found in Table 4.3. Unfortunately, even these specific questionnaires are imperfect to screen for testosterone deficiency. In one review of several specific questionnaires, there was evidence of greater ability to detect a change after treatment compared to the general HRQoL instruments [3]. Additionally, it was felt that all of these questionnaires were clinically relevant to assess for symptoms related to hypogonadism.
Table 4.3
Testosterone deficiency specific questionnaires
Author, year | Title | Domains measured |
---|---|---|
Heinemann, 1999 | Aging Males’ Symptoms Scale | Sexual, constitutional, cognitive, physiologic |
Morley, 2000 | Androgen Deficiency in Aging Males | Sexual, constitutional, cognitive |
Smith, 2000 | Massachusetts Male Aging Study | Sexual, constitutional, cognitive, physiologic |
Wiltink, 2009 | Hypogonadism-Related Symptom Scale | Sexual, constitutional, cognitive |
Corona, 2009 | Androtest | Sexual, constitutional, cognitive |
McMillan, 2003 | Age-Related Hormone Deficiency-Dependent Quality of Life Questionnaire | Sexual, constitutional, cognitive |
Aging Males’ Symptoms Scale
The most extensively studied specific questionnaire is the Aging Males’ Symptom (AMS) Scale. This Scale is a 17 item questionnaire used to assess patient complaints in three domains: psychological (five items), somatovegetative (seven items), and sexual (five items). A Likert Scale is used for each question with 1 indicating no complaint and 5 noting severe bother. A total score is calculated with ranges available to characterize overall symptoms and likelihood of androgen deficiency. The total score is graded as: 17–26 points representing “no/little symptoms,” 27–36 points representing “mild symptoms,” 37–49 points representing “moderate symptoms,” and 50–85 points representing “severe symptoms.”
The AMS Scale was originally developed in 1999, in Germany [5]. It was created to address the idea that men may experience events analogous to female menopause. Over 200 variables were assessed using factorial analysis to identify complaints and domains specifically related to aging. The goals of the symptom Scale development were to assess symptoms of aging (independent from disease-related symptoms) in groups of men under different conditions, to evaluate the severity of symptoms over time, and to measure changes pre- and post-androgen replacement. Initial testing was performed on men over 40 years of age. At its inception, the AMS Scale was not intended to be a screening tool for androgen deficiency, although it was intended to measure treatment effect. Normal standardized scores have only been published for German and Japanese patients. The Scale has been used internationally and translated into many other languages including English, Dutch, French, Spanish, Portuguese, Italian, Swedish, Korean, and Indonesian.
The AMS Scale has good reliability; with a Cronbach’s Alpha measure of internal consistency between 0.7 and 0.9 across countries, time periods, and domain subscores [6]. This was determined from a meta-analysis of studies performed in Germany, the United Kingdom, Spain, Portugal, Italy, France, Sweden, Thailand, and Korea. Small patient numbers assessed in some countries limits the reliability determination. Validity was assessed as well, revealing the ability to discriminate between degrees of hypogonadism [6].
Several studies have used the AMS to evaluate symptom response to androgen replacement. The AMS scores for patients mirror those results determined by physician evaluation of treatment efficacy [7]. In a study of nearly 900 men, the average total score before treatment was 45.3 ± 13.2 compared to 29.9 ± 9.1 after treatment, for men above 50 years of age. Similar results were seen for men less than 50 years of age. Prior to therapy, only 5.7% of patients reported “no/little symptom” scores, but this increased to 41.8% after treatment with injectable testosterone enanthate. Unfortunately, the only randomized, double-blind, placebo controlled study to assess testosterone replacement therapy in men with laboratory proven androgen deficiency revealed no relationship between the AMS score and free testosterone level [8]. This study also did not find any statistically significant difference between placebo and treatment group AMS scores after 6 months of testosterone replacement.
Additional studies evaluating the relationship between total AMS score and serum testosterone levels document variable results. Several abstracts and articles have shown a positive correlation between total or free testosterone and the total AMS score [9, 10]. While AMS score alone was shown to be predictive of testosterone levels, the addition of body mass index (BMI) and age further improved predictive ability. In contrast, others studies were unable to reveal the same finding and suggest testosterone is not related to total AMS score [11–13].
A number of factors exist which potentially explain these conflicting results. The method of hormone level determination and patient demographics may influence study results [14]. The AMS Scale has also been related to lower household income and major depressive disorder [15]. Although the Scale is criticized for being too long and requiring too much time to complete [16], it has shown the ability to be used over the phone or in person making it useful for research studies. Currently, the AMS Scale should be viewed as an ancillary tool to track symptom severity and change in symptoms over time in patients. It has limited utility in screening patients for androgen deficiency, but this weakness is mitigated when patient AMS score is combined with BMI and age (Appendix 1).
Androgen Deficiency in Aging Males
The Saint Louis University Androgen Deficiency in Aging Males (ADAM) questionnaire is composed of ten “yes or no” questions regarding patient symptoms related to sexual function, mood, and energy level. A positive score answers “yes” to questions 1 or 7; otherwise a “yes” answer is needed on any three other questions [14]. The Scale was initially developed by identifying ten symptoms common to patients with low bioavailable testosterone. The ADAM questionnaire has been translated and validated in Chinese and Arabic. Several other nonvalidated translations are also available.
Initial results of the ADAM questionnaire reported an 88% sensitivity and 60% specificity [17]. Later studies have supported the 80–90% sensitivity but reported lower specificity of 19–36% [18–22]. One study found age and diabetes mellitus to be correlated with a positive ADAM questionnaire independent of a low bioavailable testosterone level [18]. The internal reliability is adequate with a Cronbach’s alpha of 0.71–0.74 [19, 20].
Similarly to the AMS, correlating the ADAM questionnaire with testosterone levels is problematic. Results are conflicting with several publications documenting the ADAM questionnaire to be a sensitive indicator of low bioavailable testosterone [18–22]. The two studies have noted no correlation between serum bioavailable testosterone and a positive ADAM questionnaire [23, 24]. The first question (“Do you have a decrease in libido?”) has been shown to be more specific than other individual questions or the total score. One study supported the idea that men with a free testosterone level less than 70 ng/dL will have at least one symptom from each domain (sexual, energy, mood) [22].
There have been several modifications of the ADAM questionnaire. The most frequent modification is to calculate a cumulative score based on the total number of “yes” responses, with each “yes” response representing one point. This has shown mixed results with one positive and one negative study [22, 23]. A short version of the questionnaire consisting of six items has been shown to increase specificity in a Chinese patient population [20]. A more recent variation of the ADAM questionnaire applies a Likert Scale of 1–5 for each question [25]. Unlike the AMS, a score of 1 indicates severe symptoms, and a score of 5 indicates minimal to no trouble with each item. Therefore, a score of 10 represents the most symptomatic, while a score of 50 is least symptomatic. This quantitative version of the ADAM questionnaire was tested in a population of 57 men scheduled to undergo radical prostatectomy. It was found to correlate positively with both serum testosterone values and the sexual health inventory for men questionnaire.