Statistics for the Clinical Scientist

Chapter 9 Statistics for the Clinical Scientist




INTRODUCTION


Statistics is the science of (1) designing experiments or studies; (2) collecting, organizing, and summarizing data; (3) analyzing data; and (4) interpreting and communicating the results of the analyses. For medical research studies, the data often consist of measurements taken from human subjects (clinical studies) or from animals.


The essential goal of statistics is to make conclusions (or inferences) about a parameter, such as a mean, proportion, regression coefficient, or correlation. The true value of the parameter can be determined by obtaining the measurements of the entire population to which inferences are to be made. However, this is most often impractical or impossible to do. So, instead we take a subset of the population, called a sample. Then we analyze the sample to estimate or test the parameter value. As an example, suppose we would like to know the mean fetal humeral soft tissue thickness (HSTT) of singleton gestations for women at 20 weeks’ gestation. It would be impractical to (sonographically) measure the fetal HSTT for all such women, so we randomly sample, say, 50 such women. We then estimate the mean fetal HSTT from these 50 women. The mean fetal HSTT from these 50 women (called the sample mean) will undoubtedly not be the same as the (theoretical) mean that we would have obtained had we measured all such women (called the population mean), but it should be “close.” How close is it? Would it be closer if we had randomly sampled 100 women? 500 women? These are the questions that the science of statistics is designed to answer.


The next section outlines two essential characteristics of variables. The remaining sections of this chapter present the basic statistical methods used to accomplish the four goals of statistics defined above.



LEVEL OF MEASUREMENT AND VARIABLE STRUCTURE


Any medical research study consists of a set of variables, such as gestational age, birth weight, body mass index (BMI), gender, Bishop score, blood pressure (BP), cholesterol level, severity of disease, dosage level, type of treatment, or type of operation. For statistical purposes, it is important to characterize each variable in two ways: (1) level of measurement and (2) variable structure.





EXPERIMENTAL/RESEARCH DESIGN


The three core principles of experimental design are (1) randomization, (2) replication, and (3) control or blocking. Random selection of subjects from the study population or random assignment of subjects to treatment groups is necessary to avoid bias in the results. A random sample of size n from a population is a sample whose likelihood of having been selected is the same as any other sample of size n. Such a sample is said to be representative of the population. See any standard statistics text for a discussion of how to randomize.2,3


Replication refers to the measurement of subjects under the same condition. This is necessary to determine how the measurements vary for that condition. So, the measurements of several subjects receiving the same treatment would be considered replications.


Control, or blocking, is a technique to adjust an analysis for another variable (sometimes called a confounding variable or covariate). The variability associated with such a variable is accounted for in the analysis in a way that increases the precision of the comparisons of interest. See the discussion of the randomized complete block design for an example.



Completely Randomized Design (CRD)


In the CRD, subjects are randomly assigned to experimental groups. As an example, suppose that the thickness of the endometrium is observed via ultrasound for women randomly assigned to three different fertility treatments (treatments T1, T2, and T3). Twenty-seven subjects are available for study, so 9 subjects are randomly selected to receive treatment T1, 9 are randomly selected from the remaining 18 to receive treatment T2, and the remaining 9 receive treatment T3. The data layout appears in Table 9-1.



In addition to randomness, an important assumption of the CRD is that the samples are independent (i.e., the outcome measurement—in this case, endometrial thickness—for one subject is not related to that of any other subject). For instance, two related subjects (e.g., sisters) should not both be included in the data set because they are genetically related and hence their endometrial thicknesses may be correlated.


The principles of the CRD are the same for a survey, where subjects form groups in a natural way. For instance, in comparing the mean cholesterol level among first-, second-, and third-trimester pregnant women, a random sample of all pregnant women can be obtained, and then each woman can be naturally classified as being in the first, second, or third trimester. In this case, the sample size for each group will likely not be the same. Alternatively, random samples of 10 first-trimester, 10 second-trimester, and 10 third-trimester women can be obtained (stratified sampling).


Note in these examples that the principles of randomization and replication are used. The following design is used for controlling or blocking.






DESCRIPTIVE STATISTICAL METHODS


After the survey has been taken, the study has been carried out, or the experiment has been performed, then the data are obtained and entered into a spreadsheet or other usable form. After the data entry and data error checking have been completed, the first statistical analyses are performed; these are most often descriptive analyses—organizing and summarizing the data in the sample. Many different statistical methods are used to help organize and summarize the data. The descriptive statistical methods used for continuous variables differ from those used for discrete variables.



Continuous Variables


The principal descriptive features of continuous variables are (1) central location and (2) variability (or dispersion).


An important numerical measure of the centrality of a continuous variable is the mean:



image



where n = number of observations and xi = ith measurement in the sample.


Alternative measures of centrality are the median (middle measurement, or average of the two middle measurements if n is even, in the list of measurements rank-ordered from smallest to largest) and the mode (most frequent measurement).


A simple numerical measure of the variability or dispersion of a continuous variable is the range: the largest measurement minus the smallest measurement. The larger the variability in the measurements, the larger the range will be. A more common numerical measure of variability is the variance:



image



Note that the variance is approximately the average of the squared deviations (or distances) between each measurement and its mean. The more dispersed the measurements are, the larger s2 is. Because s2 is measured in the square of the units of the actual measurements, sometimes the standard deviation is used as the preferred measure of variability:



image



Also of interest for a continuous variable is its distribution (i.e., a representation of the frequency of each measurement or intervals of measurements). There are many forms for the graphical display of the distribution of a continuous variable, such as a stem-and-leaf plot, boxplot, or bar chart. From such a display, the central location, dispersion, and indication of “rare” measurements (low frequency) and “common” measurements (high frequency) can be identified visually.2


It is often of interest to know if two continuous variables are linearly related to one another. The Pearson correlation coefficient, r, is used to determine this. The values of r range from –1 to +1. If r is close to –1, then the two variables have a strong negative correlation (i.e., as the value of one variable goes up, the value of the other tends to go down [consider “number of years in practice after residency” and “risk of errors in surgery”]). If r is close to +1, then the two variables are positively correlated (e.g., “fetal humeral soft tissue thickness” and “gestational age”). If r is close to zero, then the two variables are not linearly related. One set of guidelines for interpreting r in medical studies is: |r| > 0.5 ⇒ “strong linear relationship,” 0.3< |r| ≤ 0.5⇒ “moderate linear relationship,” 0.1 < |r| ≤ 0.3⇒ “weak linear relationship,” and |r| ≤ 0.1⇒ “no linear relationship”.9,10



Discrete Variables


For summarization of a discrete variable, a frequency table is used: for each discrete value of the variable, its frequency (how many times it occurs in the sample of, say, n observations) and relative frequency (frequency/n) are recorded. Of course, the sum of the frequencies over all of the discrete levels of the variable must be n, and the sum of the relative frequencies must be 1.0 (100%).


For two discrete variables the data are summarized in a contingency table. A two-way contingency table is a cross-classification of subjects according to the levels of each of the two discrete variables. An example of a contingency table is given in Table 9-4.



Here, there are a + b + c + d subjects in the sample; there are “a” subjects under the standard treatment who died, “b” subjects under the standard treatment who survived, and so on. If it is desired to know if two discrete variables are associated with one another, then a measure of association must be used. Which measure of association would be appropriate for a given situation depends on whether the variables are nominal or ordinal or a mixture.11,12


In medical research, two of the important measures that characterize the relationship between two discrete variables are the risk ratio and the odds ratio. The risk of an event, p, is simply the probability or likelihood that the event occurs. The odds of an event is defined as p/(1–p) and is an alternative measure of how often an event occurs. The risk ratio (sometimes called relative risk) and odds ratio are simply ratios of the risks and odds, respectively, of an event for two different conditions. In terms of the contingency table given in Table 9-4, these terms are defined as follows.




Note that a risk ratio of 4.0 means that the risk of death under the standard treatment is four times that under the new treatment. An odds ratio of 4.0 means that the odds of death are four times higher under the standard treatment than under the new treatment.


Another important measure in clinical practice is the risk difference, RD, the absolute difference between the risk of death under the standard treatment and the risk of death under the new treatment:



image



The inverse of the risk difference is the number needed to treat, NNT:



image



NNT is an estimate of how many subjects would need to receive the new treatment before there would be one more or less death, as compared to the standard treatment.


For example, in a study by Marcoux and colleagues13 341 women with endometriosis-associated subfertility were randomized into two groups, laparoscopic ablation and laparoscopy alone. The study outcome was “pregnancy > 20 weeks’ gestation.” Of the 172 women in the first group, 50 became pregnant; of the 169 women in the second group, 29 became pregnant (a = 50, b = 122, c = 29, and d = 140). The risk of pregnancy is 0.291 in the first group and 0.172 in the second group; the corresponding odds of pregnancy are 0.410 and 0.207. The risk ratio is 1.69 and the odds ratio is 1.98. The likelihood of pregnancy is 69% higher after ablation than after laparoscopy alone; the odds of pregnancy are about doubled after ablation compared to laparoscopy alone. The risk difference is



image



and number needed to treat is



image



Rounding upward, approximately 9 women must undergo ablation to achieve 1 additional pregnancy.

Stay updated, free articles. Join our Telegram channel

Aug 27, 2016 | Posted by in UROLOGY | Comments Off on Statistics for the Clinical Scientist

Full access? Get Clinical Tree

Get Clinical Tree app for offline access