1 00:00:09,142 --> 00:00:13,079 I'm a research professor at the Duke Clinical Research Institute. 2 00:00:13,079 --> 00:00:18,184 And I've been doing patient-reported outcomes research for about 15 years or so. 3 00:00:18,184 --> 00:00:23,723 And I'm very happy to be invited to speak to you tonight about it. 4 00:00:23,723 --> 00:00:28,428 This is actually a great time to be talking about this because 5 00:00:28,428 --> 00:00:32,766 there are so many exciting developments in patient-centered research these days. 6 00:00:32,766 --> 00:00:34,934 And so, we've got this lecture. 7 00:00:34,934 --> 00:00:39,572 And I think our -- the next lecture will also be about self-report 8 00:00:39,572 --> 00:00:40,707 approaches to research. 9 00:00:40,707 --> 00:00:44,944 So, I'd like to go over a few different things today. 10 00:00:44,944 --> 00:00:49,983 It's a huge, huge field, but I'll try to pick out what I think 11 00:00:49,983 --> 00:00:55,355 are some of the most basic issues that would be useful for everyone to know. 12 00:00:55,355 --> 00:00:58,224 First of all, why measure patient-reported health status? 13 00:00:59,492 --> 00:01:02,796 What are the different types of patient-reported outcomes? 14 00:01:02,796 --> 00:01:06,966 When I say PROs, it's going to be patient-reported outcomes. 15 00:01:06,966 --> 00:01:10,270 How are patient-reported outcome measures developed and evaluated? 16 00:01:10,270 --> 00:01:14,007 Just very briefly, I want to go over that. 17 00:01:14,007 --> 00:01:18,578 We want to talk about some "new" methods for doing patient-reported 18 00:01:18,578 --> 00:01:21,081 outcomes research using item response theory. 19 00:01:21,081 --> 00:01:26,086 And I put new in quotes because it's new for health measurement, 20 00:01:26,086 --> 00:01:29,389 but it's not new for measurement in general. 21 00:01:29,389 --> 00:01:33,226 Education testing folks have been doing this for a long time. 22 00:01:33,226 --> 00:01:37,897 So, we're a little late to the game, but there are some exciting developments. 23 00:01:37,897 --> 00:01:39,899 So, we want to mention those. 24 00:01:39,899 --> 00:01:40,900 And then finally, 25 00:01:40,900 --> 00:01:45,905 I want to say a word about interpreting scores that you'd get out of patient-reported 26 00:01:45,905 --> 00:01:47,073 outcome measures. Okay? 27 00:01:47,073 --> 00:01:49,242 So, let's start with the first. 28 00:01:49,242 --> 00:01:51,811 Why measure patient-reported health status? 29 00:01:51,811 --> 00:01:55,215 Well, let's think very generally about a situation where 30 00:01:55,215 --> 00:01:59,419 we're interested in trying to measure the benefits of some intervention. 31 00:01:59,419 --> 00:02:02,088 And so, how do we assess benefit? 32 00:02:02,088 --> 00:02:07,026 If the intervention is the kind of intervention that's supposed to act on 33 00:02:07,026 --> 00:02:12,365 the body, it's a pill or some procedure, we've got all sorts of measures 34 00:02:12,365 --> 00:02:17,737 that we use to assess the effect of some agent on the body, right? 35 00:02:17,737 --> 00:02:23,910 We've got copies of some virus in the blood, tumor size, blood pressure, peak VO2. 36 00:02:23,910 --> 00:02:28,448 These are all examples of these endpoints we use in studies 37 00:02:28,448 --> 00:02:32,752 that tell us how the body has changed, all right? 38 00:02:32,752 --> 00:02:38,324 But for endpoints to really matter, to have endpoints that will inform decisions, 39 00:02:38,324 --> 00:02:42,862 they've got to be endpoints that are relevant to the stakeholders 40 00:02:42,862 --> 00:02:45,331 here, chiefly patients, clinicians, and payers. 41 00:02:45,665 --> 00:02:51,371 And there are probably even more stakeholders, but those are sort of the big 42 00:02:51,371 --> 00:02:51,971 three. 43 00:02:51,971 --> 00:02:57,844 And it's not always the case that some of those endpoints that we use 44 00:02:57,844 --> 00:03:03,116 to measure the exact effect on the body translate well into day-to-day functioning. 45 00:03:03,116 --> 00:03:05,952 So, let's take peak VO2, all right? 46 00:03:05,952 --> 00:03:12,859 Peak VO2 is a measurement that's used in a lot of trials for COPD or heart failure. 47 00:03:12,859 --> 00:03:18,131 And one might ask, "Well, how does someone's peak VO2 relate to their 48 00:03:18,131 --> 00:03:23,970 -- the extent of physical functioning that they can perform during their day?" Right? 49 00:03:23,970 --> 00:03:26,139 Well, across a few different studies, 50 00:03:26,139 --> 00:03:31,444 we see that it's a -- there's a moderate size correlation in what our group did. 51 00:03:31,444 --> 00:03:35,448 The correlation between peak VO2 and PROMIS Physical Function scores is 0.53. 52 00:03:35,448 --> 00:03:41,087 Now, if you want to get a little geekier on the statistics, you know, that tells you 53 00:03:41,087 --> 00:03:43,756 that there's about 72 percent of the variance 54 00:03:43,756 --> 00:03:46,759 in physical functioning that's not accounted for by peak 55 00:03:55,401 --> 00:03:56,169 VO2, right? 56 00:03:56,169 --> 00:04:01,241 So, it's still the case that the majority of the differences among people 57 00:04:01,241 --> 00:04:06,312 in their day-to-day physical functioning is not explained by their peak VO2, right? 58 00:04:06,312 --> 00:04:10,583 They are used for making an assessment of their physical functioning, 59 00:04:10,583 --> 00:04:17,223 because we know that just measuring peak VO2 is not going to give us the best indication 60 00:04:17,223 --> 00:04:19,158 of how this new treatment 61 00:04:19,158 --> 00:04:24,130 we're testing is actually going to change their day-to-day lives, all right? 62 00:04:24,130 --> 00:04:27,267 Another quick example, overactive bladder syndrome, all right? 63 00:04:27,267 --> 00:04:30,970 So, for a lot of studies on overactive bladder syndrome, 64 00:04:30,970 --> 00:04:36,142 one of the things they'll use to measure volume is pad weight, all right? 65 00:04:36,142 --> 00:04:39,846 They'll have people bring in their pads in Ziplock bags 66 00:04:39,846 --> 00:04:44,284 or whatever, and actually measure the volume that has leaked, all right? 67 00:04:44,284 --> 00:04:48,721 Well, one might ask, "Well, what is a meaningful reduction in volume?" 68 00:04:49,088 --> 00:04:53,059 I might have an intervention that is reducing the volume to some degree. 69 00:04:53,059 --> 00:04:56,429 But for a patient, what constitutes a meaningful reduction in volume? 70 00:04:56,429 --> 00:05:00,700 It might be the case that, you know, "Unless you can reduce my volume 71 00:05:00,700 --> 00:05:05,605 to the point where I don't have to go out wearing a pad, I don't care. 72 00:05:05,605 --> 00:05:08,341 You might be very excited that it's a statistically 73 00:05:08,341 --> 00:05:12,312 significant reduction of 30 percent volume, but it's not particularly relevant to me, 74 00:05:12,312 --> 00:05:16,249 because it's not going to drive what I'm doing so much." Right? 75 00:05:16,249 --> 00:05:18,284 So, it's not always straightforward 76 00:05:18,284 --> 00:05:22,755 how to assess benefit in terms of sort of these harder outcomes. 77 00:05:22,755 --> 00:05:27,193 And we want to turn to actually querying patients to find out 78 00:05:27,193 --> 00:05:31,864 what effect some of these treatments are having on their day-to-day lives. 79 00:05:31,864 --> 00:05:37,236 So, when we talk about treatment benefit, you know, the FDA is quite explicit 80 00:05:37,236 --> 00:05:41,708 in their definition of benefit in terms of feeling, function, and survival. 81 00:05:42,208 --> 00:05:47,146 And when we're talking about feeling and functioning, we're really in the domain 82 00:05:47,146 --> 00:05:51,918 now of trying to figure out what's happening in people's day-to-day lives. 83 00:05:51,918 --> 00:05:57,457 And the best way to assess that oftentimes is through a patient-reported outcome measure. 84 00:05:57,457 --> 00:06:01,260 And we'll stick with the FDA for a second here, 85 00:06:01,260 --> 00:06:04,697 and they've got a very good definition -- whoops. 86 00:06:04,697 --> 00:06:05,832 Sorry about that. 87 00:06:07,467 --> 00:06:08,067 No signal. 88 00:06:08,067 --> 00:06:08,668 I'm sorry. 89 00:06:08,668 --> 00:06:11,070 Oh, here we go. 90 00:06:14,774 --> 00:06:16,275 Hopefully, this will work. 91 00:06:16,275 --> 00:06:18,978 Because if it doesn't, I'll have a very awkward 92 00:06:18,978 --> 00:06:21,080 interpretive dance version of this entire talk. 93 00:06:21,080 --> 00:06:23,483 Okay. Here we go. 94 00:06:24,350 --> 00:06:29,922 So, the FDA defines a patient-reported outcome as "a measurement based on a report 95 00:06:29,922 --> 00:06:34,694 that comes directly from the patient about the status of a patient's 96 00:06:34,694 --> 00:06:36,295 health condition without amendment 97 00:06:36,295 --> 00:06:41,467 or interpretation of the patient's response by a clinician or anyone else." Okay? 98 00:06:41,467 --> 00:06:46,639 And we're talking here about all sorts of things that match this definition. 99 00:06:46,639 --> 00:06:50,610 It could be symptom diaries. It could be multi-item surveys. 100 00:06:50,610 --> 00:06:54,981 It could be in person interviews where responses are being recorded. 101 00:06:55,314 --> 00:07:00,953 The data can be captured electronically, paper and pencil, on a phone, all right? 102 00:07:00,953 --> 00:07:04,390 All these things could be patient-reported outcome measures. 103 00:07:04,390 --> 00:07:11,030 And just to throw up an example of one, here's the NIH PROMIS fatigue short form, 104 00:07:11,030 --> 00:07:16,269 right, where we're asking multiple items to try to get at someone's fatigue. 105 00:07:16,269 --> 00:07:21,107 And the responses to those items will be combined in some way 106 00:07:21,107 --> 00:07:25,545 to arrive at a single score that reflects their self-reported fatigue. 107 00:07:25,778 --> 00:07:27,113 All right? Okay. 108 00:07:27,113 --> 00:07:32,752 So, there we have the sort of, why are we interested in patient-reported outcomes? 109 00:07:32,752 --> 00:07:33,186 Okay? 110 00:07:33,186 --> 00:07:36,789 Let's talk about the different types of patient-reported outcomes. 111 00:07:36,789 --> 00:07:42,495 And we're going to walk through a model that I've adapted from a wonderful article 112 00:07:42,495 --> 00:07:44,030 by Wilson and Cleary. 113 00:07:44,030 --> 00:07:46,699 It was published in JAMA in '95. 114 00:07:46,699 --> 00:07:51,270 Their terminology is a little bit older, but it still is one 115 00:07:51,270 --> 00:07:53,539 that sort of drives our field. 116 00:07:54,540 --> 00:07:59,045 So, we begin with these biological and physiological variables. 117 00:07:59,045 --> 00:08:03,015 And we've talked a bit about these, right? 118 00:08:03,015 --> 00:08:07,487 So, things like CD4 count, HbA1c, or tumor size. 119 00:08:07,487 --> 00:08:11,290 Again, these are things that are measuring how 120 00:08:11,624 --> 00:08:17,430 the state of the body or how the body has changed, right? 121 00:08:17,430 --> 00:08:22,134 Those bodily changes give rise to different symptoms, right? 122 00:08:22,134 --> 00:08:26,873 And so, symptom status like pain intensity, nausea, urinary straining. 123 00:08:27,206 --> 00:08:32,512 Symptom status is the first time we start to see a potential 124 00:08:32,512 --> 00:08:35,181 for a patient-reported outcome here, right? 125 00:08:37,216 --> 00:08:40,686 Changes in symptoms can affect the person's day-to-day functioning. 126 00:08:40,686 --> 00:08:46,526 And so, functional status then is sort of the next step in the causal chain. 127 00:08:46,526 --> 00:08:49,595 It might include things like walking or what 128 00:08:49,595 --> 00:08:54,267 people referred to as activities of daily living, or social interactions, right? 129 00:08:54,267 --> 00:08:59,305 There are all sorts of ways in which the person needs to function 130 00:08:59,305 --> 00:09:03,175 during the day that could be compromised by certain symptoms, 131 00:09:04,210 --> 00:09:04,644 right? 132 00:09:04,644 --> 00:09:08,548 Now, the person's symptoms and their functional status together 133 00:09:08,548 --> 00:09:13,286 might inform how the person sort of views their overall health. 134 00:09:13,286 --> 00:09:18,057 And so, general health perceptions is the next type of patient-reported 135 00:09:18,057 --> 00:09:20,192 outcome in the chain here. 136 00:09:20,192 --> 00:09:23,229 And sometimes we assess that very generally 137 00:09:23,229 --> 00:09:28,000 with a simple question like one from PROMIS or the SF-36. 138 00:09:28,000 --> 00:09:34,206 "In general, would you say your health is excellent, very good, good, fair, poor? 139 00:09:34,206 --> 00:09:35,141 All right? 140 00:09:35,141 --> 00:09:39,345 It doesn't seem like a particularly exciting item, but 141 00:09:39,345 --> 00:09:43,583 it's amazing that responses to this item are independent 142 00:09:43,583 --> 00:09:47,787 predictors of mortality after controlling for other clinical characteristics. 143 00:09:47,787 --> 00:09:52,692 So, it's a good kind of a catchall type of item. 144 00:09:53,159 --> 00:09:54,327 People's overall health 145 00:09:54,327 --> 00:09:59,432 then informs their evaluations of their overall quality of life, right? 146 00:09:59,432 --> 00:10:02,568 And so, we might ask something like, "Overall, 147 00:10:02,568 --> 00:10:07,607 how satisfied are you with your life as a whole these days?" Right? 148 00:10:07,607 --> 00:10:12,678 So, much broader question, and it encompasses much more than just their health. 149 00:10:12,678 --> 00:10:18,117 And so, we've got a lot of these nonmedical factors, financial, you know, socioeconomic 150 00:10:18,117 --> 00:10:21,621 considerations that are informing one's overall quality of life. 151 00:10:21,621 --> 00:10:27,460 But certainly, one's health related quality of life could be a major determinant of that. 152 00:10:27,960 --> 00:10:28,661 Okay? 153 00:10:28,661 --> 00:10:35,701 So, we've got this sort of chain, this causal chain of patient-reported concepts here. 154 00:10:35,701 --> 00:10:39,605 Let's talk a little bit more about these. 155 00:10:39,605 --> 00:10:46,412 One of the things we know is that any of these things, symptom status, 156 00:10:46,412 --> 00:10:51,751 the functional status, my overall health perceptions, any of those things 157 00:10:51,751 --> 00:10:56,122 could be influenced by other characteristics about the person. 158 00:10:56,489 --> 00:10:59,959 So, it's not just what's happened to my body. 159 00:10:59,959 --> 00:11:04,630 It's not just the drug I've got, or the disease I have. 160 00:11:04,630 --> 00:11:07,333 But there are individual characteristics, all right? 161 00:11:07,333 --> 00:11:09,301 Any clinician here knows that 162 00:11:09,301 --> 00:11:13,939 certain patients seem to have personalities that amplify the experience of pain. 163 00:11:13,939 --> 00:11:19,378 And so, you have two patients with the exact same structural or physiological problem 164 00:11:19,378 --> 00:11:22,181 who have very different pain experiences, right? 165 00:11:22,181 --> 00:11:23,883 So, there are characteristics 166 00:11:23,883 --> 00:11:28,554 of the individual that can also influence all these different things. 167 00:11:28,554 --> 00:11:34,093 And I've drawn thicker arrows there going from left to right to indicate 168 00:11:34,093 --> 00:11:37,063 the general understanding that these other characteristics 169 00:11:37,063 --> 00:11:42,168 of the person have an increasing effect as we go out here. 170 00:11:42,168 --> 00:11:48,974 So, the individual state of the body is probably not going to be a strong determinant 171 00:11:48,974 --> 00:11:54,513 of one's overall quality of life as other characteristics of the person, right? 172 00:11:54,547 --> 00:11:59,452 Similarly, not only there are interesting characteristics about the individual 173 00:11:59,452 --> 00:12:00,419 that influence 174 00:12:00,419 --> 00:12:04,824 their symptom status, functional status, their general health perceptions, 175 00:12:04,824 --> 00:12:10,229 but there are also characteristics of the person's environment, all right? 176 00:12:10,229 --> 00:12:15,601 If you think about how much your functional status is affected 177 00:12:15,601 --> 00:12:20,506 by the physical environment you're in, it's quite profound, right? 178 00:12:20,506 --> 00:12:23,442 What types of supports are there? 179 00:12:23,442 --> 00:12:25,144 We've got some right here, right? 180 00:12:25,144 --> 00:12:29,081 If I have a little bit of trouble walking, these rails are wonderful, right? 181 00:12:29,081 --> 00:12:34,153 And I might report when I got up to the top that I don't have that much difficulty 182 00:12:34,153 --> 00:12:36,122 going upstairs in the environment I'm in. 183 00:12:36,122 --> 00:12:40,326 But if you put me in an environment where you didn't have these awesome rails, 184 00:12:40,326 --> 00:12:41,460 I might be reporting 185 00:12:41,460 --> 00:12:43,996 that I have a bit more difficulty walking upstairs, 186 00:12:43,996 --> 00:12:46,999 because there's a parentheses in the environment I'm in, right? 187 00:12:46,999 --> 00:12:53,539 So, the environment can play a large role in the person's experience of their symptoms, 188 00:12:53,539 --> 00:12:57,476 the impact on their functioning, their general health perceptions. 189 00:12:57,476 --> 00:13:02,314 And again, that influence might be increasing as we go from 190 00:13:02,314 --> 00:13:06,352 left to right in the -- these concepts here. 191 00:13:06,352 --> 00:13:06,919 Okay? 192 00:13:06,919 --> 00:13:11,490 I've talked -- I've alluded to this a little bit, 193 00:13:11,490 --> 00:13:14,560 but there's an important point here -- there's actually two important points here. 194 00:13:15,394 --> 00:13:20,232 One of the important points is that we see a dilution of effects 195 00:13:20,232 --> 00:13:24,670 of biological interventions as we go from left to right, all right? 196 00:13:24,670 --> 00:13:30,209 So, if I have an intervention that is to change the state of the body, 197 00:13:30,209 --> 00:13:31,710 it's going to change 198 00:13:31,710 --> 00:13:36,882 some balance of chemicals in the body in order to improve the disease, right? 199 00:13:37,383 --> 00:13:41,587 Or if it's a surgical intervention, it's going to take something out that's 200 00:13:41,587 --> 00:13:44,824 causing a problem, right? Those are changes on the body. 201 00:13:44,824 --> 00:13:48,394 We're going to try to observe the effects of that intervention. 202 00:13:48,394 --> 00:13:53,599 But one of the things implied by this model is that the effect of that intervention 203 00:13:53,599 --> 00:13:57,803 will be seen most acutely in those things that I measured in terms 204 00:13:57,803 --> 00:14:02,074 of their -- of the biology or physiology of the body, right? 205 00:14:02,074 --> 00:14:04,310 Next, it will be symptoms. 206 00:14:04,310 --> 00:14:06,579 Next, it will be functioning. 207 00:14:06,579 --> 00:14:12,885 But because there are a lot of other things, other characteristics of the individual, 208 00:14:12,885 --> 00:14:17,857 other characteristics of their environment that contribute to the symptom experience, 209 00:14:17,857 --> 00:14:18,757 functional status, 210 00:14:18,757 --> 00:14:23,729 general health perceptions, that effect of that biological intervention gets diluted. 211 00:14:23,729 --> 00:14:29,602 And so, it's harder for me to show an effect of my intervention 212 00:14:29,602 --> 00:14:34,006 when I'm measuring things further out here, all right? 213 00:14:34,006 --> 00:14:35,941 It's harder to show an effect 214 00:14:35,941 --> 00:14:40,145 when I'm measuring things further out, because so much more contributes to it. 215 00:14:40,145 --> 00:14:42,715 It's just a lot messier. It's more complex. 216 00:14:42,715 --> 00:14:47,887 At the same time, though, for many patients, those are the things they are interested in. 217 00:14:47,887 --> 00:14:53,058 They want to know how this is going to improve my functional status during the day. 218 00:14:53,425 --> 00:14:56,028 That would be really important to show that. 219 00:14:56,028 --> 00:14:59,965 But we're saying that it's going to probably be a little bit 220 00:14:59,965 --> 00:15:04,536 more difficult to show that than it would be to show symptom improvement, right? 221 00:15:04,536 --> 00:15:08,140 Or improvement in some blood levels of some relevant chemicals, right? 222 00:15:08,140 --> 00:15:08,440 Okay. 223 00:15:08,440 --> 00:15:13,045 So, dilution of effects of biological interventions as we move from left to right. 224 00:15:13,045 --> 00:15:15,314 That's one of the implications of this. 225 00:15:15,648 --> 00:15:19,018 And another one that shouldn't be too surprising 226 00:15:19,018 --> 00:15:22,788 is that the correlation between successive boxes decreases, right? 227 00:15:22,788 --> 00:15:27,393 So, the correlation between the biological and physical -- physiological variables 228 00:15:27,393 --> 00:15:31,997 and symptom status is much higher than the correlation between biological 229 00:15:31,997 --> 00:15:35,768 and physiological variables and general health perceptions, right? Okay? 230 00:15:35,768 --> 00:15:39,571 So, those are also things to be aware of. 231 00:15:39,571 --> 00:15:42,074 And again, and why is that? 232 00:15:42,241 --> 00:15:48,213 It's because the -- all these other things are affecting the chain as we go down. 233 00:15:48,213 --> 00:15:53,052 So, the further out you are, the more of those things are affecting 234 00:15:53,052 --> 00:15:54,186 the boxes differently. 235 00:15:54,186 --> 00:15:54,720 Okay? 236 00:15:54,720 --> 00:15:59,758 So, this is a helpful overall model, I think, to try to characterize 237 00:15:59,758 --> 00:16:04,229 the main types of patient-reported outcomes that we might be interested in, 238 00:16:04,229 --> 00:16:10,202 and to plan our study and really ask, "Well, what do we want to measure here? 239 00:16:10,436 --> 00:16:12,538 Are we interested just in symptoms? 240 00:16:12,538 --> 00:16:15,007 Are we interested in impact on functioning? 241 00:16:15,007 --> 00:16:19,578 Are we interested in how the person thinks about their overall health?" Right? 242 00:16:19,578 --> 00:16:25,884 And we know now that if we are interested in things that are further out to the right, 243 00:16:25,884 --> 00:16:30,456 we might have to work a little bit harder to show that effect. 244 00:16:30,456 --> 00:16:34,693 We might have to do things like, for example, try to measure 245 00:16:34,693 --> 00:16:37,162 some of these characteristics in the environment 246 00:16:37,162 --> 00:16:41,000 and characteristics of the individual and include those in our study 247 00:16:41,000 --> 00:16:46,638 as a way of trying to improve our measurement of our treatment effect, all right? 248 00:16:46,638 --> 00:16:47,172 Okay. 249 00:16:47,172 --> 00:16:47,906 All right. 250 00:16:47,906 --> 00:16:51,677 So, that was one of the types of different PROs. 251 00:16:51,677 --> 00:16:56,181 Let's go through a quick review of developing and evaluating PRO measures. 252 00:16:56,181 --> 00:16:59,785 And I don't want to spend too long on this. 253 00:16:59,785 --> 00:17:04,857 And I know in our next lecture there will be some similar material covered. 254 00:17:04,857 --> 00:17:10,963 But I do want to give you an appreciation of the overall shape of how this goes. 255 00:17:11,130 --> 00:17:12,231 Okay? All right. 256 00:17:12,231 --> 00:17:16,368 So, the first thing we're going to do is to determine 257 00:17:16,368 --> 00:17:21,774 what PRO concept we want to measure and why we want to measure it. 258 00:17:21,774 --> 00:17:25,711 And let's just take the example of fatigue, all right? 259 00:17:25,711 --> 00:17:29,982 So, a bunch of investigators are sitting around, "We think we need 260 00:17:30,382 --> 00:17:32,051 a better fatigue measure." Okay? 261 00:17:32,051 --> 00:17:38,023 Well, when we're all talking about it, we've got an -- a kind of a notion 262 00:17:38,023 --> 00:17:44,396 of what we mean by fatigue and sort of where the boundaries of that concept might be. 263 00:17:44,730 --> 00:17:47,699 But it's still kind of muddy, all right? 264 00:17:47,699 --> 00:17:52,571 But we've got some rough concept of what might constitute fatigue, all right? 265 00:17:52,571 --> 00:17:58,577 But to check ourselves, though, we want to go and we want to talk to patients 266 00:17:58,577 --> 00:18:00,079 and maybe their caregivers 267 00:18:00,079 --> 00:18:05,117 who actually are having experiences that we think would be described as fatigue. 268 00:18:05,117 --> 00:18:09,822 And so, the second general step is to collect some qualitative data 269 00:18:09,822 --> 00:18:12,858 to understand the meaning of this concept, right? 270 00:18:12,858 --> 00:18:15,461 So, might do focus groups or individual interviews. 271 00:18:15,461 --> 00:18:20,032 Talk to people about, you know, "Tell me what it's like to have this. 272 00:18:20,032 --> 00:18:21,333 What's your day like? 273 00:18:21,333 --> 00:18:26,238 What -- is fatigue the same as tired?" Or I'm going to talk about -- 274 00:18:26,238 --> 00:18:31,443 and as we do that, that fuzzy concept we have, it should hopefully become a bit 275 00:18:31,443 --> 00:18:36,982 better defined for us, so that we can say that as a result of talking to folks, 276 00:18:36,982 --> 00:18:41,553 we now have a much clearer picture of this thing we're trying to measure. 277 00:18:41,720 --> 00:18:45,290 And let's try to write now a preliminary definition of fatigue 278 00:18:45,290 --> 00:18:49,561 and what the parts are that we're going to try to measure fatigue. 279 00:18:49,561 --> 00:18:49,962 Okay? 280 00:18:49,962 --> 00:18:54,032 Once we've got that, we're going to need to write some items. 281 00:18:54,032 --> 00:18:55,634 So, that's the next step. 282 00:18:55,634 --> 00:19:00,172 Let's write items that we think are going to measure that concept, all right? 283 00:19:00,172 --> 00:19:04,376 I just put eight items here because they fit nicely on the screen. 284 00:19:04,376 --> 00:19:08,247 But you can write four items, or you can write 400 items. 285 00:19:08,280 --> 00:19:12,017 It really depends on the concept you're measuring, okay? 286 00:19:12,017 --> 00:19:17,356 And one important thing here is that as soon as we start writing 287 00:19:17,356 --> 00:19:20,526 the items, we've got a preliminary measurement model. 288 00:19:20,526 --> 00:19:27,232 It's a preliminary model or a picture of what -- how we think this concept is defined, 289 00:19:27,232 --> 00:19:28,000 all right? 290 00:19:28,000 --> 00:19:33,539 We think that there's this thing, fatigue, that's inside the person that is going 291 00:19:33,539 --> 00:19:39,144 to cause their responses to each of the items we're going to ask them, all right? 292 00:19:39,144 --> 00:19:42,347 Fatigue is going to cause them to respond in certain ways. 293 00:19:42,347 --> 00:19:47,553 If you're more fatigued, you're going to be more likely to say, "I've -- you know, I have 294 00:19:47,553 --> 00:19:52,224 -- I did not have the energy I needed to get things done today." All right? 295 00:19:52,224 --> 00:19:54,393 Okay. So, that's our kind of picture. 296 00:19:54,393 --> 00:19:57,729 And by the way, there are different types of measurement models. 297 00:19:57,930 --> 00:20:01,833 We're not going to get into them, because it's just too mind-numbing 298 00:20:01,833 --> 00:20:03,468 for a late afternoon evening. 299 00:20:03,468 --> 00:20:07,372 But I've presented the most popular type of model we see here. 300 00:20:07,372 --> 00:20:09,007 It's called the unidimensional model 301 00:20:09,007 --> 00:20:13,579 where all the items are thought to be measuring one thing, and that's fatigue. 302 00:20:13,579 --> 00:20:15,847 Okay? So, I've got this preliminary model. 303 00:20:15,847 --> 00:20:18,450 We're going to come back to that, though. 304 00:20:19,151 --> 00:20:19,551 Okay? 305 00:20:19,551 --> 00:20:25,591 So, now I've got these items that I and my other investigator friends wrote, right? 306 00:20:25,591 --> 00:20:29,995 And they're probably awesome items, because we all have PhDs, right? 307 00:20:29,995 --> 00:20:33,832 Absolutely wrong. Some of them will be terrible items. 308 00:20:33,832 --> 00:20:41,273 And so, the next thing we've got to do is to go test these items with the patients 309 00:20:41,273 --> 00:20:45,644 or respondents from the population we are interested in studying. 310 00:20:45,644 --> 00:20:49,481 And we're going to sit down and talk to people about the items. 311 00:20:49,481 --> 00:20:52,451 We're going to ask them how they understand the item. 312 00:20:52,451 --> 00:20:54,820 What do they think the item is asking? 313 00:20:54,820 --> 00:20:57,789 How do they come up with their response? All right? 314 00:20:57,789 --> 00:20:59,891 We want to make sure that people 315 00:20:59,891 --> 00:21:03,428 are thinking about these items the same way we're thinking about them. 316 00:21:03,895 --> 00:21:09,134 And we will definitely, as a result of that, if we do it well, find out 317 00:21:09,134 --> 00:21:13,739 that some of our items are really crummy, and that we might have missed 318 00:21:13,739 --> 00:21:16,708 some aspects of this thing we're trying to measure. 319 00:21:16,708 --> 00:21:19,645 A person says, "Well, okay, I don't really understand 320 00:21:19,645 --> 00:21:24,249 what you're getting at with this question, you know, but I'll tell you that 321 00:21:24,249 --> 00:21:29,254 it reminds me a much better thing to be asking here is blah." All right? 322 00:21:29,254 --> 00:21:32,024 So, we might ideas for a new item. 323 00:21:32,024 --> 00:21:37,596 So, a very important step to be talking with the patients and people with the disease 324 00:21:37,596 --> 00:21:41,767 is to find out how they're understanding the items and make sure 325 00:21:41,767 --> 00:21:46,104 we've got items that are understood by everyone at the same way. 326 00:21:46,104 --> 00:21:50,776 Once we've got that set of items, we can now go and administer 327 00:21:50,776 --> 00:21:54,279 that set of items to a large sample of people. 328 00:21:54,446 --> 00:21:59,251 And we want people who have the diseases of interest, 329 00:21:59,251 --> 00:22:06,458 and we would love to get people who represent the whole range of symptom severity, 330 00:22:06,458 --> 00:22:10,062 all right? The whole range of severity. 331 00:22:10,062 --> 00:22:17,969 So, collecting data on a big set of people allows us then to use psychometric analyses. 332 00:22:17,969 --> 00:22:20,372 And it's just another word 333 00:22:20,372 --> 00:22:25,677 for statistical analyses that are specific to developing and evaluating measures. 334 00:22:26,311 --> 00:22:26,611 Okay? 335 00:22:26,611 --> 00:22:31,817 We're going to use psychometric analyses to see how -- to do a couple of things here. 336 00:22:31,817 --> 00:22:36,421 The first thing is we want to see how well are those items working statistically, 337 00:22:36,421 --> 00:22:40,092 all right? Remember I said we developed that sort of preliminary model? 338 00:22:40,092 --> 00:22:44,396 Well, we've got a whole bunch of analyses we can do to say, "Hey, 339 00:22:44,396 --> 00:22:47,132 how well do those items fit together like that? 340 00:22:47,299 --> 00:22:52,904 How well do the items really seem to be caused by the same thing?" Or, "Are 341 00:22:52,904 --> 00:22:57,476 some of these items really not hanging well with the other ones?" And 342 00:22:57,476 --> 00:23:02,381 it may be that they're measuring something a little bit different than fatigue, right? 343 00:23:02,381 --> 00:23:06,585 We do those psychometric analyses to see whether our items are behaving, 344 00:23:06,585 --> 00:23:11,189 they're working the way we thought they would work to be 345 00:23:11,189 --> 00:23:15,394 telling us something about this common thing called -- we're calling fatigue. 346 00:23:15,927 --> 00:23:19,431 The other reason why we're going to do psychometric analyses 347 00:23:19,431 --> 00:23:25,904 is, once we've got the right model, we want to find out how to get a score, right? 348 00:23:25,904 --> 00:23:30,575 We've got eight items right now, but we would like to figure out 349 00:23:30,575 --> 00:23:33,445 how to estimate a score for each person, 350 00:23:33,445 --> 00:23:38,150 one number that will now represent that person's fatigue for us, okay? 351 00:23:38,150 --> 00:23:38,950 Okay. 352 00:23:38,950 --> 00:23:42,554 And I'm happy to take lots of questions afterward 353 00:23:42,554 --> 00:23:46,992 about more details of the types of analyses that we do. 354 00:23:46,992 --> 00:23:53,432 They're -- they can get a little bit squirrelly, but I do like talking about them. 355 00:23:53,432 --> 00:23:58,437 So, if we have time after, I'm happy to go into that. 356 00:23:58,437 --> 00:24:01,473 Then we're ready for our next step. 357 00:24:01,473 --> 00:24:05,076 We've got a measure that we think measures fatigue. 358 00:24:05,510 --> 00:24:09,948 We've got scores that we can get out of this measure now. 359 00:24:09,948 --> 00:24:15,520 And now, the question is to evaluate the reliability and the validity of this measure. 360 00:24:15,520 --> 00:24:19,591 And by validity here, we mean that the measure is measuring 361 00:24:19,591 --> 00:24:23,328 what it's supposed to measure and nothing else, all right? 362 00:24:23,328 --> 00:24:27,032 So, we're sort of hitting the bull's eye, all right? 363 00:24:27,032 --> 00:24:28,500 In terms of reliability, 364 00:24:28,500 --> 00:24:32,971 we mean here that we're measuring this thing with very little error. 365 00:24:33,638 --> 00:24:36,942 Another word for reliability might be precision here. 366 00:24:36,942 --> 00:24:44,316 And so, in the metaphor of a target, validity is how close you are to the bull's eye, 367 00:24:44,316 --> 00:24:49,020 reliability is how tight a cluster do you have, all right? 368 00:24:49,020 --> 00:24:49,621 Okay. 369 00:24:53,758 --> 00:24:56,061 Now, if you'd get into the patient-reported 370 00:24:56,061 --> 00:25:00,732 outcome literature a bit and start to look at tests -- or I'm sorry, 371 00:25:00,732 --> 00:25:05,403 measures, you'll see that there are a lot of different kinds of validity mentioned. 372 00:25:05,403 --> 00:25:08,073 There are lots of different kinds of reliability. 373 00:25:08,073 --> 00:25:14,713 We don't have time to go into all of them, but I just want to make you aware of this. 374 00:25:14,713 --> 00:25:20,285 So, you might see, for example, reference to content validity, or construct validity, 375 00:25:20,285 --> 00:25:21,152 or responsiveness. 376 00:25:21,152 --> 00:25:25,857 These are all different ways of assessing the degree to which 377 00:25:25,857 --> 00:25:30,795 the measure is measuring what it's supposed to measure, all right? 378 00:25:30,795 --> 00:25:35,734 I'll give you one quick example of something called convergent validity. 379 00:25:35,734 --> 00:25:41,706 The rationale for convergent validity is that if I've got a measure that's supposed 380 00:25:41,706 --> 00:25:47,712 to measure fatigue, and over here is another well-established measure that is thought to 381 00:25:47,712 --> 00:25:53,285 measure fatigue, well, scores on my measure and the established measure should converge. 382 00:25:53,285 --> 00:25:56,288 They should be correlated in different samples. 383 00:25:56,288 --> 00:25:59,190 And so, here are actual data 384 00:25:59,190 --> 00:26:03,962 from the NIH PROMIS depression domain, some measure of depression. 385 00:26:03,962 --> 00:26:09,734 And the distribution of scores is on the histogram at the bottom 386 00:26:09,734 --> 00:26:15,006 there, and off to the left is the distribution for another 387 00:26:15,006 --> 00:26:17,909 really well-known depression measure, the CESD. 388 00:26:17,909 --> 00:26:22,213 And when we noticed we're developing the PROMIS measures, 389 00:26:22,213 --> 00:26:27,018 we looked at the convergence between scores on PROMIS measures 390 00:26:27,018 --> 00:26:31,823 and scores on corresponding measures of the same domains here. 391 00:26:32,023 --> 00:26:36,494 And so, we can see for this one, correlations 0.84, 392 00:26:36,494 --> 00:26:38,697 there's very strong convergence, right? 393 00:26:38,697 --> 00:26:43,168 It gives us more confidence that the PROMIS depression measures 394 00:26:43,168 --> 00:26:48,940 are measuring the same type of depression that the CESD is measuring, right? 395 00:26:48,940 --> 00:26:53,845 Although minor geek point here, if you'll compare the two frequency 396 00:26:53,845 --> 00:26:59,484 distributions, you'll see the CESD exhibits a fairly significant floor effect, we'd say. 397 00:26:59,484 --> 00:27:02,020 Everybody's piled up at the bottom, right? 398 00:27:02,020 --> 00:27:07,092 The CESD is not quite as sensitive for lower levels of depressive symptomatology, right? 399 00:27:07,092 --> 00:27:11,429 So -- whereas the PROMIS depression distribution is much closer to normal, 400 00:27:11,429 --> 00:27:15,767 it does a better job of discriminating among people with lower levels. 401 00:27:15,767 --> 00:27:20,839 So, those are some other fun things we checked out when we're comparing measures. 402 00:27:20,839 --> 00:27:21,206 Okay? 403 00:27:21,206 --> 00:27:25,176 So, that's just an example of a particular kind of validity. 404 00:27:25,343 --> 00:27:29,047 Okay? Let's talk about reliability for a second. 405 00:27:29,047 --> 00:27:33,952 And I'm just going to go through three quick examples of reliability. 406 00:27:33,952 --> 00:27:39,257 The idea here is that if my health state has not changed, right, 407 00:27:39,257 --> 00:27:44,996 if I'm basically in the exact same health, I should get the same score 408 00:27:44,996 --> 00:27:49,501 if you use different sets of items from the same measure. 409 00:27:49,501 --> 00:27:55,640 So, if I've got a 40-item measure, and those items are at a reliable measure, 410 00:27:55,640 --> 00:28:01,746 then any random subset of those items should be giving me the same score, right? 411 00:28:01,746 --> 00:28:03,882 And that's known as internal consistency. 412 00:28:03,882 --> 00:28:08,887 If you see -- ever see Cronbach's alpha, that's a measure of internal consistency 413 00:28:08,887 --> 00:28:09,988 reliability, all right? 414 00:28:09,988 --> 00:28:12,490 This idea that different sets of items 415 00:28:12,490 --> 00:28:17,829 from the same measure would all end up giving me the same score, all right? 416 00:28:17,829 --> 00:28:19,631 If they don't, I mean, 417 00:28:19,631 --> 00:28:24,369 some of the items aren't hanging together with the other ones too well. 418 00:28:24,369 --> 00:28:32,010 If I've not changed, I should get the same score over time on this measure, all right? 419 00:28:32,010 --> 00:28:36,948 If I know my true health state has remained the same 420 00:28:36,948 --> 00:28:43,221 but the scores are bouncing around, that means there are some error there, right? 421 00:28:43,221 --> 00:28:44,556 There's just noise. 422 00:28:44,556 --> 00:28:47,258 And that's known as test-retest reliability. 423 00:28:47,258 --> 00:28:51,730 And then in the special case where we've got measures 424 00:28:51,730 --> 00:28:57,135 that require some raters to assign some ratings to things they observe 425 00:28:57,135 --> 00:29:00,705 or written responses people provide, this doesn't happen 426 00:29:00,705 --> 00:29:05,276 that often, but that kind of reliability is known as interrater reliability. 427 00:29:05,276 --> 00:29:10,415 It means that I'm going to get the same score on this measurement system 428 00:29:10,415 --> 00:29:13,752 regardless of who is scoring the thing, all right? 429 00:29:13,752 --> 00:29:15,954 So, internal consistency, test-retest, and interrater 430 00:29:15,954 --> 00:29:19,991 reliability are three of the big kinds of reliability you see. 431 00:29:19,991 --> 00:29:20,358 Okay?