I'M A RESEARCH PROFESSOR AT THE DUKE CLINICAL RESEARCH INSTITUTE. AND I HAVE BEEN DOING PATIENT REPORTED OUTCOMES, RESEARCH FOR ABOUT 15 YEARS OR SO. AND I'M VERY HAPPY TO BE INVITED TO SPEAK TO YOU TONIGHT ABOUT IT. THIS IS ACTUALLY A GREAT TIME TO BE TALKING ABOUT THIS BECAUSE THERE IS SO MANY EXCITING DEVELOPMENTS IN PATIENT CENTERED RESEARCH THESE DAYS. SO WE'VE GOT THIS LECTURE AND I THINK THE NEXT LECTURE WILL ALSO BE ABOUT SELF-REPORT APPROACHES TO RESEARCH. SO I'D LIKE TO GO OVER A FEW DIFFERENT THINGS. IT'S A HUGE FIELD. I TRIED TO PICK OUT WHAT I THINK ARE SOME OF THE MOST BASIC ISSUES THAT WOULD BE USEFUL FOR EVERY ONE TO KNOW. FIRST OF ALL, WHY MEASURE PATIENT REPORTED HEALTH STATUS? WHAT ARE THE DIFFERENT TYPES OF PATIENT REPORTED OUTCOMES? WHEN I SAY PROs, PATIENT REPORTED OUTCOMES. HOW ARE PATIENT REPORTED OUTCOME MEASURES DEVELOPED AND EVALUATED. WE WANT TO GO OVER THAT. WE WANT TO TALK ABOUT SOME NEW METHODS FOR DOING PATIENT REPORTED OUTCOMES, RESEARCH USING ITEM RESPONSE THEORY. I PUT NEW IN QUOTES. IT'S NEW FOR HEALTH MEASUREMENT BUT NOT FOR MEASUREMENT IN GENERAL. EDUCATION TESTING FOLKS HAVE BEEN DOING THIS FOR A LONG TIME. WE'RE LATE TO THE GAME. THERE IS EXCITING DEVELOPMENTS SO WE WANT TO MENTION THOSE. AND THEN FINALLY I WANT TO SAY A WORD ABOUT INTERPRETING SCORES THAT YOU GET OUT OF PATIENT REPORTED OUTCOME MEASURES. SO LET'S START WITH THE FIRST. WHY MEASURE PATIENT REPORTED HEALTH STATUS? WELL, LET'S THINK VERY GENERALLY ABOUT SITUATION WHERE WE'RE INTERESTED IN TRYING TO MEASURE THE BENEFITS OF SOME INTERVENTION. SO HOW DO WE ASSESS BENEFIT? IF THE INTERVENTION IS THE KIND OF INTERVENTION THAT'S SUPPOSED TO ACT ON THE BODY, A PILL OR SOME PROCEDURE, WE'VE GOT ALL SORTS OF MEASURES THAT WE USE TO ASSESS THE EFFECT OF SOME AGENT ON A BODY. WE'VE GOT COPIES OF SOME VIRUS IN THE PLOD, TUMOR SIZE BLOOD PRESSURE, PEEK VOT. ALL EXAMPLE -- VO2. ALL EXAMPLES OF ENDPOINTS WE USE THAT TELL US HOW THE BODY HAS CHANGEED. BUT FOR ENDPOINTS TO REALLY MATTER, TO HAVE ENDPOINTS THAT WILL INFORM DECISIONS, THEY'VE GOT TO BE ENDPOINTS THAT ARE RELEVANT TO THE STAKEHOLDERS HERE CHIEFLY PATIENTS, CLINICIANS AND PAIRS. AND THERE ARE PROBABLY EVEN MORE STAKEHOLDERS. THOSE ARE THE BIG 3. IT'S NOT ALWAYS THE CASE THAT SOME OF THOSE ENDPOINTS THAT WE USE TO MEASURE THE EXACT EFFECT ON THE BODY TRANSLATE WELL INTO DAY TO DAY FUNCTIONING. PEAK VO2 IS USED IN A LOT OF TRIALS FOR COPD OR HEART FAILURE. AND ONE MIGHT ASK, WELL, HOW DOES SOMEONE'S PEAK VO2 RELATE TO THEIR -- THE EXTENT OF PHYSICAL FUNCTIONING THAT THEY CAN PERFORM DURING THEIR DAY? WELL, ACROSS A FEW DIFFERENT STUDIES YOU SEE IT'S -- THERE IS A MODERATE SIZED CORRELATION IN WHAT OUR GROUP DID. THE CORRELATION BETWEEN PEAK VO2 AND PROMISE PHYSICAL FUNCTION SCORES AS .53. NOW, IF YOU WANT TO GET GEEKIER ON THE STATISTICS, THAT TELLS YOU THAT THERE IS ABOUT 72% OF THE VARIANCE IN PHYSICAL FUNCTIONING THAT'S NOT ACCOUNTED FOR BY PEAK VO2. SO STILL THE CASE THAT THE MAJORITIES OF THE DIFFERENCES AMONG PEOPLE IN THEIR DAY TO DAY PHYSICAL FUNCTIONING IS NOT EXBRAINED BY THEIR PEAK VO2. ARGUES FOR MAKING AN ASSESSMENT OF THEIR PHYSICAL FUNCTIONING. WE KNOW THAT JUST MEASURING PEAK VO2 WON'T GIVE US THE BEST INDICATION HOW THIS NEW TREATMENT WE'RE TESTING IS GOING TO CHANGE THEIR DAY TO DAY LIVES I MIGHT HAVE AN INTERVENTION THIS IS REDUCING THE VOLUME TO SOME DEGREE. FOR A PATIENT WHAT CONSTITUTES A MEAN FOR REDUCTION IN VOLUME? IT MIGHT BE THE CASE THAT UNLESS YOU CAN REDUCE MY VOLUME TO THE POINT WHERE I DON'T HAVE TO GO OUT WEARING A PAD, I DON'T CARE. YOU MIGHT BE VERY EXCITED IT'S A STATISTICALLY SIGNIFICANT REDUCTION OF 30% VOLUME BUT IT'S NOT PARTICULARLY RELEVANT TO ME. IT'S NOT GOING TO DRIVE WHAT I'M DOING SO MUCH. SO IT'S NOT ALWAYS STRAIGHT FORWARD HOW TO ASESSION BENEFIT IN TERMS OFF THESE HARDER OUTCOMES. WE WANT TO TURN TO QUERYING PATIENTS TO FIND OUT WHAT EFFECT THESE TREATMENTS ARE AFGHANISTAN WILL HAVING ON THEIR DAY TO DAY LIVES. WHEN WE TALK ABOUT TREATMENT BENEFIT, THE FDA IS EXPLICIT IN THEIR DEFINITION OF BENEFIT IN TERMS OF FEELING, FUNCTION AND SURVIVAL. WHEN WE'RE TALKING ABOUT FEELING AND FUNCTIONING WE'RE REALLY IN THE DOMAIN NOW OF TRYING TO FIGURE OUT WHAT'S HAPPENING IN PEOPLE'S DAY TO DAY LIVES. THE BEST WAY TO ASSESS THAT OFTEN TIMES IS THROUGH A PATIENT REPORTED OUTCOME MEASURE. AND WE'LL STICK WITH THE FDA FOR A SECOND HERE, AND THEY'VE GOT A VERY GOOD DEFINITION. SORRY ABOUT THAT. NO SIGNAL. I'M SORRY. HERE WE GO. HOPEFULLY THIS WILL WORK. IF IT DOESN'T I HAVE AN AWKWARD INTERPRETIVE DANCE VERSION OF THIS ENTIRE TALK. HERE WE GO. SO THE FDA DEFINES A PATIENT OUTCOME, MEASUREMENT BASED ON A REPORT THAT COMES DIRECTLY FROM THE PATIENT ABOUT THE STATUS OF A PATIENT'S HEALTH CONDITION WITHOUT AMENDMENT OR INTERPRETATION. PATIENT'S RESPONSE BY A CLINICIAN OR ANYONE ELSE. AND WE'RE TALKING HERE ABOUT ALL SORTS OF THINGS THAT MATCH THIS DEFINITION. COULD BE SYMPTOM DIARIES, COULD BE MULTIITEM SURVEYS, COULD BE IN-PERSON INTERVIEWS, RESPONSES ARE BEING RECORDED. THE DATA CAN BE CAPTURED ELECTRONICALLY, PAPER, PENCIL, ON A PHONE. ALL OF THESE THINGS COULD BE PATIENT REPORTED OUTCOME MEASURES. AND JUST TO THROW UP AN EXAMPLE OF ONE, HERE IS THE NIH PROMISE FATIGUE SHORT FORM WHERE WE'RE ASKING MULTIPLE ITEMS TO TRY TO GET AT SOMEONE'S FATIGUE AND THE RESPONSES WILL BE COMBINED IN SOME WAY TO ARRIVE AT A SINGLE SCORE THAT REFLECTS THEIR SELF-REPORTED FATIGUE. OKAY. SO THEN WE HAVE A SORT OF I CAN'T ARE WE INTERESTED IN PATIENT REPORTED OUTCOMES? LET'S TALK ABOUT THE DIFFERENT TYPES OF PATIENT REPORTED OUTCOMES. AND WE'RE GOING TO WALK THROUGH A MODEL THAT I'VE ADAPTED FROM A WONDERFUL ARTICLE BY WILSON AND CLEARLY, PUBLISHED IN JAMA IN '95. TERMINOLOGY IS A LITTLE BIT OLDER BUT IT'S STILL ONE SORT OF DRIVES OUR FIELD. SO WE BEGIN WITH THESE BIOLOGICAL AND PHYSIOLOGICAL VARIABLES. WE'VE TALKED ABOUT THESE. THINGS LIKE CD4 COUNT. HB A1C, TUMOR SIZE. THESE ARE THINGS THAT ARE MEASURING HOW THE STATE. BODY OR HOW THE BODY HAS CHANGEED. THOSE BODILY CHANGES GIVE RISE TO DIFFERENT SYMPTOMS. AND SO SYMPTOM STATUS, LIKE PAIN INTENSITY, NAUSEA, URINARY STRIPPING, SYMPTOM STATUS IS THE FIRST TIME WE START TO SEE A POTENTIAL FOR A PATIENT REPORTED OUTCOME HERE. CHANGES IN SYMPTOMS CAN EFFECT THE PERSON'S DAY TO DAY FUNCTIONING. FUNCTIONAL STATUS SERVES THE NEXT STEP IN THE CAUSAL CHAIN, WALKING OR WHAT PEOPLE REFER TO AS ACTIVITIES OF DAILY LIVING. OR SOCIAL INTERACTIONS. THIS IS ALL SORTS OF WAYS IN WHICH THE PERSON NEEDS TO FUNCTION DURING THE DAY. COULD WE COMPROMISEED BY CERTAIN SYMPTOMS. NOW, THE PERSON'S SYMPTOMS AND THEIR FUNCTIONAL STATUS, TOGETHER, MIGHT INFORM HOW THE PERSON SORT OF VIEWS THEIR OVERALL HEALTH. SO GENERAL HEALTH PERCEPTIONS IS THE NEXT TYPE OF PATIENT REPORTED OUTCOME IN THE CHAIN HERE. AND SOMETIMES WE ASSESS THAT VERY GENERALLY WITH A SIMPLE QUESTION LIKE ONE FROM PROMISE OR THE SF36, IN GENERAL WOULD YOU SAY YOUR HEALTH IS EXAMPLE LENT, VERY G GOOD, FAIR OR POOR. DOESN'T SEEM EXCITING BUT IT'S AAMAZING THAT RESPONSES ARE INDEPENDENT PREDICTORS OF MORTALITY AFTER CONTROLLING FOR OTHER CLINICAL CHARACTERISTICS, SO IT'S A GOOD CAPTURE-ALL TYPE OF ITEM. PEOPLE'S OVERALL HEALTH, THEN, INFORMS THEIR EVALUATIONS OF THEIR OVERALL QUALITY OF LIFE. AND SO WE MIGHT ASK SOMETHING LIKE OVERALL HOW SATISFIED ARE YOU WITH YOUR LIFE AS A WHOLE THESE DAYS? MUCH BROADER QUESTION, ENCOMPASS ING MUCH MORE THAN JUST THEIR HEALTH. WE HAVE A LOFT LOT OF THESE NON MEDICAL FACTORS, FINANCIAL, SOCIOECONOMIC CONSIDERATIONS THAT ARE INFORMING ONE'S OVERALL QUALITY OF LIFE. BUT CERTAINLY ONE'S HEALTH RELATED QUALITY OF LIFE COULD BE A MAJOR DETERMINANT OF THAT. SO WE'VE GOT THIS SORT OF CHAIN, THIS CAUSAL CHAIN OF PATIENT REPORTED CONCEPTS HERE. LET'S TALK A LITTLE BIT MORE ABOUT THESE. ONE OF THE THINGS WE KNOW IS THAT ANY OF THESE THINGS, SYMPTOM, STATUS, THE FUNCTIONAL STATUS, MY OVERALL HEALTH PERCEPTIONS, ANY OF THOSE THINGS COULD BE INFLUENCED BY OTHER CHARACTERISTICS ABOUT THE PERSON. SO IT'S NOT JUST WHAT'S HAPPENED TO MY BODY. IT'S NOT JUST THE DRUG I'VE GOT OR DISEASE I HAVE. BUT THERE ARE INDIVIDUAL CHARACTERISTICS. ANY CLINICIAN HERE KNOWS THAT CERTAIN PATIENTS SEEM TO HAVE PERSONALITIES THAT 5678PLIFY THE EXPERIENCE OF PAIN. SO YOU COULD HAVE TWO PATIENTS WITH THE EXACT SAME STRUCTURAL OR PHYSIOLOGICAL PROBLEM WHO HAVE VERY DIFFERENT PAIN EXPERIENCES. SO THERE ARE -- THERE ARE CHARACTERISTICS OF THE INDIVIDUAL THAT CAN ALSO INFLUENCE ALL THESE DIFFERENT THINGS. AND I'VE DRAWN THINGER ARROWS, GOING FROM LEFT TO RIGHT TO INDICATE THAT GENERAL UNDERSTANDING THAT THESE OTHER CHARACTERISTICS OF THE PERSON HAVE AN INCREASING EFFECT AS WE GO OUT HERE. SO THE INDIVIDUAL STATE OF THE BODY IS PROBABLY NOT GOING TO BE AS STRONG A DETERMINANT AS ONE'S OVERALL QUALITY OF LIFE AS OTHER CHARACTERISTICS OF THE PERSON. SIMILARLY, NOT ONLY ARE THEIR INTERESTING CHARACTERISTICS ABOUT THE INDIVIDUAL THAT INFLUENCE THEIR SYMPTOM STATUS, FUNCTIONAL STATUS, BUT ALSO CHARACTERISTICS OF THE PERSON'S ENVIRONMENT. IF YOU THINK ABOUT HOUR YOUR FUNCTIONAL STATUS IS EFFECTED BY THE PHYSICAL ENTERTAINMENT OF ENVIRONMENT YOU'RE IN, IT'S QUITE PRO FOUND. WHAT TYPES OF SUPPORTS ARE THERE? WE HAVE SOME RIGHT HERE. IF I HAVE TROUBLE WALKING, THESE RAILS ARE WONDERFUL. I MIGHT REPORT WHEN I GOT TO THE TOP THAT I DON'T HAVE THAT MUCH DIFFICULTY GOING UPSTAIRS IN THE ENVIRONMENT I'M IN. IF YOU PUT ME IN AN ENVIRONMENT WHERE YOU DON'T HAVE THESE AWESOME RAILS I MIGHT BE REPORTED, I HAVE MORE DIFFICULTLY WALKING UP STAIRS. THERE IS A PARENTHESES IN THE ENVIRONMENT I'M IN. SO THE ENVIRONMENT CAN PLAY A LARGE ROLE IN THE PERSON'S EXPERIENCE OF THEIR SYMPTOMS, THE IMPACT ON THEIR FUNCTIONING, THE GENERAL HEALTH PERCEPTIONS. AND AGAIN, THAT INFLUENCE MIGHT BE INCREASING AS WE GO FROM LEFT TO RIGHT IN THESE CONCEPTS HERE. I'VE ALLUDED TO THIS A LITTLE BIT. BUT THERE IS AN IMPORTANT POINT HERE, TWO IMPORTANT POINTS HERE. ONE OF THE IMPORTANT POINTS IS THAT WE SEE A DILUTION OF EFFECTS OF BIOLOGICAL INTERVENTIONS AS WE GO FROM LEFT TO RIGHT. IF I HAVE AN INTERVENTION THAT IS TO CHANGE THE STATE OF THE BODY, CHANGE A BALANCE OF CHEMICALS IN THE BODY TO IMPROVE THE DISEASE, OR SURGICAL INTERVENTION. IT WILL TAKE SOMETHING OUT CAUTIONING A PROBLEM. -- CAUSING A PROBLEM. THOSE ARE CHANGES ON THE BODY. WE'RE GOING TO TRY TO OBSERVE THE EFFECTS OF THAT INTERVENTION. BUT ONE OF THE THINGS IMPLIED BY THIS MODEL IS THAT THE EFFECT OF THAT INTERVENTION WILL BE SEEN MOST ACUTELY IN THOSE THINGS THAT I MEASURE IN TERMS OF THE BIOLOGY OR PHYSIOLOGY OF THE BODY. NEXT IT WILL BE SYMPTOMS, NEXT, FUNCTIONING. BECAUSE THERE ARE A LOT OF OTHER THINGS. OTHER CHARACTERISTICS OF THE INDIVIDUAL, OTHER CHARACTERISTICS OF THEIR ENVIRONMENT THAT CONTRIBUTE TO SYMPTOM EXPERIENCE, FUNCTIONAL STATUS, GENERAL HEALTH PERCEPTIONS, THAT EFFECT OF THAT BIOLOGICAL INTERVENTION GETS DILUTED. IT'S HARDER FOR ME TO SHOW AN EFFECT OF MY INTERVENTION WHEN I'M MEASURING THINGS FURTHER OUT HERE. IT'S HARDER TO SHOW AN EFFECT WHEN I'M MEASURING THINGS FURTHER OUT BECAUSE SO MUCH MORE CONTRIBUTES TO IT. IT'S JUST A LOT MESSIER, MORE COMPLEX. AT THE SAME TIME, THOUGH, FOR MANY PATIENTS, THOSE ARE THE THINGS THAT ARE INTERESTED IN. THEY WANT TO KNOW HOW THIS IS GOING TO IMPROVE MY FUNCTIONAL STATUS DURING THE DAY. BE REALLY IMPORTANT TO SHOW THAT. BUT WE'RE SAYING IT'S GOING TO PROBABLY BE A LITTLE BIT MORE DIFFICULT TO SHOW THAT THAN IT WOULD BE TO SHOW SYMPTOM IMPROVEMENT. OR IMPROVEMENT IN SOME BLOOD LEVELS OF SOME RELEVANT CHEMICALS. SO DILUTION OF EFFECTS, BIOLOGICAL INTERVENTIONS AS WE MOVE FROM LEFT TO RIGHT. THAT'S ONE OF THE IMPLICATIONS OF THIS. ANOTHER ONE IS THAT THE CORRELATION BETWEEN SUCCESSIVE BOXES DECREASES. SO THE CORRELATION BETWEEN THE BIOLOGICAL, PHYSIOLOGICAL VARIABLES AND SYMPTOM STATUS IS MUCH HIGHER THAN THE CORRELATION BETWEEN BIOLOGICAL AND PHYSIOLOGICAL VARIABLES AND GENERAL HEALTH PERCEPTIONS. SO THOSE ARE ALSO THINGS TO BE AWARE OF. AGAIN, WHY IS THAT? IT'S BECAUSE THERE -- ALL THESE OTHER THINGS EFFECT THE CHAIN AS WE GO DOWN. THE FURTHER OUT YOU ARE, THE MORE THOSE THINGS ARE AFFECTING THE BOXES DIFFERENTLY. SO THIS IS A HELPFUL OVERALL MODEL, I THINK, TO TRY TO CHARACTERIZE THE MAIN TYPES OF PATIENT REPORTED OUTCOMES WE MIGHT BE INTERESTED IN. AND TO PLAN OR STEADY, AND REALLY ASK, WHAT DO WE WANT TO MEASURE HERE. >> ARE WE INTERESTED JUST IN SYMPTOMS, IN IMPACT ON FUNCTIONING, ARE WE INTERESTED IN HOW THE PERSON THINKS ABOUT THEIR OVERALL HEALTH? AND WE KNOW, NOW, THAT IF WE'RE INTERESTED IN THINGS THAT FURTHER OUT TO THE RIGHT, WE MIGHT HAVE TO WORK A LITTLE BIT HARDER TO SHOW THAT EFFECT. WE MIGHT HAVE TO DO THINGS LIKE, FOR EXAMPLE, TRY TO MEASURE SOME OF THESE CHARACTERISTICS IN THE ENVIRONMENT AND CHARACTERISTICS OF THE INDIVIDUAL AND INCLUDE THOSE IN OUR STUDY. AS A WAY OF TRYING TO IMPROVE OUR MEASUREMENT OF OUR TREATMENT EFFECT. OKAY. ALL RIGHT. SO THAT WAS WHAT ARE THE TYPES OF DIFFERENT PROs. LET'S GO THROUGH A QUICK REVIEW OF THE DEVELOPING AND EVALUATING PRO MEASURES, AND I DON'T WANT TO SPEND TOO LONG ON THIS. AND I KNOW THAT OUR NEXT LECTURE THERE WILL BE SOME SIMILAR MATERIAL COVERED. BUT I DO WANT TO GIVE YOU AN APPRECIATION OF THE OVERALL SHAPE OF HOW THIS GOES. ALL RIGHT. SO THE FIRST THING WE'RE GOING TO DO IS TO DETERMINE WHAT PRO CONCEPT WE WANT TO MEASURE AND WHY WE WANT TO MEASURE IT. AND LET'S JUST TAKE THE EXAMPLE OF FATIGUE. SO A BUNCH OF INVESTIGATORS ARE SITTING AROUND. WE NEED A BETTER FATIGUE MEASURE WELL, WHEN WE'RE ALL TALKING ABOUT IT, WE HAVE GOT KIND OF A NOTION OF WHAT WE MEAN BY FATIGUE. AND SORT OF WHERE THE BOUNDARIES OF THAT CONCEPT MIGHT BE. BUT IT'S STILL KIND OF MUDDY. BUT WE'VE GOT SOME ROUGH CONCEPT OF WHAT MIGHT CONSTITUTE FATIGUE BUT TO CHECK OURSELVES, THOUGH, WE WANT TO GO AND WE WANT TO TALK TO PATIENTS AND MAYBE THEIR CAREGIVERS WHO ARE HAVING EXPERIENCES THAT WE THINK WOULD BE DESCRIBED AS FATIGUE. SO THE SECOND GENERAL STEP IS TO COLLECT SOME QUALITATIVE DATA TO UNDERSTAND THE MEANING OF THIS CONCEPT. SO MIGHT DO FOCUS GROUPS OR INDIVIDUAL INTERVIEWS. TALK TO PEOPLE, TELL ME WHAT IT'S LIKE TO HAVE THIS. WHAT'S YOUR DAY LIKE? WHAT -- IS FATIGUE THE SAME AS TIRED? AND AS WE DO THAT, THAT FUZZY CONCEPT WE HAVE SHOULD HOPEFULLY BE BETTER DEFINED FOR US. SO THAT WE CAN SAY THAT AS A RESULT OF TALKING TO FOLKS, WE NOW HAVE A MUCH CLEARER PICTURE OF THIS THING WE'RE TRYING TO MEASURE. AND LET'S TRY TO RIGHT NOW PRELIMINARY DEFINITION OF FATIGUE AND WHAT THE PARTS ARE. OF WHAT WE'RE GOING TO TRY TO MEASURE OF FATIGUE. ONCE WE'VE GOT, THAT WE'RE GOING TO NEED TO WRITE SOME ITEMS. LET'S WRITE ITEMS THAT WE THINK WILL MEASURE THAT CONCEPT. I JUST PUT 8 ITEMS HERE. THEY FIT NICELY ON THE SCREEN. YOU CAN WRITE 4 ITEMS OR YOU CAN WRITE 400 ITEMS. IT REALLY DEPENDS TON THE CONCEPT YOU'RE MEASURING. AND ONE IMPORTANT THING HERE IS THAT AS SOON AS WE START WRIGHTING THE DEMS, WE HAVE GOT A PRELIMINARY -- WRITING THE ITEMS, WE HAVE A PRELIMINARY MODEL OR PICTURE OF WHAT -- HOW WE THINK THIS CONCEPT IS DEFINED. WE THINK THAT THERE IS THIS THING, FATIGUE, THAT'S INSIDE THE PERSON. THAT IS GOING TO CAUSE THEIR RESPONSES TO EACH OF THE ITEMS WE'RE GOING TO ASK THEM. FATIGUE IS GOING TO CAUSE THEM TO RESPOND IN CERTAIN WAYS. IF YOU'RE MORE FATIGUED YOU'LL BE MORE LIKELY TO SAY I DID NOT HAVE THE ENERGY I NEEDED TO GET THINGS DONE TODAY. PLEA, THERE ARE DIFFERENT FINES -- BY THE WAY, THERE ARE DIFFERENT TYPES OF MEASUREMENT MODELS. WE WON'T GET INTO THEM, IT'S TOO MIND NUMBING FOR LATE AFTERNOON BUT I'VE PRESENTED THE MOST POPULAR ITEM. UNI DIMENSIONAL, ALL THE ITEMS ARE MEASURING ONE THING. THAT'S FATIGUE. SO I HAVE THIS PRELIMINARY MODEL. WE'RE GOING TO COME BACK TO THAT, THOUGH. SO NOW I HAVE ITEMS THAT I AND MY OTHER INVESTIGATOR FRIENDS WROTE. PROBABLY AWESOME ITEMS. WE ALL HAVE Ph.D.s, RIGHT? ABSOLUTELY WRONG. SOME OF THEM WILL BE TERRIBLE ITEMS. THE NEXT THING WE HAVE TO DO IS GO TEST THESE ITEMS WITH THE PATIENTS OR RESPONDENTS FROM THE POPULATION WE ARE INTERESTED IN STUDYING. AND WE'RE GOING TO SIT DOWN AND TALK TO PEOPLE ABOUT THE ITEMS. WE'RE GOING TO ASK THEM HOW THEY UNDERSTAND THE ITEM. WHAT DO YOU THINK THE ITEM IS ASKING? HOW DO THEY COME UP WITH THEIR RESPONSE IS THIS WE WANT TO MAKE SURE THAT PEOPLE ARE THINKING ABOUT THESE ITEMS THE SAME WAY WE'RE THINKING ABOUT THEM. WE WILL DEFINITELY, AS A RESULT OF THAT, IF WE DO IT WELL, FIND OUT THAT SOME OF OUR ITEMS ARE REALLY CRAMY. AND THAT WE MIGHT HAVE MISSED SOME ASPECTS OF THIS THAT WE'RE TRYING TO MEASURE. A PERSON SAYS I DON'T REALLY UNDERSTAND WHAT YOU'RE GETTING AT WITH THIS QUESTION. BUT I'LL TELL YOU, REMINDS ME OF A BETTER THING TO BE ASKING HERE, BLAH. SO A NEW ITEM? SO A VERY IMPORTANT STEP TO BE TALKING WITH THE PATIENTS, PEOPLE WITH THE DISEASE TO FIND OUT HOW THEY'RE UNDERSTANDING THE ITEMS. MAKE SURE WE HAVE ITEMS UNDERSTOOD BY EVERY ONE THE SAME WAY. ONCE WE'VE GOT THAT SET OF ITEMS, WE CAN NOW GO AND MINISTER THAT SET OF ITEMS TO A LARGE SIMPLE OF PEOPLE. -- SAMPLE OF PEOPLE. WE WANT PEOPLE WHO HAVE THE DISEASES OF INTEREST. AND WE WOULD LOVE TO GET PEOPLE WHO REPRESENT THE WHOLE RANGE OF SYMPTOMS SEVERITY. THE WHOLE RANGE OF SEVERITY. SO COLLECTING DATA ALLOWS US, THEN, TO USE PSYCHO METRIC ANALYSIS. THAT'S JUST ANOTHER WORD FOR STATISTICAL ANALYSES THAT ARE SPECIFIC TO DEVELOPING AND EVALUATING MEASURES. TO DO A COUPLAGES HERE. THE FIRST THING, WE WANT TO SEE HOW WELL ARE THOSE ITEMS WORKING STATISTICALLY? REMEMBER, I SAID WE DEVELOP A PRELIMINARY MODEL. WE HAVE A BUNCH OF ANALYSIS WE CAN DO TO SAY HOW WELL DID THOSE ITEMS FIT TOGETHER LIKE THAT? HOW WELL DID THE ITEMS REALLY SEEM TO BE CAUSED BY THE SAMENING OR ARE SOME OF THESE ITEMS REALLY NOT HANGING WELL WITH THE OTHER ONES? AND IT MAY BE THAT THEY ARE AREA MEASURING SOMETHING A LITTLE BIT DIFFERENT THAN FATIGUE. WE DO THOSE PSYCHO METRIC ANALYSES TO SEE WHETHER OUR ITEMS ARE BEHAVING, WORKING THE WAY WE THOUGHT THEY WOULD WORK, TO BE TELLING US SOMETHING ABOUT THIS COMMON THING CALLED -- WE'RE CALLING FATIGUE. THE OTHER REASON WE'RE GOING TO DO PSYCHO METRIC ANALYSIS, ONCE WE HAVE THE RIGHT MODEL WE WANT TO FIND OUT HOW TO GET A SCORE. WE'VE GOT 8 ITEMS RIGHT NOW. WE'D LIKE TO FIGURE OUT A SCORE FOR EACH PERSON. ONE NUMBER THAT ARE REPRESENT THAT PERSON'S FATIGUE FOR US. OKAY. AND I'M HAPPEN TO TAKE LOTS OF QUESTIONS AFTER ABOUT MORE DETAILS OF THE TYPES OF ANALYSIS THAT WE DO. THEY CAN GET A LITTLE BIT SQUIRRELY BUT I LIKE TALKING ABOUT THEM. IF WE HAVE TIME AFTER I'M HAPPY TO GO INTO THAT. THEN WE'RE READY FOR THE NEXT STEP. WE'VE GOT A MEASURE, WE THINK MEASURES FATIGUE. WE HAVE SCORES WE CAN GET OUT OF THIS MEASURE NOW. NOW THE QUESTION IS TO EVALUATE THE RELIABILITY AND VALIDITY OF THIS MEASURE. BY VALIDITY, WE MEAN THAT THE MAYOR IS MEASURING WHAT IT'S SUPPOSED TO MEASURE AND NOTHING ELSE. SO WE'RE SORT OF HITTING THE BULL'S EYE. IN TERMS OF RELIANT, WE MEAN HERE THAT WE'RE MEASURING THIS THING WITH VERY LITTLE ERROR. ANOTHER WORD FOR RELIABILITY MIGHT BE PRECISION HERE. SO -- AND THE METAPHOR OR A TARGET, VALIDITY IS HOW CLOSE YOU ARE TO THE BULL'S EYE. RELIABILITY IS HOW TIGHT A CLUSTER DO YOU HAVE? OKAY. NOW, IF YOU GET INTO THE LITERATURE AND START TO LOOK AT TESTS -- I'M SORRY, MEASURES, YOU'LL SEE THAT THERE ARE A LOT OF DIFFERENT KINDS OF VALIDITY MENTION. THERE ARE LOTS OF KINDS OF RELIANT. WE DON'T HAVE TIME TO GO INTO ALL OF THEM. I JUST WANT TO MAKE YOU AWE WEAR OF THIS. YOU MIGHT SEE, FOR EXAMPLE, REFERENCE TO CONTENT VALIDITY OR CONSTRUCT VALIDITY. RESPONSIVENESS. THESE ARE ALL DIFFERENT WAYS OF ASSESSING THE DEGREE TO WHICH THE MEASURE IS MEASURING WHAT IT'S SUPPOSED SPOKE MEASURE. I'LL GIVE YOU A QUICK EXAMPLE. SOMETHING CALLED CONVERGENT VALIDITY. THE RACIAL IS THAT IF I HAVE A MEASURE THAT'S SUPPOSED TO MEASURE FATIGUE, AND OVER HERE IS ANOTHER WELL ESTABLISHED MEASURE THAT IS THOUGHT TO MEASURE FATIGUE, WELL, SCORES ON MY MEASURE AND THE ESTABLISHED MEASURE SHOULD CONVERGE. THEY SHOULD BE CORRELATED IN DIFFERENT SAMPLES. HEORTOLOGY DATA FROM THE NIH PROBABLE -- PROMISE DEPRESSION DOMAIN, THE DISTRIBUTION OF SCORES IS ON THE HISTOGRAM ON THE BOTTOM THERE. OFF TO THE LEFT IS THE DISTRIBUTION FOR ANOTHER REALLY WELL-KNOWN DEPRESSION MEASURE, THE CESD. AND WHEN WE WERE DEVELOPING THE PROMISE MEASURES, WE LOOKED AT THE CONVERGENCE BETWEEN SCORES ON PROMISE MEASURES AND SCORES ON CORRESPONDING MEASURES OF THE SAME DOMAINS HERE. WE CAN SEE FOR THIS ONE, CORRELATIONS .84, THERE IS VERY STRONG CONVERGENCE. IT GIVES US MORE CONFIDENCE THAT THE PROMISE DEPRESSION MEASURES ARE MEASURING THE SAME TYPE OF DEPRESSION THAT THE CESD IS MEASURING. ALTHOUGH MINOR GEEK POINT HERE. IF YOU COMPARE THE TWO FREQUENCY DISTRIBUTIONS YOU'LL SEE THE C CESD EXHIBITS A FAIRLY SIGNIFICANT FLOOR EFFECT. EVERY ONE IS PILED UP AT THE BOTTOM. THE CESD IS NOT AS SENSITIVE FOR LOWER LEVELS OF DEPRESSIVE SYMPTOMOLOGY. RIGHT? SO THE PROMISE DEPRESSION DISTRIBUTION IS MUCH CLOSER TO NORMAL. IT DOES A BETTER JOB OF DISCRIMINATING AMONG PEOPLE WITH LOWER LEVELS. THOSE ARE SOME OTHER FUN THINGS WE CHECK OUT WHEN WE COMPARE MEASURES. SO THAT'S JUST AN EXAMPLE OF PARTICULAR KIND OF VALIDITY. LET'S TALK ABOUT RELIABILITY FOR A SECOND. I'M GOING THROUGH 3 QUICK EXAMPLES OF RELIABILITY. IF MY HEALTH STATE HAS NOT CHANGED, IF I'M BASICALLY IN THE EXACT SAME HEALTH, I SHOULD GET THE SAME SCORE IF USED DIFFERENT SET OF ITEMS FROM THE SAME MEASURE. IF I EVER A 40 ITEM MEASURE, THOSE ITEMS ARE -- IT'S A RELIABLE MEASURE, THAN ANY RANDOM SUBSET SHOULD GIVE ME THE SAME SCORE. THAT'S INTERNAL CONSISTENCY. IF YOU SEE CHROME BOX ALPHA, THAT'S A PRESSURE OF INTERNAL CONSISTENCY RELIABILITY. THIS IDEA THAT DIFFERENT SETS OF ITEMS FROM THE SAME MEASURE WOULD ALL END UP GIVING ME THE SAME SCORE. IF THEY DON'T, IT MEANS SOME OF THE ITEMS AREN'T HANGING TOGETHER WITH THE OTHER ONES TOO WELL. IF I OF NOT CHANGED, I SHOULD GET THE SAME SCORE OVER TIME ON THIS MEASURE. IF I KNOW MY TRUE HEALTH STATE HAS REMAINED THE SAME BUT THE SCORES ARE BOUNCING AROUND, THAT MEANS THERE IS SOME ERROR THERE. THERE IS JUST NOISE. THAT'S TEST RETEST RELIABILITY. AND THEN A SPECIAL CASE WHERE WE'VE GOT MEASURES THAT REQUIRE SOME RATERS TO ASSIGN SOME RATINGS TO THINGS THEY OBSERVE, OR WRITTEN RESPONSES PEOPLE PROVIDE, IT DOESN'T HAPPEN THAT OFTEN. THAT KIND OF -- THAT KIND OF RELIABILITY IS KNOWN AS INNER RATER RELIABILITY. IT MEANS I'LL GET THE SAME SCORE ON THIS MEASUREMENT SYSTEM REGARDLESS OF WHO IS SCORING THE THING. SO INTERNAL CONSISTENCY, TEST, RETEST, INTERRATER RELIABILITY ARE 3 OF THE BIG KINDS OF RELIABILITY YOU SEE. ALL RIGHT. I TALK FAST. AGAIN, I'VE CAFFEINATED. LET'S DO A QUICK RECAP HERE. TO DETERMINE WHAT CONCEPT WE WANT TO MEASURE AND WHY WE GO AND COLLECT QUALITATIVE DATA FROM PEOPLE TO MAKE SURE THAT WE REALLY HAVE DEFINED THIS THING. ONCE WE'VE GOT IT DEFINED, WE CAN WRITE ITEMS FOR IT. WE'RE GOING TO TAKE THE ITEMS TO THE ACTUAL TYPES OF PEOPLE WHO ARE GOING TO BE TAKING THIS MEASURE. FIND OUT IF THEY UNDERSTAND IT. REVISE THE ITEMS, MAKE SURE REVISIONS WORK WITH NEW PEOPLE. THEM WE'RE GOING TO GIVE THE ITEMS TO A WHOLE BUNCH OF PEOPLE SO THAT WE CAN DO PSYCHO METRIC ANALYSES. AND THE PSYCHO METRIC ANALYSES ALLOW US TO TEST HOW WELL OUR ORIGINAL MODEL FITS THE DATA, WHETHER IT'S A BETTING MODEL WE HAVE TO USE. IT ALLOWS US TO DERIVE A SCORE OR SET OF SCORES, MAYBE. BUT SOME SCORE THAT WE CAN USE. THEN FINALLY, WE CAN TAKE THAT OUT AND TEST IT NOW TO SEE WHAT KIND OF RELIABILITY AND VALIDITY IT EXHIBITS. ALL RIGHT. SO THAT WAS THE DEVELOPMENT AND EVALUATION PROCESS IN A VERY SMALL NUTSHELL. SO I'D LIKE TO NOW TALK ABOUT SOME VERY INTERESTING DEVELOPMENTS THAT HAVE OCCURRED IN HOW WE DEVELOP PATIENT REPORTED OUTCOME MEASURES. SOME OF THE COOL THINGS WE CAN DO WITH THEM NOW. IT'S ESPECIALLY RELEVANT BECAUSE NIH HAS INVESTED A GOOD DEAL IN DEVELOPING MEASURES USING SOME OF THESE NEWER APPROACHES. REMEMBER, I SAID NEW. IT'S NEW FOR HEALTH BUT NOT NEW FOR THE MEASUREMENT WORLD. WE'RE GOING TO BE TALKING ABOUT SOMETHING CALLED ITEM RESPONSE THEORY. WHAT I WANT TO FOCUS ON HERE, WHAT'S DIFFERENT BETWEEN A REGULAR OLD AMERICANER AND -- MEASURE AND ONE USING ITEM RESPONSE THEORY, THEN TELL YOU ONE COOL THING YOU CAN DO. SO LET'S START WITH CONTRASTING DIFFERENT TYPES OF MEASURES. LET'S START WITH HERE IS YOUR BASIC OFF THE SHELF MEASURE. A FACID FATIGUE SCALE DEVELOPED BY DAVE AND COLLEAGUES TO MEASURE FATIGUE. ESPECIALLY IN CANCER POPULATIONS. THIS IS A GREAT EXAMPLE OF YOUR TRADITIONLING OFF THE SHELF MEASUREMENT. I SAY OFF THE SHELF BECAUSE YOU GO AND GRAB IT AND YOU'RE READY TO GO. HE HAVE ONE IS GOING TO COMPLETE THE SAME ITEMS, ON THE FACET FATIGUE. ALL THE ITEMS ARE NECESSARY TO OBTAIN A SCORE. YOU HAVE TO COMPLETE ALL THE ITEMS BECAUSE YOU AVERAGE OR SUM UP ALL THE ITEMS TO GET THE SCORE. AND THE SCORE ON THE FACET FATIGUE MIGHT NOT BE ON THE SAME MERIC AS OTHER MEASURES OF THE SAME THING. FOR EXAMPLE, THE SF36 HAS AN ENERGY DOMAIN, SUPPOSED TO BE MEASURING ABOUT THE SAME THING. BUT THOSE SCORES ARE ON A DIFFERENT METRIC. THE ABSOLUTE VALUE OF THOSE SCORES MEAN A DIFFERENT THING FROM THE FACET MEMBER. IT'S LIKE TWO DIFFERENT THUNDERSTORMSTERS. BOTH -- THERMOMETERS. ONE MEASURES TEMPERATURE. ONE GIVES ONE SET OF NUMBERS, ANOTHER, ANOTHER AND THEY'RE NOT COMPARABLE. THIS IS WHAT WE GET WITH OFF THE SHELF MEASURES. IN CONTRAST, WHEN WE USE ITEM RESPONSE THEORY WE'RE ABLE TO CREATE SOMETHING CALLED AN ITEM BANK. ITEM BANK IS A LARGE CLEANINGS OF ITEMS MEASURING A SINGLE DOMAIN. LIKE FATIGUE. OR PAIN INTENSITY. ANY AND ALL OF THE ITEMS FROM THE BANK CAN BE USED TO PROVIDE A SCORE FOR THAT DOMAIN. SO YOU CAN PLUCK OUT ONE ITEM, ESTIMATE A SCORE ON FATIGUE, OR YOU CAN PLUCK OUT 40 ITEMS AND COMBINE THEM ALL AND ESTIMATE A SCORE ON FATIGUE. THAT MEANS THAT ITEM BANKS CAN BE DYNAMIC, NOT FIXED. WE CAN ADD ITEMS TO THEM OVER TIME. IF WE END UP GETTING BETTER ITEMS. WE CAN FEED IN SOME NEW ITEMS AND GET RID OF ITEMS THAT DON'T SEEM TO BE QUITE AS USEFUL. SO IT'S A DYNAMIC RESOURCE. LET'S LIKE FOR THE DOMAIN OF PHYSICAL FUNCTIONING AS AN EXAMPLE HERE. OKAY. SO WE KNOW THERE IS THIS RANGE OF PHYSICAL FUNCTIONING OUT THERE IN THE WORLD. WE KNOW THAT WE WOULD LIKE TO SYSTEMICALLY MEASURE THAT RANGE. AND SO TO DO THAT, WE WILL CREATE AN ITEM BANK FOR PHYSICAL FUNCTIONING. AND IN THIS CASE, THE ITEM BANK IS A COLLECTION OF ITEMS THAT ALL MEASURE PHYSICAL FUNCTION. AND SO WE'VE GOT EXAMPLES HERE, ARE YOU ABLE TO GET IN AND OUT OF BED. ABLE TO STAND UP WITHOUT LOADING YOUR BALANCE FOR ONE MINUTE. ARE YOU ABLE TO RUN FIVE MILES? ANY ITEM OR SET OF ITEMS FROM THIS BANK COULD BE USED TO GENERATE A SCORE FOR A PERSPECTIVE FOR PHYSICAL FUNCTIONING. ANY OF THEM. WHY IS THAT? IT'S BECAUSE THEY -- WE'VE PUT THESE ITEMS THROUGH THE WRINGER TO MAKE SURE THAT THEY FIT AN ITEM RESPONSE THEORY MODEL, A STATISTICAL MODEL THAT ALLOWS ALL THESE PROPERTIES TO HOLD. AND NOTE HERE, TOO, THAT THESE ITEMS AS I'VE LAID THEM OUT HERE ARE KIND OF A RAID AND ORDER OF SEVERITY. ARE YOU ABLE TO GET IN AND OUT OF BED? THAT'S A KIND OF QUESTION YOU'D ASK OF A PERSON THAT LOOKED LIKE THEY WERE INCREDIBLY WEAK. THAT'S AN ITEM THAT'S GOOD FOR DISCRIMINATING AMONG PEOPLE AT THE LOWER END OF FUNCTIONING. ARE YOU ABLE TO RUN FIVE MILES IS SOMETHING THAT WOULD BE INFORMATIVE TO ASK OF A PERSON WHO IS IN THE HIGHER END. AND THIS TELLS US ANOTHER INTERESTING PROPERTY ABOUT ITEM BANKS, IS THAT WITH ITEM RESPONSE THEORY, NOT ALL ITEMS ARE THE SAME. SOME ITEMS ARE BETTER USED FOR LOWER SEVERITY OR HIGHER SEVERITY. WHEN WE'RE THINKING ABOUT WHICH ITEMS WE MIGHT WANT, WE HAVE TO THINK ABOUT THE POPULATION WE'RE USING. WE HAVE THE OPTION TO TAILOR OUR MEASURES. MORE ABOUT THAT IN A SECOND HERE. SO BACK TO OFF THE SHELVING, IF I USE MY TRADITIONAL OFF THE SHELF MEASURE, I TAKE IT OFF THE SHELF AND I GIVE IT TO THE PATIENT OR SEND A WEB LINK. HOWEVER, IT IS. AND THEY ALL FILL OUT THE SAME MEASURE AND THAT'S IT. SO IT'S EASY TO FIGURE OUT WHAT TO DO. YOU TAKE THE OFF THE SHELF MESH AND ADMINISTRATION IT. -- ADMINISTRATOR IT. IT'S NOT STRAIGHT FORWARD WITH AN ITEM BANK. YOU HAVE SOME DECISIONS. LET'S GO THROUGH THOSE HERE. BASICALLY, I'VE GOT ONE OF TWO MAIN STRATEGIES HERE. A FIXED LENGTH ORCOIZED ADAPTIVITIES. WITH A FIXED LENGTH MEASURE, EVERY ONE IS GOING TO GET THE SAME NUMBER OF ITEMS. IT'S A FIXED LENGTH FOR EVERY ONE. AND I'VE FOOT A COUPLE DIFFERENT OPTIONS THERE. THERE ARE SOME VERY HANDY READ EMADE FIXED LENGTH MEASURES. IF YOU WANT TO USE MEASURES FROM THE NIH PROMISE SYSTEM, YOU CAN GO TO THE AIR SAYSMENT -- ASSESSMENT WEBSITE AND FIND READY MADE SHORT FORMS. THESE ARE FORMS WITH ABOUT 6 TO 10 ITEMS PER DOMAIN THAT ARE GOOD FOR GENERAL POPULATIONS. THEY'RE GENERALLY SENSITIVE TO THE RANGE OF FUNCTIONING THERE THAT THE PROMISE FOLKS HAVE PICKED, SORT OF A GOOD KIND OF OFF THE SHELF VERSION OF PROMISE MEASURES. SO THERE IS READY MADE FIXED LENGTH MEASURES. I COULD ALSO GO IN, IF I HAVE ACHIEVED THE REQUISITE LEVEL OF GEEKNESS, I COULD GO INTO THE SYSTEM AND HE COULD PUT OUT THE ITEMS THAT I WOULD LIKE THAT I THINK ARE GOING TO BE MOST SENSITIVE FOR MY POPULATION. SO I HAVE A TRIAL WHERE I'VE GOT SEVERELY ILL HEART FAILURE PATIENTS, I DON'T WANT TO BE ASKING PEOPLE HOW MUCH TROUBLE THEY HAVE RUNNING FIVE MILES. IT'S A WASTE OF THEIR TIME. I AM AREA GOING TO GO IN AND SELECT THETERNALS THAT ARE GOING TO BE MOST SENSITIVE FOR MY POPULATION AND I'M GOING TO GIVE EVERY ONE IN MY TRIAL THOSE ITEM S. SO I COULD TAILOR THEM AND MAKE MY OWN. THE OTHER WAYS OF USING ITEMS FROM AN ITEM BANK IS THROUGH COMPUTERIZED ADAPTIVE TESTS. A LOT MIGHT BE FAMILIAR WITH COMPUTERIZED ADAPTIVE TESTING. THEY USE IT A LOT FOR EDUCATION, LIKE THE S.A.T.G.R.E. A LOT OF THE CERTIFICATE CERTIFICATION TESTS IN MEDICINE, NURSING, AND GOVERNMENT SAFETY TESTS. HAS THESE BIG BANKS OF ITEMS. YOU SIT DOWN AT THE COMPUTER AND THEY SHOW YOU ONE AT A TIME. SO SAME DEAL HERE. THE IDEA THAT THE NEXT SYSTEM YOU RECEIVE IS DEPENDENT UPON YOUR ANSWER TO PREVIOUS ITEMS. IF I'VE GET A CHANGE OF ITEMS ACROSS THE SPECTRUM OF SEVERITY, A COMPUTERIZED ADAPTIVE TEST OR CAT MIGHT START WITH ONE RIGHT IN THE MIDDLE. AND IF I ASSUME THERE IS A LOT OF DIFFICULTY DOING THAT IT PUTS ME ON THE MORE SEVERE SIDE. THIS GUY SOUNDS MORE SEVERE. LET ME ASK HIM A QUESTION ON THE MORE SEVERE END. I'M GOING TO ANSWER THAT. THAT WILL SAY OH, SEEMS LIKE IT'S AROUND HERE. I'LL ASK A QUESTION HERE. SO IN SOME WAYS IT'S VERY SIMILAR TO THE WAY A CLINICIAN MIGHT BE ASKING QUESTIONS OF A PATIENT. START WITH A GENERAL ONE, AND THEN AS YOU GET THE SENSE OF HOW SEVERE OR MILD THEY ARE, ASK MORE QUESTIONS TO GET A MORE PRECISE PICTURE OF WHAT THAT PERSPECTIVE LOOKS LIKE. BUT THE IDEA WITH A CAT IS THAT IN THEORY, EVERY PERSON COULD GET A DIFFERENT SET OF ITEMS. BECAUSE THE ITEMS THAT YOU GET PULLED OUT OF THAT BIG ITEM BANK ARE THE ONES GETTING US TO THE MOST PRECISE ESTIMATE OF YOUR SCORE AS QUICKLY AS POSSIBLE. AND WE FIND OFTEN THAT YOU CAN GET A VERY PRECISE ESTIMATE OF PEOPLE'S SCORES BY ASKING ONLY 4 QUESTIONS. BUT THEY HAVE TO BE THE RIGHT FOUR QUESTIONS. HOW DO YOU FIGURE OUT WHICH ARE THE RIGHT FOUR QUESTIONS? WELL, YOU HAVE AN ALGORITHM THAT IS DRAWING ITEMS ITEMS FROM THAT ITEM BANK. HAVING AN SYSTEM BANK WITH THESE DIFFERENT OPTIONS ALLOWS A FEW THINGS TO HAPPEN. IMAGINE WE HAVE THIS FATIGUE ITEM BANK HERE. AND IT HAS, SAY, 40 ITEMS IN IT. WELL, I COULD USE ITEMS FROM THAT ITEM BANK IN ALL DIFFERENT SETTINGS. I CAN USE THEM IN DIFFERENT WAYS. A CHEMOTHERAPY TRIAL MIGHT HAVE ITEMS 1-3. FOR ARTHRITIS, IT MIGHT BE USING A CAT, A COMPIZED ADAPTIVE TEST. WHAT'S GREAT IS BECAUSE ALL OF THE ITEMS BEING PULLED FOR ALL THESE STUDIES WERE FROM THE ITEM BANK, THEY AUGHAVE THE SAME METRIC, THE SAME MEANING, THE SCORES ARE COMPARABLE ACROSS ALL THE DIFFERENT STUDIES. THAT CAN HELP FASTITATE META-ANALYSIS AND THE LIKE. RIGHT NOW THERE IS A LOT OF INTEREST IN PRAGMATIC CLINICAL TRIALS OR COMPARATIVE EFFECTIVENESS RESEARCH, WHERE WE TRY TO DO RESEARCH WITHIN HEALTHCARE CITY COUNCILS. I MIGHT HAVE DIFFERENT HEALTHCARE SYSTEMS, EACH USING DIFFERENT MEASURES. WELL, RIGHT NOW, EACH OF THOSE MEASURES RESULTS IN A DIFFERENT METRIC, IT'S REALLY DIFFICULT TO POOL RESULTS THEN. IF, HOWEVER, ALL OF THOSE ITEMS FROM THE DIFFERENT MEASURES COULD BE PLACED WITHIN AN ITEM BANK WHICH MEANS WE DO THE KINDS OF ITEM RESPONSE THEORY ANALYSIS THAT TEST WHETHER THESE GUISE CAN SIT IN THE SAME BANK, NOW WE'RE IN A GREAT POSITION BECAUSE WE CAN GET ONE METRIC OUT OF THAT. EVEN THOUGH DIFFERENT SETS MIGHT USE DIFFERENT SETS OF ITEMS, IF THEY'RE YOU'LL IN THE SAME ITEM BANK, WE HAVE A CROSSWALK GOING. WE CAN GET THE SAME METRIC OUT OF ALL OF THEM. SO A QUICK REVIEW HERE, COMPARING THIS TRADITIONAL PRO MEASURE, THE OFF THE SHELF WITH PRO ITEM BANK, WITH TRADITIONAL ALL ITEMS REQUIRED TO COMPUTE A SCORE. WITH ITEM BANKS, ANY AND ALL SUBSETS OF THE ITEMS CAN GENERATE A SCORE. WITH A TRADITIONAL, EVERYBODY HAS GOT TO TAKE THE SAME ITEMS. WITH A BANK. DIFFERENT PEOPLE COULD GET DIFFERENT ITEMS. WITH THE TRADITIONAL MEASURE, YOU HAVE -- YOU CAN PULL IT OFF THE SHELF. BUT WHEN WE'VE GOT AN ITEM BANK, YOU'VE GOT TO USE ITEMS IN THE BANK TO CREATE MEASURES FOR A SPECIFIC USE. AND WE WENT THROUGH SOME OF THE DIFFERENT OPTIONS THERE. FIXED LENGTH OR CAPS. IF IT'S FIXED YOU CAN GET READY MADE ONES THAT SOMEONE MIGHT HAVE CREATED FOR YOU OR YOU CAN CREATE YOUR OWN CUSTOM ONE. AND FINALLY, TRADITIONAL PRO MEASURES, THE SCORES ARE NOT EASILY COMPARABLE TO OTHER SCORES FROM OTHER MEASURES OF THE SAME THING. WITH AN ITEM BANK, THERE IS MUCH GREATER OPPORTUNITY FOR CROSSWALK BETWEEN SCORES FROM DIFFERENT MEASURES AS LONG AS THE ITEMS FROM THE DIFFERENT MEASURES CAN BE PUT IN THE SAME BANK. ALL RIGHT. I MENTIONED THAT NIH HAS INVESTED IN ITEM BANKS QUITE HEAVILY. AND TO GREAT BENEFIT, I THINK. FEW EXAMPLES HERE, THE BIGGEST IS PROBABLY PROMISE. PATIENT REPORTED OUTCOMES MEASUREMENT INFORMATION SYSTEM. AND OVER ABOUT TEN YEARS OF DEVELOPMENT, PROMISE HAS AMASSED QUITE A FEW EXCELLENT MEASURES OF THE MAIN TYPES OF HEALTH CONCEPTS THAT ARE RELEVANT ACROSS CHRONIC DISEASES FOR ADULTS AND CHILDREN IN ADOLESCENCE. THEY'RE FREELY AVAILABLE. YOU CAN GO TO NIH PROMISE.ORG FOR MORE INFORMATION THERE. NIH SUPPORTED THE NIH TOOL BOX WHICH IS FOR ASSESSING NEUROLOGIC BEHAVIORAL FUNCTION U AGES 3-85. AGAIN, FREELY AVAILABLE. GO TO NIH TOOL BOX.ORG. AND FINALLY, CHRONIC NEUROLOGIC CONDITIONS. IT'S FREE AND YOU CAN GO TO THE WEAPONS ON THE SCREEN. SO YOU -- WEBSITE ON THE SCREEN. CHECK THOSE OUT AND YOU CAN GET MORE INFORMATION ABOUT HOW ITEM BANKS WORK. I SAID THAT THERE WAS A VERY COOL THING YOU CAN DO, A LOT OF COOL THINGS YOU CAN DO WITH ITEM RESPONSE THEORY. I WANTED DO TELL YOU ABOUT A SPECIFIC ONE. IT'S CHALLENGE ITEM FUNCTIONING -- IT'S DIFFERENTIAL ITEM FUNCTIONING. CHECK THIS OUT. IN THE PAST 7 DAYS DID YOU CRY? YES OR NO. THIS IS -- I MADE A YES NO VERSION OF A KIND OF ITEM YOU OFTEN SEE IN DEPRESSION MEASURES AND I THINK THAT WE WOULD ALL AGREE THAT IN GENERAL, RESPONSES TO THIS ITEM MIGHT INFORM US TO SOME DEGREE ABOUT THE LIKELIHOOD THAT A PERSON MIGHT BE DEPRESSED. MORE LIKELY YOU CRIED IN THE LAST WEEK, THE MORE LIKELY YOU ARE TO HAVE BEEN DEPRESSED. THIS HAS SHOWN TO BE THE CASE. IMAGINE THAT WE WANTED TO COMPARE MEN AND WOMEN'S DEPRESSION, THOUGH. WE HAD AN ITEM LIKE THIS, A FEW OTHER SIMILAR ITEMS. WE FOUND THAT MEN AND WOMEN DIFFERED IN THEIR DEPRESSION LEVELS. WHAT WE'D LIKE TO KNOW IS THAT MEN AND WOMEN TRULY DO DIFFER IN UNDERLYING DEPRESSION LEVELS, AND NOT THAT THEY ACTUALLY DIFFER IN THE WAY THE ITEMS WORK FOR THEM. SO HANG ON FOR A SECOND. WE'RE TALKING ABOUT CHALLENGE ITEM FUNCTION. I RESPECT MEANS THE ITEM BEHAVES DIFFERENTLY FOR TWO OR MORE GROUPS. SO CHECK THIS OUT. THIS IS A VERY SIMPLE ITEM RESPONSE THEORY CURVE HERE. AND YOU CAN SEE WE'RE MODELING THE PROBABILITY OF ANSWERING YES TO THAT QUESTION ON THE Y AXIS. AS A FUNCTION OF THE PERSON'S UNDER LYING DEPRESSION ON THE X AXIS. THIS, BY THE WAY, IS WHY IT'S CALLED ITEM RESPONSE THEORY. THEY ARE MODELS OF THE LIKELIHOOD OF RESPONDING WITH A GIVEN ANSWER BASED ON YOUR UNDERLYING LEVEL OF WHATEVER. IN THIS CASE, DEPRESSION. SO YOU CAN SEE HERE AS THE PERSON'S DEPRESSION -- AS DEPRESSION INCREASES, SO DOES THE PROBABILITY THAT THEY'RE GOING TO ANSWER YES TO DID YOU CRY IN THE PAST WEEK. SO WHAT WE'RE SAYING WHEN WE SAY THIS IS INFORMATIVE ABOUT DEPRESSION, WE'RE SAYING WELL, WE KNOW THAT PEOPLE THAT ANSWER YES, THOSE FOLKS ARE MOST LIKELY TO BE ON THE HIGH END OF THE DEPRESSION. SO WE'RE GOING TO MAKE AN INFERENCE THEY'RE ON THE HIGHER END OF THE DEPRESSION. THIS IS A MAP THAT MAPS THE RESPONSES TO THAT ITEM TO THEIR UNDERLYING DEPRESSION. AND SO WE TAKE SOMEONE WHO IS, SAY, A ONE ABOVE THE MEAN OF 0 ON DEPRESSION. THEY HAVE A .9 PROBABILITY OF ANSWERING YES ON THAT. OKAY. SO WE HAVE SAID THE DIFFERENTIAL ITEM FUNCTION IS WHAT HAPPENS WHEN AN ITEM BEHAVES DIFFERENTLY FOR TWO OR MORE GROUPS. ANOTHER WAY TO SAY THAT, THE MAP BETWEEN DEPRESSION AND THE ITEM IS DIFFERENT FOR TWO OR MORE GROUPS. I WAS TALKING ABOUT MEN AND WOMEN WITH THIS CRYING ITEM. LET'S LOOK AT THIS. THIS IS THE MAP COMPUTED SEPARATELY FOR FEMALES AND MALES. WHAT IS IT SHOWING HERE? IF WE'VE GOT A MALE, A MALE WHO WE KNOW BECAUSE WE'VE SUDDENLY FOR JUST A SECOND, IS ONE STANDARD DIFFERENTIATION ABOVE THE MEAN ON DEPRESSION, THE MALES LIKELIHOOD THAT THEY'RE GOING TO ANSWER YES IS .3. 30% CHANCE THEY'LL ANSWER YES, I'VE CRIED IN THE PAST WEEK. A WOMAN ONE STANDARD DEVIATION ABOVE THE MEAN HAS A .9 PROBABILITY OF ANSWERING YES. NOW, WHY? ON AVERAGE, ON AVERAGE, WOMEN WILL RESPOND THAT THEY CRIED IN THE PAST WEEK MORE OFTEN THAN MEN INDEPENDENT OF THEIR UNDERLYING LEVEL OF DEPRESSION. BUT YOU CAN SEE HOW MESSED UP YOU'D GET IF YOU JUST USED ONE VERSION OF THE MAP AND APPLIED IT THE SAME FOR MALES AND FEMALES HERE. SO THIS IS AN EXAMPLE OF DIFFERENTIAL ITEM IF YOU CANNING AND WHEN WE'RE -- ARE THERE GROUPS, ARE THERE GROUPS FOR WHOM THESE ITEMS MIGHT BEHAVIOR DIFFERENTLY? WILL THEY BEHAVE DIFFERENTLY FOR I DIDN'T THINK VERSES OLDER PEOPLE. DIFFERENTLY FOR DIFFERENT CULTURAL GROUPS? AND WE CAN EXPLICITLY TEST FOR THAT USING ITEM RESPONSE THEORY AND CONDUCTING TESTS OF DIFFERENTIAL ITEM FUNKING. IT WILL -- FUNCTIONING. THEN WE'VE GOT A CHOICE. WE CAN CHECK THOSE ITEMS IF WE HAVE OTHER ITEMS THAT ARE JUST AS GOOD, GET RID OF THEM OR SAY YOU KNOW WHAT? IT'S REALLY INFORMATIVE. NOW WE KNOW WE NEED TO INTERPRET IT, SCORE IT DIFFERENTLY. FOR MEN AND WOMEN. STILL USEFUL. WE HAVE 250 SCORE IT DIFFERENTLY. BY THE WAY, WHEN YOU SEE TESTS FOR POTENTIAL RACE BIAS IN ACHIEVEMENT TESTS AND IQ TESTS, WHAT THEY'RE DOING IS DIFFERENTIAL ITEM FUNCTIONING. FOR TWO PEOPLE AT THE EXACT SAME UNDERLYING LEVEL OF ROPING DO THEY HAVE DIFFERENT PROBABILITIES OF GETTING THIS ITEM CORRECT? OR NOT? BECAUSE THERE MAY BE DIFFERENT CULTURAL BACKGROUNDS. DIRGE ITEM IF YOU THINKING IS VERY IMPORTANT, A REALLY TERRIFIC TECHNIQUE WE HAVE OGOT. OKAY. SO THAT WAS OUR QUICK FORAY INTO SOME OF THE THINGS THAT ITEM RESPONSE THEORY HAS BROUGHT US, NAMELY, MEASURES BASED ON ITEM BANKS. AND HOW THEY DIFFERENT FROM YOUR TRADITIONAL OFF THE SHELF MEASURES. AND SOME OTHER REALLY COOL TOOL TOOLS. LET'S MOVE TO THE LAST SECTION THAT I WANTED TO GO OVER WITH YOU, INTERPRETING SCORES ON PRO MEASURES. IF I WERE TOOK A CLASS ON BLOOD PRESSURE HERE, I MAY NOT HAVE TO HAVE THIS SECTION ON WHAT'S GOING TO CONSTITUTE CLINICALLY MEANINGFUL CHANGE IN BLOOD PRESSURE. I KNOW WE'RE ALL -- WE ALL DEBATE ABOUT WHAT IS THE DANGER LEVEL THERE. BUT WHAT WE'RE TALKING ABOUT MORE CLINICAL ENDPOINTS LIKE THAT, MEASURES THAT HAVE BEEN USED IN CLINICAL PRACTICE, HAVE GENERATEED SOME EXPERIENCE THERE AND SOME SENSE OF WHAT CONSTITUTES A BIG CHANGE, SMALL CHANGE, OR NO CHANGE AT ALL WORTH LOOKING AT. WE DON'T HAVE THAT WITH ALMOST AIL PATIENT REPORTED OUTCOME MEASURES. THESE ARE MEASURES DESIGNED PRIMARILY FOR RESEARCH. CLINICIANS DON'T HAVE A LOT OF EXPERIENCE USING THEM. WE HAVEN'T GENERATED THE EXPERIENCES WE NEED IN ORDER 250 UNDERSTAND WHAT MIGHT BE A BIGGER SMALL CHANGE. I WANT TO GO THROUGH A QUICK EXAMPLE. THIS IS A STUDY WE DID PUBLISHED IN JAMA A WHILE BACK. AND THIS WAS THE HF ACTION TRIAL FOR HEART FAILURE. AND COMPARED EXERCISE INTERVENTION WITH USUAL CARE AND THE PRIMARY OUTWAS SURVIVAL. SECOND WAS PHYSICAL FUNCTIONING AND OVERALL QUALITY OF LIFE. HERE I'M GOING TO FOCUS ON CHANGES IN A MEASURE CALLED THE KANSAS CITY CARDIO MY -- KANSAS CITY CARDIOMYOPATHY QUESTIONNAIRE. THE DIFFERENCE BETWEEN THE GROUPS WAS 1.93, 95% CONFIDENCE INTERVAL. THE CONCLUSION THERE, THE EXERCISE ARM HAS A STATISTICALLY GREATER RATE OF CHANGE. 1.93. SO WHAT? WHAT ARE WE TO MAKE OF THIS. HOW DOES THIS PARAMETER RELATE TO DAY TO DAY IF YOU CANNING? IS IT CLINICALLY MEANINGFUL. A POLICY RELEVANT EFFECT THAT HAS BEEN DETECTED? THERE SEVERAL OPTIONS FOR TRYING TO ANSWER THIS QUESTION. AND ACTUALLY, EXCITING WORK DEVELOPING NEW OPTIONS. I WANT TO GIVE YOU ONE QUICK EXAMPLE OF A STRATEGY OF SOME PEOPLE USE TO ANSWER THIS TYPE OF QUESTION. AND THE RATIONALE HERE IS THAT IF I'M NOT SURE WHAT CHANGES AND SCORES ON THIS MEASURE MEAN, ONE THING I CAN DO IS TO SORT OF GET A POINT OF REFERENCE BY LOOKING AT CHANGES ON SOMETHING I DO UNDERSTAND, AND SEEING HOW CHANGES IN MY SCORES MAP TO THE THING I DO UNDERSTAND. IT'S CALLED AN ANCHOR METHOD. I FIND SOME OTHER MEASURE I UNDERSTAND BETTER. AND SEE HOW IT'S RELATED TO MY PATIENT REPORTED OUTCOME SCORES. SO IN THIS CASE, THE DEVELOPER OF THE KANSAS CITY MEASURE, JOHN, HE AND COLLEAGUES PUBLISHED AN ARTICLE. THEY HAD CARDIOLOGISTS MAKE AN ASSESSMENT 6 MONTHS AFTER TREATMENT. CHANGE IN THEIR PATIENTS HEALTH. AND THEY COULD RATE NO CHANGE, OR LARGE, MODERATE OR SMALL DETERIORATION OR SMALL, LARGE, MODERAL IMPROVEMENT. THOSE WERE THE OPTIONS. 7 POINT SCALE SAYING TELL US HOW YOUR PATIENTS CARDIAC HEALTH HAS CHANGED OVER THE LAST 6 MONTHS. AND FOR ALL OF THOSE PATIENTS, THOSE PATIENTS ALL TOOK THE KANSAS CITY QUESTIONNAIRE AT BASELINE, AND WHEN THEY CAME IN FOR A 6 MONTH VISIT. AND SO WHAT WE'RE ABLE TO DO, THEN, WHAT THEY WERE ABLE TO DO, IS TO LOOK AT THE CHANGE IN SCORE BETWEEN BASELINE AND 6 MONTHS. AND LOOK AT ALL THE FOLKS WHO WERE DESCRIBED AS HAVING LARGE DETERIORATION, AND LOOK AT WHAT THEIR MEAN CHANGE WAS. MODERATE DETERIORATION. HERE ARE THE RESULTS OF THAT, THE MEAN CHANGE IS ON THE Y AXIS THERE, WITH 0 RIGHT IN THE MIDDLE. AND DRAW YOUR ATTENTION TO THESE TWO AREAS HERE. THIS IS -- THESE ARE THE MEAN CHANGES IN KANSAS CITY SCORE, THAT CORRESPOND TO TIMES WHEN THE PHYSICIAN SAID THERE WAS A SMALL DETERIORATION. AND PATIENTS FOR WHOM THE PHYSICIAN SAID THERE IS A SMALL IMPROVEMENT. AND COINCIDENTLY, THEY'RE BOTH ABOUT FIVE POINTED CHANGE. ONE IS FIVE POINT IMPROVEMENT, ONE A FIVE POINT DECLINE IN SCORE. DOESN'T ALWAYS WORK OUT THAT WAY. BUT IN THIS CASE, IT DID. KIND OF NICE. SO IN THIS CASE, WE'RE ABLE TO SAY I RESPECT LOOKS LIKE ABOUT A -- ON AVE RAGE, A FIVE POINT CHANGE/BASELINE HE -- FROM BASELINE, SEEMINGS TO CORRESPOND TO CLINICIAN'S PERCEPTION, THIS PERSON HAD SMALL, NOTICEABLE IMPROVEMENT OR DECLINE. DOESN'TMINE FIVE IS A HOLY NUMBER, NO. BUT IT'S JUST A HELPFUL POINT OF REFERENCE. THAT'S ALL WE NEED. SO WE CAN USE THAT IN -- WHEN WE EXPRESS THE RESULTS FROM THIS. WE PLOTTED THE DISTRIBUTION OF CHANGE SCORES, THE CHANGE FROM BASELINE IS ON THE X AXIS. AND THE FREQUENCY AND PERCENT IS ON THE Y AXIS. WE PLOT THAT SEPARATELY FOR THE EXERCISE AND USUAL CARE ARM THERE. AND RECALL THAT THE DIFFERENCE -- THE DIFFERENCE WAS 1.93. KIND OF LIKE THE DIFFERENCE OF THE MEAN OF THOSE DISTRIBUTIONS. THIS IS A LITTLE MORE FORMATIVE, THOUGH. WE CAN DRAW A REFERENCE LINE AT 5. IMPROVEMENT OF 5. AGAIN, MAYBE IT'S 6. WE'RE NOT PERTAINING THIS IS ANYTHING EXACT ABOUTTH. TO HELP US AGAINST A SENSE, WE KNOW THAT A FIVE ON AVERAGE WAS WHEN CLINICIANS SAID I'M NOTICING AN IMPROVEMENT. WE DRAW THAT REFERENCE THERE, WE CAN FIND OUT WHAT% OF PEOPLE IN EACH TRIAL ARM EXCEEDED THAT. IN THIS CASE, IN THE EXERCISE GROUP, 54% OF FOLKS EXCEEDED THAT COMPARED TO 29% OF CONTROL. IT HELPS US JUST GET A SENSE OF WHAT HAPPENED IN THIS STUDY WITH RESPECT TO THESE KANSAS CITY SCORES. AS I SAY, THERE ARE OTHER APPROACHES TO TRY TO CREATE MEANING AROUND THE SCORES YOU GET. THIS IS JUST ONE EXAMPLE OF THOSE. BUT I THINK THE IMPORTANT POINT IS THAT EFFORTS HAVE TO BE MADE WHEN YOU HAVE A NEW MEASURE TO ACTUALLY TRY TO FIGURE OUT WHAT WILL CONSTITUTE A MEANINGFUL CHANGE OR MEANINGFUL DIFFERENCE. OKAY. SO LET'S QUICKLY REVIEW HERE, THIS IS A HIGH SPEED SURVEY OF SOME KEY POINTS IN THE FIELD. WE TALKED ABOUT THE VALUE OF MEASURING PATIENT REPORT HEALTH STATUS. AND THE DIFFERENT TYPES OF PATIENT REPORTED OUTCOMES SUMMARIZED BY THAT WILSON CLEARY MODEL, THAT IDEA OF THE DILUTION OF EFFECTS GOING FROM LEFT TO RIGHT IN THAT MODEL. AND ALL THE OTHER STUFF ABOUT THE INDIVIDUAL AND ABOUT THE ENVIRONMENT THAT CONTRIBUTE TO THE PATIENT REPORTED OUTCOMES THAT WE MIGHT OBSERVE. WE TALKED ABOUT THE DIFFERENT STEPS FOR DEVELOPING AND EVALUATING PATIENT REPORTED OUTCOMES. AND ALSO, WENT INTO DEPTH ABOUT NEW DEVELOPMENTS IN THE FIELD THAT HAVE ARISEN AS A RESULT OF APPLYING ITEM RESPONSE THEORY TO HEALTH MEASUREMENT, AND THE CREATION OF ITEM BANKS. TO ALLOW US TO MORE EFFICIENTLY ASSESS HEALTH. AND THEN FINALLY WE TALKED ABOUT THE IMPORTANCE OF DOING SOMETHING TO HELP US INTERPRET THE CLINICAL OR POLICY RELEVANCE OF CHANGES OR DIFFERENCES IN SCORES ON PATIENT REPORTED OUTCOME MEASURES, BECAUSE THEY'RE STRANGE METRICS, NOT USED IN CLINICAL PRACTICE ROUTINELY RIGHT NOW. WE DON'T 6 THE BODY OF EXPERIENCE THAT ALLOWS PEOPLE TO LOOK AT AND SAY THIS IS ACTUALLY A DIFFERENCE WORTH PAYING FOR. OKAY. I THINK WE'VE GOT SOME TIME FOR QUESTIONS. IF THERE ARE ANY. YES, SIR. [INAUDIBLE QUESTION] WHAT A GREAT POINT. THE QUESTION WAS WHAT ABOUT THE ELEMENT OF TIME IN THE REPORTING? ITEMS USUALLY HAVE A RECALL PERIOD. TIN PAST 7 DAYS OR IN THE PAST MONTH TO. WHAT EXTENT IS THAT A CONCERN THAT THERE MAY BE INACCURACY OF THAT REPORTING. A GREAT POINT WAS RAISED, A POPULATION LIKE ELDERLY, WHERE YOU MIGHT HAVE MEMORY PROBLEMS. MIGHT WE BE ASKING THE QUESTIONS TO REPORT ON SOMETHING THAT'S BEYOND ITS ABILITY. WHAT A GREAT QUESTION. I DIDN'T INCLUDE IT IN THE EVALUATION PROCESS THERE, BUT IT IS INCREASINGLY THE CASE THAT PEOPLE WILL DO A RECALL STUDY. AND THIS HAS BEEN DONE FOR A NUMBER OF THE PROMISE MEASURES. WE CONDUCTED ONE FOR THE PROMISE ACTUAL FUNCTION MEASURES. WHERE YOU'LL HAVE PEOPLE MAKE DAILY RATINGS OF WHATEVER IT IS YOU'RE MEASUREING OVER SAY, A MONTH. AND AT THE SAME TIME, DO A WEEKLY RATING, MAYBE A MONTHLY RATING. COMPLEX DESIGNS YOU CAN COME UP WITH HERE BUT THE IDEA IS TO LOOK AT THE CORRESPOND BETWEENANCE BETWEEN A 7 DAY RECALL AND -- CORDONS BETWEEN 7 DAY RECALL AND AVERAGE DAILY RECALL. THE SHORT ANSWER IS YES, IT'S VERY IMPORTANT. IT CAN BE EVALUATED IMPEERICALLY. THE FDA IS VERY INTERESTED IN EVIDENCE THAT SOME WIN CAN RECORD ACCURATELY. AND THERE IS REALLY INTERESTING RESULTS SHOWING THE TYPES OF RECALL BIAS PEOPLE EXHIBIT. THE LAST FEW DAYS ARE OFTEN MORE INFLUENTIAL. IN DETERMINING THEIR REPORT. THE MOOD THEY'RE IN AT THE TIME THEY TAKE THE MEASURE IS MORE -- MIGHT BE RELATED TO THEIR REPORTS. SO THIS IS AN EMERGING SCIENCE BUT VERY IMPORTANT ASPECT OF THIS. I'M SO GLAD YOU BROUGHT THAT YOU. I DIDN'T INCLUDE THAT. WHAT OTHER COMMENTS OR QUESTIONS DO FOLKS HAVE? YES, SIR. >> [INAUDIBLE QUESTION] THE QUESTION IS HOW DO YOU ESTABLISH AN ABSOLUTE BASELINE WHEN PEOPLE -- PEOPLE SUBJECTIVE SENSE OF WHAT A FIX IS FOR EXAMPLE COULD VARY AMONG PEOPLE. IS THAT A GOOD -- AND -- SO GREAT QUESTION. I MEAN HERE IS THE ESSENTIAL FACT. I WAS GOING TO SAY PROBLEM, IT'S NOT A PROBLEM. IT'S A FEATURE. THE FACT OF HAVING A SUBJECTIVE REPORT AS THE MEASURE. WE COULD GO AND TRY TO SAY, WELL, LET'S GET MEASURES OF YOUR FIBERS BETWEEN THESE 2 PEOPLE WHO BOTH ANSWER TO 6. FIND OUT THAT THE MAG ANY DUDE OF FIRING C FIBERS IS DIFFERENT BETWEEN THE TWO, THAT MIGHT BE INTERESTING. BUT THAT'S EXPENSIVE. I'M NOT SURE WHAT YOU'RE GOING TO DO WITH IT. WHAT A LOT OF PEOPLE WILL DO IN THIS CASE IS TO SAY LET'S DO A LONGITUDINAL DESIGN, EACH PERSON IS THEIR OWN CONTROL. ANOTHER OPTION IS TO SOMETIMES HAVE PEOPLE RATE DIFFERENT REFERENCE STATES, SO IF THEY ARE -- THEY M IGHT BE RATING HOW BAD THEY THINK IT WOULD BE TO BE IN THIS OTHER STATE. YOU CAN COMPARE WHAT PEOPLE SAID THERE. IT GETS A LITTLE BIT WONKY. I THINK IT'S BETTER TO USE PEOPLE AS THEIR OWN CONTROL. IT'S ALSO THE CASE, TOO, THOUGH, THAT SOME OF THOSE DIFFERENCES BETWEEN PEOPLE ARE EXACTLY THE THINGS WE'RE TRYING TO MEASURE. IF I HAVE TWO PEOPLE WHO BODIES FROM THE NECK DOWN ARE IDENTICAL , BUT ONE'S GIVING PAIN INTY RATING OF 4, AND THE OTHER IS SAYING 8, THOSE ARE DIFFERENT EXPERIENCES THAT PEOPLE ARE HAVING. AND SOME OF IT MAY BE BECAUSE THEY USE THE SCALE DIFFERENTLY, BUT A LOT OF IT MAY BE DUE TO THIS -- IT'S A DIFFERENT EXPERIENCE. THAT'S WHY WE'RE ASKING THEM, BECAUSE IT'S THAT EXPERIENCE THAT IS PROBABLY DRIVING BEHAVIOR LIKE SEEKING CARE. OTHER QUESTIONS? YES, MA'AM. >> [INAUDIBLE QUESTION] THE QUICK REVIEW IS CUE WE SEE A LOT OF QUESTIONS FOR MOBILE DEVISES. I THINK THE NEXT LECTURE WILL BE ON DIFFERENT MODES OF ADMINISTRATION. ABSOLUTELY. THE ABILITY TO USE AN iPAD IN THE CLINIC, OR PHONES, THAT PEOPLE ARE OUT WALKING AROUND WITH IS VERY EXCITING, ESPECIALLY TO BE ABLE TO MEASURE REAL TIME THINGS. INSTEAD OF ME ASKING YOU WHAT ARE THE PAST 7 DAYS LIKE? I CAN PING YOU EVERY NOW AND THEN WITH A VERY SHORT THING A FEW TIMINGS A DAY NOW. SO THAT IS ABSOLUTELY OF TREMENDOUS INTEREST AND THERE IS A LOT OF EXCITING APPLICATIONS BEING DEVELOPED TO DO THAT. YES, MA'AM. >> [INAUDIBLE QUESTION] >> THE QUESTION, WHEN WE HAVE A PATIENT REPORTED MEASURE, WHAT'S A PROTOCOL DEVIATION. THERE ABSOLUTELY IS. IF THE -- IF WE'RE REQUIRING THAT THE PERSON COMPLETE ONE ITEM AND THEY HAVE NOT COMPLETED ANY AND THE STUDY COORDINATOR WAS TO MAKE SURE THEY COMPLETED SOME OR THEY ANSWERED THE QUESTION THAT SAYS I'M NOT ANSWERING BECAUSE, THAT MIGHT BE REVIEWED AS A PROTOCOL DEVIATION. AND ANOTHER EXIS SOMETIMES THE TIMING OF WHEN THEY ANSWER THE MEASURES DURING A CLINIC VISIT IS CRITICAL. SO LET'S SAY I'M DOING A HEART FAILURE STUDY. I'VE GOT A GUY GOING THROUGH A STRESS TEST. THEIR ANSWERS TO THEIR PHYSICAL -- HOW IS THEIR PHYSICAL FUNCTION, RIGHTER WHEN THEY COME IN THE CLINIC, WILL LOOK A LOT DIFFERENT THAN AFTER THEY FINISH THE STRESS TEST. THEY'RE NOT FEELING SO HOT. SO IF I'M DESIGNING THAT PROTOCOL, THE THING I'M MEASURING, WHAT WAS THEIR FUNCTIONING LIKE THE WEEK PRIOR TO COMING INTO THE CLINIC. I WANT THEM TO TAKE THE MEASURE RIGHT AWAY. IF WE FIND OUT THAT THAT MEASURE WAS -- THEY FORGOT TO ADMINISTRATOR THE BEGINNING OF THE VISIT, THEY DID THEIR STRESS TEST AND ADMINISTRATOR IT ANNIVERSARY WARD, THAT'S A PROBLEM -- AFTER WARD, THAT'S A PROBLEM. THERE ARE OTHER THINGS YOU MIGHT THINK OF. SO YES, IT'S POSSIBLE TO HAVE PROTOCOLING DEVIATIONS RELATED TO PATIENT REPORTED OUTCOMES. OTHER QUESTIONS OR COMMENTS? OKAY. WELL, YOU'VE BEEN SO ATTENTIVE AT THIS LATE HOUR. I APPRECIATE IT AND ENJOY THE REST OF YOUR EVENING. [APPLAUSE]