1 00:00:05,138 --> 00:00:08,508 >> GOOD MORNING, EVERYONE. 2 00:00:08,575 --> 00:00:10,710 WELCOME TO OUR CTU STATISTICS 3 00:00:10,777 --> 00:00:12,946 LECTURE. 4 00:00:13,012 --> 00:00:16,015 MY NAME IS TIANXIA WU. 5 00:00:16,082 --> 00:00:16,683 I WILL PRESENT LOGISTIC 6 00:00:16,750 --> 00:00:18,685 REGRESSION ANALYSIS. 7 00:00:18,752 --> 00:00:22,555 FIRST I WILL INTRODUCE 8 00:00:22,622 --> 00:00:24,457 REGRESSION ANALYSIS, ODDS AND 9 00:00:24,524 --> 00:00:25,225 ODDS RATIO AND LOGISTIC 10 00:00:25,291 --> 00:00:26,493 REGRESSION MODEL. 11 00:00:26,559 --> 00:00:29,629 THEN WE WILL DISCUSS THE 12 00:00:29,696 --> 00:00:30,497 ASSUMPTIONS, MODEL EVALUATION 13 00:00:30,563 --> 00:00:32,065 AND DIAGNOSTICS. 14 00:00:32,132 --> 00:00:36,503 AND LATER I WILL PRESENT TWO 15 00:00:36,569 --> 00:00:37,103 EXAMPLES OF LOGISTICAL 16 00:00:37,170 --> 00:00:40,173 REGRESSION ANALYSIS. 17 00:00:40,240 --> 00:00:43,676 THE MAIN REFERENCES OF THIS 18 00:00:43,743 --> 00:00:46,613 LECTURE ARE DR. ALISON'S BOOK 19 00:00:46,679 --> 00:00:50,817 LOGISTIC REGRESSION USING SAS 20 00:00:50,884 --> 00:00:54,387 AND SA/STAT 14.2 USERS GUIDE. 21 00:00:54,454 --> 00:00:58,158 IN THE PREVIOUS LECTURE, GINA 22 00:00:58,224 --> 00:00:59,526 INTRODUCED LINEAR REGRESSION. 23 00:00:59,592 --> 00:01:01,661 WHAT IS THE DIFFERENCE BETWEEN 24 00:01:01,728 --> 00:01:03,363 LINEAR AND LOGISTIC REGRESSION 25 00:01:03,430 --> 00:01:05,398 AND WHAT IS IT USED FOR? 26 00:01:05,465 --> 00:01:09,035 TO ANSWER THESE QUESTIONS WE 27 00:01:09,102 --> 00:01:11,438 NEED TO START FROM THE TYPE. 28 00:01:11,504 --> 00:01:12,772 BASED ON THE MEASUREMENTS THERE 29 00:01:12,839 --> 00:01:15,809 ARE TWO TYPES OF VARIABLES. 30 00:01:15,875 --> 00:01:16,376 ONE IS CALLED QUANTITATIVE 31 00:01:16,443 --> 00:01:19,446 VARIABLE. 32 00:01:19,512 --> 00:01:20,613 A QUANTITATIVE VARIABLE CAN BE 33 00:01:20,680 --> 00:01:23,283 CONTINUOUS. 34 00:01:23,349 --> 00:01:27,120 TAKE ANY VALUE IN AN INTERVAL. 35 00:01:27,187 --> 00:01:29,823 FOR EXAMPLE, HEIGHT AND WEIGHT. 36 00:01:29,889 --> 00:01:32,826 A QUANTITATIVE VARIABLE CAN BE 37 00:01:32,892 --> 00:01:33,259 DISCRETE. 38 00:01:33,326 --> 00:01:34,928 WHICH ONLY TAKES SPECIFIC 39 00:01:34,994 --> 00:01:37,096 NUMERICAL VALUES. 40 00:01:37,163 --> 00:01:39,833 BUT THOSE NUMERICAL VALUES HAVE 41 00:01:39,899 --> 00:01:40,300 A CLEAR QUANTITATIVE 42 00:01:40,366 --> 00:01:42,268 INTERPRETATION. 43 00:01:42,335 --> 00:01:44,337 FOR EXAMPLE, THE NUMBER OF 44 00:01:44,404 --> 00:01:48,641 DOCTORS VISIT. 45 00:01:48,708 --> 00:01:50,043 QUANTITATIVE VARIABLES INCLUDE 46 00:01:50,109 --> 00:01:53,046 CATEGORICAL OR NOMINAL. 47 00:01:53,112 --> 00:01:57,450 BINARY AND ORDINAL. 48 00:01:57,517 --> 00:01:58,485 CATEGORICAL VARIABLE INCLUDES 49 00:01:58,551 --> 00:02:03,356 TWO OR MORE CATEGORIES. 50 00:02:03,423 --> 00:02:04,324 BUT THERE'S NO INTRINSIC 51 00:02:04,390 --> 00:02:11,764 ORDERING TO THE CATEGORIES. 52 00:02:11,831 --> 00:02:15,969 FOR EXAMPLE, RISK AND COLORS. 53 00:02:16,035 --> 00:02:18,638 IT INCLUDES ONLY TWO CATEGORIES. 54 00:02:18,705 --> 00:02:25,545 WHICH IS CALLED BINARY WITH TWO 55 00:02:25,612 --> 00:02:33,286 CATEGORIES. 56 00:02:33,353 --> 00:02:36,122 ORDINARY CATEGORY IS SIMILAR. 57 00:02:36,189 --> 00:02:38,091 THE ONLY DIFFERENCE IS ORDINAL 58 00:02:38,157 --> 00:02:39,125 HAS A CLEAR ORDERING OF THE 59 00:02:39,192 --> 00:02:41,461 CATEGORIES. 60 00:02:41,528 --> 00:02:45,999 FOR EXAMPLE, THE PAIN SCALE. 61 00:02:46,065 --> 00:02:47,200 NO PAIN, MILD, MODERATE AND 62 00:02:47,267 --> 00:02:49,435 SEVERE. 63 00:02:49,502 --> 00:02:51,437 BASED ON A STUDY OBJECTIVE 64 00:02:51,504 --> 00:02:54,107 DESIGN, A VARIABLE CAN BE 65 00:02:54,173 --> 00:02:56,776 DEPENDENT OR INDEPENDENT. 66 00:02:56,843 --> 00:02:58,411 THE DEPENDENT VARIABLE IS ALSO 67 00:02:58,478 --> 00:03:01,814 CALLED RESPONSE VARIABLE. 68 00:03:01,881 --> 00:03:04,584 THE INDEPENDENT VARIABLES 69 00:03:04,651 --> 00:03:05,385 INCLUDE EXPLANATORY VARIABLES, 70 00:03:05,451 --> 00:03:06,786 EXPOSURE OR FACTOR. 71 00:03:06,853 --> 00:03:14,694 FOR EXAMPLE, A STUDY IS 72 00:03:14,761 --> 00:03:17,096 DESIGNED TO EVALUATE DIABETES 73 00:03:17,163 --> 00:03:18,932 WITH DIET, EXERCISE. 74 00:03:18,998 --> 00:03:21,434 IN THIS EXAMPLE THE DIABETES IS 75 00:03:21,501 --> 00:03:23,469 AN INDEPENDENT VARIABLE, THE 76 00:03:23,536 --> 00:03:28,675 DIET EXERCISE AND SWEETS ARE 77 00:03:28,741 --> 00:03:29,542 INDEPENDENT VARIABLES. 78 00:03:29,609 --> 00:03:32,211 IN ORDER TO KNOW THE DIFFERENCE 79 00:03:32,278 --> 00:03:33,079 BETWEEN LOGISTIC AND LINEAR 80 00:03:33,146 --> 00:03:34,347 REGRESSION WE NEED TO TALK 81 00:03:34,414 --> 00:03:38,217 ABOUT THE REGRESSION ANALYSIS. 82 00:03:38,284 --> 00:03:40,153 REGRESSION ANALYSIS IS 83 00:03:40,219 --> 00:03:43,690 STATISTIC TOOL TO INVESTIGATING 84 00:03:43,756 --> 00:03:44,591 FUNCTIONAL RELATIONSHIPS 85 00:03:44,657 --> 00:03:45,592 BETWEEN DEPENDENT VARIABLES AND 86 00:03:45,658 --> 00:03:47,727 INDEPENDENT VARIABLES. 87 00:03:47,794 --> 00:03:49,095 THE REGRESSION FAMILY INCLUDES 88 00:03:49,162 --> 00:03:52,265 MANY MEMBERS. 89 00:03:52,332 --> 00:03:54,734 WHICH ONE SHOULD BE USED IS 90 00:03:54,801 --> 00:03:56,336 MAINLY BASED ON TYPES OF 91 00:03:56,402 --> 00:04:00,273 DEPENDENT VARIABLES. 92 00:04:00,340 --> 00:04:02,909 FOR THE QUANTITATIVE DEPENDENT 93 00:04:02,976 --> 00:04:06,145 VARIABLE FOLLOWING DISTRIBUTION 94 00:04:06,212 --> 00:04:15,021 WE USE LINEAR. 95 00:04:15,088 --> 00:04:16,689 DISTRIBUTION WE USE POISSON 96 00:04:16,756 --> 00:04:19,092 REGRESSION. 97 00:04:19,158 --> 00:04:20,426 FOR QUALITATIVE DEPENDENT 98 00:04:20,493 --> 00:04:20,960 VARIABLE WE USE LOGISTIC 99 00:04:21,027 --> 00:04:24,864 REGRESSION. 100 00:04:24,931 --> 00:04:30,970 FOR THE BINARY DEPENDENT WE USE 101 00:04:31,037 --> 00:04:33,539 BINARY LOGISTIC REGRESSION, 102 00:04:33,606 --> 00:04:34,741 ALSO CALLED LOGISTICAL 103 00:04:34,807 --> 00:04:39,178 REGRESSION FOR SHORT. 104 00:04:39,245 --> 00:04:41,214 FOR THE ORDINAL DEPENDENT 105 00:04:41,280 --> 00:04:42,148 VARIABLE WE USE ORDINAL 106 00:04:42,215 --> 00:04:47,587 LOGISTIC REGRESSION. 107 00:04:47,654 --> 00:04:56,763 FOR NOMINAL WE USE MULTINOMIAL 108 00:04:56,829 --> 00:04:57,897 LOGISTIC REGRESSION. 109 00:04:57,964 --> 00:05:00,066 FOR THIS WE WILL TALK ABOUT 110 00:05:00,133 --> 00:05:03,236 BINARY. 111 00:05:03,302 --> 00:05:12,512 AND THERE ARE OTHER REGRESSIONS. 112 00:05:12,578 --> 00:05:20,687 TO APPRECIATE YOU HAVE TO 113 00:05:20,753 --> 00:05:23,423 UNDERSTAND THE DEFINITION OF 114 00:05:23,489 --> 00:05:23,723 PROBABILITY. 115 00:05:23,790 --> 00:05:27,026 THERE ARE TWO POSSIBLE VALUES. 116 00:05:27,093 --> 00:05:30,163 USUALLY Y TAKES 1 IF THE EVENT 117 00:05:30,229 --> 00:05:31,664 OCCURS AND Y TAKES 0 IF THE 118 00:05:31,731 --> 00:05:34,133 EVENT DOES NOT OCCUR. 119 00:05:34,200 --> 00:05:35,468 THE P REPRESENTS THE 120 00:05:35,535 --> 00:05:36,035 PROBABILITY THAT THE EVENT 121 00:05:36,102 --> 00:05:38,071 OCCURS. 122 00:05:38,137 --> 00:05:40,973 THE ODDS OF AN EVENT IS DEFINED 123 00:05:41,040 --> 00:05:44,444 AS THE RATIO OF PROBABILITY OF 124 00:05:44,510 --> 00:05:46,379 AN EVENT OCCURRING TO THE 125 00:05:46,446 --> 00:05:47,346 PROBABILITY OF AN EVENT NOT 126 00:05:47,413 --> 00:05:49,716 OCCURRING. 127 00:05:49,782 --> 00:05:57,857 OR THE P DIVIDED BY 1 -P. 128 00:05:57,924 --> 00:05:59,692 IN OTHER WORDS DEFINED AS RATIO 129 00:05:59,759 --> 00:06:01,794 HOW LIKELY THE EVENT IS TO 130 00:06:01,861 --> 00:06:04,697 OCCUR, TO HOW LIKELY IT IS NOT 131 00:06:04,764 --> 00:06:05,431 TO OCCUR. 132 00:06:05,498 --> 00:06:07,667 SO WHY DO WE NEED ODDS? 133 00:06:07,734 --> 00:06:14,340 IT IS BECAUSE IT IS MORE 134 00:06:14,407 --> 00:06:14,774 SENSIBLE SKILL FOR 135 00:06:14,841 --> 00:06:16,242 MULTIPLICATIVE COMPARISONS. 136 00:06:16,309 --> 00:06:17,777 FOR EXAMPLE, THE PROBABILITY TO 137 00:06:17,844 --> 00:06:20,613 WIN THE NEXT ELECTION FOR 138 00:06:20,680 --> 00:06:23,649 CANDIDATE A IS 0.3, AND FOR B 139 00:06:23,716 --> 00:06:25,585 THE PROBABILITY IS 0.6. 140 00:06:25,651 --> 00:06:27,754 IT IS A MEANINGFUL TO CLAIM THE 141 00:06:27,820 --> 00:06:30,156 PROBABILITY FOR B IS TWICE AS 142 00:06:30,223 --> 00:06:34,160 GREAT AS A'S. 143 00:06:34,227 --> 00:06:38,364 HOWEVER, IF P EQUAL TO 0.6 FOR 144 00:06:38,431 --> 00:06:39,899 CANDIDATE A, THE B'S 145 00:06:39,966 --> 00:06:41,601 PROBABILITY CANNOT BE TWICE AS 146 00:06:41,667 --> 00:06:43,169 GREAT AS A. 147 00:06:43,236 --> 00:06:44,704 BECAUSE THE PROBABILITY CANNOT 148 00:06:44,771 --> 00:06:48,841 BE MORE THAN 1. 149 00:06:48,908 --> 00:06:53,146 BUT IN ODDS SCALE FOR P0.6 FOR 150 00:06:53,212 --> 00:06:59,285 CANDIDATE A AND WE CAN GET THE 151 00:06:59,352 --> 00:07:07,760 ODDS EQUAL TO 0.6 DIVIDED BY 152 00:07:07,827 --> 00:07:10,129 0.4 EQUAL TO 1.5. 153 00:07:10,196 --> 00:07:14,000 FOR P, IF THE PROBABILITY IS 154 00:07:14,066 --> 00:07:23,276 0.75, FOR B WE CAN GET 155 00:07:23,342 --> 00:07:26,212 0.75/0.25 EQUALS 3, WE CAN SEE 156 00:07:26,279 --> 00:07:28,881 THE ODDS ARE TWICE AS GREAT AS 157 00:07:28,948 --> 00:07:31,350 A. 158 00:07:31,417 --> 00:07:32,351 0.75 IS IT SUB FROM THIS 159 00:07:32,418 --> 00:07:33,986 EQUATION. 160 00:07:34,053 --> 00:07:35,221 THE ODDS RATIO IS TO COMPARE 161 00:07:35,288 --> 00:07:37,290 ODDS. 162 00:07:37,356 --> 00:07:39,625 FOR EXAMPLE, THE ODDS RATIO OF 163 00:07:39,692 --> 00:07:44,363 A TO B IS EQUAL TO 1.5 DIVIDED 164 00:07:44,430 --> 00:07:46,599 BY 3, EQUAL TO 0.5. 165 00:07:46,666 --> 00:07:49,335 INDICATING THE ODDS OF WINNING 166 00:07:49,402 --> 00:07:52,872 THE ELECTION FOR CANDIDATE A IS 167 00:07:52,939 --> 00:07:53,806 50% LOWER THAN THAT FOR 168 00:07:53,873 --> 00:07:58,144 CANDIDATE B. 169 00:07:58,211 --> 00:08:01,447 AND ODDS RATIO OF B TO A IS 170 00:08:01,514 --> 00:08:04,884 EQUAL TO 3 DIVIDED BY 1.5 EQUAL 171 00:08:04,951 --> 00:08:06,919 TO 2, INDICATING THE ODDS OF 172 00:08:06,986 --> 00:08:09,255 WINNING THE ELECTION FOR B IS 173 00:08:09,322 --> 00:08:15,695 TWO TIMES THAT FOR A. 174 00:08:15,761 --> 00:08:18,564 NOW WE COME TO DEFINE THE LOGIT 175 00:08:18,631 --> 00:08:22,168 MODEL. 176 00:08:22,235 --> 00:08:25,171 OR LOGISTICAL REGRESSION MODEL. 177 00:08:25,238 --> 00:08:29,242 WE JUST LEARNED ODDS DEFINED AS 178 00:08:29,308 --> 00:08:32,745 PROBABILITY DIVIDED BY ONE 179 00:08:32,812 --> 00:08:36,816 MINUS PROBABILITY. 180 00:08:36,883 --> 00:08:41,854 THEN P DIVIDED BY 1 MINUS 181 00:08:41,921 --> 00:08:44,223 P APPLY NATURAL LOGARITHM 182 00:08:44,290 --> 00:08:47,026 TRANSFORMATION TO THE ODDS. 183 00:08:47,093 --> 00:08:55,201 THEN WE GET THE LOGIT P. 184 00:08:55,268 --> 00:08:57,904 THIS TABLE SHOWS THREE 185 00:08:57,970 --> 00:08:59,438 VARIABLES FOR THE PROBABILITY 186 00:08:59,505 --> 00:09:01,774 DERIVED FROM 0 TO 1. 187 00:09:01,841 --> 00:09:08,814 THE ODDS RANGE IS FROM 0 TO 188 00:09:08,881 --> 00:09:13,352 POSITIVE LIMIT, WHICH MEANS NO 189 00:09:13,419 --> 00:09:15,121 UPPER BOND. 190 00:09:15,187 --> 00:09:18,424 AND LOGIT THERE IS NO UPPER OR 191 00:09:18,491 --> 00:09:20,126 LOWER BONDS. 192 00:09:20,192 --> 00:09:23,763 IT TAKES ALL REAL NUMBERS. 193 00:09:23,829 --> 00:09:25,197 THIS TABLE ALSO SHOWS 194 00:09:25,264 --> 00:09:26,299 ONE-TO-ONE RELATIONSHIP BETWEEN 195 00:09:26,365 --> 00:09:29,702 THE THREE VARIABLES. 196 00:09:29,769 --> 00:09:32,672 DIFFERENT SCALES. 197 00:09:32,738 --> 00:09:36,175 FOR EXAMPLE, THE PROBABILITY OF 198 00:09:36,242 --> 00:09:40,346 0.5 CORRESPONDING TO ODDS OF 1 199 00:09:40,413 --> 00:09:42,481 AND LOGIT OF 0. 200 00:09:42,548 --> 00:09:45,084 LET'S LOOK AT THIS AGAIN. 201 00:09:45,151 --> 00:09:49,922 FOR THE BINARY VARIABLE WE USE 202 00:09:49,989 --> 00:09:51,324 PROBABILITY TO DESCRIBE THE 203 00:09:51,390 --> 00:09:54,393 EVENTS OCCURRING. 204 00:09:54,460 --> 00:09:59,332 AND THEN WE DEFINE THE ODDS AS 205 00:09:59,398 --> 00:10:02,234 P DIVIDED 1 MINUS P. 206 00:10:02,301 --> 00:10:03,602 AND THEN WE APPLY THE LOG 207 00:10:03,669 --> 00:10:05,705 TRANSFORMATION. 208 00:10:05,771 --> 00:10:09,241 WE GET THE LOGIT. 209 00:10:09,308 --> 00:10:11,844 THIS TRANSFORMATION LEADS TO A 210 00:10:11,911 --> 00:10:14,513 LINEAR RELATIONSHIP WITH THE 211 00:10:14,580 --> 00:10:15,014 PREDICTOR VARIABLES, OR 212 00:10:15,081 --> 00:10:16,515 INDEPENDENT VARIABLES. 213 00:10:16,582 --> 00:10:17,717 THE LOGISTIC REGRESSION MODEL 214 00:10:17,783 --> 00:10:20,019 IS LIKE THIS. 215 00:10:20,086 --> 00:10:22,521 WHICH IS VERY SIMILAR TO THE 216 00:10:22,588 --> 00:10:24,490 LINEAR REGRESSION. 217 00:10:24,557 --> 00:10:30,029 IN THE LINEAR REGRESSION THE 218 00:10:30,096 --> 00:10:30,529 LEFT SIDE IS RESPONSIVE 219 00:10:30,596 --> 00:10:32,965 VARIABLE. 220 00:10:33,032 --> 00:10:37,103 LOGISTIC REGRESSION IS THE 221 00:10:37,169 --> 00:10:38,871 LOGIT, IS THE LOG 222 00:10:38,938 --> 00:10:40,840 TRANSFORMATION OF THE ODDS. 223 00:10:40,906 --> 00:10:45,678 AND FOR THE RIGHT SIDE. 224 00:10:45,745 --> 00:10:48,147 ALL THE X ARE SAME AS IN THE 225 00:10:48,214 --> 00:10:51,884 LINEAR REGRESSION MODEL. 226 00:10:51,951 --> 00:10:52,551 WHICH CAN TAKE ANY TYPE 227 00:10:52,618 --> 00:10:54,754 VARIABLE. 228 00:10:54,820 --> 00:10:56,222 SO X CAN BE QUANTITATIVE OR 229 00:10:56,288 --> 00:10:59,925 QUALITATIVE. 230 00:10:59,992 --> 00:11:06,799 AND FOR THE BETA VARIABLE, THE 231 00:11:06,866 --> 00:11:08,834 BETA AND EITHERS ARE THE SLOPE. 232 00:11:08,901 --> 00:11:11,637 IN THE LINEAR REGRESSION THE 233 00:11:11,704 --> 00:11:15,174 SLOPE CAN BE INTERPRETED AS THE 234 00:11:15,241 --> 00:11:17,443 CHANGE IN RESPONSE VARIABLE FOR 235 00:11:17,510 --> 00:11:21,147 EACH UNIT INCREASE IN X. 236 00:11:21,213 --> 00:11:22,648 BUT LOGISTIC REGRESSION THE 237 00:11:22,715 --> 00:11:25,117 SLOPE BETA IS THE CHANGE IN LOG 238 00:11:25,184 --> 00:11:28,421 ODDS RATIO. 239 00:11:28,487 --> 00:11:32,124 FOR EACH UNIT INCREASE IN 240 00:11:32,191 --> 00:11:35,594 INDEPENDENT VARIABLE OR XI. 241 00:11:35,661 --> 00:11:42,968 BECAUSE THE LOG ODDS RATIO HAS 242 00:11:43,035 --> 00:11:45,905 NO DIRECT MEANING, THEREFORE WE 243 00:11:45,971 --> 00:11:52,912 BACK, THE EXPONENTIAL SLOPE. 244 00:11:52,978 --> 00:11:56,549 THE EXPONENTIAL SLOPE IS THE 245 00:11:56,615 --> 00:11:58,350 CHANGE IN ODDS RATIO FOR EACH 246 00:11:58,417 --> 00:11:59,785 UNIT INCREASE IN THE X. 247 00:11:59,852 --> 00:12:01,520 WHICH WE WILL SHOW IN DETAIL IN 248 00:12:01,587 --> 00:12:07,126 THE EXAMPLES. 249 00:12:07,193 --> 00:12:08,661 SO WHAT IS LOGISTIC REGRESSION 250 00:12:08,727 --> 00:12:11,797 USED FOR. 251 00:12:11,864 --> 00:12:13,499 BECAUSE LOGISTIC REGRESSION 252 00:12:13,566 --> 00:12:19,672 MODELS ARE LINEAR REGRESSION OF 253 00:12:19,738 --> 00:12:20,372 INDEPENDENT VARIABLES WITH 254 00:12:20,439 --> 00:12:21,874 NATURAL LOG OF THE ODDS OF 255 00:12:21,941 --> 00:12:23,375 DEPENDENT VARIABLE. 256 00:12:23,442 --> 00:12:25,444 THEREFORE IT COULD BE USED TO 257 00:12:25,511 --> 00:12:27,546 EVALUATE THE ASSOCIATION OF ONE 258 00:12:27,613 --> 00:12:29,815 OR MORE INDEPENDENT VARIABLE, 259 00:12:29,882 --> 00:12:33,185 WITH BINARY DEPENDENT VARIABLE. 260 00:12:33,252 --> 00:12:35,955 AND TO STUDY THE RELATIONSHIP 261 00:12:36,021 --> 00:12:39,658 BETWEEN EACH VARIABLE AND THE 262 00:12:39,725 --> 00:12:41,093 BINARY VARIABLES WHILE HOLDING 263 00:12:41,160 --> 00:12:42,728 CONSTANT THE VALUES OF THE 264 00:12:42,795 --> 00:12:47,066 OTHER INDEPENDENT VARIABLES. 265 00:12:47,133 --> 00:12:48,667 OR ADJUSTED FOR OTHER DEPENDENT 266 00:12:48,734 --> 00:12:49,335 VARIABLES. 267 00:12:49,401 --> 00:12:52,605 AND TO ESTIMATE THE PROBABILITY 268 00:12:52,671 --> 00:12:54,406 OF A PARTICULAR OUTCOME GIVEN 269 00:12:54,473 --> 00:12:56,909 THE VALUES OF THE INDEPENDENT 270 00:12:56,976 --> 00:13:00,779 VARIABLES OR PREDICTION. 271 00:13:00,846 --> 00:13:01,380 THERE'S ONLY ONE INDEPENDENT 272 00:13:01,447 --> 00:13:03,883 VARIABLE. 273 00:13:03,949 --> 00:13:05,384 THE LOGISTICAL REGRESSION IS 274 00:13:05,451 --> 00:13:05,885 CALLED SIMPLE LOGISTIC 275 00:13:05,951 --> 00:13:08,053 REGRESSION. 276 00:13:08,120 --> 00:13:10,456 THE ODDS RATIO BASED ON THESE 277 00:13:10,523 --> 00:13:14,059 SIMPLE LOGISTIC REGRESSION IS 278 00:13:14,126 --> 00:13:17,363 CALLED UNADJUSTED ODDS RATIO. 279 00:13:17,429 --> 00:13:20,966 IF THE MODEL INCLUDE MORE THAN 280 00:13:21,033 --> 00:13:22,935 ONE INDEPENDENT VARIABLE WE 281 00:13:23,002 --> 00:13:23,435 CALL MULTIPLE LOGISTIC 282 00:13:23,502 --> 00:13:25,638 REGRESSION. 283 00:13:25,704 --> 00:13:27,940 THE ODDS RATIO IS ESTIMATED 284 00:13:28,007 --> 00:13:29,742 FROM THE MULTIPLE LOGISTIC 285 00:13:29,808 --> 00:13:30,476 REGRESSION IS CALLED ADJUSTED 286 00:13:30,543 --> 00:13:33,179 ODDS RATIO. 287 00:13:33,245 --> 00:13:36,315 WHICH REPRESENTS THE UNIQUE 288 00:13:36,382 --> 00:13:37,550 CONTRIBUTION OF THE INDEPENDENT 289 00:13:37,616 --> 00:13:42,188 VARIABLES AFTER ADJUSTING FOR 290 00:13:42,254 --> 00:13:43,455 THE OTHER EFFECTS, ADJUSTING 291 00:13:43,522 --> 00:13:44,690 FOR THE OTHER VARIABLES IN THE 292 00:13:44,757 --> 00:13:46,225 MODEL. 293 00:13:46,292 --> 00:13:48,260 THE ADJUSTED ODDS RATIO IS 294 00:13:48,327 --> 00:13:49,962 OFTEN LOWER THAN THE 295 00:13:50,029 --> 00:13:55,868 CORRESPONDING UNADJUSTED ONE. 296 00:13:55,935 --> 00:13:59,138 JUST LIKE THE LINEAR 297 00:13:59,205 --> 00:14:00,072 REGRESSION, LOGISTIC REGRESSION 298 00:14:00,139 --> 00:14:01,140 SHOULD ALSO MEET SOME BASIC 299 00:14:01,207 --> 00:14:04,009 ASSUMPTIONS. 300 00:14:04,076 --> 00:14:07,246 THE FOUR BASIC ASSUMPTIONS IN 301 00:14:07,313 --> 00:14:08,981 LINEAR REGRESSION ALSO APPLY TO 302 00:14:09,048 --> 00:14:11,183 LOGISTIC REGRESSION. 303 00:14:11,250 --> 00:14:12,651 THE FIRST ONE IS INDEPENDENT 304 00:14:12,718 --> 00:14:17,156 OBSERVATIONS. 305 00:14:17,223 --> 00:14:19,225 WHICH MEANS THE SUBJECTS ARE 306 00:14:19,291 --> 00:14:20,759 INDEPENDENT FOR EACH OTHER. 307 00:14:20,826 --> 00:14:24,063 AND THERE'S ONLY ONE 308 00:14:24,129 --> 00:14:28,267 OBSERVATION FROM EACH SUBJECT. 309 00:14:28,334 --> 00:14:32,504 WHICH MEANS THE DESIGN, THE 310 00:14:32,571 --> 00:14:34,039 DATASET CANNOT BE LONGITUDINAL 311 00:14:34,106 --> 00:14:35,641 AND SHOULD BE CROSS SECTIONAL, 312 00:14:35,708 --> 00:14:36,875 THE SAME IN THE LINEAR 313 00:14:36,942 --> 00:14:43,749 REGRESSION. 314 00:14:43,816 --> 00:14:44,149 THE SECOND ONE IS 315 00:14:44,216 --> 00:14:46,051 MULTICOLLINEARITY. 316 00:14:46,118 --> 00:14:47,219 THIS MEANS THE INDEPENDENT 317 00:14:47,286 --> 00:14:48,487 VARIABLES SHOULD NOT BE HIGHLY 318 00:14:48,554 --> 00:14:50,489 CORRELATED WITH EACH OTHER. 319 00:14:50,556 --> 00:14:52,091 TO CHECK THIS ASSUMPTION THERE 320 00:14:52,157 --> 00:14:55,361 ARE TWO METHODS. 321 00:14:55,427 --> 00:14:58,264 ONE IS TO RUN PEARSON 322 00:14:58,330 --> 00:15:00,666 CORRELATION ANALYSIS FOR ALL 323 00:15:00,733 --> 00:15:09,575 INDEPENDENT VARIABLES. 324 00:15:09,642 --> 00:15:19,151 ANY PAIR WITH COEFFICIENT 325 00:15:19,218 --> 00:15:19,718 GREATER THAN 0.8 INDICATING 326 00:15:19,785 --> 00:15:20,853 MULTICOLLINEARITY. 327 00:15:20,919 --> 00:15:21,887 AND ONE SHOULD BE EXCLUDED FROM 328 00:15:21,954 --> 00:15:26,525 THE MODEL. 329 00:15:26,592 --> 00:15:30,362 THE SECOND METHOD IS RUN LINEAR 330 00:15:30,429 --> 00:15:32,865 REGRESSION BY TREATING BINARY 331 00:15:32,931 --> 00:15:35,234 OUTCOME 0, 1 AS CONTINUOUS TO 332 00:15:35,301 --> 00:15:35,801 GET THE VARIANCE INFLATION 333 00:15:35,868 --> 00:15:38,437 FACTOR. 334 00:15:38,504 --> 00:15:39,972 THE LINEAR MODEL INCLUDES ALL 335 00:15:40,039 --> 00:15:45,811 INDEPENDENT VARIABLES. 336 00:15:45,878 --> 00:15:48,681 IF ANY INDEPENDENT VARIABLE ARE 337 00:15:48,747 --> 00:15:51,984 CATEGORICAL WITH MORE THAN TWO 338 00:15:52,051 --> 00:15:56,922 CATEGORIES, YOU NEED TO RECORD 339 00:15:56,989 --> 00:16:00,392 IT AS MORE THAN TWO -- 340 00:16:00,459 --> 00:16:00,659 VARIABLES. 341 00:16:00,726 --> 00:16:11,103 FOR THE VIF GREATER THAN 4 342 00:16:11,170 --> 00:16:11,704 INDICATES POSSIBLE MULTICO 343 00:16:11,770 --> 00:16:13,105 LINEARITY GREATER THAN 10. 344 00:16:13,172 --> 00:16:21,780 THE THIRD ONE IS THE LINEAR FOR 345 00:16:21,847 --> 00:16:23,749 CONTINUOUS INDEPENDENT 346 00:16:23,816 --> 00:16:24,016 VARIABLES. 347 00:16:24,083 --> 00:16:25,317 THIS MEANS THERE'S A LINEAR 348 00:16:25,384 --> 00:16:26,485 RELATIONSHIP BETWEEN LOGIT P 349 00:16:26,552 --> 00:16:32,291 AND THE CONTINUOUS X. 350 00:16:32,358 --> 00:16:36,228 WE USE PLOT OF YX TO CHECK THE 351 00:16:36,295 --> 00:16:36,495 LINEARITY. 352 00:16:36,562 --> 00:16:37,563 THIS DOESN'T WORK IN LOGISTIC 353 00:16:37,629 --> 00:16:39,965 REGRESSION. 354 00:16:40,032 --> 00:16:41,700 BECAUSE THE DEPENDENT VARIABLE 355 00:16:41,767 --> 00:16:45,437 Y HAS ONLY TWO POSSIBLE VALUES. 356 00:16:45,504 --> 00:16:49,775 SO WE USE THE BOX-TIDWELL TEST. 357 00:16:49,842 --> 00:16:52,311 WHICH MEANS TO TEST THE 358 00:16:52,378 --> 00:16:54,747 INTERACTION TERM X AND LOG X. 359 00:16:54,813 --> 00:16:56,682 FOR EXAMPLE, AGE IS A 360 00:16:56,749 --> 00:16:58,984 CONTINUOUS VARIABLE. 361 00:16:59,051 --> 00:17:03,222 AND WE NEED, WE CAN TEST THE 362 00:17:03,288 --> 00:17:04,223 AGE, THE INTENTION BETWEEN AGE 363 00:17:04,289 --> 00:17:07,192 AND THE LOG AGE. 364 00:17:07,259 --> 00:17:10,729 IF THE P VALUE IS LESS THAN 365 00:17:10,796 --> 00:17:12,297 ONE, INDICATING THE ASSUMPTION 366 00:17:12,364 --> 00:17:14,066 IS VIOLATED. 367 00:17:14,133 --> 00:17:16,835 THE ONE SOLUTION IS TO 368 00:17:16,902 --> 00:17:17,369 CATEGORIZE THE CONTINUOUS 369 00:17:17,436 --> 00:17:18,837 VARIABLE. 370 00:17:18,904 --> 00:17:25,778 AND THEN TREAT IT AS 371 00:17:25,844 --> 00:17:27,179 CATEGORICAL VARIABLE, OR USE 372 00:17:27,246 --> 00:17:28,480 PIECEWISE LINEARIZATION. 373 00:17:28,547 --> 00:17:31,683 THIS WILL BE SHOWN IN THE 374 00:17:31,750 --> 00:17:32,451 EXAMPLES. 375 00:17:32,518 --> 00:17:35,687 THE FOURTH ONE IS NO 376 00:17:35,754 --> 00:17:37,189 INFLUENTIAL VALUES WHICH COULD 377 00:17:37,256 --> 00:17:39,191 BE CHECKED BY USING DIAGNOSTIC 378 00:17:39,258 --> 00:17:40,292 WHICH I WILL BE SHOWING IN THE 379 00:17:40,359 --> 00:17:46,899 NEXT SLIDE. 380 00:17:46,965 --> 00:17:48,867 MODEL EVALUATION. 381 00:17:48,934 --> 00:17:50,335 THIS SLIDE SHOWS FOR MODEL 382 00:17:50,402 --> 00:17:52,070 EVALUATION IS TO TEST HOW WELL 383 00:17:52,137 --> 00:17:56,308 THE MODEL FITS THE DATA. 384 00:17:56,375 --> 00:17:58,377 IT IS ALSO CALLED THE GOODNESS 385 00:17:58,444 --> 00:18:00,946 OF FIT TEST, WHICH IS TO 386 00:18:01,013 --> 00:18:02,748 COMPARE THE OBSERVED 387 00:18:02,815 --> 00:18:04,683 PROBABILITY TO EXPECTED OR 388 00:18:04,750 --> 00:18:06,318 PREDICTED PROBABILITY. 389 00:18:06,385 --> 00:18:09,688 THERE ARE A LOT OF METHODS FOR 390 00:18:09,755 --> 00:18:11,857 THE MODEL EVALUATION. 391 00:18:11,924 --> 00:18:16,728 HERE WE INTRODUCE THREE OF THEM. 392 00:18:16,795 --> 00:18:21,033 ONE IS CALLED HOSMER AND 393 00:18:21,099 --> 00:18:23,469 LEMESHOW TEST, OR HL TEST. 394 00:18:23,535 --> 00:18:26,605 THE HIGH P VALUE INDICATES THAT 395 00:18:26,672 --> 00:18:29,475 THE MODEL THAT IS A FITTED 396 00:18:29,541 --> 00:18:31,276 MODEL CANNOT BE REJECTED. 397 00:18:31,343 --> 00:18:34,079 AND THE MODEL CANNOT BE 398 00:18:34,146 --> 00:18:38,083 SIGNIFICANTLY IMPROVED BY 399 00:18:38,150 --> 00:18:38,617 ADDING NON-LINEARITIES OR 400 00:18:38,684 --> 00:18:39,718 INTERACTION. 401 00:18:39,785 --> 00:18:42,187 AND THE LOW P VALUE INDICATES A 402 00:18:42,254 --> 00:18:43,655 MODEL DOES NOT FIT THAT WELL, 403 00:18:43,722 --> 00:18:44,990 OR LACK OF FIT. 404 00:18:45,057 --> 00:18:48,126 AND THE MODEL CAN BE 405 00:18:48,193 --> 00:18:52,764 SIGNIFICANTLY IMPROVED BY 406 00:18:52,831 --> 00:18:53,365 ADDING NON-LINEARITIES AND/OR 407 00:18:53,432 --> 00:18:55,000 INTERACTION. 408 00:18:55,067 --> 00:18:57,503 THE SECOND TEST METHOD IS 409 00:18:57,569 --> 00:18:58,770 DEVIANCE AND PEARSON GOODNESS 410 00:18:58,837 --> 00:19:00,973 OF FIT STATISTICS. 411 00:19:01,039 --> 00:19:04,076 THE THIRD ONE IS GENERALIZED 412 00:19:04,142 --> 00:19:06,712 R-SQUARE, WHICH IS MEASURES OF 413 00:19:06,778 --> 00:19:11,083 PREDICTIVE POWER. 414 00:19:11,149 --> 00:19:14,186 IN THE LINEAR REGRESSION, THE 415 00:19:14,253 --> 00:19:23,762 R-SQUARE CAN BE INTERPRETED AS 416 00:19:23,829 --> 00:19:24,696 THE PROPORTION OF BY 417 00:19:24,763 --> 00:19:29,001 INDEPENDENT VARIABLES. 418 00:19:29,067 --> 00:19:30,502 BUT THE GENERALIZED R-SQUARE 419 00:19:30,569 --> 00:19:31,169 CANNOT BE INTERPRETED IN THIS 420 00:19:31,236 --> 00:19:34,006 WAY. 421 00:19:34,072 --> 00:19:36,341 BUT STILL, WE CAN USE TO 422 00:19:36,408 --> 00:19:43,282 COMPARE DIFFERENT MODELS. 423 00:19:43,348 --> 00:19:45,250 DIAGNOSTICS. 424 00:19:45,317 --> 00:19:55,127 THE PURPOSE OF DIAGNOSTICS IS 425 00:19:55,193 --> 00:19:57,963 TO IDENTIFY THE INFLUENTIAL 426 00:19:58,030 --> 00:19:58,597 OBSERVATIONS OUTLIERS, EXTREME 427 00:19:58,664 --> 00:19:59,998 VALUES. 428 00:20:00,065 --> 00:20:06,738 THE RESIDUAL IS DEFINED AS THE 429 00:20:06,805 --> 00:20:09,274 DIFFERENCE BETWEEN THE OBSERVED 430 00:20:09,341 --> 00:20:09,775 PROBABILITY AND IS THE 431 00:20:09,841 --> 00:20:12,010 PREDICTED PROBABILITY. 432 00:20:12,077 --> 00:20:13,712 THE FOLLOWING FOUR STATISTICS 433 00:20:13,779 --> 00:20:15,781 ARE OFTEN USED AND ARE USUALLY 434 00:20:15,847 --> 00:20:21,286 ARE PRESENTED IN PLOTS. 435 00:20:21,353 --> 00:20:26,959 ONE IS THE PEARSON RESIDUALS. 436 00:20:27,025 --> 00:20:28,126 THE POSSIBILITY INFLUENTIAL 437 00:20:28,193 --> 00:20:29,995 OBSERVATIONS ARE THOSE WITH 438 00:20:30,062 --> 00:20:34,833 ABSOLUTE VALUE OF PEARSON 439 00:20:34,900 --> 00:20:37,469 RESIDUAL LESS THAN 2. 440 00:20:37,536 --> 00:20:37,970 THE SECOND IS DEVIANCE 441 00:20:38,036 --> 00:20:40,806 RESIDUALS. 442 00:20:40,872 --> 00:20:41,873 THE POSSIBLE INFLUENTIAL 443 00:20:41,940 --> 00:20:47,746 OBSERVATIONS ARE THOSE WITH 444 00:20:47,813 --> 00:20:50,182 ABSOLUTE VALUE OF 2. 445 00:20:50,248 --> 00:20:55,654 BOTH PEARSON AND THE DEVIANCE 446 00:20:55,721 --> 00:20:56,755 RESIDUALS ARE FOLLOW THE 447 00:20:56,822 --> 00:21:00,325 STANDARDIZED NON DISTRIBUTION. 448 00:21:00,392 --> 00:21:04,229 THEREFORE THE CUT-OFF FOR 449 00:21:04,296 --> 00:21:06,765 OUTLIER ARE 2. 450 00:21:06,832 --> 00:21:10,669 THE THIRD ONE IS HAT MATRIX 451 00:21:10,736 --> 00:21:13,939 DIAGONAL, OR LEVERAGE. 452 00:21:14,006 --> 00:21:18,677 FOR THIS VALUE, FOR THIS 453 00:21:18,744 --> 00:21:22,114 STATISTICS, THE RANGE IS 0 TO 1. 454 00:21:22,180 --> 00:21:24,449 THE POSSIBLE INFLUENTIAL 455 00:21:24,516 --> 00:21:28,520 OBSERVATIONS ARE THOSE WITH 456 00:21:28,587 --> 00:21:29,855 GREATER THAN 3 TIMES K DIVIDED 457 00:21:29,921 --> 00:21:31,423 BY N. 458 00:21:31,490 --> 00:21:33,625 HERE THE K IS NUMBER OF 459 00:21:33,692 --> 00:21:34,159 PARAMETERS INCLUDING THE 460 00:21:34,226 --> 00:21:35,160 INTERCEPT. 461 00:21:35,227 --> 00:21:36,662 AND N IS THE TOTAL NUMBER OF 462 00:21:36,728 --> 00:21:40,298 OBSERVATIONS. 463 00:21:40,365 --> 00:21:45,237 THE FOURTH ONE IS THE 464 00:21:45,303 --> 00:21:45,671 CONFIDENCE INTERVAL 465 00:21:45,737 --> 00:21:49,007 DISPLACEMENT C. 466 00:21:49,074 --> 00:21:50,409 THE POSSIBLE INFLUENTIAL 467 00:21:50,475 --> 00:21:53,345 OBSERVATIONS ARE THOSE WITH C 468 00:21:53,412 --> 00:21:58,083 STATISTICS GREATER THAN 1. 469 00:21:58,150 --> 00:22:02,354 EVEN THOUGH THEY ARE FOR THESE 470 00:22:02,421 --> 00:22:04,956 STATISTICS, IT IS RECOMMENDED 471 00:22:05,023 --> 00:22:06,892 TO INVESTIGATE CASES FOR THOSE 472 00:22:06,958 --> 00:22:09,795 FOR WHICH THE VALUE IS HIGHER, 473 00:22:09,861 --> 00:22:20,338 HIGH RELATIVE TO THE OTHERS. 474 00:22:21,573 --> 00:22:24,443 THESE SLIDES SHOW THE SCAT PLOT 475 00:22:24,509 --> 00:22:27,112 FOR THE FOUR STATISTICS. 476 00:22:27,179 --> 00:22:28,213 THIS ONE IS FOR PEARSON 477 00:22:28,280 --> 00:22:29,981 RESIDUAL. 478 00:22:30,048 --> 00:22:31,883 YOU CAN SEE THERE ARE TWO 479 00:22:31,950 --> 00:22:34,553 OUTLIERS HERE. 480 00:22:34,619 --> 00:22:37,823 AND THESE TWO OUTLIERS ALSO 481 00:22:37,889 --> 00:22:38,857 SHOWING IN DEVIANCE RESIDUAL, 482 00:22:38,924 --> 00:22:42,027 THIS AND THIS HERE. 483 00:22:42,094 --> 00:22:44,029 AND FOR LEVERAGE, THERE'S ONE 484 00:22:44,096 --> 00:22:50,736 OUTLIER HERE. 485 00:22:50,802 --> 00:22:54,172 AND FOR THE C STATISTICS THERE 486 00:22:54,239 --> 00:22:55,073 ARE FIVE WITH VALUES GREATER 487 00:22:55,140 --> 00:22:57,876 THAN 1. 488 00:22:57,943 --> 00:23:00,746 THIS SLIDE SHOWS A DIFFERENCE 489 00:23:00,812 --> 00:23:02,247 AND SIMILARITIES BETWEEN LINEAR 490 00:23:02,314 --> 00:23:06,885 AND THE LOGISTIC REGRESSION. 491 00:23:06,952 --> 00:23:08,754 FOR THE DEPENDENT VARIABLE, 492 00:23:08,820 --> 00:23:11,923 THAT'S THE MAIN DIFFERENCE. 493 00:23:11,990 --> 00:23:14,159 FOR THE LINEAR REGRESSION 494 00:23:14,226 --> 00:23:20,398 SHOULD BE CONTINUOUS WITH 495 00:23:20,465 --> 00:23:20,832 NORMAL DISTRIBUTION. 496 00:23:20,899 --> 00:23:23,602 AND LOGISTIC REGRESSION IS 497 00:23:23,668 --> 00:23:29,107 BINARY YES OR NO. 498 00:23:29,174 --> 00:23:31,276 AND FOLLOWING BERNOULLI. 499 00:23:31,343 --> 00:23:32,844 IT SHOULD MEET AS FOUR 500 00:23:32,911 --> 00:23:42,854 ASSUMPTIONS. 501 00:23:42,921 --> 00:23:44,890 NO MULTICOLLINEARITY AND LIN 502 00:23:44,956 --> 00:23:48,426 YAIRITY FORTIN YUS X AND Y AND 503 00:23:48,493 --> 00:23:51,396 X LOGIT P AND X ARE LINEAR 504 00:23:51,463 --> 00:23:54,266 RELATIONSHIP. 505 00:23:54,332 --> 00:23:57,702 AND NO INFLUENTIAL VALUES. 506 00:23:57,769 --> 00:24:02,374 FOR THE EFFECT SELECTION OF THE 507 00:24:02,440 --> 00:24:05,377 INDEPENDENT VARIABLE SELECTION, 508 00:24:05,443 --> 00:24:09,181 WE CAN USE STEPWISE MYTH ODD, 509 00:24:09,247 --> 00:24:09,648 FORWARD, BACKWARD, OR 510 00:24:09,714 --> 00:24:13,952 PREDICTIVE POWER. 511 00:24:14,019 --> 00:24:17,622 WHICH MEANS BASICALLY R-SQUARED. 512 00:24:17,689 --> 00:24:19,391 ESTIMATION METHOD PARAMETERS. 513 00:24:19,457 --> 00:24:20,559 IN THE LINEAR REGRESSION WE USE 514 00:24:20,625 --> 00:24:23,728 LEAST SQUARE. 515 00:24:23,795 --> 00:24:24,863 AND LOGISTIC WE USE MAXIMUM 516 00:24:24,930 --> 00:24:26,198 LIKELIHOOD. 517 00:24:26,264 --> 00:24:28,567 AND FOR PREDICTIVE POWER IN THE 518 00:24:28,633 --> 00:24:31,536 LINEAR REGRESSION USE R-SQUARE. 519 00:24:31,603 --> 00:24:40,045 IN LOGISTIC REGRESSION WE USE 520 00:24:40,111 --> 00:24:41,146 GENERALIZED R-SQUARE OR 521 00:24:41,213 --> 00:24:44,282 OPERATING CHARACTERISTIC CURVE. 522 00:24:44,349 --> 00:24:46,484 AUROC. 523 00:24:46,551 --> 00:24:49,554 IN THE RESULT PRESENTATION, 524 00:24:49,621 --> 00:24:55,193 SLOPE SEMI PARTIAL R-SQUARE. 525 00:24:55,260 --> 00:24:58,997 NEXT, I'M GOING TO PRESENT TWO 526 00:24:59,064 --> 00:25:04,169 EXAMPLES OF LOGISTIC REGRESSION. 527 00:25:04,236 --> 00:25:08,139 THE FIRST EXAMPLE IS NEURALGIA 528 00:25:08,206 --> 00:25:10,842 TRIAL. 529 00:25:10,909 --> 00:25:13,144 THIS WAS DESIGNED FOR TREATMENT 530 00:25:13,211 --> 00:25:14,546 EFFECTS BY COMPARING WITH 531 00:25:14,613 --> 00:25:16,715 PLACEBO. 532 00:25:16,781 --> 00:25:18,216 THIS IS A RANDOMIZED CONTROL 533 00:25:18,283 --> 00:25:21,720 TRIAL WITH THREE ARMS. 534 00:25:21,786 --> 00:25:25,123 THE SUBJECTS ARE ELDERLY 535 00:25:25,190 --> 00:25:27,692 PATIENTS WITH NEURALGIA. 536 00:25:27,759 --> 00:25:29,427 SUBJECTS ARE RANDOMLY ASSIGNED 537 00:25:29,494 --> 00:25:31,162 TO RECEIVE DRUG A, B AND A 538 00:25:31,229 --> 00:25:35,367 PLACEBO. 539 00:25:35,433 --> 00:25:44,442 STRATIFY USING 6 AND 8 AS 540 00:25:44,509 --> 00:25:46,311 COVARIATES IS APPLIED. 541 00:25:46,378 --> 00:25:48,847 SAMPLE SIZE IS A TOTAL OF 60, 542 00:25:48,914 --> 00:25:50,515 20 PER GROUP. 543 00:25:50,582 --> 00:25:52,050 RESPONSE VARIABLE IS WHETHER 544 00:25:52,117 --> 00:25:54,586 THE PATIENT REPORTED PAIN OR 545 00:25:54,653 --> 00:25:57,289 NOT, DENOTED AS 1 AND 0. 546 00:25:57,355 --> 00:25:59,524 COVARIATES ARE SEX, AGE AND 547 00:25:59,591 --> 00:26:01,126 DURATION. 548 00:26:01,192 --> 00:26:04,562 THIS TABLE SHOWS PART OF THE 549 00:26:04,629 --> 00:26:05,664 DATA. 550 00:26:05,730 --> 00:26:07,198 THE FIRST COLUMN FOR THE CASE 551 00:26:07,265 --> 00:26:08,600 NUMBER AND THE PAIN AND THE 552 00:26:08,667 --> 00:26:10,568 TREATMENT AND THE SEX AND THE 553 00:26:10,635 --> 00:26:20,779 AGE AND THE DURATION. 554 00:26:20,845 --> 00:26:23,615 WE NEED TO USE SAS PROCEDURE 555 00:26:23,682 --> 00:26:24,416 LOGISTIC. 556 00:26:24,482 --> 00:26:26,785 THE FIRST IS TO CALL THE 557 00:26:26,851 --> 00:26:33,124 PROCEDURE LOGISTIC AND SPECIFY 558 00:26:33,191 --> 00:26:34,392 THE DATA, NEURALGIA. 559 00:26:34,459 --> 00:26:38,129 AND THE SECOND LINE IS TO LIST 560 00:26:38,196 --> 00:26:40,131 ALL THE QUALITATIVE OR 561 00:26:40,198 --> 00:26:40,632 CATEGORICAL INDEPENDENT 562 00:26:40,699 --> 00:26:41,866 VARIABLES. 563 00:26:41,933 --> 00:26:45,637 AND TO SPECIFY THE REFERENCE 564 00:26:45,704 --> 00:26:48,506 LABEL FOR ODDS RATIO. 565 00:26:48,573 --> 00:26:55,480 LIKE HERE WE SPECIFIED THE 566 00:26:55,547 --> 00:26:57,549 REFERENCE LEVEL FOR TREATMENT. 567 00:26:57,615 --> 00:27:00,051 AND FOR SEX, IT'S MALE. 568 00:27:00,118 --> 00:27:03,288 IF YOU DON'T SPECIFY THIS BY 569 00:27:03,355 --> 00:27:05,156 DEFAULT, THE REFERENCE LEVEL IS 570 00:27:05,223 --> 00:27:12,263 THE LAST ORDERED LEVELS. 571 00:27:12,330 --> 00:27:14,299 I SPECIFY THE DIFFERENCE WHAT I 572 00:27:14,366 --> 00:27:18,737 WANT TO USE AS A REFERENCE. 573 00:27:18,803 --> 00:27:19,938 AND THE THIRD IS TO SPECIFY THE 574 00:27:20,005 --> 00:27:21,673 MODEL. 575 00:27:21,740 --> 00:27:28,646 DEPENDENT VARIABLE IS PAIN. 576 00:27:28,713 --> 00:27:30,248 AND THE INDEPENDENT VARIABLES 577 00:27:30,315 --> 00:27:30,982 ARE TREATMENT, AGE, SEX AND 578 00:27:31,049 --> 00:27:35,286 CURE RATION. 579 00:27:35,353 --> 00:27:37,622 AND HERE WE USE PAIN, EVENT 580 00:27:37,689 --> 00:27:39,958 EQUAL TO 1 TO SPECIFY THE 581 00:27:40,025 --> 00:27:42,961 RESPONSE LEVEL TO BE MODELED. 582 00:27:43,028 --> 00:27:44,195 BY DEFAULT, THE PROCEDURE 583 00:27:44,262 --> 00:27:45,630 MODELS THE PROBABILITY OF THE 584 00:27:45,697 --> 00:27:48,533 SMALLEST VALUE. 585 00:27:48,600 --> 00:27:52,470 LIKE HERE, I DON'T SPECIFY 1. 586 00:27:52,537 --> 00:28:03,014 THE PROCEDURE WILL MODEL 0. 587 00:28:04,215 --> 00:28:05,216 HERE LACK FIT H.L. GOODNESS OF 588 00:28:05,283 --> 00:28:10,655 FIT TEST. 589 00:28:10,722 --> 00:28:13,391 AND INFLUENCE DIAGNOSTICS. 590 00:28:13,458 --> 00:28:15,760 TO CALCULATE ODDS RATIO FOR 591 00:28:15,827 --> 00:28:19,230 TREATMENT. 592 00:28:19,297 --> 00:28:23,268 AND THE CONFIDENCE INTERVAL. 593 00:28:23,334 --> 00:28:24,736 HERE IS SPECIFIED CONFIDENCE 594 00:28:24,803 --> 00:28:30,141 INTERVAL EQUAL TO PL IS TO 595 00:28:30,208 --> 00:28:31,242 CALCULATE THE CONFIDENCE 596 00:28:31,309 --> 00:28:33,611 INTERVAL BASED ON THE PROFILE 597 00:28:33,678 --> 00:28:36,047 LIKELIHOOD FUNCTION WHICH IS 598 00:28:36,114 --> 00:28:37,849 MORE ACCURATE THAN WALD METHOD, 599 00:28:37,916 --> 00:28:38,716 ESPECIALLY WHEN THE SAMPLE SIZE 600 00:28:38,783 --> 00:28:44,022 IS SMALL. 601 00:28:44,089 --> 00:28:47,025 TO ANALYZE DATA, WE CANNOT JUST 602 00:28:47,092 --> 00:28:48,159 SIMPLY RUN THE PROGRAM. 603 00:28:48,226 --> 00:28:49,060 WE NEEDED TO CHECK THE 604 00:28:49,127 --> 00:28:51,329 ASSUMPTIONS. 605 00:28:51,396 --> 00:28:56,668 FOR THIS DATA, IF THE DEPENDENT 606 00:28:56,734 --> 00:29:01,639 VARIABLE IS CONTINUOUS, WE 607 00:29:01,706 --> 00:29:03,475 SHOULD USE ANCOVA A SCHUMM 608 00:29:03,541 --> 00:29:09,647 SHUNS FOR COVARIATES. 609 00:29:09,714 --> 00:29:11,015 ANCOVA THE COVARIATE VARIABLES 610 00:29:11,082 --> 00:29:13,651 MUST MEET TWO ASSUMPTIONS. 611 00:29:13,718 --> 00:29:18,323 THE ONE ASSUMPTION IS THE 612 00:29:18,389 --> 00:29:19,524 COVARIATE VARIABLE SHOULD NOT 613 00:29:19,591 --> 00:29:20,058 BE ASSOCIATED WITH THE 614 00:29:20,125 --> 00:29:29,200 TREATMENT GROUP. 615 00:29:29,267 --> 00:29:31,469 BECAUSE THE SIX AND EIGHT ARE 616 00:29:31,536 --> 00:29:32,737 USED AS COVARIATES IN THE 617 00:29:32,804 --> 00:29:34,472 RANDOMIZATION. 618 00:29:34,539 --> 00:29:37,842 NO PROBLEM FOR SEX AND AGE. 619 00:29:37,909 --> 00:29:40,945 THEY ARE PERFECTLY BALANCED. 620 00:29:41,012 --> 00:29:43,148 AND FOR THE DURATION, ALSO VERY 621 00:29:43,214 --> 00:29:44,849 SIMILAR, THE MEANS ARE VERY 622 00:29:44,916 --> 00:29:49,854 SIMILAR. 623 00:29:49,921 --> 00:29:52,090 AND SO THE DURATION NOT 624 00:29:52,157 --> 00:29:59,430 ASSOCIATED WITH TREATMENT. 625 00:29:59,497 --> 00:30:01,666 THIS IS VERY, VERY IMPORTANT. 626 00:30:01,733 --> 00:30:03,668 FOR EXAMPLE, IF THE PATIENT IN 627 00:30:03,735 --> 00:30:06,905 DRUG A GROUP ARE MUCH OLDER 628 00:30:06,971 --> 00:30:08,640 THAN THOSE IN DRUG B GROUP AND 629 00:30:08,706 --> 00:30:11,776 IF THE AGE IS ASSOCIATED WITH 630 00:30:11,843 --> 00:30:13,211 THE PAIN, WE FOUND THE 631 00:30:13,278 --> 00:30:14,712 DIFFERENCE BETWEEN THE TWO 632 00:30:14,779 --> 00:30:14,913 GROUP. 633 00:30:14,979 --> 00:30:17,215 WE WOULD NOT KNOW IF THE 634 00:30:17,282 --> 00:30:18,550 DIFFERENCE WAS CAUSED BY DRUG 635 00:30:18,616 --> 00:30:21,052 OR BY AGE. 636 00:30:21,119 --> 00:30:25,590 IN THIS CASE, THE AGE EFFECT 637 00:30:25,657 --> 00:30:27,125 CANNOT BE ADJUSTED -- CANNOT 638 00:30:27,192 --> 00:30:34,766 BE ANCOVA. 639 00:30:34,832 --> 00:30:36,801 AND IS CONFOUNDED WITH THE AGE 640 00:30:36,868 --> 00:30:37,869 EFFECT IS CONFOUNDED WITH THE 641 00:30:37,936 --> 00:30:40,238 TREATMENT EFFECT. 642 00:30:40,305 --> 00:30:46,277 THIS INDICATING THE STUDY IS 643 00:30:46,344 --> 00:30:50,748 POORLY DESIGNED OR A FAILURE. 644 00:30:50,815 --> 00:30:51,950 THE NEXT IS HOMOGENEITY OF 645 00:30:52,016 --> 00:30:53,518 REGRESSION SLOPES. 646 00:30:53,585 --> 00:30:56,487 THIS MEANS THE SLOPES OF THE 647 00:30:56,554 --> 00:30:57,922 LINEAR REGRESSION, THE SLOPES 648 00:30:57,989 --> 00:31:00,124 OF THE REGRESSION LINE FOR EACH 649 00:31:00,191 --> 00:31:00,658 TREATMENT GROUP SHOULD BE 650 00:31:00,725 --> 00:31:08,633 SIMILAR. 651 00:31:08,700 --> 00:31:12,070 TO TEST THIS ASSUMPTION WE 652 00:31:12,136 --> 00:31:12,870 EVALUATE THE INTERACTION 653 00:31:12,937 --> 00:31:13,805 BETWEEN THE TREATMENT AND EACH 654 00:31:13,871 --> 00:31:15,073 COVARIATE. 655 00:31:15,139 --> 00:31:17,308 IN THIS CASE THE AGE OF 656 00:31:17,375 --> 00:31:19,010 TREATMENT, SEX AND TREATMENT 657 00:31:19,077 --> 00:31:19,711 AND THE DURATION AND THE 658 00:31:19,777 --> 00:31:21,546 TREATMENT. 659 00:31:21,613 --> 00:31:23,982 ALL THE INTERACTIONS WITH P 660 00:31:24,048 --> 00:31:26,884 LESS THAN 0.1. 661 00:31:26,951 --> 00:31:29,854 WHICH MEANS THE TWO ASSUMPTIONS 662 00:31:29,921 --> 00:31:33,858 FIT. 663 00:31:33,925 --> 00:31:36,461 SECOND, WE ARE GOING TO CHECK 664 00:31:36,527 --> 00:31:38,396 THE FOUR ASSUMPTIONS. 665 00:31:38,463 --> 00:31:40,832 BASED ON THE DESIGN, THERE'S NO 666 00:31:40,898 --> 00:31:44,669 PROBLEM FOR THE INDEPENDENT 667 00:31:44,736 --> 00:31:45,370 OBSERVATIONS. 668 00:31:45,436 --> 00:31:54,679 AND FOR THE SECOND ASSUMPTION, 669 00:31:54,746 --> 00:31:56,014 NO MULTICOLINEARITY. 670 00:31:56,080 --> 00:31:57,815 SINCE WE ALREADY KNOW THE 671 00:31:57,882 --> 00:32:00,018 TREATMENT IS NOT ASSOCIATED 672 00:32:00,084 --> 00:32:01,552 WITH THE THREE COVARIATES. 673 00:32:01,619 --> 00:32:05,657 SO WE ONLY NEED TO CHECK THE 674 00:32:05,723 --> 00:32:07,825 AGE, DURATION AND SEX. 675 00:32:07,892 --> 00:32:14,165 AND THEN WE RUN THE PEARSON 676 00:32:14,232 --> 00:32:16,234 CORRELATION AND ALL THREE 677 00:32:16,301 --> 00:32:16,834 CORRELATION COEFFICIENTS ARE 678 00:32:16,901 --> 00:32:18,403 VERY SMALL. 679 00:32:18,469 --> 00:32:28,913 SO THERE'S NO PROBLEM FOR 680 00:32:32,884 --> 00:32:35,486 MULTICOLLINEARITY. 681 00:32:35,553 --> 00:32:37,422 THE LINEAR RELATIONSHIP BETWEEN 682 00:32:37,488 --> 00:32:39,624 THE LOGIT OF PAIN AND AGE 683 00:32:39,691 --> 00:32:41,359 DURATION, BETWEEN THE AGE AND 684 00:32:41,426 --> 00:32:42,393 LOG AGE AND DURATION AND LOG 685 00:32:42,460 --> 00:32:45,063 DURATION. 686 00:32:45,129 --> 00:32:52,236 AND THIS TABLE SHOWS THAT THE 687 00:32:52,303 --> 00:32:57,108 AGE HAS A P VALUE OF LESS THAN 688 00:32:57,175 --> 00:32:57,842 0.01 AND NO PROBLEM WITH 689 00:32:57,909 --> 00:32:59,844 DURATION. 690 00:32:59,911 --> 00:33:04,082 THE FOLLOWING SLIDES WILL SHOW 691 00:33:04,148 --> 00:33:08,686 HOW TO DEAL WITH THIS NON 692 00:33:08,753 --> 00:33:10,154 LINEARITY. 693 00:33:10,221 --> 00:33:13,024 AND THE FOURTH ASSUMPTION IS NO 694 00:33:13,091 --> 00:33:13,891 INFLUENTIAL VALUES WHICH WILL 695 00:33:13,958 --> 00:33:16,794 BE SHOWING IN THE SLIDES FOR 696 00:33:16,861 --> 00:33:21,466 DIAGNOSTICS. 697 00:33:21,532 --> 00:33:25,236 FOR THE CONTINUOUS INDEPENDENT 698 00:33:25,303 --> 00:33:26,537 VARIABLES IN THE INDICATION, 699 00:33:26,604 --> 00:33:31,876 EVEN IF THERE'S NO MORE 700 00:33:31,943 --> 00:33:32,977 DISTRIBUTION ASSUMPTION, BUT WE 701 00:33:33,044 --> 00:33:39,717 NEEDED TO, BUT THERE SHOULD BE 702 00:33:39,784 --> 00:33:43,788 NO OUTLIER, ABSOLUTE VALUES. 703 00:33:43,855 --> 00:33:47,525 WE CAN SIMPLY CALCULATE THE 704 00:33:47,592 --> 00:33:48,192 KURTOSIS AND SKEWNESS FOR EACH 705 00:33:48,259 --> 00:33:49,560 VALUE. 706 00:33:49,627 --> 00:33:51,162 IF IT'S LESS THAN ONE, THERE 707 00:33:51,229 --> 00:33:56,534 WILL BE NO PROBLEM. 708 00:33:56,601 --> 00:34:01,472 THIS SLIDE SHOWS HOW TO DEAL 709 00:34:01,539 --> 00:34:02,907 WITH THE NONLINEARITY. 710 00:34:02,974 --> 00:34:05,710 HERE WE SEE THE FIRST MODEL. 711 00:34:05,777 --> 00:34:07,545 WE JUST USE THE AGE AS 712 00:34:07,612 --> 00:34:12,650 COVARIATES. 713 00:34:12,717 --> 00:34:15,286 AND IGNORE THE NONLINEARITY. 714 00:34:15,353 --> 00:34:16,921 THE DURATION WAS DROPPED FROM 715 00:34:16,988 --> 00:34:19,157 THE MODEL BECAUSE THE P VALUE 716 00:34:19,223 --> 00:34:26,831 IS GREATER THAN 0.9. 717 00:34:26,898 --> 00:34:30,168 AND TO SOLVE THE PROBLEM WE 718 00:34:30,234 --> 00:34:32,603 CATEGORIZE THE CONTINUOUS 719 00:34:32,670 --> 00:34:35,640 VARIABLE AS CATEGORICAL. 720 00:34:35,706 --> 00:34:37,074 THIS TABLE SHOWS TWO CATEGORY 721 00:34:37,141 --> 00:34:39,944 VARIABLE FOR H. 722 00:34:40,011 --> 00:34:43,114 ONE WITH TWO CATEGORY. 723 00:34:43,181 --> 00:34:45,817 USING 70 AS A CUT-OFF. 724 00:34:45,883 --> 00:34:48,753 THE SECOND WAS THREE 725 00:34:48,820 --> 00:34:49,987 CATEGORIES, WITH 68 AND 72 AS 726 00:34:50,054 --> 00:34:53,191 CUT-OFF. 727 00:34:53,257 --> 00:34:56,694 AND FIRST WE CAN TREAT IT AS 728 00:34:56,761 --> 00:34:58,663 CATEGORICAL. 729 00:34:58,729 --> 00:35:01,332 FOR AGE TREAT, THERE WILL BE NO 730 00:35:01,399 --> 00:35:03,201 ASSUMPTION FOR THE LINEARITY. 731 00:35:03,267 --> 00:35:06,170 AND THE MODEL 2, WE USE THE AGE 732 00:35:06,237 --> 00:35:08,473 WITH TWO CATEGORY. 733 00:35:08,539 --> 00:35:11,843 AND THE MODEL INCLUDE HAS FOUR 734 00:35:11,909 --> 00:35:15,213 DEGREES OF FREEDOM. 735 00:35:15,279 --> 00:35:17,048 SO TWO FOR TREATMENT, ONE FOR 736 00:35:17,114 --> 00:35:24,188 SEX AND ONE FOR AGE. 737 00:35:24,255 --> 00:35:25,389 AND MODEL THREE, THE AGE WITH 3 738 00:35:25,456 --> 00:35:27,425 CATEGORY. 739 00:35:27,492 --> 00:35:30,161 AND DEGREES OF FREEDOM IS 2, SO 740 00:35:30,228 --> 00:35:32,296 WE HAVE FIVE DEGREES OF FREEDOM. 741 00:35:32,363 --> 00:35:36,767 AND THE SECOND SOLUTION IS TO 742 00:35:36,834 --> 00:35:38,936 PIECEWISE LINEARIZATION. 743 00:35:39,003 --> 00:35:43,474 THE METHOD IS TO DIVIDE AGE 744 00:35:43,541 --> 00:35:45,309 INTO TWO LINEAR SECTION, LIKE 745 00:35:45,376 --> 00:35:47,245 AGE LESS THAN 70 AND AGE 746 00:35:47,311 --> 00:35:49,947 GREATER THAN 70. 747 00:35:50,014 --> 00:35:52,416 EACH SECTION IS ASSUMED TO HAVE 748 00:35:52,483 --> 00:35:54,385 LINEARITY. 749 00:35:54,452 --> 00:35:58,456 SO THEREFORE WE COME TO THE 750 00:35:58,523 --> 00:36:04,529 MODEL 4 AS A TREATMENT AND SEX, 751 00:36:04,595 --> 00:36:10,167 AND AGE INCLUDE THREE, AGE AS 752 00:36:10,234 --> 00:36:13,371 CONTINUOUS AND AGE 2 WITH 2 753 00:36:13,437 --> 00:36:14,305 CATEGORY AND INTERACTION 754 00:36:14,372 --> 00:36:16,607 BETWEEN AGE AND AGE 2. 755 00:36:16,674 --> 00:36:18,643 SO THIS MODEL HAS TOTAL MODEL 756 00:36:18,709 --> 00:36:20,578 DEGREE WITH SIX. 757 00:36:20,645 --> 00:36:24,015 SO TWO HERE, ONE, ONE, ONE. 758 00:36:24,081 --> 00:36:29,387 AND THEN WE HAVE TEST THE 759 00:36:29,453 --> 00:36:30,021 LINEARITY, USE THE BOX-TIDWELL 760 00:36:30,087 --> 00:36:30,888 TEST. 761 00:36:30,955 --> 00:36:36,327 AND THEN WE FOUND THE P VALUE 762 00:36:36,394 --> 00:36:39,263 IS GREATER THAN 0.1. 763 00:36:39,330 --> 00:36:40,431 THIS TABLE SHOWS JOINT TEST FOR 764 00:36:40,498 --> 00:36:42,066 MODEL 4. 765 00:36:42,133 --> 00:36:45,036 SO THE MODEL INCLUDE AGE, AGE 766 00:36:45,102 --> 00:36:46,637 2, AGE INTERACTION BETWEEN 767 00:36:46,704 --> 00:36:50,274 THESE TWO AND SIX AND TREATMENT. 768 00:36:50,341 --> 00:36:52,944 YOU SEE THE INTERACTION IS 769 00:36:53,010 --> 00:36:55,613 SIGNIFICANT. 770 00:36:55,680 --> 00:36:57,148 THE SIGNIFICANT INTERACTION 771 00:36:57,214 --> 00:37:00,551 INDICATES THAT THE TWO AGE 772 00:37:00,618 --> 00:37:03,588 SECTION HAVE DIFFERENT 773 00:37:03,654 --> 00:37:04,689 RELATIONSHIPS BETWEEN LARGE AND 774 00:37:04,755 --> 00:37:08,759 AGE RATIO. 775 00:37:08,826 --> 00:37:10,161 THIS SLIDE SHOWS COMPARISONS 776 00:37:10,227 --> 00:37:13,297 BETWEEN THE FOUR MODELS. 777 00:37:13,364 --> 00:37:18,002 THE MODEL 1, USE THE AGE AND 778 00:37:18,069 --> 00:37:18,436 CONTINUOUS BUT WITH 779 00:37:18,502 --> 00:37:21,172 NONLINEARITY. 780 00:37:21,238 --> 00:37:25,109 AND MODEL 2, AGE 2 CATEGORY. 781 00:37:25,176 --> 00:37:26,844 FOR THE CATEGORICAL INDEPENDENT 782 00:37:26,911 --> 00:37:27,812 VARIABLE, THERE'S NO 783 00:37:27,878 --> 00:37:33,684 REQUIREMENT FOR THE LINEARITY. 784 00:37:33,751 --> 00:37:36,354 AND THE MODEL 3, 4 AND THE AGE 785 00:37:36,420 --> 00:37:41,859 INCLUDE THE THREE ITEMS. 786 00:37:41,926 --> 00:37:44,362 THIS, HERE SHOWS H.L., GOODNESS 787 00:37:44,428 --> 00:37:46,631 OF FIT TEST P VALUE. 788 00:37:46,697 --> 00:37:48,866 YOU SEE THE MODEL 4 FITS BETTER 789 00:37:48,933 --> 00:37:53,704 THAN THE OTHER THREE. 790 00:37:53,771 --> 00:37:59,076 THIS TABLE, THIS TABLE SHOWS 791 00:37:59,143 --> 00:38:02,380 AGE EFFECT AND THE FOUR MODELS. 792 00:38:02,446 --> 00:38:07,051 FOR MODEL 1 THE AGE RANGE IS 793 00:38:07,118 --> 00:38:11,288 59-83 AND ODDS RATIO 1.21. 794 00:38:11,355 --> 00:38:12,790 AND CONFIDENCE INTERVAL DOESN'T 795 00:38:12,857 --> 00:38:15,026 INCLUDE A 1 WHICH MEANS THE 796 00:38:15,092 --> 00:38:17,028 ODDS RATIO IS SIGNIFICANTLY 797 00:38:17,094 --> 00:38:21,399 DIFFERENT FROM 1. 798 00:38:21,465 --> 00:38:25,870 AND THE 1.21, WHICH MEANS FOR 799 00:38:25,936 --> 00:38:29,340 WHOLE RANGE OF AGE, FOR EACH 800 00:38:29,407 --> 00:38:34,412 YEAR INCREASE, THE ODDS OF PAIN 801 00:38:34,478 --> 00:38:38,349 INCREASED 21%. 802 00:38:38,416 --> 00:38:40,351 AND FOR MODEL 2, THE AGE IS 803 00:38:40,418 --> 00:38:44,021 ALSO SIGNIFICANT. 804 00:38:44,088 --> 00:38:49,326 THE ODDS RATIO ALSO SIGNIFICANT. 805 00:38:49,393 --> 00:38:51,862 FOR AGES 3, NO DIFFERENCE FOR 806 00:38:51,929 --> 00:38:54,198 THIS INTERVAL COMPARED WITH 807 00:38:54,265 --> 00:38:59,236 LESS THAN 68. 808 00:38:59,303 --> 00:38:59,770 BECAUSE INTERVAL DOESN'T 809 00:38:59,837 --> 00:39:01,839 INCLUDE 1. 810 00:39:01,906 --> 00:39:03,941 WHICH MEANS ODDS RATIO IS NOT 811 00:39:04,008 --> 00:39:07,211 SIGNIFICANTLY DIFFERENT FROM 1. 812 00:39:07,278 --> 00:39:08,713 AND FOR THESE TWO COMPARISON 813 00:39:08,779 --> 00:39:11,582 ODDS RATIO ARE SIGNIFICANT. 814 00:39:11,649 --> 00:39:15,686 AND FOR THE MODEL 4, YOU SEE 815 00:39:15,753 --> 00:39:19,390 FOR AGE LESS THAN 70, THE ODDS 816 00:39:19,457 --> 00:39:22,693 RATIO IS 0.88. 817 00:39:22,760 --> 00:39:26,330 EVEN LESS THAN 1. 818 00:39:26,397 --> 00:39:27,765 THE ODDS RATIO COMES INTERVAL 819 00:39:27,832 --> 00:39:29,934 INCLUDES 1. 820 00:39:30,000 --> 00:39:32,903 SO THIS ODDS RATIO IS NOT 821 00:39:32,970 --> 00:39:33,404 SIGNIFICANT. 822 00:39:33,471 --> 00:39:36,707 WHICH MEANS FOR AGE LESS THAN 823 00:39:36,774 --> 00:39:40,911 70, AGE HAS NO EFFECT ON THE 824 00:39:40,978 --> 00:39:41,645 PAIN. 825 00:39:41,712 --> 00:39:44,515 BUT FOR AGE GREATER THAN 70, 826 00:39:44,582 --> 00:39:55,092 THE AGE EFFECT IS SIGNIFICANT. 827 00:39:59,063 --> 00:40:04,401 THIS SLIDE SHOWS DIAGNOSTICS. 828 00:40:04,468 --> 00:40:07,404 BASED ON THE PEARSON RESIDUAL, 829 00:40:07,471 --> 00:40:11,575 THERE ARE TWO OUTLIER. 830 00:40:11,642 --> 00:40:13,677 WHICH IS 12, AND 17. 831 00:40:13,744 --> 00:40:17,715 AND THESE TWO OUTLIER ALSO 832 00:40:17,782 --> 00:40:18,215 SHOWING IN THE DEVIANCE 833 00:40:18,282 --> 00:40:19,750 RESIDUALS. 834 00:40:19,817 --> 00:40:21,786 12, 17. 835 00:40:21,852 --> 00:40:23,888 BASED ON THE LEVERAGE, THERE'S 836 00:40:23,954 --> 00:40:28,759 ONLY ONE OUTLIER. 837 00:40:28,826 --> 00:40:30,861 AND BASED ON THE CI STATISTICS, 838 00:40:30,928 --> 00:40:32,897 THERE ARE FIVE OF THEM GREATER 839 00:40:32,963 --> 00:40:37,501 THAN 1. 840 00:40:37,568 --> 00:40:41,138 AND HERE WE SEE THIS DIAGNOSIS 841 00:40:41,205 --> 00:40:45,910 USUALLY WE PAY MORE ATTENTION 842 00:40:45,976 --> 00:40:49,313 FOR THE CASES SHOWING IN MORE 843 00:40:49,380 --> 00:40:59,723 THAN ONE STATISTICS. 844 00:41:01,525 --> 00:41:02,893 SO THIS SHOWS THE FIVE POSSIBLE 845 00:41:02,960 --> 00:41:05,496 OUTLIERS. 846 00:41:05,563 --> 00:41:07,231 HOW TO HANDLE THIS, FIRST I 847 00:41:07,298 --> 00:41:10,067 JUST SAID WE NEEDED TO PAY MORE 848 00:41:10,134 --> 00:41:16,240 ATTENTION FOR THE OUTLIER OF 849 00:41:16,307 --> 00:41:17,341 THE INFLUENTIAL OBSERVATIONS, 850 00:41:17,408 --> 00:41:20,511 SHOWING MORE THAN ONE 851 00:41:20,578 --> 00:41:23,147 STATISTICS LIKE THESE THREE. 852 00:41:23,214 --> 00:41:26,217 AND TO HANDLE THESE OUTLIERS, 853 00:41:26,283 --> 00:41:28,452 AND THE SOLUTIONS ARE THE FIRST 854 00:41:28,519 --> 00:41:31,689 THE TWO TO CHECK IF THE 855 00:41:31,755 --> 00:41:32,823 INFLUENTIAL OBSERVATIONS ARE 856 00:41:32,890 --> 00:41:36,393 CAUSED BY TECHNICAL ERRORS. 857 00:41:36,460 --> 00:41:40,130 THE SECOND IS TO RUN LOGISTIC 858 00:41:40,197 --> 00:41:42,633 REGRESSION ANALYSIS AGAIN BY 859 00:41:42,700 --> 00:41:43,167 EXCLUDING THE INFLUENTIAL 860 00:41:43,234 --> 00:41:45,803 OBSERVATIONS. 861 00:41:45,870 --> 00:41:47,504 THIS ANALYSIS IS ALSO CALLED 862 00:41:47,571 --> 00:41:53,711 SENSITIVITY ANALYSIS. 863 00:41:53,777 --> 00:41:56,714 AND USUALLY WE REPORT -- POST? 864 00:41:56,780 --> 00:42:02,253 (?) RESULTS. 865 00:42:02,319 --> 00:42:04,088 THIS TABLE SHOWS ODDS RATIO 866 00:42:04,154 --> 00:42:11,262 CALCULATED FROM THE MODEL 4. 867 00:42:11,328 --> 00:42:13,397 AND TO INTERPRET THE ODDS 868 00:42:13,464 --> 00:42:15,032 RATIO, WE CAN INTERPRET ODDS 869 00:42:15,099 --> 00:42:18,802 RATIO LIKE THIS WAY. 870 00:42:18,869 --> 00:42:22,172 FOR EXAMPLE FOR THE TREATMENT A 871 00:42:22,239 --> 00:42:28,212 VERSUS B, THE ODDS RAISH 872 00:42:28,279 --> 00:42:29,380 -- RATIO IS 1.3 BECAUSE IT 873 00:42:29,446 --> 00:42:30,514 INCLUDE 1. 874 00:42:30,581 --> 00:42:32,650 SO WE CAN SEE THERE'S NO 875 00:42:32,716 --> 00:42:33,417 SIGNIFICANT DIFFERENCE BETWEEN 876 00:42:33,484 --> 00:42:35,185 DRUG A AND B. 877 00:42:35,252 --> 00:42:45,863 AFTER ADJUSTING FOR AGE AND SEX. 878 00:42:45,930 --> 00:42:50,000 THE ODDS RATIO FOR TREATMENT A 879 00:42:50,067 --> 00:42:53,671 TO B THE ODDS RATIO IS 0.057 880 00:42:53,737 --> 00:42:54,972 WHICH IS SIGNIFICANT BECAUSE IT 881 00:42:55,039 --> 00:42:56,473 DOESN'T INCLUDE 1. 882 00:42:56,540 --> 00:43:00,210 BECAUSE THE ODDS RATIO IS LESS 883 00:43:00,277 --> 00:43:04,548 THAN 1, WE USUALLY USE 1 MINUS 884 00:43:04,615 --> 00:43:06,617 THIS ONE AND INTERPRET AS LOWER. 885 00:43:06,684 --> 00:43:08,786 LIKE THIS ONE YOU CAN SEE THE 886 00:43:08,852 --> 00:43:11,455 ODDS OF PAIN FOR PATIENTS 887 00:43:11,522 --> 00:43:15,859 TAKING DRUG A ARE 94.3 LOWER 888 00:43:15,926 --> 00:43:18,495 THAN THOSE TAKING PLACEBO. 889 00:43:18,562 --> 00:43:24,969 ADJUSTING FOR AGE AND SEX. 890 00:43:25,035 --> 00:43:30,207 AND FOR SEX HERE, THE ODDS RAISH 891 00:43:30,274 --> 00:43:32,176 -- RATIO IS 4.675. 892 00:43:32,242 --> 00:43:35,279 WE CAN SEE THE ADJUSTED ODDS 893 00:43:35,346 --> 00:43:37,915 RATIO OF PAIN FOR MALE PATIENTS 894 00:43:37,982 --> 00:43:40,818 ARE 4.7 TIMES THE ODDS FOR 895 00:43:40,884 --> 00:43:42,753 FEMALE PATIENTS AFTER 896 00:43:42,820 --> 00:43:44,688 CONTROLLING FOR TREATMENT AND 897 00:43:44,755 --> 00:43:54,832 AGE. 898 00:43:55,299 --> 00:43:58,035 AND FOR AGE LESS THAN 70, AGE 899 00:43:58,102 --> 00:43:59,903 HAD NO EFFECT ON PAIN AFTER 900 00:43:59,970 --> 00:44:00,504 ADJUSTING FOR SEX AND THE 901 00:44:00,571 --> 00:44:03,240 TREATMENT. 902 00:44:03,307 --> 00:44:05,876 AND FOR AGE GREATER THAN 70, WE 903 00:44:05,943 --> 00:44:08,512 CAN SEE FOR THE PATIENTS OVER 904 00:44:08,579 --> 00:44:11,048 70 YEARS OLD, EACH 1 YEAR 905 00:44:11,115 --> 00:44:16,286 INCREASE IN AGE IS ASSOCIATED 906 00:44:16,353 --> 00:44:19,056 WITH 81.5% INCREASE IN THE ODDS 907 00:44:19,123 --> 00:44:21,125 OF HAVING PAIN, ADJUSTING FOR 908 00:44:21,191 --> 00:44:31,568 TREATMENT AND THE SEX. 909 00:44:32,703 --> 00:44:35,205 THE SECOND EXAMPLE IS 910 00:44:35,272 --> 00:44:36,573 INVESTIGATE SIX BIOMARKERS FOR 911 00:44:36,640 --> 00:44:47,117 PREDICTOR CANCER REMISSION. 912 00:44:49,853 --> 00:44:52,489 FOR THIS IS WHETHER OR NOT 913 00:44:52,556 --> 00:44:54,792 CANCER REMISSION OCCURRED. 914 00:44:54,858 --> 00:44:57,528 SEVEN INDEPENDENCE EPT SUBJECTS 915 00:44:57,594 --> 00:44:59,430 AND CONTAINS 6 PREDICTOR OR 916 00:44:59,496 --> 00:45:06,103 INDEPENDENT VARIABLES. 917 00:45:06,170 --> 00:45:07,671 CELL, SMEAR, INFIL, LI, BLAST, 918 00:45:07,738 --> 00:45:08,639 TEMP. 919 00:45:08,705 --> 00:45:11,842 FIRST WE NEED TO CHECK THE 920 00:45:11,909 --> 00:45:12,142 ASSUMPTIONS. 921 00:45:12,209 --> 00:45:14,611 THE FIRST ASSUMPTION OF 922 00:45:14,678 --> 00:45:18,515 INDEPENDENT OBSERVATIONS, WE 923 00:45:18,582 --> 00:45:21,752 DON'T NEED TO CHECK BECAUSE 924 00:45:21,819 --> 00:45:22,219 THIS 27 SUBJECTS ARE 925 00:45:22,286 --> 00:45:23,220 INDEPENDENT. 926 00:45:23,287 --> 00:45:33,797 THE FIRST WE NEEDED TO CHECK 927 00:45:37,234 --> 00:45:40,637 THE MULTICOLLINEARITY. 928 00:45:40,704 --> 00:45:45,409 THIS SHOWS THE PEARSON 929 00:45:45,476 --> 00:45:46,610 CORRELATION FOR ALL SIX 930 00:45:46,677 --> 00:45:51,815 VARIABLES FOR THE PAIRS. 931 00:45:51,882 --> 00:45:56,086 CLEARLY YOU SEE HERE INFIL AND 932 00:45:56,153 --> 00:46:00,657 SMEAR ARE STRONGLY CORRELATED 933 00:46:00,724 --> 00:46:01,925 GREATER THAN 0.8. 934 00:46:01,992 --> 00:46:05,062 FOR THIS PAIR ONE SHOULD BE 935 00:46:05,129 --> 00:46:12,369 EXCLUDED FROM THE ANALYSIS. 936 00:46:12,436 --> 00:46:16,406 THE SECOND METHOD IS TO RUN 937 00:46:16,473 --> 00:46:20,444 LINEAR REGRESSION TO CHECK VIF 938 00:46:20,511 --> 00:46:21,311 BY TREATING REMISSION AS 939 00:46:21,378 --> 00:46:24,915 CONTINUOUS VARIABLE. 940 00:46:24,982 --> 00:46:29,253 AND HERE, BASED ON THE VIF, THE 941 00:46:29,319 --> 00:46:37,027 LAST COLUMN, YOU WILL SEE IF 942 00:46:37,094 --> 00:46:44,434 THESE THREE VARIABLES SHOWING 943 00:46:44,501 --> 00:46:45,335 SEVERE MULTICOLLINEARITY, 944 00:46:45,402 --> 00:46:47,437 BECAUSE INFIL HAS THE LARGEST 945 00:46:47,504 --> 00:46:49,173 V.I.F. 946 00:46:49,239 --> 00:46:52,476 SO WE FIRST DROP THE INFIL. 947 00:46:52,543 --> 00:46:56,613 AND THEN AFTER WE EXCLUDE THE 948 00:46:56,680 --> 00:47:02,286 INFIL, YOU SEE THE VIF VALUES 949 00:47:02,352 --> 00:47:05,222 ARE PERFECT. 950 00:47:05,289 --> 00:47:07,958 EXCEPT 4. 951 00:47:08,025 --> 00:47:08,959 THEREFORE THE INFIL IS DROPPED 952 00:47:09,026 --> 00:47:12,629 FROM THE ANALYSIS. 953 00:47:12,696 --> 00:47:14,531 IF YOU ARE SHOWING THIS, YOU 954 00:47:14,598 --> 00:47:17,234 MIGHT TRY TO DROP THE OTHER ONE. 955 00:47:17,301 --> 00:47:18,702 BECAUSE THESE TWO ARE CLOSELY 956 00:47:18,769 --> 00:47:22,706 CORRELATED. 957 00:47:22,773 --> 00:47:26,543 AND BECAUSE THE VIF FOR INFIL 958 00:47:26,610 --> 00:47:34,518 HAS LARGEST, SO WE FIRST 959 00:47:34,585 --> 00:47:36,486 CONSIDER DROP INFIL. 960 00:47:36,553 --> 00:47:41,458 THEN CHEKALIN YAIRITY. 961 00:47:41,525 --> 00:47:45,295 -- CHECK LINEARITY. 962 00:47:45,362 --> 00:47:50,367 LIKE CELL, LOG CELL AND THE LI. 963 00:47:50,434 --> 00:47:52,936 AND LI AND THE LOG LI. 964 00:47:53,003 --> 00:47:59,042 AND LOOK AT THIS, THE P VALUES 965 00:47:59,109 --> 00:48:00,877 THEY ARE ALL GOING. 966 00:48:00,944 --> 00:48:03,780 WE DON'T NEED TO WORRY ABOUT 967 00:48:03,847 --> 00:48:14,024 LINEARITY. 968 00:48:17,327 --> 00:48:17,694 THEN WE CHECK STEPWISE 969 00:48:17,761 --> 00:48:21,865 REGRESSION. 970 00:48:21,932 --> 00:48:24,635 WE USE SIGNIFICANCE LEVEL OF 971 00:48:24,701 --> 00:48:25,502 0.3. 972 00:48:25,569 --> 00:48:26,937 SET TO ALLOW A VARIABLE INTO 973 00:48:27,004 --> 00:48:27,804 THE MODEL AND TO STAY IN THE 974 00:48:27,871 --> 00:48:28,405 MODEL. 975 00:48:28,472 --> 00:48:30,140 0.3. 976 00:48:30,207 --> 00:48:32,175 THIS TABLE SHOWS A SUMMARY OF 977 00:48:32,242 --> 00:48:38,949 STEPWISE SELECTION. 978 00:48:39,016 --> 00:48:41,451 YOU WILL SEE ALL THREE P 979 00:48:41,518 --> 00:48:45,622 VARIABLES ARE LESS THAN 0.3. 980 00:48:45,689 --> 00:48:50,360 AND HERE ALSO SHOWS AREA AUROC. 981 00:48:50,427 --> 00:48:56,566 YOU SEE EACH VARIABLE ADDED HOW 982 00:48:56,633 --> 00:49:03,974 MUCH AUROC ADDED. 983 00:49:04,041 --> 00:49:06,943 BASED ON THE THREE SELECTED 984 00:49:07,010 --> 00:49:10,947 VARIABLES, LOGISTIC REGRESSION 985 00:49:11,014 --> 00:49:12,482 USING THE SELECTED PREDICTORS, 986 00:49:12,549 --> 00:49:15,419 THEN WE CHECK THE GOODNESS OF 987 00:49:15,485 --> 00:49:15,585 FIT. 988 00:49:15,652 --> 00:49:18,822 AND THE HL TEST WITH THE P 989 00:49:18,889 --> 00:49:23,794 VALUE OF 0.5054. 990 00:49:23,860 --> 00:49:26,630 WE CAN SEE THEY FIT WELL. 991 00:49:26,697 --> 00:49:30,367 AND THIS TABLE, THE LEFT SIDE 992 00:49:30,434 --> 00:49:33,503 SHOWS ODDS RATIO, BASED ON THE 993 00:49:33,570 --> 00:49:34,671 UNIT CALLED 1. 994 00:49:34,738 --> 00:49:41,578 WHICH IS DEFAULT. 995 00:49:41,645 --> 00:49:42,979 YOU SEE THEY JUST DON'T KNOW 996 00:49:43,046 --> 00:49:44,481 WHAT ARE THEY. 997 00:49:44,548 --> 00:49:49,052 AND THEN WE CAN ADJUST THE 998 00:49:49,119 --> 00:49:51,221 UNIT, LIKE FOR LI, CHANGE UNIT 999 00:49:51,288 --> 00:49:52,989 TO 0.1. 1000 00:49:53,056 --> 00:49:56,927 YOU SEE THEY LOOK VERY NICE. 1001 00:49:56,993 --> 00:49:59,062 AND TEMP, WE CHANGE THE UNIT TO 1002 00:49:59,129 --> 00:50:00,764 0.01. 1003 00:50:00,831 --> 00:50:04,668 AND WITH CELL WE CHANGE IT TO 1004 00:50:04,735 --> 00:50:06,336 0.01. 1005 00:50:06,403 --> 00:50:07,971 THEN THEY HAVE VERY NICE ODDS 1006 00:50:08,038 --> 00:50:09,473 RATIO. 1007 00:50:09,539 --> 00:50:13,910 LIKE FOR LI WE CAN INTERPRET IT 1008 00:50:13,977 --> 00:50:20,650 AS FOR EACH 0.1 UNIT INCREASE, 1009 00:50:20,717 --> 00:50:23,453 THE ODDS OF REMISSION INCREASED 1010 00:50:23,520 --> 00:50:27,324 47. 1011 00:50:27,391 --> 00:50:29,226 AND WE USE SIGNIFICANCE LEVEL 1012 00:50:29,292 --> 00:50:33,497 OF 0.15. 1013 00:50:33,563 --> 00:50:35,399 WHICH IS USED. 1014 00:50:35,465 --> 00:50:39,836 THEN THE MODEL INCLUDE ONLY ONE 1015 00:50:39,903 --> 00:50:40,437 BIOMARKER, WHICH IS LI. 1016 00:50:40,504 --> 00:50:43,907 LIKE THIS. 1017 00:50:43,974 --> 00:50:45,742 AND THE OTHER HL TEST P VALUE 1018 00:50:45,809 --> 00:50:54,918 IS OKAY. 1019 00:50:54,985 --> 00:50:56,820 THIS SLIDE SHOWS DIAGNOSTIC 1020 00:50:56,887 --> 00:51:03,493 TEST FOR THE FOUR STATISTICS. 1021 00:51:03,560 --> 00:51:07,297 YOU SEE IN THE PEARSON RESIDUAL 1022 00:51:07,364 --> 00:51:09,232 THERE'S ONE OUTLIER. 1023 00:51:09,299 --> 00:51:13,970 AND THIS ONE IS ALSO SHOWING IN 1024 00:51:14,037 --> 00:51:24,414 THE DEVIANCE RESIDUAL. 1025 00:51:30,287 --> 00:51:31,288 FOR THE LEVERAGE THERE'S NO 1026 00:51:31,354 --> 00:51:38,028 OUTLIER. 1027 00:51:38,094 --> 00:51:39,262 AND FOR THE CONFIDENCE 1028 00:51:39,329 --> 00:51:42,699 INTERVAL, THIS IS PLACEMENT OF 1029 00:51:42,766 --> 00:51:48,438 C OR C STATISTICS ALSO SHOW. 1030 00:51:48,505 --> 00:51:50,841 THE OUTLIER FROM EACH OF THE 1031 00:51:50,907 --> 00:51:54,077 PLOTS ARE THE SAME, IS THE CASE 1032 00:51:54,144 --> 00:51:54,911 8. 1033 00:51:54,978 --> 00:52:05,121 CASE 8. 1034 00:52:09,025 --> 00:52:11,561 THIS TABLE IS PREDICTOR 1035 00:52:11,628 --> 00:52:12,696 PROBABILITIES FOR CASE 1 TO 8 1036 00:52:12,762 --> 00:52:15,298 FOR THIS MODEL. 1037 00:52:15,365 --> 00:52:18,502 THE FIRST COLUMN IS THE CASE 1038 00:52:18,568 --> 00:52:20,770 NUMBER. 1039 00:52:20,837 --> 00:52:23,373 AND THE SECOND COLUMN IS 1040 00:52:23,440 --> 00:52:25,942 RESERVED VALUE FOR REMISSION. 1041 00:52:26,009 --> 00:52:28,278 1 IS REMISSION OCCUR. 1042 00:52:28,345 --> 00:52:29,880 AND 0 IS NO REMISSION. 1043 00:52:29,946 --> 00:52:32,516 AND LI VALUE. 1044 00:52:32,582 --> 00:52:36,353 AND FROM IS SAME AS REMISSION. 1045 00:52:36,419 --> 00:52:40,991 AND THE INTO COLUMN IS A 1046 00:52:41,057 --> 00:52:44,361 PREDICTOR REMISSION STATUS. 1047 00:52:44,427 --> 00:52:47,330 DETERMINED BY THE PHAT. 1048 00:52:47,397 --> 00:52:49,566 PHAT IS A PREDICTOR PROBABILITY 1049 00:52:49,633 --> 00:52:53,203 FOR REMISSION EQUAL TO 1. 1050 00:52:53,270 --> 00:52:58,375 SO HERE, IT WAS A PHAT HAS A 1051 00:52:58,441 --> 00:53:01,177 VALUE GREATER THAN 0.5. 1052 00:53:01,244 --> 00:53:07,217 SO INTO COLUMN IS 1. 1053 00:53:07,284 --> 00:53:08,852 LESS THAN 1, 0.5 WOULD BE 0 1054 00:53:08,919 --> 00:53:11,821 LIKE THIS ONE. 1055 00:53:11,888 --> 00:53:13,823 LESS THAN 0.5 IS 0. 1056 00:53:13,890 --> 00:53:15,759 SO THIS IS A PREDICTOR 1057 00:53:15,825 --> 00:53:17,060 REMISSION STATUS BASED ON THE 1058 00:53:17,127 --> 00:53:21,798 PREDICTOR PROBABILITY. 1059 00:53:21,865 --> 00:53:24,834 THESE TWO COLUMNS ARE THE 1060 00:53:24,901 --> 00:53:28,805 CONFIDENCE INTERVALS FOR THE 1061 00:53:28,872 --> 00:53:30,473 PHAT. 1062 00:53:30,540 --> 00:53:32,909 AND XP1 IS THE CROSS VALIDATED 1063 00:53:32,976 --> 00:53:33,910 PROBABILITY. 1064 00:53:33,977 --> 00:53:36,313 FOR REMISSION 1. 1065 00:53:36,379 --> 00:53:43,153 WHICH IS CALCULATED LEAVE ON 1066 00:53:43,219 --> 00:53:44,521 THE LEAVE-ONE-OUT PRINCIPLE. 1067 00:53:44,588 --> 00:53:49,159 AND XP0 IS EQUAL TO 1 MINUS XP1. 1068 00:53:49,225 --> 00:53:52,796 FOR EXAMPLE, FOR THE CASE #1, 1069 00:53:52,862 --> 00:53:53,396 THE OBSERVED REMISSION STATUS 1070 00:53:53,463 --> 00:53:56,132 IS 1. 1071 00:53:56,199 --> 00:54:01,371 AND THE PREDICTOR IS ALSO 1. 1072 00:54:01,438 --> 00:54:01,972 THE PREDICTOR PROBABILITY IS 1073 00:54:02,038 --> 00:54:04,608 0.849. 1074 00:54:04,674 --> 00:54:06,142 AND THE CONFIDENCE INTERVAL ARE 1075 00:54:06,209 --> 00:54:09,112 THIS. 1076 00:54:09,179 --> 00:54:14,551 AND IF WE EXCLUDE THE CASE #1 1077 00:54:14,618 --> 00:54:17,787 DATA, AND THEN THE PREDICTOR 1078 00:54:17,854 --> 00:54:20,890 PROBABILITY FOR CASE #1 IS 0.8. 1079 00:54:20,957 --> 00:54:24,561 SO IT'S VERY CLOSE. 1080 00:54:24,628 --> 00:54:28,632 WHICH MEANS THE CASE #1 DATA IS 1081 00:54:28,698 --> 00:54:33,136 NOT INFLUENTIAL OBSERVATION. 1082 00:54:33,203 --> 00:54:36,406 AND FOR THE CASE #8, THE 1083 00:54:36,473 --> 00:54:39,109 OBSERVED REMISSION IS 0. 1084 00:54:39,175 --> 00:54:44,748 SO IT'S NO REMISSION. 1085 00:54:44,814 --> 00:54:46,182 AND THE PREDICTOR REMISSION 1086 00:54:46,249 --> 00:54:49,486 STATUS IS 1. 1087 00:54:49,552 --> 00:54:52,722 AND BECAUSE CASE #8 AND #1 HAVE 1088 00:54:52,789 --> 00:54:56,793 THE SAME LI VALUE, SO THE PHAT, 1089 00:54:56,860 --> 00:54:57,394 THE PREDICTOR P VALUE ARE THE 1090 00:54:57,460 --> 00:54:59,629 SAME. 1091 00:54:59,696 --> 00:55:01,431 BUT THIS ONE IS DIFFERENT. 1092 00:55:01,498 --> 00:55:05,168 IT WAS A CASE #1 OR CASE #8 IS 1093 00:55:05,235 --> 00:55:07,404 EXCLUDED, AND THEN DO THE 1094 00:55:07,470 --> 00:55:11,441 PREDICTION OF THE P VALUE IS 1095 00:55:11,508 --> 00:55:11,608 0.9. 1096 00:55:11,675 --> 00:55:12,442 SO THERE'S A BIG DIFFERENCE 1097 00:55:12,509 --> 00:55:16,713 BETWEEN HERE. 1098 00:55:16,780 --> 00:55:19,582 HERE ALSO SHOWING IN THE 1099 00:55:19,649 --> 00:55:21,951 PEARSON AND DEVIANCE RESIDUAL. 1100 00:55:22,018 --> 00:55:23,920 BECAUSE THE DIFFERENT PREDICTOR 1101 00:55:23,987 --> 00:55:24,654 PROBABILITY AND OBSERVED 1102 00:55:24,721 --> 00:55:33,697 PROBABILITY ARE VERY DIFFERENT. 1103 00:55:33,763 --> 00:55:35,732 SO BASED ON THE PREDICTOR 1104 00:55:35,799 --> 00:55:41,171 MODEL, WE CAN PREDICT THE 1105 00:55:41,237 --> 00:55:42,038 PROBABILITY FOR REMISSION 1 1106 00:55:42,105 --> 00:55:44,107 FORGIVEN LI VALUE. 1107 00:55:44,174 --> 00:55:48,645 FOR EXAMPLE, LI EQUAL TO 1.5. 1108 00:55:48,712 --> 00:55:49,713 AND THEN YOU REPLACE 1.5 TO 1109 00:55:49,779 --> 00:55:53,083 THIS MODEL. 1110 00:55:53,149 --> 00:55:55,085 YOU GET YOUR EXPONENTIAL, YOU 1111 00:55:55,151 --> 00:55:59,255 GET THIS, AND THEN YOU CAN GET 1112 00:55:59,322 --> 00:56:04,494 THE PREDICTOR P VALUE AS 0.635. 1113 00:56:04,561 --> 00:56:12,569 THIS MEANS FOR LI EQUAL TO 1.5, 1114 00:56:12,635 --> 00:56:14,704 THE PREDICTOR PROBABILITY OF 1115 00:56:14,771 --> 00:56:24,013 REMISSION EQUAL TO 1 IS 0.635. 1116 00:56:24,080 --> 00:56:29,352 THIS ALSO SHOW -- I ALSO 1117 00:56:29,419 --> 00:56:31,855 CALCULATED THE PREDICTOR MODEL 1118 00:56:31,921 --> 00:56:33,790 BY EXCLUDING CASE #8. 1119 00:56:33,857 --> 00:56:36,593 YOU SEE THE DIFFERENCE. 1120 00:56:36,659 --> 00:56:39,429 SO THE SLOPE HERE IS 2.89. 1121 00:56:39,496 --> 00:56:43,066 AND HERE IS 4.5. 1122 00:56:43,133 --> 00:56:45,034 AND THE ODDS RATIO BASED ON THE 1123 00:56:45,101 --> 00:56:47,003 UNIT OF 0.1. 1124 00:56:47,070 --> 00:56:49,839 SO HERE IS 1.34. 1125 00:56:49,906 --> 00:56:54,677 AND HERE IS 1.57. 1126 00:56:54,744 --> 00:56:56,846 AND THE HL TESTER IS HERE, BOTH 1127 00:56:56,913 --> 00:56:59,215 ARE OKAY. 1128 00:56:59,282 --> 00:57:01,651 LIKE IN THIS CASE, THIS IS, WE 1129 00:57:01,718 --> 00:57:05,088 CALL THE SENSITIVITY ANALYSIS. 1130 00:57:05,155 --> 00:57:10,160 WE SHOULD REPORT BOTH RESULTS. 1131 00:57:10,226 --> 00:57:20,236 THE SUMMARY. 1132 00:57:20,303 --> 00:57:21,571 AND LOGISTIC REGRESSION 1133 00:57:21,638 --> 00:57:23,106 ANALYSIS IS USED TO EVALUATE 1134 00:57:23,173 --> 00:57:25,909 THE ASSOCIATION OF ONE OR MORE 1135 00:57:25,975 --> 00:57:27,877 INDEPENDENT VARIABLES WITH A 1136 00:57:27,944 --> 00:57:31,047 BINARY DEPENDENT VARIABLE. 1137 00:57:31,114 --> 00:57:33,383 AND TO ESTIMATE THE PROBABILITY 1138 00:57:33,449 --> 00:57:35,718 OF PARTICULAR OUTCOME GIVEN THE 1139 00:57:35,785 --> 00:57:39,155 VALUES OF INDEPENDENT VARIABLES. 1140 00:57:39,222 --> 00:57:40,290 FOR LOGISTIC REGRESSION WE 1141 00:57:40,356 --> 00:57:42,959 REPORT ODDS RATIO. 1142 00:57:43,026 --> 00:57:45,829 AND THE 95% CONFIDENCE INTERVAL. 1143 00:57:45,895 --> 00:57:48,598 AND ALSO WE CAN REPORT THE 1144 00:57:48,665 --> 00:57:53,169 PREDICTOR MODELS. 1145 00:57:53,236 --> 00:57:54,504 AND LOGISTIC REGRESSION 1146 00:57:54,571 --> 00:57:55,605 COMPARED TO THE LINEAR 1147 00:57:55,672 --> 00:57:56,706 REGRESSION IS THE MAIN 1148 00:57:56,773 --> 00:57:57,740 DIFFERENCE. 1149 00:57:57,807 --> 00:58:04,080 LOGISTIC REGRESSION WE USE THE 1150 00:58:04,147 --> 00:58:04,681 DEPENDENT VARIABLE IS BINARY 1151 00:58:04,747 --> 00:58:05,748 VARIABLE. 1152 00:58:05,815 --> 00:58:08,084 THEY ALSO NEED TO MEET THE FOUR 1153 00:58:08,151 --> 00:58:13,289 BASIC ASSUMPTIONS. 1154 00:58:13,356 --> 00:58:14,691 INDEPENDENT OBSERVATIONS, NO 1155 00:58:14,757 --> 00:58:21,331 MULTICOLLINEARITY, LINEARITY 1156 00:58:21,397 --> 00:58:24,133 BETWEEN THE CONTINUOUS VARIABLE 1157 00:58:24,200 --> 00:58:27,337 AND NO INFLUENTIAL OBSERVATION 1158 00:58:27,403 --> 00:58:28,571 AND CHECK THE HL 1159 00:58:28,638 --> 00:58:30,773 GOODNESS-OF-FIT TEST. 1160 00:58:30,840 --> 00:58:31,674 ANY QUESTIONS? 1161 00:58:31,741 THANK YOU VERY MUCH.