1 00:00:04,471 --> 00:00:07,107 THANK YOU ALL FOR JOINING TODAY. 2 00:00:07,173 --> 00:00:08,108 FOR THOSE WHO DON'T KNOW, I'M 3 00:00:08,174 --> 00:00:09,242 GINA NORATO. 4 00:00:09,309 --> 00:00:11,878 I'M THE LEAD OF THE OFFICE OF 5 00:00:11,945 --> 00:00:15,048 BIOSTATISTICS FOR THE CLINICAL 6 00:00:15,115 --> 00:00:17,183 TRIALS UNIT IN NIDS. 7 00:00:17,250 --> 00:00:18,118 TODAY WE'RE GOING TO BE TALKING 8 00:00:18,184 --> 00:00:20,887 ABOUT LINEAR REGRESSION AS PART 9 00:00:20,954 --> 00:00:25,525 OF OUR SIX-PART STATISTICAL 10 00:00:25,592 --> 00:00:26,593 LECTURE SERIES THIS SPRING. 11 00:00:26,659 --> 00:00:27,894 PLEASE FEEL FREE, I'M OKAY IF 12 00:00:27,961 --> 00:00:30,864 YOU WOULD LIKE TO SUBMIT 13 00:00:30,930 --> 00:00:32,799 QUESTIONS WHILE I AM PRESENTING 14 00:00:32,866 --> 00:00:36,169 VIA CHAT OR THE Q & A OPTION. 15 00:00:36,236 --> 00:00:38,004 JUST KNOW THAT, THAT IT IS 16 00:00:38,071 --> 00:00:38,471 LIKELY BEING RECORDED. 17 00:00:38,538 --> 00:00:43,042 SO IF YOU WOULD RATHER NOT BE ON 18 00:00:43,109 --> 00:00:44,144 RECORDING FOR THE FUTURE, JUST 19 00:00:44,210 --> 00:00:46,479 KEEP THAT IN MIND, BUT I'M HAPPY 20 00:00:46,546 --> 00:00:49,749 TO READ YOUR QUESTION AND ANSWER 21 00:00:49,816 --> 00:00:51,551 AS WE GO. 22 00:00:51,618 --> 00:00:52,118 OKAY. 23 00:00:52,185 --> 00:00:54,721 SO, JUST TO GIVE A LITTLE 24 00:00:54,788 --> 00:00:55,355 OUTLINE OF WHAT WE'RE GOING TO 25 00:00:55,422 --> 00:00:57,791 TRY TO GET THROUGH TODAY: SO 26 00:00:57,857 --> 00:01:00,226 FIRST I WANT TO GIVE AN OVERVIEW 27 00:01:00,293 --> 00:01:01,494 OF WHAT REGRESSION IS AND WHY WE 28 00:01:01,561 --> 00:01:01,928 USE IT. 29 00:01:01,995 --> 00:01:04,731 AND THEN, I'M JUST GOING TO GET 30 00:01:04,798 --> 00:01:06,966 INTO THE MEAT OF HOW WE RUN 31 00:01:07,033 --> 00:01:08,601 THROUGH MODELING, THE DIFFERENT 32 00:01:08,668 --> 00:01:10,170 ASSUMPTIONS THAT WE NEED TO 33 00:01:10,236 --> 00:01:12,472 CONSIDER WHEN WE DEAL WITH 34 00:01:12,539 --> 00:01:13,039 REGRESSION. 35 00:01:13,106 --> 00:01:14,974 WHAT THE MODEL STRUCTURE LOOKS 36 00:01:15,041 --> 00:01:15,742 LIKE. 37 00:01:15,809 --> 00:01:20,113 HOW TO DEAL WITH THINGS LIKE 38 00:01:20,180 --> 00:01:21,514 COVARIANTS AND CONFOUNDING IN 39 00:01:21,581 --> 00:01:23,583 THE MODEL, AND ADD INTERACTIONS 40 00:01:23,650 --> 00:01:25,251 TERMS AND THROUGHOUT WE WILL BE 41 00:01:25,318 --> 00:01:27,954 TALKING ABOUT CHECKING MODEL 42 00:01:28,021 --> 00:01:28,254 ASSUMPTIONS. 43 00:01:28,321 --> 00:01:30,523 THEN, I WILL TALK ABOUT HOW TO 44 00:01:30,590 --> 00:01:32,125 REPORT THE RESULTS OF THESE 45 00:01:32,192 --> 00:01:35,295 REGRESSIONS GIVING SOME SAMPLE 46 00:01:35,361 --> 00:01:35,562 LANGUAGE. 47 00:01:35,628 --> 00:01:37,096 AND ALSO TALK ABOUT SOME OTHER 48 00:01:37,163 --> 00:01:40,800 SIMILAR TESTS SUCH AS P-TESTS, 49 00:01:40,867 --> 00:01:41,267 CORRELATIONS AND NOVA. 50 00:01:41,334 --> 00:01:46,206 THINGS THAT ARE SORT OF IN LINE 51 00:01:46,272 --> 00:01:47,740 WITH REGRESSION AND TALK ABOUT 52 00:01:47,807 --> 00:01:49,342 WHEN WE WOULD USE ONE VERSUS THE 53 00:01:49,409 --> 00:01:52,111 OTHER. 54 00:01:52,178 --> 00:01:52,712 AND CONCLUDE WITH SUMMARY AND 55 00:01:52,779 --> 00:01:53,379 KEY POINTS. 56 00:01:53,446 --> 00:01:55,715 SO GETTING INTO THE OVERVIEW, SO 57 00:01:55,782 --> 00:01:58,017 WHAT IS LINEAR REGRESSION? 58 00:01:58,084 --> 00:01:59,719 SO REGRESSION IS BASICALLY JUST 59 00:01:59,786 --> 00:02:01,254 A STATISTICAL MODEL THAT 60 00:02:01,321 --> 00:02:03,756 ESTIMATES THE LINEAR 61 00:02:03,823 --> 00:02:04,357 RELATIONSHIP BETWEEN ONE 62 00:02:04,424 --> 00:02:06,059 CONTINUOUS VARIABLE AND ONE OR 63 00:02:06,125 --> 00:02:07,994 MORE OTHER VARIABLES OF 64 00:02:08,061 --> 00:02:08,261 INTEREST. 65 00:02:08,328 --> 00:02:12,131 THEY MUST BE USED WITH 66 00:02:12,198 --> 00:02:14,067 CONTINUOUSLY MEASURED OUTCOME 67 00:02:14,133 --> 00:02:14,734 METRICS. 68 00:02:14,801 --> 00:02:16,503 SO, SOMETHING WITH A SMOOTH 69 00:02:16,569 --> 00:02:16,970 CONTINUOUS SCALE. 70 00:02:17,036 --> 00:02:20,106 SOMETHING LIKE A MEASURE OF 71 00:02:20,173 --> 00:02:24,544 WEIGHT OR TIME, VOLUME, BLOOD 72 00:02:24,611 --> 00:02:26,913 PRESSURE, AND SOME LARGE-RANGE 73 00:02:26,980 --> 00:02:28,081 FUNCTIONAL SCALES. 74 00:02:28,147 --> 00:02:30,683 IMPORTANTLY, THEY SHOULD SORT OF 75 00:02:30,750 --> 00:02:33,586 BEHAVE UNBOUNDED, EVEN IF THERE 76 00:02:33,653 --> 00:02:35,688 IS A LOGICAL BOUND TO YOUR 77 00:02:35,755 --> 00:02:37,290 MEASUREMENT, LIKE WE CAN'T HAVE 78 00:02:37,357 --> 00:02:40,093 A NEGATIVE TIME, WE WANT TO MAKE 79 00:02:40,159 --> 00:02:42,061 SURE THAT THE DISTRIBUTION OF 80 00:02:42,128 --> 00:02:43,029 THEIR VALUES IS SUCH THAT WE'RE 81 00:02:43,096 --> 00:02:45,498 NOT REALLY UP AGAINST THE 82 00:02:45,565 --> 00:02:51,804 BOUNDARIES, WHO WE HAVE THE NICE 83 00:02:51,871 --> 00:02:54,140 SMOOTH CONTINUOUS BEHAVE AND WE 84 00:02:54,207 --> 00:02:56,009 HAVE NICE GRANULARITIES TO THE 85 00:02:56,075 --> 00:02:57,410 MEASUREMENT AS WELL, IF WE'RE 86 00:02:57,477 --> 00:02:58,378 DEALING WITH WHOLE NUMBERS OR 87 00:02:58,444 --> 00:03:00,346 COUNT VALUES THAT WILL FALL 88 00:03:00,413 --> 00:03:01,714 UNDER A DIFFERENT TYPE OF 89 00:03:01,781 --> 00:03:02,015 REGRESSION. 90 00:03:02,081 --> 00:03:03,683 SO JUST CONSIDER THAT AS WELL. 91 00:03:03,750 --> 00:03:08,054 AND FOR REGRESSION WE ARE 92 00:03:08,121 --> 00:03:08,588 PRIMARILY CONCERNED ABOUT 93 00:03:08,655 --> 00:03:11,758 PREDICTING Y FROM X OR MULTIPLE 94 00:03:11,824 --> 00:03:12,325 Xs AND MULTIPLE COVARIANTS. 95 00:03:12,392 --> 00:03:16,129 I WANT TO BE CLEAR, WHEN WE SAY 96 00:03:16,195 --> 00:03:16,796 PREDICTION HERE, WE'RE TALKING 97 00:03:16,863 --> 00:03:18,865 ABOUT PURELY FROM A CALCULATING 98 00:03:18,932 --> 00:03:20,333 OR STATISTICAL POINT OF VIEW. 99 00:03:20,400 --> 00:03:23,236 WE DON'T REALLY MEAN TO HAVE ANY 100 00:03:23,303 --> 00:03:27,307 CAUSAL PREDICTION OR CAUSAL 101 00:03:27,373 --> 00:03:27,707 RELATIONSHIP HERE. 102 00:03:27,774 --> 00:03:29,442 THAT SORT OF INVESTIGATION WOULD 103 00:03:29,509 --> 00:03:32,111 BE UNDER A TOTALLY DIFFERENT 104 00:03:32,178 --> 00:03:33,313 BRANCH OF STATISTICS. 105 00:03:33,379 --> 00:03:35,348 SO WE'RE JUST TALKING ABOUT IF 106 00:03:35,415 --> 00:03:37,317 WE KNOW SOME CHARACTERISTICS OF 107 00:03:37,383 --> 00:03:38,685 A PERSON, HOW WELL CAN WE 108 00:03:38,751 --> 00:03:41,054 PREDICT SOME OTHER VARIABLE THAT 109 00:03:41,120 --> 00:03:43,323 WE HAVE OBSERVED ALSO. 110 00:03:43,389 --> 00:03:48,094 SO THIS IS STILL PURELY FROM 111 00:03:48,161 --> 00:03:54,968 KIND OF A CORRELATIONAL-TYPE OF 112 00:03:55,034 --> 00:03:57,670 INTERPRETATION, NOT CAUSATION. 113 00:03:57,737 --> 00:03:59,339 AND SO, JUST TO GO THROUGH SOME 114 00:03:59,405 --> 00:04:02,942 TERMINOLOGY AND JUST HAVE TO A 115 00:04:03,009 --> 00:04:04,143 BASIS OR A FRAMEWORK FROM WHERE 116 00:04:04,210 --> 00:04:04,877 TO GO FROM HERE. 117 00:04:04,944 --> 00:04:08,114 SO WE'RE GOING TO BE TALKING A 118 00:04:08,181 --> 00:04:09,315 LOT ABOUT Xs AND Ys. 119 00:04:09,382 --> 00:04:12,251 SO OUR Y IS GOING TO BE OUR 120 00:04:12,318 --> 00:04:12,685 DEPENDENT VALUABLE. 121 00:04:12,752 --> 00:04:15,054 WE IS ALSO CAUSE THIS OUR 122 00:04:15,121 --> 00:04:15,788 RESPONSE VARIABLE OR OUTCOME 123 00:04:15,855 --> 00:04:16,623 MEASURE OR VARIABLE. 124 00:04:16,689 --> 00:04:18,358 AND OUR Xs ARE GOING TO BE OUR 125 00:04:18,424 --> 00:04:20,994 INDEPENDENT VARIABLES. 126 00:04:21,060 --> 00:04:23,796 THEY ARE ALSO CALLED EXPLANATORY 127 00:04:23,863 --> 00:04:24,931 VARIABLES OR PREDICTOR 128 00:04:24,998 --> 00:04:25,365 VARIABLES. 129 00:04:25,431 --> 00:04:26,366 AND DEPENDING ON OUR SCIENTIFIC 130 00:04:26,432 --> 00:04:29,636 VIEW OF THESE THINGS WE CAN ALSO 131 00:04:29,702 --> 00:04:32,105 CALL THEM COVARIOUS OR 132 00:04:32,171 --> 00:04:32,405 CONFOUNDERS. 133 00:04:32,472 --> 00:04:33,773 THESE ARE MORE LIKE A 134 00:04:33,840 --> 00:04:34,240 QUALITATIVE LABEL. 135 00:04:34,307 --> 00:04:36,109 BUT THEY ARE ALL BASICALLY 136 00:04:36,175 --> 00:04:38,444 TREATED THE SAME IN THE MODEL. 137 00:04:38,511 --> 00:04:40,813 THE MODEL DOESN'T VIEW THEM ANY 138 00:04:40,880 --> 00:04:41,514 DIFFERENTLY. 139 00:04:41,581 --> 00:04:45,952 AND THEN, YOU'LL WANT TO PUT 140 00:04:46,019 --> 00:04:47,453 YOUR PREALGEBRA OR ALGEBRA HATS 141 00:04:47,520 --> 00:04:50,590 ON, BECAUSE WE ARE GOING TO BE 142 00:04:50,657 --> 00:04:51,090 TALKING ABOUT LINES AND 143 00:04:51,157 --> 00:04:51,457 GRAPHING. 144 00:04:51,524 --> 00:04:53,126 SO WE'RE GOING TO BE TALKING 145 00:04:53,192 --> 00:04:55,628 ABOUT INTERCEPTS WHERE THIS IS 146 00:04:55,695 --> 00:04:58,197 REALLY A PREDICTION OF Y WHERE 147 00:04:58,264 --> 00:05:00,833 ALL OTHER VARIABLES INVESTS ARE 148 00:05:00,900 --> 00:05:01,868 ZERO. 149 00:05:01,934 --> 00:05:04,203 THAT KIND OF ALGEBRAIC 150 00:05:04,270 --> 00:05:05,238 INTERPRETATION OF INTERCEPT AND 151 00:05:05,304 --> 00:05:05,438 SLOPE. 152 00:05:05,505 --> 00:05:06,973 AND ALSO WE WILL BE TALKING 153 00:05:07,040 --> 00:05:08,007 ABOUT COEFFICIENTS. 154 00:05:08,074 --> 00:05:10,576 WE CAN THINK BROADLY OF THIS AS 155 00:05:10,643 --> 00:05:12,845 BEING SLOPES, BUT I WANTED TO 156 00:05:12,912 --> 00:05:13,579 CALL THIS A COEFFICIENT BECAUSE 157 00:05:13,646 --> 00:05:15,448 IT IS KIND OF HOW THEY ARE 158 00:05:15,515 --> 00:05:17,884 WRITTEN ABOUT IN THE OUTPUTS OF 159 00:05:17,950 --> 00:05:20,086 REGRESSIONS AND THINGS LIKE 160 00:05:20,153 --> 00:05:23,056 THAT, THEY ARE COEFFICIENTS. 161 00:05:23,122 --> 00:05:23,623 SO, FOR CONTINUOUS MEASURE 162 00:05:23,690 --> 00:05:25,525 VARIABLES OF X WE CAN THINK OF 163 00:05:25,591 --> 00:05:27,393 THESE AS A SLOPE, WHEREAS THE 164 00:05:27,460 --> 00:05:28,761 CHANGE IN THE PREDICTION OF Y 165 00:05:28,828 --> 00:05:33,833 FOR ONE UNIT CHANGE IN X IS SORT 166 00:05:33,900 --> 00:05:35,635 OF THE INTERPRETATION THERE. 167 00:05:35,702 --> 00:05:37,804 BUT, YOU KNOW, FOR A BINARY OR 168 00:05:37,870 --> 00:05:38,671 CATEGORICAL VARIABLE THIS ISN'T 169 00:05:38,738 --> 00:05:40,139 REALLY A SLOPE, IT IS JUST A 170 00:05:40,206 --> 00:05:42,575 SIMPLE CHANGE, BUT THE SAME SORT 171 00:05:42,642 --> 00:05:43,710 OF LOGIC APPLIES. 172 00:05:43,776 --> 00:05:46,245 THIS IS A CHANGE IN THE 173 00:05:46,312 --> 00:05:46,813 PREDICTION OF Y BETWEEN TWO 174 00:05:46,879 --> 00:05:48,581 DIFFERENT STEP LEVELS OF X. 175 00:05:48,648 --> 00:05:51,117 AND WE CALL THOSE BROADLY JUST 176 00:05:51,184 --> 00:05:52,585 COEFFICIENTS OR ESTIMATES EVEN 177 00:05:52,652 --> 00:05:53,352 MORE BROADLY. 178 00:05:53,419 --> 00:05:55,688 BUT IN TERMS OF WRITING OF A 179 00:05:55,755 --> 00:06:01,027 REGRESSION EQUATION, WE CALL 180 00:06:01,094 --> 00:06:03,796 THEM COEFFICIENTS. 181 00:06:03,863 --> 00:06:05,865 SO JUST TO GIVE -- JUST TO KIND 182 00:06:05,932 --> 00:06:07,266 OF GIVE OUR LITTLE OVERVIEW 183 00:06:07,333 --> 00:06:07,834 HERE. 184 00:06:07,900 --> 00:06:09,402 SO JUMPING RIGHT INTO SOME 185 00:06:09,469 --> 00:06:10,136 ASSUMPTIONS FOR REGRESSION. 186 00:06:10,203 --> 00:06:13,573 SO THERE ARE FOUR ASSUMPTIONS 187 00:06:13,639 --> 00:06:16,109 THAT WE NEED TO BE CONSIDERING 188 00:06:16,175 --> 00:06:20,113 WHEN WE DEAL WITH REGRESSION. 189 00:06:20,179 --> 00:06:20,780 LINEARITY, INDEPENDENCE, NORMAL 190 00:06:20,847 --> 00:06:23,549 DISTRIBUTION OF ERRORS AND EQUAL 191 00:06:23,616 --> 00:06:24,684 VARIANCE OF ERRORS AND QUITE 192 00:06:24,751 --> 00:06:25,351 NICELY, IF YOU WRITE THEM IN 193 00:06:25,418 --> 00:06:28,287 THIS WAY, THEY HAVE THE LINE 194 00:06:28,354 --> 00:06:29,255 ACRONYM TO MAKE THIS A LITTLE 195 00:06:29,322 --> 00:06:30,556 BIT EASIER TO REMEMBER. 196 00:06:30,623 --> 00:06:33,192 SO LET'S GO THROUGH EACH OF 197 00:06:33,259 --> 00:06:34,527 THEM. 198 00:06:34,594 --> 00:06:36,129 SO FOR LINEARITY, WE WANT TO 199 00:06:36,195 --> 00:06:39,766 MAKE SURE THAT LINEAR MODEL TO 200 00:06:39,832 --> 00:06:40,733 DESCRIBE THE RELATIONSHIP IS 201 00:06:40,800 --> 00:06:42,034 EVEN APPROPRIATE AT ALL. 202 00:06:42,101 --> 00:06:44,270 WE CAN DO THIS VERY SIMPLY IN 203 00:06:44,337 --> 00:06:46,572 MOST CASES BY INVESTIGATING THE 204 00:06:46,639 --> 00:06:50,076 BY VARIANT RELATIONSHIP OR THE 205 00:06:50,143 --> 00:06:50,977 SET OF BIVARIANT RELATIONSHIPS 206 00:06:51,043 --> 00:06:52,378 THAT WE'RE INTERESTED IN. 207 00:06:52,445 --> 00:06:54,847 PERHAPS ANOTHER SORT OF MODEL IS 208 00:06:54,914 --> 00:06:57,183 BEST FOR DESCRIBING THE 209 00:06:57,250 --> 00:06:57,517 RELATIONSHIP. 210 00:06:57,583 --> 00:07:00,520 A NONLINEAR, PERHAPS OR SOME 211 00:07:00,586 --> 00:07:02,789 OTHER LINE MODEL OR 212 00:07:02,855 --> 00:07:04,724 POLYNOMIAL-SORT OF REGRESSION 213 00:07:04,791 --> 00:07:07,293 HERE WITH OTHER SORTS OF COMPLEX 214 00:07:07,360 --> 00:07:08,194 TERMS. 215 00:07:08,261 --> 00:07:08,628 NOT PURELY A LINEAR 216 00:07:08,694 --> 00:07:09,762 RELATIONSHIP. 217 00:07:09,829 --> 00:07:13,366 SO IT CAN BE REALLY IMPORTANT TO 218 00:07:13,432 --> 00:07:14,000 VIEW AN INITIAL INVESTIGATION OF 219 00:07:14,066 --> 00:07:15,468 WHAT YOU'RE DEALING TO MAKE SURE 220 00:07:15,535 --> 00:07:18,171 THIS IS EVEN AN APPROPRIATE 221 00:07:18,237 --> 00:07:23,109 DESCRIPTION OF YOUR DATA AT ALL. 222 00:07:23,176 --> 00:07:24,811 NEXT WE ASSUME, WE NEED TO MAKE 223 00:07:24,877 --> 00:07:27,680 SURE THAT OUR OBSERVATIONS FOR 224 00:07:27,747 --> 00:07:28,514 OUR DATA ARE INDEPENDENT. 225 00:07:28,581 --> 00:07:30,249 SO WE HAVE THIS INDEPENDENT 226 00:07:30,316 --> 00:07:31,083 ASSUMPTION. 227 00:07:31,150 --> 00:07:34,120 AND THIS CAN BE DETERMINED 228 00:07:34,187 --> 00:07:37,456 WHETHER WE ARE ADHERING TO THIS 229 00:07:37,523 --> 00:07:39,025 ASSUMPTION BY DIVINE OR 230 00:07:39,091 --> 00:07:39,592 COLLECTION ALONE. 231 00:07:39,659 --> 00:07:41,494 SO IS THERE CLUSTERING OF DATA 232 00:07:41,561 --> 00:07:43,629 FROM PEOPLE COMING WITH FAMILIES 233 00:07:43,696 --> 00:07:45,865 OR MAYBE WE HAVE THESE 234 00:07:45,932 --> 00:07:46,465 INTERVENTIONAL UNITS, SCHOOLS 235 00:07:46,532 --> 00:07:49,769 AND CLASSES IN SOME SETTINGS OR 236 00:07:49,836 --> 00:07:53,005 WE HAVE PAIRED DATA OR REPEATED 237 00:07:53,072 --> 00:07:54,640 MEASUREMENTS, CLUSTERING WITHIN 238 00:07:54,707 --> 00:07:57,777 A PERSON OF REPEATED 239 00:07:57,844 --> 00:07:58,911 MEASUREMENTS OVER TIME. 240 00:07:58,978 --> 00:07:59,312 LONGITUDINAL DATA. 241 00:07:59,378 --> 00:08:01,914 THIS WOULD ALL MEAN WE'RE NOT IN 242 00:08:01,981 --> 00:08:03,416 A BASIC LINEAR REGRESSION 243 00:08:03,482 --> 00:08:03,683 FRAMEWORK. 244 00:08:03,749 --> 00:08:05,151 MAYBE WE NEED TO RUN A DIFFERENT 245 00:08:05,218 --> 00:08:07,153 SORT OF MODEL TO ACCOUNT FOR 246 00:08:07,220 --> 00:08:12,058 THIS CORRELATED BEHAVIOR WITHIN 247 00:08:12,124 --> 00:08:13,326 THE CROSS OBSERVATIONS WITHIN 248 00:08:13,392 --> 00:08:13,559 GROUPS. 249 00:08:13,626 --> 00:08:16,596 WE ALSO NEED TO MAKE SURE THAT 250 00:08:16,662 --> 00:08:17,697 WE'RE MEETING THE ASHUN OF 251 00:08:17,763 --> 00:08:19,932 NORMAL DISTRIBUTION OF ERRORS. 252 00:08:19,999 --> 00:08:21,534 SO, IMPORTANTLY, THIS IS THE 253 00:08:21,601 --> 00:08:23,903 NORMAL DISTRIBUTION OF ERRORS. 254 00:08:23,970 --> 00:08:25,504 AND NOT NECESSARILY THAT OUR 255 00:08:25,571 --> 00:08:27,840 OUTCOME MEASURE IS NORMALLY 256 00:08:27,907 --> 00:08:28,274 DISTRIBUTED. 257 00:08:28,341 --> 00:08:30,042 SO THIS IS SPECIFICALLY ABOUT 258 00:08:30,109 --> 00:08:30,610 WHEN YOU RUN YOUR MODEL AND 259 00:08:30,676 --> 00:08:32,778 WE'RE GOING TO TALK ABOUT WHAT 260 00:08:32,845 --> 00:08:33,913 ERRORS ARE IN A LITTLE BIT. 261 00:08:33,980 --> 00:08:35,982 BUT WHEN YOU RUN YOUR MODEL AND 262 00:08:36,048 --> 00:08:38,184 OBTAIN THE ERRORS FROM THE 263 00:08:38,251 --> 00:08:40,186 MODAL, WHAT IS THE BEHAVIOR OF 264 00:08:40,253 --> 00:08:42,054 THOSE, THE NEXT TWO MODELS ARE 265 00:08:42,121 --> 00:08:44,056 RELATED TO SPECIFICALLY ERRORS 266 00:08:44,123 --> 00:08:45,258 AND NOT ACTUALLY ANY OF THE 267 00:08:45,324 --> 00:08:47,627 VARIABLES YOU'RE USING IN YOUR 268 00:08:47,693 --> 00:08:48,060 REGRESSION. 269 00:08:48,127 --> 00:08:50,496 WE CAN INVESTIGATE THE BEHAVIOR 270 00:08:50,563 --> 00:08:54,000 OF THESE ERRORS EITHER USE THE 271 00:08:54,066 --> 00:08:58,804 CLASSES LIKE A HISTOGRAM OR BY 272 00:08:58,871 --> 00:09:00,873 DOING CUBE PLOTS WHICH IS 273 00:09:00,940 --> 00:09:04,977 OBSERVING THE QUANTALS OF YOUR 274 00:09:05,044 --> 00:09:07,813 ERRORS VERSUS THE THEORETICAL 275 00:09:07,880 --> 00:09:08,648 NORMAL DISTRIBUTION QUANTILES. 276 00:09:08,714 --> 00:09:12,084 WE SHOULD SEE NICE HISTOGRAM 277 00:09:12,151 --> 00:09:13,219 BEHAVIOR AND/OR SIMILARLY, WE 278 00:09:13,286 --> 00:09:14,887 SHOULD SEE SOMETHING ON THIS X 279 00:09:14,954 --> 00:09:17,757 EQUAL Y LINE FOR A QQ PLOT. 280 00:09:17,823 --> 00:09:20,593 I JUST WANT TO DEMONSTRATE, THIS 281 00:09:20,660 --> 00:09:23,696 IS A SET OF PLOTS THAT ARE 282 00:09:23,763 --> 00:09:25,598 RESAMPLED FROM A NEW UNDER LYING 283 00:09:25,665 --> 00:09:26,532 NORMAL DISTRIBUTION. 284 00:09:26,599 --> 00:09:29,235 YOU CAN SEE EVEN IN CASES WHERE 285 00:09:29,302 --> 00:09:29,769 WE DO HAVE GOOD NORMAL 286 00:09:29,835 --> 00:09:31,704 DISTRIBUTIONAL DATA OF THE 287 00:09:31,771 --> 00:09:33,839 ERROR, THESE AREN'T GOING TO LIE 288 00:09:33,906 --> 00:09:36,108 PERFECTLY, THEY WILL NEVER LIE 289 00:09:36,175 --> 00:09:36,876 PERFECTLY ON THE X EQUALS Y 290 00:09:36,943 --> 00:09:38,177 LINE. 291 00:09:38,244 --> 00:09:40,079 PART OF WHAT WE WILL TALK ABOUT 292 00:09:40,146 --> 00:09:41,213 MEETING ASSUMPTIONS IS 293 00:09:41,280 --> 00:09:42,882 UNDERSTANDING YOU WILL NEVER 294 00:09:42,949 --> 00:09:45,084 HAVE PERFECT BEHAVIOR IN THIS 295 00:09:45,151 --> 00:09:48,387 WAY, WHEN YOU KNOW THE 296 00:09:48,454 --> 00:09:48,921 UNDERLYING DISTRIBUTION IS 297 00:09:48,988 --> 00:09:50,856 NORMAL, THE SAMPLE SIZE IS 298 00:09:50,923 --> 00:09:52,391 SMALL, WE WILL NOT HAVE THIS 299 00:09:52,458 --> 00:09:53,926 PERFECT X EQUAL Y LINE. 300 00:09:53,993 --> 00:09:55,995 BUT WE SHOULD SEE IF THERE'S 301 00:09:56,062 --> 00:09:57,563 ANYTHING REALLY MAJOR GOING ON 302 00:09:57,630 --> 00:10:01,233 HERE WHEN WE DO THIS 303 00:10:01,300 --> 00:10:01,801 INVESTIGATION. 304 00:10:01,867 --> 00:10:03,035 AND THEN WE SHOULD ALSO 305 00:10:03,102 --> 00:10:04,537 INVESTIGATE EQUAL VARIANCE OF 306 00:10:04,603 --> 00:10:05,404 ERRORS. 307 00:10:05,471 --> 00:10:07,940 SO, WE DO THIS BY LOOKING AT, 308 00:10:08,007 --> 00:10:10,176 THERE ARE A COUPLE OF DIFFERENT 309 00:10:10,242 --> 00:10:11,577 VERSIONS OF THIS TYPE OF PLOT, 310 00:10:11,644 --> 00:10:13,879 BUT WE WILL MAINLY BE LOOKING AT 311 00:10:13,946 --> 00:10:15,982 FITTED VALUES PREDICTED VALUES 312 00:10:16,048 --> 00:10:18,384 OF OUR OUTCOME MEASURE AGAINST 313 00:10:18,451 --> 00:10:19,085 OUR RECIPIENTS THAT WE ED MATE 314 00:10:19,151 --> 00:10:21,954 FROM THE MODEL. 315 00:10:22,021 --> 00:10:23,689 WE SHOULD SEE THAT THE SPREAD 316 00:10:23,756 --> 00:10:24,690 AROUND THE VERTICAL SPREAD 317 00:10:24,757 --> 00:10:27,860 AROUND THIS ZERO LINE IS EVEN 318 00:10:27,927 --> 00:10:30,463 ACROSS ALL PREDICTED VALUES OF 319 00:10:30,529 --> 00:10:31,831 Y. 320 00:10:31,897 --> 00:10:36,402 AND WE CALL THIS -- HOME 321 00:10:36,469 --> 00:10:37,303 SCUDASTICITY. 322 00:10:37,370 --> 00:10:42,141 WE WANT TO SEE THIS ON THE LEFT 323 00:10:42,208 --> 00:10:42,675 WITH RECTANGULAR BEHAVIOR. 324 00:10:42,742 --> 00:10:46,479 AND FOR CERTAIN PREDICTED VALUES 325 00:10:46,545 --> 00:10:48,080 OF Y WE HAVE LESS VARIABLE. 326 00:10:48,147 --> 00:10:49,815 AND OTHER VALUES OF Y, MORE 327 00:10:49,882 --> 00:10:53,352 VARIABLE, WHETHER WE HAVE THIS 328 00:10:53,419 --> 00:10:54,787 SHAPE, OR OTHER TYPE OF CONE 329 00:10:54,854 --> 00:10:55,421 SHAPE WE SEE FLIPPED THE OTHER 330 00:10:55,488 --> 00:10:56,122 WAY. 331 00:10:56,188 --> 00:10:59,191 WE WANT TO MAKE SURE THAT ACROSS 332 00:10:59,258 --> 00:11:02,194 ALL PREDICTED VALUES OF Y THAT 333 00:11:02,261 --> 00:11:06,332 THE PREDICTABILITY, ALL OF THE 334 00:11:06,399 --> 00:11:07,199 RESIDUALS CAN BE EXPLAININGED 335 00:11:07,266 --> 00:11:10,770 ACROSS WITH THE SAME STANDARD 336 00:11:10,836 --> 00:11:11,037 DEVIATION. 337 00:11:11,103 --> 00:11:15,041 SO THOSE ARE THE FOUR 338 00:11:15,107 --> 00:11:16,375 ASSUMPTIONS. 339 00:11:16,442 --> 00:11:18,611 SO TO GET INTO SPECIFICALLY WHAT 340 00:11:18,677 --> 00:11:23,015 THE MODEL LOOKS LIKE. 341 00:11:23,082 --> 00:11:27,053 SO-- SO A SIMPLE SCENARIO OF ONE 342 00:11:27,119 --> 00:11:28,954 X VERSUS A Y REGRESSION, WOULD 343 00:11:29,021 --> 00:11:30,689 TRY TO FIND THE RELATIONSHIP 344 00:11:30,756 --> 00:11:32,158 BETWEEN X AND Y WITH A LINEAR 345 00:11:32,224 --> 00:11:35,428 FIT WITH THE BLUE LINE. 346 00:11:35,494 --> 00:11:35,995 SPECIFICALLY, FOR ONE 347 00:11:36,062 --> 00:11:38,798 INDEPENDENT VARIABLE X AND 348 00:11:38,864 --> 00:11:40,032 DEPENDENT MEASURE Y, THE 349 00:11:40,099 --> 00:11:42,601 RELATIONSHIP FOR PREDICTING THE 350 00:11:42,668 --> 00:11:43,969 ESTIMATED Y WILL BE CALLED Y-HAT 351 00:11:44,036 --> 00:11:46,605 FOR EACH SUBJECT CAN BE WRITTEN 352 00:11:46,672 --> 00:11:48,274 IN THIS WAY. 353 00:11:48,340 --> 00:11:50,109 WHERE WE ARE ABLE TO FORM THIS 354 00:11:50,176 --> 00:11:52,078 REGRESSION LINE WHICH IS 355 00:11:52,144 --> 00:11:57,750 BASICALLY MADE UP OF OUR 356 00:11:57,817 --> 00:11:58,617 PREDICTED Y VALUES. 357 00:11:58,684 --> 00:12:00,453 AND FROM THIS LITTLE EVASION IS 358 00:12:00,519 --> 00:12:03,656 WHY WE'RE ESTIMATING THESE BETAS 359 00:12:03,722 --> 00:12:04,957 OR THESE COEFFICIENTS FROM THE 360 00:12:05,024 --> 00:12:09,061 OUTPUT OF OUR REGRESSION MODEL. 361 00:12:09,128 --> 00:12:11,097 SO BETA-1 IS OUR ESTIMATED 362 00:12:11,163 --> 00:12:13,132 CHANGE IN Y FOR ONE UNIT CHANGE 363 00:12:13,199 --> 00:12:13,466 IN X. 364 00:12:13,532 --> 00:12:14,834 THIS IS OUR LITTLE SLOPE THAT WE 365 00:12:14,900 --> 00:12:19,905 WOULD ABSTAIN FROM THE MODEL. 366 00:12:19,972 --> 00:12:23,342 AND BETA-0 IN PRETTY MUCH ALL 367 00:12:23,409 --> 00:12:26,512 CASES WE WILL CALL BETA-0 IS THE 368 00:12:26,579 --> 00:12:29,515 INTERCEPT, BETA-1 IS THE FIRST 369 00:12:29,582 --> 00:12:30,616 OBSERVED VARIABLE, NOT A SLOPE, 370 00:12:30,683 --> 00:12:34,453 BUT IN THIS CASE IT IS A SLOPE, 371 00:12:34,520 --> 00:12:35,054 BETA-0 IS ALWAYS THE INTERCEPT 372 00:12:35,121 --> 00:12:35,888 IN A FORMAL LIFE. 373 00:12:35,955 --> 00:12:39,525 THIS IS THE ESTIMATED VALUE OF Y 374 00:12:39,592 --> 00:12:41,494 WHEN ALL X ARE ZERO. 375 00:12:41,560 --> 00:12:45,397 OR IN THE SIMPLE ALGEBRAIC WAY, 376 00:12:45,464 --> 00:12:47,900 THIS IS THE SIMPLE INTER SENT ON 377 00:12:47,967 --> 00:12:49,368 THE GRAPH WHERE WE RUN INTO 378 00:12:49,435 --> 00:12:49,768 ZERO. 379 00:12:49,835 --> 00:12:53,272 AND THE DEGREE OF ERROR FROM THE 380 00:12:53,339 --> 00:12:55,174 TRUE Y, SO ALL THESE BLACK 381 00:12:55,241 --> 00:12:57,309 POINTS ARE OUR TRUE OBSERVE Y 382 00:12:57,376 --> 00:12:57,543 VALUES. 383 00:12:57,610 --> 00:13:00,279 THE DISTANCE THAT EACH OF THOSE 384 00:13:00,346 --> 00:13:03,883 POINTS ARE FROM OUR -- OUR 385 00:13:03,949 --> 00:13:06,919 PREDICTED LINE ARE SET OF ERROR 386 00:13:06,986 --> 00:13:07,186 TERMS. 387 00:13:07,253 --> 00:13:08,120 THIS EPSILON. 388 00:13:08,187 --> 00:13:08,988 I FOR EACH PERSON. 389 00:13:09,054 --> 00:13:12,925 SO EACH PERSON IS GIVEN AN ERROR 390 00:13:12,992 --> 00:13:15,628 OF HOW FAR THEIR PREDICTION, 391 00:13:15,694 --> 00:13:18,130 THEIR OBSERVED BEHAVIOR IS FROM 392 00:13:18,197 --> 00:13:18,697 THE PREDICTED BEHAVIOR. 393 00:13:18,764 --> 00:13:21,901 THIS IS SET OF ERRORS I HAVE 394 00:13:21,967 --> 00:13:23,669 BEEN TALKING ABOUT FOR MEETING 395 00:13:23,736 --> 00:13:25,171 OUR ASUNS OR NOT. 396 00:13:25,237 --> 00:13:27,373 AND THIS PIECE OF THE EDUCATION 397 00:13:27,439 --> 00:13:33,779 HERE IS OUR PREDICTED Y, OR 398 00:13:33,846 --> 00:13:35,514 Y-HAT. 399 00:13:35,581 --> 00:13:36,982 AND SO THIS IS KIND OF THE CRUX 400 00:13:37,049 --> 00:13:39,818 OF THE REGRESSION WHICH IS TO 401 00:13:39,885 --> 00:13:41,954 ESTIMATE THE BETA-0 AND BETA-1, 402 00:13:42,021 --> 00:13:43,155 OR AS MANY COEFFICIENTS AS WE 403 00:13:43,222 --> 00:13:45,357 HAVE, SUCH THAT THE SUM OF ALL 404 00:13:45,424 --> 00:13:47,326 PREDICTED ERRORS ARE ALL OF OUR 405 00:13:47,393 --> 00:13:49,195 EPSILONS IS AS SMALL AS 406 00:13:49,261 --> 00:13:49,662 POSSIBLE. 407 00:13:49,728 --> 00:13:51,931 WE FORMALIZE THAT, THIS IS JUST 408 00:13:51,997 --> 00:13:54,366 A SUMMATION ACROSS ALL OF THESE 409 00:13:54,433 --> 00:13:58,437 RED LINES HERE SQUARED. 410 00:13:58,504 --> 00:14:01,040 SO THIS LITTLE Y, I MINUS Y-HAT, 411 00:14:01,106 --> 00:14:04,109 I IS THE LITTLE DISTANCE OF 412 00:14:04,176 --> 00:14:05,444 ERRORS AND WE SQUARE THEM SO 413 00:14:05,511 --> 00:14:07,079 WE'RE NOT CANCELING OUT ALL OF 414 00:14:07,146 --> 00:14:08,247 THE BUNCH OF ERRORS BY HAVING 415 00:14:08,314 --> 00:14:10,649 SOME POSITIVE AND SOME NEGATIVE. 416 00:14:10,716 --> 00:14:12,618 SO WE'RE JUST INTERESTED IN THE 417 00:14:12,685 --> 00:14:12,918 MAGNITUDE. 418 00:14:12,985 --> 00:14:15,621 AND SUMMING ALL OF THOSE AND 419 00:14:15,688 --> 00:14:16,889 MINIMIZING, FITTING THE LINE 420 00:14:16,956 --> 00:14:24,296 SUCH THAT WE'RE MINIMIZING THAT 421 00:14:24,363 --> 00:14:27,700 SUMMED SQUARE OF ERRORS. 422 00:14:27,766 --> 00:14:31,337 SO I WANT TO USE THAT FRAMEWORK 423 00:14:31,403 --> 00:14:34,139 AND WALK THROUGH SOME EXAMPLES 424 00:14:34,206 --> 00:14:36,108 OF DOING REGRESSION USING SAMPLE 425 00:14:36,175 --> 00:14:36,909 DATA. 426 00:14:36,976 --> 00:14:42,248 AND THIS IS REAL, REAL DATA THAT 427 00:14:42,314 --> 00:14:44,917 IS AVAILABLE ONLINE SOURCED FROM 428 00:14:44,984 --> 00:14:49,388 THE UC IRVINE MACHINE LEARNING 429 00:14:49,455 --> 00:14:51,724 REPOSITORY USING 279 CARDIAC 430 00:14:51,790 --> 00:14:52,858 PATIENTS AND I SPECIFICALLY 431 00:14:52,925 --> 00:14:54,226 PULLED THE DATA USING THE 432 00:14:54,293 --> 00:14:54,960 PACKAGE AVAILABLE THERE. 433 00:14:55,027 --> 00:14:59,031 THIS IS A SET OF PATIENTS THAT 434 00:14:59,098 --> 00:15:01,000 ARE REFERRED FOR CORONARY 435 00:15:01,066 --> 00:15:02,134 ANGIOGRAPHY AT THE CLEVELAND 436 00:15:02,201 --> 00:15:02,368 CLINIC. 437 00:15:02,434 --> 00:15:06,305 NO PATIENTS HAD A HISTORY OF MI 438 00:15:06,372 --> 00:15:08,040 OR CARDIOMYOPATHIC DISEASE. 439 00:15:08,107 --> 00:15:11,176 AND ALL PATIENTS UNDERWENT ECG 440 00:15:11,243 --> 00:15:13,846 AT REST, AND CHOLESTEROL, AND 441 00:15:13,912 --> 00:15:15,981 FASTING BLOOD SUGAR MEASUREMENT 442 00:15:16,048 --> 00:15:18,550 AND OTHER DATA COLLECTION 443 00:15:18,617 --> 00:15:20,019 MEASUREMENTS HERE. 444 00:15:20,085 --> 00:15:23,622 AND ALSO UNDERWENT EXERCISE ECG, 445 00:15:23,689 --> 00:15:25,124 AND CARDIAC FLUOROSCOPY. 446 00:15:25,190 --> 00:15:27,059 SO WE HAVE A SMALL SET OF 447 00:15:27,126 --> 00:15:30,195 VARIABLES, AGAIN, THIS IS JUST A 448 00:15:30,262 --> 00:15:32,197 PUBLICLY AVAILABLE DATASET THAT 449 00:15:32,264 --> 00:15:34,833 I'M USING TO HELP US WALK US 450 00:15:34,900 --> 00:15:36,602 THROUGH SOME OF THESE EXAMPLES. 451 00:15:36,669 --> 00:15:39,071 I WILL BE SHOWING CODE SNIPPETS 452 00:15:39,138 --> 00:15:41,106 AND OUTPUTS FROM OUR SOFTWARE 453 00:15:41,173 --> 00:15:42,241 SPECIFICALLY, WHICH IS WHAT I'M 454 00:15:42,308 --> 00:15:48,280 MOST OF THE TIME USING. 455 00:15:48,347 --> 00:15:51,116 I GUESS I SHOULD JUST JUMP TO 456 00:15:51,183 --> 00:15:51,850 THE DISCLAIMER PART, WHICH IS 457 00:15:51,917 --> 00:15:53,085 THAT MOST SOFTWARE IS GOING TO 458 00:15:53,152 --> 00:15:54,920 BE MORE OR LESS THE SAME IN 459 00:15:54,987 --> 00:15:56,255 TERMS OF OUTPUT. 460 00:15:56,322 --> 00:15:59,291 BUT YOU MAY NEED TO MAKE SURE 461 00:15:59,358 --> 00:16:02,428 THAT YOU'RE LOOKING AT YOUR 462 00:16:02,494 --> 00:16:02,961 DOCUMENTATION FOR WHATEVER 463 00:16:03,028 --> 00:16:06,098 SOFTWARE YOU MAY BE USING TO DO 464 00:16:06,165 --> 00:16:07,299 THIS, TO SET THE MODEL PROPERLY, 465 00:16:07,366 --> 00:16:07,966 HOW THE SOFTWARE WOULD LIKE YOU 466 00:16:08,033 --> 00:16:09,201 TO SET IT. 467 00:16:09,268 --> 00:16:12,237 AND SIMILAR THAT YOUR DATA AND 468 00:16:12,304 --> 00:16:13,205 EACH VARIABLE IS SPECIFIED AS 469 00:16:13,272 --> 00:16:15,107 EXPECTED FOR USE WITH 470 00:16:15,174 --> 00:16:15,441 REREGRESSION. 471 00:16:15,507 --> 00:16:18,143 BECAUSE IT IS NOT ALWAYS, IT IS 472 00:16:18,210 --> 00:16:20,045 NOT ALWAYS THE SAME IN EVERY 473 00:16:20,112 --> 00:16:20,312 SOFTWARE. 474 00:16:20,379 --> 00:16:21,647 AND IF YOU HAVE SPECIFIC 475 00:16:21,714 --> 00:16:23,949 QUESTIONS ABOUT THIS, YOU KNOW, 476 00:16:24,016 --> 00:16:26,852 OUR TEAM IS AVAILABLE FOR 477 00:16:26,919 --> 00:16:28,520 QUESTIONS. 478 00:16:28,587 --> 00:16:30,789 AS YOU WORK, BUT BROADLY, TO -- 479 00:16:30,856 --> 00:16:32,791 THE POINT SOME OF THE KEY ISSUES 480 00:16:32,858 --> 00:16:36,095 THAT MAY COME UP WITH THIS, 481 00:16:36,161 --> 00:16:37,629 WHICH IS DATA THAT'S 482 00:16:37,696 --> 00:16:39,698 CATEGORICAL, SO DATA WITH THREE 483 00:16:39,765 --> 00:16:41,834 OR MORE LEVELS SPECIFICALLY, BUT 484 00:16:41,900 --> 00:16:44,269 ARE IN THE DATASET AS NUMBERS. 485 00:16:44,336 --> 00:16:46,505 ZERO, ONE, TWO, ETC. 486 00:16:46,572 --> 00:16:48,340 THAT MAY BE RECODED THIS FROM 487 00:16:48,407 --> 00:16:52,111 BEING WRITTEN OUT WORD TO THEN 488 00:16:52,177 --> 00:16:54,279 ONE, TWO. 489 00:16:54,346 --> 00:17:00,352 I THINK MOST BY DEFAULT WILL 490 00:17:00,419 --> 00:17:01,420 THINK THIS IS A NUMERICAL 491 00:17:01,487 --> 00:17:02,821 VARIABLE. 492 00:17:02,888 --> 00:17:05,157 WE DON'T WANT TO TREAT THIS AS A 493 00:17:05,224 --> 00:17:06,058 NUMERICAL VARIABLE. 494 00:17:06,125 --> 00:17:10,362 WE WANT THIS CODED AS A CODED 495 00:17:10,429 --> 00:17:12,564 VARIABLE, SO WE WOULD NEED TO 496 00:17:12,631 --> 00:17:13,699 SPECIFY TO THE SOFTWARE THAT'S 497 00:17:13,766 --> 00:17:16,568 WHAT THIS IS, AND NOT BE 498 00:17:16,635 --> 00:17:18,971 AUTOMATICALLY IN THE REGRESSION 499 00:17:19,037 --> 00:17:21,140 AS A CONTINUOUS NUMERICAL 500 00:17:21,206 --> 00:17:21,774 VARIABLE AND WE WANT AN ESTIMATE 501 00:17:21,840 --> 00:17:25,711 FOR EACH OF THE LEVELS. 502 00:17:25,778 --> 00:17:27,946 BINARY 0/1 VARIABLES DON'T NEED 503 00:17:28,013 --> 00:17:31,049 TO BE RECODED BECAUSE THEY ARE 504 00:17:31,116 --> 00:17:32,284 ONE LEVEL. 505 00:17:32,351 --> 00:17:34,253 BUT CATEGORY VARIABLES WE WANT 506 00:17:34,319 --> 00:17:40,292 TO SPECIFY THESE THE RIGHT WAY 507 00:17:40,359 --> 00:17:41,593 FOR OUR SOFTWARE. 508 00:17:41,660 --> 00:17:44,930 SO IN THIS DATA, WE COULD ASK A 509 00:17:44,997 --> 00:17:46,498 QUESTION SUCH AS: IS AGE 510 00:17:46,565 --> 00:17:48,934 RELATED TO THE MAXIMAL HEART 511 00:17:49,001 --> 00:17:50,936 RATE ACHIEVED IN THESE PATIENTS 512 00:17:51,003 --> 00:17:52,571 UNDERGOING EXERCISE TESTING? 513 00:17:52,638 --> 00:17:54,173 I HAVE INCLUDED A GRAPH HERE TO 514 00:17:54,239 --> 00:17:57,009 SHOW WHAT THIS DATA LOOKS LIKE. 515 00:17:57,075 --> 00:17:59,278 AGAIN, WE HAVE ALMOST 300 DATA 516 00:17:59,344 --> 00:17:59,511 POINTS. 517 00:17:59,578 --> 00:18:01,747 AGE AND YEARS, AND THE MAXIMUM 518 00:18:01,814 --> 00:18:03,582 HEART RATE ACHIEVED IN BEATS PER 519 00:18:03,649 --> 00:18:04,216 MINUTE. 520 00:18:04,283 --> 00:18:06,919 AND PERHAPS WERE INTERESTED IN 521 00:18:06,985 --> 00:18:07,519 DESCRIBING THIS RELATIONSHIP. 522 00:18:07,586 --> 00:18:10,155 AS I SAID BEFORE, WE MIGHT, WE 523 00:18:10,222 --> 00:18:12,758 SHOULD VISUALLY LOOK AT THE DATA 524 00:18:12,825 --> 00:18:13,892 BEFORE WE DO ANYTHING. 525 00:18:13,959 --> 00:18:14,793 THE RELATIONSHIP DOES LOOK LIKE 526 00:18:14,860 --> 00:18:16,929 SOMETHING THAT WE COULD DESCRIBE 527 00:18:16,995 --> 00:18:18,997 WITH A LINE THAT LOOKS ROUGHLY 528 00:18:19,064 --> 00:18:20,098 LINEAR. 529 00:18:20,165 --> 00:18:22,367 THERE ARE NO MAJOR OUTLIERS ORAD 530 00:18:22,434 --> 00:18:24,236 DISTRIBUTIONS THAT I CAN SEE 531 00:18:24,303 --> 00:18:26,305 THAT MIGHT ALREADY GIVE US SORT 532 00:18:26,371 --> 00:18:27,573 OF SOME RED FLAGS WITH RUNNING 533 00:18:27,639 --> 00:18:27,873 REGRESSION. 534 00:18:27,940 --> 00:18:29,208 SO AGAIN, THIS IS JUST A GOOD 535 00:18:29,274 --> 00:18:31,276 WAY TO CHECK OUR DATA AND MAKE 536 00:18:31,343 --> 00:18:32,878 SURE THAT ALL OF IT LOOKS THE 537 00:18:32,945 --> 00:18:36,081 WAY IT SHOULD BEFORE WE EVEN 538 00:18:36,148 --> 00:18:38,350 THINK ABOUT RUNNING THE 539 00:18:38,417 --> 00:18:38,650 REGRESSION. 540 00:18:38,717 --> 00:18:42,087 AND USING THE -- THE -- 541 00:18:42,154 --> 00:18:43,188 TERMINOLOGY FROM PRIOR SLIDES, 542 00:18:43,255 --> 00:18:44,122 WE CAN WRITE SOMETHING LIKE THIS 543 00:18:44,189 --> 00:18:46,358 TO SHOW WHAT IT IS WE'RE 544 00:18:46,425 --> 00:18:50,062 ACTUALLY GOING TO TRY TO BE 545 00:18:50,128 --> 00:18:50,696 ESTIMATING, I'M CALLING THIS THE 546 00:18:50,762 --> 00:18:51,396 HEART RATE MAX. 547 00:18:51,463 --> 00:18:52,898 THIS IS OUR OUTCOME OF INTEREST. 548 00:18:52,965 --> 00:18:56,468 THEN WE WANT TO ESTIMATE BETA-0 549 00:18:56,535 --> 00:18:57,636 AND BETA-1. 550 00:18:57,703 --> 00:18:58,937 PARTICULARLY, WE'RE INTERESTED 551 00:18:59,004 --> 00:19:04,076 IN, AGAIN, THIS BETA-1, THIS 552 00:19:04,142 --> 00:19:04,710 SLOPE, THIS RELATIONSHIP BETWEEN 553 00:19:04,776 --> 00:19:05,143 AGE AND HEART RATE. 554 00:19:05,210 --> 00:19:09,381 AND AGE IS OUR KEY COVARIANT OF 555 00:19:09,448 --> 00:19:09,648 INTEREST. 556 00:19:09,715 --> 00:19:10,883 OUR INDEPENDENT VARIABLE. 557 00:19:10,949 --> 00:19:13,018 SO, THIS IS, THESE ARE THE ONLY 558 00:19:13,085 --> 00:19:14,653 THINGS WE NEED TO CONSIDER TO 559 00:19:14,720 --> 00:19:17,356 ANSWER THIS QUESTION THE WAY 560 00:19:17,422 --> 00:19:21,393 THAT WE PHRASE IT. 561 00:19:21,460 --> 00:19:23,262 AND SO IN R, THE WAY YOU WOULD 562 00:19:23,328 --> 00:19:25,030 SPECIFY THAT IS LIKE THIS. 563 00:19:25,097 --> 00:19:27,366 AGAIN, THIS IS OBVIOUSLY GOING 564 00:19:27,432 --> 00:19:30,903 TO BE DEPENDENT DEPENDING ON 565 00:19:30,969 --> 00:19:31,436 WHAT SOFT YOU'RE USING. 566 00:19:31,503 --> 00:19:33,705 THERE IS THIS LITTLE LM FUNCTION 567 00:19:33,772 --> 00:19:34,940 THAT RUNS LINEAR REGRESSION. 568 00:19:35,007 --> 00:19:38,877 WE TELL THE SOFTWARE WHAT DATA 569 00:19:38,944 --> 00:19:39,311 WE'RE PULLING FROM. 570 00:19:39,378 --> 00:19:40,812 THIS REALLY POORLY NAMED 571 00:19:40,879 --> 00:19:43,682 VARIABLE, I THINK, FOR A 572 00:19:43,749 --> 00:19:44,783 PUBLICLY AVAILABLE DATASET IS 573 00:19:44,850 --> 00:19:46,518 OUR MAXIMUM HEART RATE ACHIEVED 574 00:19:46,585 --> 00:19:48,520 VARIABLE AND OUR SINGLE 575 00:19:48,587 --> 00:19:50,589 PREDICTOR OF AGE AND THE 576 00:19:50,656 --> 00:19:53,292 REGRESSION IS SPLIT ON EITHER 577 00:19:53,358 --> 00:19:55,761 SIDE OF PREDICTORS AND OUTCOME 578 00:19:55,827 --> 00:19:57,963 VARIABLES BY THIS LITTLE THING. 579 00:19:58,030 --> 00:19:59,831 WE CAN CALL, SO WE RUN THE 580 00:19:59,898 --> 00:20:02,267 MODEL, WE CALL THE OUTPUT USING 581 00:20:02,334 --> 00:20:03,435 THE SUMMARY FUNCTION AND WE GET 582 00:20:03,502 --> 00:20:04,903 THIS OUTPUT. 583 00:20:04,970 --> 00:20:07,005 WHICH IS JUST A SIMPLE TABLE 584 00:20:07,072 --> 00:20:09,308 BROKEN DOWN INTO A SUPPLE 585 00:20:09,374 --> 00:20:11,810 COLUMNS AND ONE ROW FOR EACH 586 00:20:11,877 --> 00:20:12,911 BETA OR EACH COEFFICIENT THAT 587 00:20:12,978 --> 00:20:14,947 WE'RE ESTIMATING FROM THE 588 00:20:15,013 --> 00:20:15,247 REGRESSION. 589 00:20:15,314 --> 00:20:17,082 AND THE COLUMNS GIVE US ALL OF 590 00:20:17,149 --> 00:20:18,617 THIS STATISTICAL DETAILS THAT WE 591 00:20:18,684 --> 00:20:18,817 NEED. 592 00:20:18,884 --> 00:20:20,919 FIRST IT GIVES US THE ESTIMATE, 593 00:20:20,986 --> 00:20:22,955 WHICH IS THE ACTUAL VALUE OF THE 594 00:20:23,021 --> 00:20:23,855 BETA THAT WE'RE ESTIMATING. 595 00:20:23,922 --> 00:20:25,624 WE ALSO HAVE A STANDARD ERROR 596 00:20:25,691 --> 00:20:28,026 FOR THAT ESTIMATE. 597 00:20:28,093 --> 00:20:29,695 A T-VALUE TO, AGAIN, GO ALONG 598 00:20:29,761 --> 00:20:30,929 WITH THAT ESTIMATE. 599 00:20:30,996 --> 00:20:39,504 AND THE P-VALUE, WHICH IS 600 00:20:39,571 --> 00:20:40,038 DENOTED IN R WITH THE 601 00:20:40,105 --> 00:20:40,772 TERMINOLOGY, AND THE STAR HERE 602 00:20:40,839 --> 00:20:44,076 THAT TEMPS US IT IS SIGNIFICANT. 603 00:20:44,142 --> 00:20:46,178 THAT'S IMPORTANT TO US. 604 00:20:46,244 --> 00:20:47,079 IT WILL LOOK FAIRLY SIMILAR FOR 605 00:20:47,145 --> 00:20:48,347 MOST ORT SOFTWARES AS WELL. 606 00:20:48,413 --> 00:20:51,783 WHEN WE LOOK AT THIS, THIS IS 607 00:20:51,850 --> 00:20:53,385 HOW WE WOULD INSTILL IN OUR 608 00:20:53,452 --> 00:20:54,853 EQUATION, I PUT IT OVER HERE ON 609 00:20:54,920 --> 00:20:55,587 THIS GRAPH. 610 00:20:55,654 --> 00:20:57,923 I'VE ALSO SHOWN THE ESTIMATED 611 00:20:57,990 --> 00:20:58,890 LINES THAT WE GET. 612 00:20:58,957 --> 00:21:04,096 THIS RED LINE IS ACTUALLY THIS Y 613 00:21:04,162 --> 00:21:05,430 EQUALS NX PLUS B-LINE DIRECTLY 614 00:21:05,497 --> 00:21:08,133 OVERLAID ON OUR DATA. 615 00:21:08,200 --> 00:21:12,304 AND AGAIN, OUR BETA-0 IS OUR 616 00:21:12,371 --> 00:21:12,838 ESTIMATE. 617 00:21:12,904 --> 00:21:13,805 204 ROUGHLY. 618 00:21:13,872 --> 00:21:18,910 AND OUR SLOPE IS NEGATIVE SLOPE 619 00:21:18,977 --> 00:21:23,415 OF ABOUT ONE OF, ONE. 620 00:21:23,482 --> 00:21:26,618 SO AGAIN, THIS IS SORT OF HOW WE 621 00:21:26,685 --> 00:21:28,086 FILL THIS IN HERE, INTO EACH. 622 00:21:28,153 --> 00:21:31,356 AND SO, I WANT TO TALK ABOUT THE 623 00:21:31,423 --> 00:21:34,092 SPECIFIC INTERPRET TASE OF EACH 624 00:21:34,159 --> 00:21:34,793 OF THESE. 625 00:21:34,860 --> 00:21:36,762 -- INTERPRETATION OF EACH OF 626 00:21:36,828 --> 00:21:38,030 THESE. 627 00:21:38,096 --> 00:21:40,098 SO THE INTERCEPT, AGAIN, IS THE 628 00:21:40,165 --> 00:21:41,266 ESTIMATED HEART RATE FOR AN 629 00:21:41,333 --> 00:21:44,903 INDIVIDUAL AT AGE ZERO. 630 00:21:44,970 --> 00:21:47,806 AND THIS IS 204.15 BEATS PER 631 00:21:47,873 --> 00:21:48,240 MINUTE. 632 00:21:48,306 --> 00:21:50,442 THIS IS ALSO NOTED TO BE 633 00:21:50,509 --> 00:21:52,978 STATISTICAL SIGNIFICANT. 634 00:21:53,045 --> 00:21:55,947 AND THIS IS COMPARING TO ZERO, 635 00:21:56,014 --> 00:21:57,849 STATISTICAL GREATER THAN ZERO. 636 00:21:57,916 --> 00:22:01,753 IF WE WANT THIS TO BE A MORE 637 00:22:01,820 --> 00:22:03,722 INTERPRETABLE INTERCEPT, WE CAN 638 00:22:03,789 --> 00:22:04,623 DO SOMETHING CALLED MEAN 639 00:22:04,690 --> 00:22:06,458 CENTERING, OR CENTERING, SOME OF 640 00:22:06,525 --> 00:22:07,559 YOU MAY HAVE HEARD OF THIS 641 00:22:07,626 --> 00:22:10,262 BEFORE, THIS WOULD BASICALLY 642 00:22:10,328 --> 00:22:14,433 JUST DO SOMETHING WHICH GIVES US 643 00:22:14,499 --> 00:22:15,267 MORE INTERPRETABLE INTERCEPT. 644 00:22:15,333 --> 00:22:16,535 WE WOULD TAKE EACH AGE OF THE 645 00:22:16,601 --> 00:22:18,437 PERSON IN THE COHORT, CALCULATE 646 00:22:18,503 --> 00:22:22,974 A COHORT AVERAGE AGE, AND FOR 647 00:22:23,041 --> 00:22:24,476 EACH PERSON CALCULATE THE CHANGE 648 00:22:24,543 --> 00:22:25,377 FROM THAT COHORT MEAN. 649 00:22:25,444 --> 00:22:28,447 AND SO EACH PERSON'S NEW MEAN 650 00:22:28,513 --> 00:22:31,083 CENTERED AGE WOULD BE RELATIVE 651 00:22:31,149 --> 00:22:31,850 TO THAT AGE. 652 00:22:31,917 --> 00:22:35,220 THEN IF WE USE THAT RECALCULATED 653 00:22:35,287 --> 00:22:35,554 IN A VARIABLE. 654 00:22:35,620 --> 00:22:38,156 THE INTERCEPT WOULD BE THE 655 00:22:38,223 --> 00:22:38,957 ESTIMATED HEART RATE AT THE 656 00:22:39,024 --> 00:22:41,426 COHORT MEAN AGE, INSTEAD OF AT 657 00:22:41,493 --> 00:22:41,860 ZERO AGE. 658 00:22:41,927 --> 00:22:44,096 IT WOULD BE AT ZERO, WHICH IS IS 659 00:22:44,162 --> 00:22:45,197 NOW ZERO THAT MEANS THE MEAN. 660 00:22:45,263 --> 00:22:46,998 WE CAN DO THINGS LIKE THAT TO 661 00:22:47,065 --> 00:22:48,767 OUR VARIABLES TO HELP US 662 00:22:48,834 --> 00:22:50,235 INTERPRET INTERCEPTS A LITTLE 663 00:22:50,302 --> 00:22:50,836 BIT BETTER. 664 00:22:50,902 --> 00:22:55,574 BUT ALTHOUGH, I THINK, BROADLY. 665 00:22:55,640 --> 00:22:57,642 WE AREN'T USUALLY INTERESTED IN 666 00:22:57,709 --> 00:22:58,543 INTERPRETING INTERCEPTS. 667 00:22:58,610 --> 00:23:01,046 AND THEN THE AGE COEFFICIENT. 668 00:23:01,113 --> 00:23:03,215 AND THIS IS, WE WOULD INTERPRET 669 00:23:03,281 --> 00:23:08,320 THIS AS THE ESTIMATED HEART RATE 670 00:23:08,386 --> 00:23:09,955 DECREASES BY 1.0002 BEATS PER 671 00:23:10,021 --> 00:23:12,924 MINUTE PER YEAR INCREASE IN AGE. 672 00:23:12,991 --> 00:23:15,460 AGAIN, THAT'S THE, OUR NICE 673 00:23:15,527 --> 00:23:16,294 INTERPRETATION OF SLOPE THAT 674 00:23:16,361 --> 00:23:18,764 MOST OF US ARE FAMILIAR WITH. 675 00:23:18,830 --> 00:23:22,134 THE TRICK IS JUST KNOWING HOW -- 676 00:23:22,200 --> 00:23:24,569 AGAIN, IT IS WHAT OUR OUTCOME 677 00:23:24,636 --> 00:23:25,137 VARIABLE, WHICH IS MEASURING 678 00:23:25,203 --> 00:23:27,339 BEATS PER MINUTE, AND THEN PER 679 00:23:27,405 --> 00:23:28,573 INCREASING UNIT IN AGE. 680 00:23:28,640 --> 00:23:29,975 AGAIN, THIS IS A NEGATIVE SLOPE, 681 00:23:30,041 --> 00:23:31,743 SO WE HAVE A DECREASE IN HEART 682 00:23:31,810 --> 00:23:32,477 RATE OVER TIME. 683 00:23:32,544 --> 00:23:34,513 THIS IS WHERE HAVING THE FIGURE 684 00:23:34,579 --> 00:23:36,848 ALSO HELPS, YOU CAN MAKE SURE 685 00:23:36,915 --> 00:23:37,649 THAT YOU'RE INTERPRETING IT 686 00:23:37,716 --> 00:23:38,817 RIGHT WAY. 687 00:23:38,884 --> 00:23:39,951 IT SHOULD ALL JIVE WITH HOW 688 00:23:40,018 --> 00:23:42,387 YOU'VE THINKING ABOUT IT. 689 00:23:42,454 --> 00:23:44,256 THIS IS STATISTICAL 690 00:23:44,322 --> 00:23:44,890 SIGNIFICANTLY DIFFERENT FROM A 691 00:23:44,956 --> 00:23:46,558 SLOPE OF ZERO WHICH IS NO 692 00:23:46,625 --> 00:23:49,895 RELATIONSHIP, AND NO CHANGE 693 00:23:49,961 --> 00:23:50,362 ACROSS INCREASING AGE. 694 00:23:50,428 --> 00:23:52,798 AND I WANT TO POINT OUT THAT 695 00:23:52,864 --> 00:23:56,368 MAYBE THIS IS LIKE A TECHNICAL 696 00:23:56,434 --> 00:23:58,136 POINT, BUT THIS SLOPE, I WANT TO 697 00:23:58,203 --> 00:24:00,605 JUST BE CAREFUL ESPECIALLY WHEN 698 00:24:00,672 --> 00:24:03,975 WE TALK ABOUT AGE AND PER YEAR. 699 00:24:04,042 --> 00:24:06,044 THIS IS A COHORT-LEVEL BEHAVIOR 700 00:24:06,111 --> 00:24:07,913 IN TERMS OF WE'RE NOT TALKING 701 00:24:07,979 --> 00:24:10,348 ABOUT AS A PERSON AGES, BECAUSE 702 00:24:10,415 --> 00:24:11,650 THIS ISN'T LONGITUDINAL DATA. 703 00:24:11,716 --> 00:24:13,118 THIS IS SAYING AS WE LOOK AT 704 00:24:13,185 --> 00:24:16,188 PEOPLE WHO ARE OLDER WE HAVE 705 00:24:16,254 --> 00:24:16,888 THIS RELATIONSHIP. 706 00:24:16,955 --> 00:24:18,924 IT YOU WANTED TO ACTUALLY SAY AS 707 00:24:18,990 --> 00:24:22,828 A PERSON AGES OR WITH -- AS A 708 00:24:22,894 --> 00:24:24,529 PERSON GETS OLDER YOU WOULD HAVE 709 00:24:24,596 --> 00:24:28,233 TO BE DOING REPEATED MEASURES ON 710 00:24:28,300 --> 00:24:29,301 THE SAME PERSON. 711 00:24:29,367 --> 00:24:32,204 SO THIS IS A FUNNY 712 00:24:32,270 --> 00:24:36,341 INTERPRETATION HERE, AND BOTH OF 713 00:24:36,408 --> 00:24:38,176 ARE GOING TO EXTRAPOLATE TO SAY, 714 00:24:38,243 --> 00:24:39,411 AS YOU ARE OLDER YOUR MAXIMUM 715 00:24:39,477 --> 00:24:40,412 HEART RATE IS LOWER. 716 00:24:40,478 --> 00:24:44,850 BUT TO BE CAUTIOUS AND SAY THIS 717 00:24:44,916 --> 00:24:45,951 IS NOT ACTUALLY LONGITUDINAL 718 00:24:46,017 --> 00:24:46,685 DATA, THIS IS A CROSS-SECTION, 719 00:24:46,751 --> 00:24:48,820 WE MIGHT NOT HAVE A GOOD INTERN 720 00:24:48,887 --> 00:24:51,957 TRUITATION OF THAT HERE. 721 00:24:52,023 --> 00:24:54,125 AND JUST TO SPEAK TO THE 722 00:24:54,192 --> 00:24:55,360 PREDICTION POINT OF THIS A 723 00:24:55,427 --> 00:24:55,894 LITTLE BIT. 724 00:24:55,961 --> 00:25:00,098 SO I JUST WANT TO POINT OUT THAT 725 00:25:00,165 --> 00:25:01,900 YOU COULD HAVE A SCENARIO WHERE 726 00:25:01,967 --> 00:25:04,135 MAYBE, WITH THIS MODEL, WITH 727 00:25:04,202 --> 00:25:06,137 THIS PREDICTED MODEL, MAYBE YOU 728 00:25:06,204 --> 00:25:07,973 WANT TO KNOW THE BEHAVIOR OF 729 00:25:08,039 --> 00:25:08,373 SOMEBODY WITH SOME 730 00:25:08,440 --> 00:25:09,241 CHARACTERISTICS. 731 00:25:09,307 --> 00:25:10,842 MAYBE IT IS JUST HANE, OR MAYBE 732 00:25:10,909 --> 00:25:13,478 IF WE HAD A LONGER MODEL THIS 733 00:25:13,545 --> 00:25:15,480 MIGHT BE MORE USEFUL, WE HAD 734 00:25:15,547 --> 00:25:16,481 DIFFERENT SETS OF 735 00:25:16,548 --> 00:25:17,415 CHARACTERISTICS ON A PERSON. 736 00:25:17,482 --> 00:25:20,051 SAY, FOR EXAMPLE, WE JUST WANTED 737 00:25:20,118 --> 00:25:23,355 TO KNOW WHAT PREDICTED MAXIMUM 738 00:25:23,421 --> 00:25:26,157 HEART RATE ACHIEVED FOR AN AGE 739 00:25:26,224 --> 00:25:28,793 AGE 50 WOULD BE, WE WOULD PLUG 740 00:25:28,860 --> 00:25:29,394 THAT IN AND DO THE CALCULATION 741 00:25:29,461 --> 00:25:29,961 IN THE WAY YOU WOULD EXPECT. 742 00:25:30,028 --> 00:25:34,466 AND GET A NUMBER OF 154.15 BEATS 743 00:25:34,532 --> 00:25:35,634 PER MINUTE. 744 00:25:35,700 --> 00:25:36,334 YOU CAN SEE NOW YOU WOULDN'T GET 745 00:25:36,401 --> 00:25:38,570 A NUMBER FROM THIS, BUT YOU CAN, 746 00:25:38,637 --> 00:25:41,039 THE WAY THAT THIS IS HAPPENING, 747 00:25:41,106 --> 00:25:43,074 AGAIN, WE'RE JUST USING AGE. 748 00:25:43,141 --> 00:25:44,976 WHEREAS ON THE PREDICTED LINE 749 00:25:45,043 --> 00:25:46,711 AND WHAT IS THE Y ATTRIBUTED TO 750 00:25:46,778 --> 00:25:49,981 THAT VALUE ON THE PREDICTED 751 00:25:50,048 --> 00:25:55,420 LINE. 752 00:25:55,487 --> 00:25:56,988 AND IN R AND IN MOST OTHER 753 00:25:57,055 --> 00:25:58,957 SOFTWARES THERE ARE GOING TO BE 754 00:25:59,024 --> 00:26:01,159 A BUNCH OF OTHER PIECES OF THE 755 00:26:01,226 --> 00:26:02,193 OUTPUT AS WELL. 756 00:26:02,260 --> 00:26:03,461 I'M NOT GOING TO TALK A WHOLE 757 00:26:03,528 --> 00:26:04,829 LOT ABOUT A LOT OF THIS STUFF. 758 00:26:04,896 --> 00:26:06,998 JUST FOR THE SAKE OF TIME. 759 00:26:07,065 --> 00:26:09,467 REALLY. 760 00:26:09,534 --> 00:26:10,702 AND COMPLEXITY. 761 00:26:10,769 --> 00:26:11,770 BUT THERE ARE A COUPLE PIECES 762 00:26:11,836 --> 00:26:15,407 HERE THAT I'M SHOWING IN THE R 763 00:26:15,473 --> 00:26:16,074 OUTPUT. 764 00:26:16,141 --> 00:26:17,876 THE FIRST IS RESIDUAL STANDARD 765 00:26:17,943 --> 00:26:19,277 ERROR, THIS IS THE STANDARD 766 00:26:19,344 --> 00:26:21,680 DEVIATION OF THOSE ERRORS. 767 00:26:21,746 --> 00:26:23,114 AGAIN, THIS IS, THE PIECE OF -- 768 00:26:23,181 --> 00:26:25,050 THE PUZZLE THAT WE TALKED ABOUT 769 00:26:25,116 --> 00:26:28,086 EARLIER WHICH IS THAT ALL THOSE 770 00:26:28,153 --> 00:26:31,056 ERRORS SHOULD HAVE THE SAME -- 771 00:26:31,122 --> 00:26:32,157 THERE SHOULD BE UNIFORM 772 00:26:32,223 --> 00:26:33,124 VARIABILITY ACROSS THE ERRORS. 773 00:26:33,191 --> 00:26:35,226 THIS IS KIND OF COMING, THIS IS 774 00:26:35,293 --> 00:26:36,928 WHERE THAT PIECE COMES IN. 775 00:26:36,995 --> 00:26:39,798 THEY ARE ALL ABLE TO BE WELL 776 00:26:39,864 --> 00:26:41,399 DESCRIBED BY A SINGLE DEVIATION 777 00:26:41,466 --> 00:26:42,400 OR VARIABILITY. 778 00:26:42,467 --> 00:26:43,935 THIS GIVES YOU THAT STATISTIC. 779 00:26:44,002 --> 00:26:46,705 WE ALSO HAVE A MULTIPLE R 780 00:26:46,771 --> 00:26:48,573 SQUARED, ADJUSTED R SQUARED. 781 00:26:48,640 --> 00:26:50,141 THIS IS THE REPORTERRING OF THIS 782 00:26:50,208 --> 00:26:53,378 TYPE OF THING IS VARYING BY 783 00:26:53,445 --> 00:26:54,012 SOFTWARE AS WELL. 784 00:26:54,079 --> 00:26:56,247 PREFER TO YOUR DOCUMENTATION 785 00:26:56,314 --> 00:26:57,282 WHAT IS SPECIFICALLY BEING 786 00:26:57,349 --> 00:26:58,016 RECORDED THERE. 787 00:26:58,083 --> 00:26:59,985 AND I, AGAIN, A LOT OF PEOPLE 788 00:27:00,051 --> 00:27:00,919 LIKE TO TALK ABOUT R SQUARED. 789 00:27:00,986 --> 00:27:04,089 I'M NOT GOING TO REALLY SAY TOO 790 00:27:04,155 --> 00:27:06,057 MUCH ABOUT THAT HERE BECAUSE I 791 00:27:06,124 --> 00:27:07,592 DON'T REALLY FIND THEM TO BE 792 00:27:07,659 --> 00:27:08,460 THAT HELPFUL, HONESTLY. 793 00:27:08,526 --> 00:27:10,395 I THINK THEY ARE REALLY 794 00:27:10,462 --> 00:27:12,664 DEPENDENT ON YOUR DATA AND, AND 795 00:27:12,731 --> 00:27:15,367 Ns AND THE NUMBER OF VARIABLES 796 00:27:15,433 --> 00:27:16,601 THAT YOU PUT IN HERE. 797 00:27:16,668 --> 00:27:20,205 ALTHOUGH, IN SOME WAYS ADJUSTED 798 00:27:20,271 --> 00:27:21,206 AND MULTIPLE R SQUARES ACCOUNT 799 00:27:21,272 --> 00:27:23,508 FOR THE VARIABLES IN THE MODEL. 800 00:27:23,575 --> 00:27:24,809 ESSENTIALLY WHAT R SQUARED IS 801 00:27:24,876 --> 00:27:27,345 TRYING TO DO IS DESCRIBE THE 802 00:27:27,412 --> 00:27:29,047 PORTION OF VARIANCE RELATIVE TO 803 00:27:29,114 --> 00:27:32,017 THE VARIANCE OF THE ERRORS. 804 00:27:32,083 --> 00:27:34,619 SO, HOW WELL DOES THE MODEL 805 00:27:34,686 --> 00:27:37,055 CAPTURE ALL OF THE VARIABILITY 806 00:27:37,122 --> 00:27:37,555 THAT'S BEING EXPLAINED? 807 00:27:37,622 --> 00:27:41,493 AND I THINK YOU CAN KIND OF SEE 808 00:27:41,559 --> 00:27:43,395 VISUALLY THAT WHILE THERE IS A 809 00:27:43,461 --> 00:27:47,665 NEGATIVE SLOPE HERE, WHILE THERE 810 00:27:47,732 --> 00:27:49,134 IS SOME EXPLANATORY POWER OF 811 00:27:49,200 --> 00:27:52,103 THIS LINE, THAT OBVIOUSLY, IT IS 812 00:27:52,170 --> 00:27:54,205 NOT PERFECTLY PREDICTIVE OF A 813 00:27:54,272 --> 00:27:55,774 PERSON'S MAXIMUM HEART RATE 814 00:27:55,840 --> 00:27:56,041 ACHIEVED. 815 00:27:56,107 --> 00:27:58,143 THERE IS SOME OTHER VARIABLE 816 00:27:58,209 --> 00:27:59,344 THAT IS IN? 817 00:27:59,411 --> 00:28:00,078 EACH PERSON'S OBSERVATION THAT 818 00:28:00,145 --> 00:28:02,647 WE ARE NOT ABLE TO CAPTURE WITH 819 00:28:02,714 --> 00:28:04,115 THE SIMPLE MODEL. 820 00:28:04,182 --> 00:28:06,684 BUT I THINK MOST OF US HAVE 821 00:28:06,751 --> 00:28:07,886 INTUITIVE UNDERSTANDING WHEN WE 822 00:28:07,952 --> 00:28:09,254 LOOK AT THIS, THERE ARE A LOT 823 00:28:09,320 --> 00:28:10,889 MORE VARIABLES AT PLAY HERE THAT 824 00:28:10,955 --> 00:28:13,491 WOULD BE INFLUENCING A PERSON'S 825 00:28:13,558 --> 00:28:15,360 OBSERVED HEART RATE. 826 00:28:15,427 --> 00:28:17,862 THERE IS ALSO AN F-STATISTIC AND 827 00:28:17,929 --> 00:28:19,597 A P-VALUE FOR, THIS SHOULD SAY 828 00:28:19,664 --> 00:28:21,633 MODEL, FOR THE MODEL OVERALL, 829 00:28:21,699 --> 00:28:25,904 AGAIN, I DON'T REALLY WANT TO 830 00:28:25,970 --> 00:28:27,038 SPEND TIME TALKING ABOUTCIFICLY. 831 00:28:27,105 --> 00:28:28,273 WE WILL TALK ABOUT F-TESTS 832 00:28:28,339 --> 00:28:28,973 LATER. 833 00:28:29,040 --> 00:28:31,776 THIS IS JUST TELLING YOU IS THE 834 00:28:31,843 --> 00:28:32,811 MODEL USEFUL TO RUN AT ALL. 835 00:28:32,877 --> 00:28:34,579 I THINK HONESTLY THAT HAS MORE 836 00:28:34,646 --> 00:28:36,848 TO DO WITH THE GOALS OF THE 837 00:28:36,915 --> 00:28:38,716 INVESTIGATION THEN IT DOES WITH 838 00:28:38,783 --> 00:28:44,522 F-STATISTICS, BUT -- YEAH. 839 00:28:44,589 --> 00:28:47,225 AND SO FROM THAT MODEL THAT WE 840 00:28:47,292 --> 00:28:48,760 JUST RAN, WE WOULD ALSO CHECK 841 00:28:48,827 --> 00:28:49,060 MODEL FIT. 842 00:28:49,127 --> 00:28:52,363 THIS IS ACTUALLY A SET OF 843 00:28:52,430 --> 00:28:53,031 OUTPUTS FROM R. 844 00:28:53,098 --> 00:28:55,800 R AND SOME OTHER SOFTWARES WILL 845 00:28:55,867 --> 00:28:57,669 HAVE BUILT IN MODEL CHECKING 846 00:28:57,735 --> 00:28:59,270 THAT YOU COULD DO DEPENDING ON 847 00:28:59,337 --> 00:29:00,705 HOW YOU ACCESS THOSE. 848 00:29:00,772 --> 00:29:02,207 AGAIN, THAT'S GOING TO DEPEND ON 849 00:29:02,273 --> 00:29:02,941 YOUR SOFTWARE. 850 00:29:03,007 --> 00:29:05,743 BUT IN R, YOU CAN PULL UP THESE 851 00:29:05,810 --> 00:29:07,011 THINGS TO INVESTIGATE MODEL FIT. 852 00:29:07,078 --> 00:29:08,713 SO ON THE LEFT, WE WOULD BE 853 00:29:08,780 --> 00:29:13,751 INVESTIGATING THE NORMALITY OF 854 00:29:13,818 --> 00:29:16,287 THE RESIDUALS AND ON THE RIGHT, 855 00:29:16,354 --> 00:29:22,026 IS THERE THAT NICE EVEN 856 00:29:22,093 --> 00:29:24,462 DISTRIBUTION OF RESIDUALS ACROSS 857 00:29:24,529 --> 00:29:26,264 ALL VALUES OF Y. 858 00:29:26,331 --> 00:29:29,968 SO, ON THE LEFT, AGAIN, WE 859 00:29:30,034 --> 00:29:30,902 TALKED ABOUT HOW, A LITTLE BIT 860 00:29:30,969 --> 00:29:34,105 HOW WE'RE NOT REALLY EVER GOING 861 00:29:34,172 --> 00:29:36,674 TO SEE A PERFECT X EQUALS Y 862 00:29:36,741 --> 00:29:36,875 HERE. 863 00:29:36,941 --> 00:29:38,776 WE DO SEE A LITTLE BIT OF THIS 864 00:29:38,843 --> 00:29:39,477 TAILING OFF HERE. 865 00:29:39,544 --> 00:29:40,545 SO MAYBE THERE IS A LITTLE BIT 866 00:29:40,612 --> 00:29:43,214 OF SKEWED BEHAVIOR. 867 00:29:43,281 --> 00:29:45,984 BUT PERHAPS NOT ENOUGH TO REALLY 868 00:29:46,050 --> 00:29:47,051 DO ANYTHING ABOUT JUST WORTH 869 00:29:47,118 --> 00:29:48,186 NOTING. 870 00:29:48,253 --> 00:29:49,654 AGAIN, I THINK VISUALLY, THE 871 00:29:49,721 --> 00:29:52,457 MODEL THAT WE FIT SORT OF JIVES 872 00:29:52,524 --> 00:29:53,124 WITH WHAT WE WOULD EXPECT TO 873 00:29:53,191 --> 00:29:53,291 SEE. 874 00:29:53,358 --> 00:29:58,062 SO I DON'T THINK THAT WE'RE 875 00:29:58,129 --> 00:30:00,098 SEEING ANY BEHAVIOR UNEXPECTED 876 00:30:00,165 --> 00:30:02,634 OR DUE TO POOR MODEL FIT. 877 00:30:02,700 --> 00:30:04,302 AND THE RESIDUAL BEHAVIOR FOR 878 00:30:04,369 --> 00:30:05,670 VARIANCE ALSO PRETTY GOOD. 879 00:30:05,737 --> 00:30:09,140 IT MAY BE MORE CIRCULAR, BUT IT 880 00:30:09,207 --> 00:30:09,641 IS STILL FAIRLY EVEN. 881 00:30:09,707 --> 00:30:10,675 THERE IS A LITTLE BIT MORE 882 00:30:10,742 --> 00:30:11,576 VARIABILITY TOWARDS THE MIDDLE 883 00:30:11,643 --> 00:30:14,746 THEN AT THE END, PERHAPS. 884 00:30:14,812 --> 00:30:16,614 BUT AGAIN, NOTHING SUPER CRAZY 885 00:30:16,681 --> 00:30:24,656 LOOKING HERE FOR MODEL FIT. 886 00:30:24,722 --> 00:30:26,124 SO MOVING ONTO ADDING 887 00:30:26,191 --> 00:30:26,558 COVARIANTS. 888 00:30:26,624 --> 00:30:30,995 SO, WE CAN MAKE A MODEL MUCH 889 00:30:31,062 --> 00:30:33,097 MORE COMPLEXES BY ADDING 890 00:30:33,164 --> 00:30:34,332 COVARIANCES INTO THE REGRESSION. 891 00:30:34,399 --> 00:30:37,468 SO PERHAPS WE WANT TO CONTROL 892 00:30:37,535 --> 00:30:40,205 FOR FASTING BLOOD SUGAR, SERUM 893 00:30:40,271 --> 00:30:46,077 CHOLESTEROL, AND THE SLOPE OF ST 894 00:30:46,144 --> 00:30:49,147 SEGMENT, THREE -- SMALLINAL, 895 00:30:49,214 --> 00:30:49,814 UPSLOPING, FLAT, AND 896 00:30:49,881 --> 00:30:50,114 DOWNSLOPING. 897 00:30:50,181 --> 00:30:52,083 PERHAPS WE WANT TO ASK ABOUT AGE 898 00:30:52,150 --> 00:30:53,184 AND ALSO CONTROLLING FOR THE 899 00:30:53,251 --> 00:30:54,953 OTHER VARIABLES. 900 00:30:55,019 --> 00:30:58,690 IN THIS CASE, FASTING BLOOD 901 00:30:58,756 --> 00:30:59,057 SUGAR IS BINARY. 902 00:30:59,123 --> 00:31:02,794 SPLIT IT 120. 903 00:31:02,860 --> 00:31:06,397 AND SERUM CHOLESTEROL IS A 904 00:31:06,464 --> 00:31:06,798 CONTINUOUS METRIC. 905 00:31:06,864 --> 00:31:09,234 SO WE CAN RUN THAT MODEL OUTPUT 906 00:31:09,300 --> 00:31:10,768 LOOKS SIMILAR, BUT JUST WITH 907 00:31:10,835 --> 00:31:11,502 MORE ROWS, RIGHT? 908 00:31:11,569 --> 00:31:12,937 BECAUSE WE HAVE MORE 909 00:31:13,004 --> 00:31:13,538 COEFFICIENTS. 910 00:31:13,605 --> 00:31:14,606 AND RECALL THAT THE 911 00:31:14,672 --> 00:31:16,007 INTERPRETATION OF THE INTERCEPT 912 00:31:16,074 --> 00:31:17,442 IS HOLDING ALL OF THE OTHER 913 00:31:17,508 --> 00:31:19,177 ITEMS THAT ZERO, WHICH MEANS 914 00:31:19,244 --> 00:31:20,411 EACH GROUP NEEDS TO HAVE A 915 00:31:20,478 --> 00:31:23,081 REFERENCE GROUP. 916 00:31:23,147 --> 00:31:23,915 SO FOR CONTINUOUS VARIABLES THIS 917 00:31:23,982 --> 00:31:26,351 IS ZERO NATURALLY. 918 00:31:26,417 --> 00:31:27,485 AND FOR BINARY VARIABLES OR 919 00:31:27,552 --> 00:31:29,320 CONTINUOUS VARIABLES WE JUST 920 00:31:29,387 --> 00:31:30,521 NEED TO BE CLEAR ABOUT WHAT THE 921 00:31:30,588 --> 00:31:33,258 REFERENCE GROUPS ARE AND PERHAPS 922 00:31:33,324 --> 00:31:33,858 WE ACTUALLY WANT TO SET THOSE 923 00:31:33,925 --> 00:31:35,793 BEFORE WE RUN THE MODEL. 924 00:31:35,860 --> 00:31:36,961 AS FAR AS THE OUTPUT, THE 925 00:31:37,028 --> 00:31:38,563 MISSING ITEM IN THE ROWS IS THE 926 00:31:38,630 --> 00:31:40,398 REFERENCE GROUP. 927 00:31:40,465 --> 00:31:42,267 AND SO, WE SEE TWO ROWS 928 00:31:42,333 --> 00:31:45,036 ACCOUNTED FOR HERE FOR ST SLOPE, 929 00:31:45,103 --> 00:31:46,404 THE DOWNSLOPE AND UPSLOPING 930 00:31:46,471 --> 00:31:50,208 GROUP, THE REFERENCE IN THIS 931 00:31:50,275 --> 00:31:52,043 CASE WOULD BE FLAT. 932 00:31:52,110 --> 00:31:53,811 FASTING BLOOD SUGAR TRUE THAT 933 00:31:53,878 --> 00:31:56,547 MEANS IT IS GREATER THAN 120, 934 00:31:56,614 --> 00:31:58,583 THE REFERENCE LEVEL IS 120. 935 00:31:58,650 --> 00:32:00,852 AND AGAIN, FOR AGE AND 936 00:32:00,918 --> 00:32:02,020 CHOLESTEROL, THE -- THE 937 00:32:02,086 --> 00:32:04,555 REFERENCE REALLY IS ZERO. 938 00:32:04,622 --> 00:32:05,757 AS FAR AS THAT. 939 00:32:05,823 --> 00:32:07,892 AND BOTH OF THESE ARE NOT 940 00:32:07,959 --> 00:32:09,227 CENTERED. 941 00:32:09,294 --> 00:32:12,563 IN THIS MODEL. 942 00:32:12,630 --> 00:32:14,565 AND WE CAN WRITE OUT THIS 943 00:32:14,632 --> 00:32:16,401 EQUATION IN A -- SIMILAR WAY 944 00:32:16,467 --> 00:32:17,502 THAT WE WOULD WRITE OUT OUR 945 00:32:17,568 --> 00:32:19,737 SIMPLE EQUATION THAT WE LOOKED 946 00:32:19,804 --> 00:32:20,538 AT. 947 00:32:20,605 --> 00:32:22,440 OUR BETA-0 AND BETA-1. 948 00:32:22,507 --> 00:32:24,008 WE ADD ALL OF THE OTHER 949 00:32:24,075 --> 00:32:25,743 VARIABLES IN. 950 00:32:25,810 --> 00:32:27,578 SO WE HAVE A BINARY VARIABLE 951 00:32:27,645 --> 00:32:29,380 THAT IS ATTACHED TO OUR BETA-2 952 00:32:29,447 --> 00:32:32,083 WHICH TELLS US ABOUT THE FASTING 953 00:32:32,150 --> 00:32:33,418 BLOOD SUGAR BEHAVIOR. 954 00:32:33,484 --> 00:32:35,920 WHETHER IT IS EQUAL, WHETHER OR 955 00:32:35,987 --> 00:32:39,257 NOT IT'S GREATER THAN 120. 956 00:32:39,324 --> 00:32:41,592 WE HAVE OUR CONTINUOUS 957 00:32:41,659 --> 00:32:42,794 CHOLESTEROL MEASUREMENT ATTACHED 958 00:32:42,860 --> 00:32:43,761 TO BETA-3. 959 00:32:43,828 --> 00:32:46,564 AND THEN TWO ADDITIONAL 960 00:32:46,631 --> 00:32:48,966 COEFFICIENTS WHICH ARE ATTACHED 961 00:32:49,033 --> 00:32:49,534 TO INDICATOR VARIABLES ABOUT 962 00:32:49,600 --> 00:32:51,169 WHETHER THE ST SLOPE IS DOWN 963 00:32:51,235 --> 00:32:54,706 SLOPING OR UPSLOPING. 964 00:32:54,772 --> 00:32:55,807 AND AGAIN, THE THINGS IN 965 00:32:55,873 --> 00:32:58,476 PARENTHESES HERE I'M CALLING 966 00:32:58,543 --> 00:32:59,944 THESE INDICATOR VARIABLES ARE 967 00:33:00,011 --> 00:33:02,180 THAT THAT OR THE REFERENCE? 968 00:33:02,246 --> 00:33:03,581 IF THEY ARE NOT DOWN OR UP, BOTH 969 00:33:03,648 --> 00:33:04,282 OF THESE WOULD BE ZERO. 970 00:33:04,349 --> 00:33:06,951 AND WE WOULD BE TALKING ABOUT 971 00:33:07,018 --> 00:33:10,688 THE REFERENCE WHICH IS CONTAINED 972 00:33:10,755 --> 00:33:12,757 IN THE INTERCEPT. 973 00:33:12,824 --> 00:33:14,559 SO LET'S JUST STEP THROUGH THE 974 00:33:14,625 --> 00:33:16,127 INTERPRETATION. 975 00:33:16,194 --> 00:33:17,195 SO, WHEN AGE IS ZERO, THE 976 00:33:17,261 --> 00:33:19,063 FASTING BLOOD SUGAR IS LESS THAN 977 00:33:19,130 --> 00:33:19,731 20. 978 00:33:19,797 --> 00:33:22,333 ABOUT 120, THE CHOLESTEROL IS 979 00:33:22,400 --> 00:33:22,667 ZERO. 980 00:33:22,734 --> 00:33:23,568 AND ST SLOPE IS FAST. 981 00:33:23,634 --> 00:33:28,673 THE HEART RATE IS ESTIMATED TO 982 00:33:28,740 --> 00:33:29,040 BE 178.9. 983 00:33:29,107 --> 00:33:32,009 FOR AGE, AGAIN, THIS IS SORT OF 984 00:33:32,076 --> 00:33:33,745 SIMILAR TO WHAT WE TALKED ABOUT 985 00:33:33,811 --> 00:33:33,978 BEFORE. 986 00:33:34,045 --> 00:33:36,247 BUT WE HAVE THIS ADDITIONAL 987 00:33:36,314 --> 00:33:38,316 LANGUAGE OF WHEN HOLDING ALL 988 00:33:38,383 --> 00:33:38,850 OTHER VARIABLES CONSTANT. 989 00:33:38,916 --> 00:33:41,753 SO WHEN WE LOOK AT PEOPLE WITH 990 00:33:41,819 --> 00:33:44,088 SIMILAR OTHER CHARACTERISTICS OF 991 00:33:44,155 --> 00:33:45,390 FASTING BLOOD SUGAR, 992 00:33:45,456 --> 00:33:47,492 CHOLESTEROL, AND ST SLOPE, THE 993 00:33:47,558 --> 00:33:50,762 ESTIMATED HEART RATE DECREASES, 994 00:33:50,828 --> 00:33:53,731 NEGATIVE SLOPE, BY .8649 BEATS 995 00:33:53,798 --> 00:33:56,134 PER MINUTE FOR ONE YEAR INCREASE 996 00:33:56,200 --> 00:33:58,703 IN AGE. 997 00:33:58,770 --> 00:34:01,339 AND THE INTERPRETATIONS FOLLOW 998 00:34:01,406 --> 00:34:04,075 SIMILARLY TO THAT. 999 00:34:04,142 --> 00:34:05,309 SO, THIS IS BINARY. 1000 00:34:05,376 --> 00:34:07,011 THIS FASTING BLOOD SUGAR IS 1001 00:34:07,078 --> 00:34:07,345 BINARY. 1002 00:34:07,412 --> 00:34:09,447 SO AGAIN, COMPARING FOR OUR 1003 00:34:09,514 --> 00:34:11,983 REFERENCE GROUP. 1004 00:34:12,049 --> 00:34:13,284 THE MAXIMUM HEART RATE IS 2.0059 1005 00:34:13,351 --> 00:34:18,856 BEATS PER MINUTE HIGHER IN THE 1006 00:34:18,923 --> 00:34:19,857 120-MILLIGRAM PER DECILITER 1007 00:34:19,924 --> 00:34:21,926 PATIENT, GREATER THAN, COMPARED 1008 00:34:21,993 --> 00:34:28,099 TO LESS THAN 120-MILLIGRAM PER 1009 00:34:28,166 --> 00:34:28,533 DECILITER PATIENT. 1010 00:34:28,599 --> 00:34:30,468 FOR THE CHOLESTEROL, THE 1011 00:34:30,535 --> 00:34:32,603 ESTIMATED HEART RATE INCREASES 1012 00:34:32,670 --> 00:34:35,239 BY .0343 BEATS PER MINUTE FOR 1013 00:34:35,306 --> 00:34:36,474 EVERY ONE UNIT INCREASE IN 1014 00:34:36,541 --> 00:34:36,774 CHOLESTEROL. 1015 00:34:36,841 --> 00:34:38,543 AGAIN, WE CAN ALSO LOOK AT 1016 00:34:38,609 --> 00:34:40,812 SIGNIFICANCE FOR ALL OF THESE, 1017 00:34:40,878 --> 00:34:42,513 BUT THE LAST TWO WERE NOT 1018 00:34:42,580 --> 00:34:42,814 SIGNIFICANT. 1019 00:34:42,880 --> 00:34:47,318 AND THEN WE HAVE THIS LIKE 1020 00:34:47,385 --> 00:34:48,986 THREE-LEVEL CATEGORICAL VARIABLE 1021 00:34:49,053 --> 00:34:50,521 THAT IS REPRESENTED BY TWO LINES 1022 00:34:50,588 --> 00:34:52,056 OF OUTPUT HERE. 1023 00:34:52,123 --> 00:34:54,792 WE CAN INTERPRET BOTH OF THOSE 1024 00:34:54,859 --> 00:34:55,960 IN RELATIONSHIP TO THE REFERENCE 1025 00:34:56,027 --> 00:34:57,361 GROUP OF FLAT SLOPE, FLAT ST 1026 00:34:57,428 --> 00:34:58,029 SLOPE. 1027 00:34:58,096 --> 00:35:01,732 SO THE ESTIMATED HEART RATE IS 1028 00:35:01,799 --> 00:35:04,068 ROUGHLY FIVE BEATS PER MINUTE 1029 00:35:04,135 --> 00:35:06,037 HIGHER ON AVERAGE FOR ST 1030 00:35:06,103 --> 00:35:07,505 DOWNSLOPES COMPARED TO FLAT ST. 1031 00:35:07,572 --> 00:35:08,072 THIS IS POSITIVE. 1032 00:35:08,139 --> 00:35:11,242 FOR POSITIVE CHANGE FROM THE 1033 00:35:11,309 --> 00:35:12,376 REFERENCE GROUP. 1034 00:35:12,443 --> 00:35:14,178 AND 18, ROUGHLY 18 BEATS PER 1035 00:35:14,245 --> 00:35:16,581 MINUTE HIGHER FOR ST UPSLOPES 1036 00:35:16,647 --> 00:35:17,782 COMPARED TO FLAT ST HAS THE 1037 00:35:17,849 --> 00:35:19,217 REFERENCE GROUP. 1038 00:35:19,283 --> 00:35:20,485 YOU CAN SEE THAT ACTUALLY THAT 1039 00:35:20,551 --> 00:35:21,886 CHANGES SIGNIFICANT, THAT IS 1040 00:35:21,953 --> 00:35:24,088 NOTED HERE ON THIS SLIDE. 1041 00:35:24,155 --> 00:35:26,324 WHAT WE CAN DO WHEN WE HAVE 1042 00:35:26,390 --> 00:35:28,726 VARIABLES WITH THREE OR MORE 1043 00:35:28,793 --> 00:35:34,398 LEVELS, WE CAN TALK ABOUT LIKE 1044 00:35:34,465 --> 00:35:35,867 IN THE ENOVA FRAMEWORK, IF YOU 1045 00:35:35,933 --> 00:35:38,603 WANT TO SAY, F STATISTICS OF THE 1046 00:35:38,669 --> 00:35:40,204 EFFECT OVER ALL OF THAT 1047 00:35:40,271 --> 00:35:40,471 VARIABLE. 1048 00:35:40,538 --> 00:35:42,039 THIS CAN BE A USEFUL WAY TO DO 1049 00:35:42,106 --> 00:35:43,908 SOME OF THE REPORTING OF 1050 00:35:43,975 --> 00:35:45,576 VARIABLES THAT HAVE MULTIPLE 1051 00:35:45,643 --> 00:35:46,310 LEVELS. 1052 00:35:46,377 --> 00:35:50,915 BECAUSE NOW, WE CAN SAY THAT ST 1053 00:35:50,982 --> 00:35:53,384 SLOPE OVERALL, WHICH IS TWO 1054 00:35:53,451 --> 00:35:54,085 DEGREES OF FREEDOM, THREE 1055 00:35:54,151 --> 00:35:55,953 LETHES, IS SIGNIFICANT, WHAT WE 1056 00:35:56,020 --> 00:36:00,091 CAN DO IS, IF WE WANTED ALL 1057 00:36:00,157 --> 00:36:03,661 PAIRWISE COMPARISONS OR IF WE 1058 00:36:03,728 --> 00:36:05,062 STEP ASIDE TO DO PAIRWISE 1059 00:36:05,129 --> 00:36:07,632 COMPARISONS BETWEEN EACH OF 1060 00:36:07,698 --> 00:36:09,000 THESE LEVELS, PERHAPS WE SET UP 1061 00:36:09,066 --> 00:36:10,568 THE REGRESSION IS SUCH THAT WE 1062 00:36:10,635 --> 00:36:12,403 ARE ONLY CARE ABOUT COMP PEERING 1063 00:36:12,470 --> 00:36:14,005 EACH OF THESE TWO FLAT ST SLOPES 1064 00:36:14,071 --> 00:36:15,706 IN THE WAY THAT I ALREADY 1065 00:36:15,773 --> 00:36:16,541 SPECIFIED IT. 1066 00:36:16,607 --> 00:36:18,075 MAYBE WE ALSO JUST WANT TO 1067 00:36:18,142 --> 00:36:18,576 REPORT THIS ONE. 1068 00:36:18,643 --> 00:36:20,077 DEPENDING ON WHAT YOUR GOALS 1069 00:36:20,144 --> 00:36:26,984 ARE, YOU CAN DO IT EITHER WAY. 1070 00:36:27,051 --> 00:36:29,287 AND FOR LAST EXAMPLE I WANT TO 1071 00:36:29,353 --> 00:36:30,087 TALK ABOUT, AGAIN, KIND OF 1072 00:36:30,154 --> 00:36:33,057 GETTING BACK TO A SMALLER MODEL, 1073 00:36:33,124 --> 00:36:36,093 BUT ADDING COMPLEXITY IN TERMS 1074 00:36:36,160 --> 00:36:36,694 OF ADDING AN INTERACTION TERM, 1075 00:36:36,761 --> 00:36:38,963 WHAT WE CAN ASK OF A MODEL, TOO, 1076 00:36:39,030 --> 00:36:40,064 IS THERE A DIFFERENT 1077 00:36:40,131 --> 00:36:42,533 RELATIONSHIP BETWEEN AGE AND 1078 00:36:42,600 --> 00:36:43,301 HEART RATE DEPENDING ON WHETHER 1079 00:36:43,367 --> 00:36:44,235 THE INDIVIDUAL HAS HEART 1080 00:36:44,302 --> 00:36:44,602 DISEASE. 1081 00:36:44,669 --> 00:36:47,972 SO I PUT THE GRAPHS HERE, 1082 00:36:48,039 --> 00:36:48,673 GRAPHED SEPARATELY BY HEART 1083 00:36:48,739 --> 00:36:50,841 DISEASE STATUS FOR THE 1084 00:36:50,908 --> 00:36:51,142 INDIVIDUAL. 1085 00:36:51,208 --> 00:36:52,677 AND AGAIN, DOING AN INDIVIDUAL 1086 00:36:52,743 --> 00:36:53,311 INSPECTOR TO SEE WHAT IS GOING 1087 00:36:53,377 --> 00:36:55,646 ON AND WHETHER WE HAVE BEHAVIOR 1088 00:36:55,713 --> 00:36:58,950 THAT IS EASILY EXPLAINABLE BY A 1089 00:36:59,016 --> 00:36:59,250 REGRESSION. 1090 00:36:59,317 --> 00:37:01,586 WE WRITE THE MODEL AS FOLLOWS. 1091 00:37:01,652 --> 00:37:04,355 SO WE HAVE THE INTERCEPT AS 1092 00:37:04,422 --> 00:37:06,123 BETA-0, OR BETA-1 AGE. 1093 00:37:06,190 --> 00:37:08,292 OR BETA-2, WHICH IS ATTACHED TO 1094 00:37:08,359 --> 00:37:09,126 OUR INDICATOR VARIABLE OF 1095 00:37:09,193 --> 00:37:10,861 WHETHER OR NOT A PERSON HAS 1096 00:37:10,928 --> 00:37:11,395 HEART DISEASE. 1097 00:37:11,462 --> 00:37:14,465 AND THEN OUR BETA-3, WHICH IS 1098 00:37:14,532 --> 00:37:18,603 ATTACHED TO THE INTERACTION, THE 1099 00:37:18,669 --> 00:37:19,570 INTERACTION BETWEEN THOSE TWO 1100 00:37:19,637 --> 00:37:19,804 PIECES. 1101 00:37:19,870 --> 00:37:23,040 AND I JUST WANT TO OUTLINE 1102 00:37:23,107 --> 00:37:25,009 SPECIFICALLY HOW THIS GIVES US 1103 00:37:25,076 --> 00:37:26,410 TWO DIFFERENT SLOPES. 1104 00:37:26,477 --> 00:37:29,814 SO, WE, BECAUSE OF THE WAY THAT 1105 00:37:29,880 --> 00:37:31,015 THESE INDICATOR VARIABLES BREAK 1106 00:37:31,082 --> 00:37:33,818 THE EQUATION APART, WE END UP 1107 00:37:33,884 --> 00:37:34,986 WITH TWO EQUATIONS FOR THOSE 1108 00:37:35,052 --> 00:37:35,586 WITH HEART DISEASE OR WITH OR 1109 00:37:35,653 --> 00:37:36,487 WITHOUT HEART DISEASE. 1110 00:37:36,554 --> 00:37:39,590 SO THE EQUATION FOR THOSE WITH 1111 00:37:39,657 --> 00:37:41,459 HEART DISEASE, THESE INDICATOR 1112 00:37:41,525 --> 00:37:42,059 PIECES TURN TO ONE. 1113 00:37:42,126 --> 00:37:47,131 AND SO WE HAVE BETA-0 PLUS 1114 00:37:47,198 --> 00:37:49,166 BETA-1 TIMES IN A PLUS BETA-2 1115 00:37:49,233 --> 00:37:52,069 PLUS BETA-3 TIMES IN A. 1116 00:37:52,136 --> 00:37:54,372 WE RESHUFFLE IT USING OUR 1117 00:37:54,438 --> 00:37:57,341 ALGEBRA AT, WE GET BETA-0 PLUS 1118 00:37:57,408 --> 00:37:59,777 BETA-2, PLUS BETA-1 PLUS BETA-3 1119 00:37:59,844 --> 00:38:00,611 BOTH TIMES AGE. 1120 00:38:00,678 --> 00:38:02,913 WE CAN THINK OF IT THIS AS BEING 1121 00:38:02,980 --> 00:38:04,448 OUR AGE SLOPE. 1122 00:38:04,515 --> 00:38:05,483 THE EQUATION FOR THOSE WITHOUT 1123 00:38:05,549 --> 00:38:08,352 HEART DISEASE, WE HAVE THE 1124 00:38:08,419 --> 00:38:08,686 ZEROS. 1125 00:38:08,753 --> 00:38:10,888 NOT INDICATING HEART DISEASE. 1126 00:38:10,955 --> 00:38:13,958 AND SO, BOTH BETA-2 AND BETA-3 1127 00:38:14,025 --> 00:38:16,093 DROPOUT OF THE CALCULATION. 1128 00:38:16,160 --> 00:38:18,696 SO WE'RE JUST LEFT WITH BETA-0 1129 00:38:18,763 --> 00:38:21,399 PLUS BETA-1 TIMES IN A. 1130 00:38:21,465 --> 00:38:24,068 SO HIGHLIGHTED IN RED, WE CAN 1131 00:38:24,135 --> 00:38:24,835 ACTUALLY SEE ALGEBRAICALLY, WE 1132 00:38:24,902 --> 00:38:26,637 HAVE IN THOSE WITHOUT HEART 1133 00:38:26,704 --> 00:38:27,972 DISEASE, A BETA-1 SLOPE FOR AGE. 1134 00:38:28,039 --> 00:38:29,907 AND THOSE WITH HEART DISEASE, WE 1135 00:38:29,974 --> 00:38:32,410 HAVE BETA-1 PLUS SOME OTHER UNIT 1136 00:38:32,476 --> 00:38:33,444 TO DESCRIBE THE CHANGE IN THE 1137 00:38:33,511 --> 00:38:35,746 SLOPE FOR THOSE WHO HAVE HEART 1138 00:38:35,813 --> 00:38:35,980 DISEASE. 1139 00:38:36,047 --> 00:38:38,783 SO THAT'S, THAT'S HOW WE DO IT 1140 00:38:38,849 --> 00:38:40,217 MATHEMATICALLY, THAT IS HOW IT 1141 00:38:40,284 --> 00:38:42,887 WILL APPEAR IN THE OUTPUT. 1142 00:38:42,953 --> 00:38:45,222 WE HAVE BETA-0, BETA-1, BETA-2, 1143 00:38:45,289 --> 00:38:47,658 BETA-3 FOR EACH IN THE ROWS OF 1144 00:38:47,725 --> 00:38:49,994 THE OUTPUT. 1145 00:38:50,061 --> 00:38:50,928 IN R THE INTERACTION TERM IS 1146 00:38:50,995 --> 00:38:52,196 NOTE WOULD THIS LITTLE COLON 1147 00:38:52,263 --> 00:38:53,664 SEPARATING THE TWO VARIABLES. 1148 00:38:53,731 --> 00:38:55,599 AGAIN, THIS MIGHT VARY DEPENDING 1149 00:38:55,666 --> 00:38:56,834 ON YOUR SOFTWARE AND BOTH OF THE 1150 00:38:56,901 --> 00:38:58,936 MAIN EFFECTS. 1151 00:38:59,003 --> 00:39:01,005 WE CAN SEE THE INTERACTION TERM 1152 00:39:01,072 --> 00:39:04,008 IS SIGNIFICANT HERE IN THAT EACH 1153 00:39:04,075 --> 00:39:05,309 GROUP IS WELL ESTIMATE TODAY 1154 00:39:05,376 --> 00:39:07,478 HAVE THEIR OWN SLOPE. 1155 00:39:07,545 --> 00:39:09,647 THIS IS A BENEFICIAL ESTIMATION 1156 00:39:09,714 --> 00:39:10,014 TO DO. 1157 00:39:10,081 --> 00:39:12,983 THAT'S A CHANGE IN THE SLOPE 1158 00:39:13,050 --> 00:39:15,119 BETWEEN GROUPS IS .6 9-1-1 BEATS 1159 00:39:15,186 --> 00:39:16,120 PER MINUTE PER YEAR. 1160 00:39:16,187 --> 00:39:18,756 YOU CAN SEE IT VISUALLY HERE. 1161 00:39:18,823 --> 00:39:21,392 THAT, THAT THE SLOPE IS LESS 1162 00:39:21,459 --> 00:39:23,360 NEGATIVE IN THOSE WITH HEART 1163 00:39:23,427 --> 00:39:24,161 DISEASE THAT ACTUALLY THE 1164 00:39:24,228 --> 00:39:25,496 ESTIMATION OF THE SLOPE IS MORE 1165 00:39:25,563 --> 00:39:27,898 POSITIVE IN THOSE WITH HEART 1166 00:39:27,965 --> 00:39:31,736 DISEASE COMPARED TO NO HEART 1167 00:39:31,802 --> 00:39:32,036 DISEASE. 1168 00:39:32,103 --> 00:39:35,539 SO AGAIN, JUST TO BE SPECIFIC, 1169 00:39:35,606 --> 00:39:36,107 IN INDIVIDUALS WITHOUT HEART 1170 00:39:36,173 --> 00:39:37,942 DISEASE, THE MAXIMUM HEART RATE 1171 00:39:38,008 --> 00:39:42,713 ACHIEVED IS ETC. MATED TO BE 1172 00:39:42,780 --> 00:39:42,947 1.0525. 1173 00:39:43,013 --> 00:39:44,682 THIS BETA-1 HERE, LOWER FOR 1174 00:39:44,749 --> 00:39:45,783 EVERY YEAR INCREASE WITH AGE. 1175 00:39:45,850 --> 00:39:48,085 EVERY YEAR INCREASE IN AGE. 1176 00:39:48,152 --> 00:39:49,420 AND INDIVIDUALS WITH HEART 1177 00:39:49,487 --> 00:39:52,089 DISEASE, AGAIN, WE HAVE THIS 1178 00:39:52,156 --> 00:39:56,127 BETA-1 PLUS BETA-3, TO GET OUR 1179 00:39:56,193 --> 00:39:59,663 NEW ESTIMATED SLOPE, AND THIS 1180 00:39:59,730 --> 00:40:01,599 SLOPE IS NEGATIVE .3614 BEATS 1181 00:40:01,665 --> 00:40:04,535 PER MINUTE PER YEAR INCREASE IN 1182 00:40:04,602 --> 00:40:04,969 AGE. 1183 00:40:05,035 --> 00:40:08,339 SO, AGAIN, JUST FORMALIZING IT 1184 00:40:08,405 --> 00:40:08,806 HERE. 1185 00:40:08,873 --> 00:40:11,909 AND IN SOME SOFTWARE AND IN R, 1186 00:40:11,976 --> 00:40:13,644 YOU CAN ACTUALLY ASK THE 1187 00:40:13,711 --> 00:40:15,446 SOFTWARE TO GIVE YOU THESE 1188 00:40:15,513 --> 00:40:17,081 NUMBERS SPECIFICALLY SO YOU 1189 00:40:17,148 --> 00:40:19,083 DON'T HAVE TO CALCULATE IT. 1190 00:40:19,150 --> 00:40:21,185 WITH THIS COMES THE CONVENIENCE 1191 00:40:21,252 --> 00:40:22,052 OF ACTUALLY HAVE A SIGNIFICANT 1192 00:40:22,119 --> 00:40:24,088 TEST FOR EACH SLOPE. 1193 00:40:24,155 --> 00:40:25,022 SO HERE FORMALLY, WE CAN SEE 1194 00:40:25,089 --> 00:40:27,725 THAT THE NO HEART DISEASE GROUP 1195 00:40:27,792 --> 00:40:29,426 STILL DOES HAVE THIS STIG TIFF 1196 00:40:29,493 --> 00:40:30,027 ACCIDENT LATE NEGATIVE 1197 00:40:30,094 --> 00:40:30,728 RELATIONSHIP WITH HANE AND HEART 1198 00:40:30,795 --> 00:40:31,328 RATE. 1199 00:40:31,395 --> 00:40:35,099 AND IN THE HEART DISEASE GROUP, 1200 00:40:35,166 --> 00:40:36,967 WHILE ALSO, WHILE BEING 1201 00:40:37,034 --> 00:40:39,270 DIFFERENT IT ALSO IS, NOT 1202 00:40:39,336 --> 00:40:40,571 SIGNIFICANTLY DIFFERENT FROM 1203 00:40:40,638 --> 00:40:40,771 ZERO. 1204 00:40:40,838 --> 00:40:44,074 SO THEY'RE IN THE HEART DISEASE 1205 00:40:44,141 --> 00:40:44,909 GROUP, THIS NEGATIVE SLOPE IS 1206 00:40:44,975 --> 00:40:46,410 ACTUALLY NOT STATISTICAL 1207 00:40:46,477 --> 00:40:48,279 SIGNIFICANT FROM THE ZERO. 1208 00:40:48,345 --> 00:40:50,047 SO, THIS CAN BE A HELPFUL WAY TO 1209 00:40:50,114 --> 00:40:54,451 FRAME THE RESULTS AS WELL. 1210 00:40:54,518 --> 00:40:57,454 SO, I JUST WANTED TO WALK 1211 00:40:57,521 --> 00:40:59,223 THROUGH A COUPLE LIKE MODEL 1212 00:40:59,290 --> 00:41:01,025 BUILDING FRAMEWORK OR POSSIBLE 1213 00:41:01,091 --> 00:41:02,827 ISSUES THAT MAY COME UP TO THINK 1214 00:41:02,893 --> 00:41:06,463 ABOUT WHILE YOU BUILD YOUR 1215 00:41:06,530 --> 00:41:06,697 MODELS. 1216 00:41:06,764 --> 00:41:08,999 SO, WE OBVIOUSLY RAN THROUGH A 1217 00:41:09,066 --> 00:41:11,635 BUNCH OF EXAMPLES, IT CAN BE 1218 00:41:11,702 --> 00:41:12,703 KIND OF OVERWHELMING TO THINK 1219 00:41:12,770 --> 00:41:13,938 ABOUT HOW TO BUILD A MODEL. 1220 00:41:14,004 --> 00:41:15,573 I THINK IT IS REALLY IMPORTANT 1221 00:41:15,639 --> 00:41:16,907 TO LOOK AT YOUR KEY QUESTION OF 1222 00:41:16,974 --> 00:41:17,174 INTEREST. 1223 00:41:17,241 --> 00:41:18,943 BE VERY CLEAR ABOUT THAT IN 1224 00:41:19,009 --> 00:41:20,444 TERMS OF WHAT YOUR OUTCOME 1225 00:41:20,511 --> 00:41:22,847 MEASURE IS. 1226 00:41:22,913 --> 00:41:24,315 DO WE HAVE A KEY PREDICTOR 1227 00:41:24,381 --> 00:41:27,451 VARIABLE OF INTEREST? 1228 00:41:27,518 --> 00:41:28,586 DO WE HAVE ANY CONFOUNDING 1229 00:41:28,652 --> 00:41:31,222 VARIABLES THAT WE KNOW FROM THE 1230 00:41:31,288 --> 00:41:32,223 LITERATURE OR PRIOR RESEARCH 1231 00:41:32,289 --> 00:41:33,591 THAT WE DEFINITELY WANT TO 1232 00:41:33,657 --> 00:41:33,824 INCLUDE? 1233 00:41:33,891 --> 00:41:34,925 THINKING ABOUT ALL OF THOSE 1234 00:41:34,992 --> 00:41:37,127 THINGS IN ADVANCE ARE GOING TO 1235 00:41:37,194 --> 00:41:40,097 BE HELPFUL TO HELP TARGET YOUR 1236 00:41:40,164 --> 00:41:43,200 BUILDING OF THE MODEL. 1237 00:41:43,267 --> 00:41:46,437 YOU SHOULD ASK ARE WE INCLUDING 1238 00:41:46,503 --> 00:41:47,905 ALL RELEVANT VARIABLES AND 1239 00:41:47,972 --> 00:41:50,207 LEANING ON THE LITERATURE 1240 00:41:50,274 --> 00:41:51,508 PERHAPS AND THE UNDERSTANDING OF 1241 00:41:51,575 --> 00:41:52,877 WHAT IS RELATED TO YOUR OUTCOME 1242 00:41:52,943 --> 00:41:55,179 OF INTEREST. 1243 00:41:55,246 --> 00:41:58,949 AND MORE IMPORTANTLY, ARE WE 1244 00:41:59,016 --> 00:41:59,483 COMPARING VARIABLE 1245 00:41:59,550 --> 00:41:59,884 INTERRELATEDNESS. 1246 00:41:59,950 --> 00:42:01,051 WE CAN CAUSE ISSUE WITH MODEL 1247 00:42:01,118 --> 00:42:03,354 FIT WHEN WE HAVE A BUNCH OF X 1248 00:42:03,420 --> 00:42:05,256 VARIABLES THAT ARE ALL HIGHLY 1249 00:42:05,322 --> 00:42:06,357 CORRELATED WITH EACH OTHER. 1250 00:42:06,423 --> 00:42:08,292 WE DEFINITELY WANT TO MAKE SURE 1251 00:42:08,359 --> 00:42:10,861 THAT WE'RE AVOIDING ANY ISSUES 1252 00:42:10,928 --> 00:42:11,695 ANY OF OUR INDEPENDENT VARIABLES 1253 00:42:11,762 --> 00:42:14,798 BEING TOO HIGHLY CORRELATED OR 1254 00:42:14,865 --> 00:42:16,300 PERHAPS EVEN EXACTLY CORRELATED 1255 00:42:16,367 --> 00:42:18,302 IN WHICH WE WOULD HAVE ISSUES 1256 00:42:18,369 --> 00:42:19,536 WITH MODEL CONVERGENCE, THE 1257 00:42:19,603 --> 00:42:20,537 MODEL WANT RUN, OR ESTIMATE 1258 00:42:20,604 --> 00:42:22,072 THOSE ASPECTS OF THE MODEL. 1259 00:42:22,139 --> 00:42:24,275 THOSE ARE ALSO THINGS TO 1260 00:42:24,341 --> 00:42:24,541 CONSIDER. 1261 00:42:24,608 --> 00:42:27,144 FOR SCENARIO WHERE WE DO HAVE A 1262 00:42:27,211 --> 00:42:29,213 COMPLEX SET OF VARIABLES OR A 1263 00:42:29,280 --> 00:42:30,347 LOT OF VARIABLES, WE CAN 1264 00:42:30,414 --> 00:42:33,083 CONSIDER THINGS SUCH AS STEPWISE 1265 00:42:33,150 --> 00:42:36,453 REGRESSION OR OTHER SORTS OF 1266 00:42:36,520 --> 00:42:37,955 VARIABLE SELECTION TECHNIQUES TO 1267 00:42:38,022 --> 00:42:38,322 BUILD THE MODEL. 1268 00:42:38,389 --> 00:42:40,224 I WOULD SAY MOST OF THE TIME, WE 1269 00:42:40,291 --> 00:42:41,425 DO WANT A LITTLE BIT MORE 1270 00:42:41,492 --> 00:42:42,993 CONTROL OVER THE VARIABLES THAT 1271 00:42:43,060 --> 00:42:43,694 WE INCLUDE. 1272 00:42:43,761 --> 00:42:45,229 SO I WOULD SAY MOST CASES THAT 1273 00:42:45,296 --> 00:42:49,066 MIGHT BE KIND OF A LAST RESORT. 1274 00:42:49,133 --> 00:42:50,901 YOU, I THINK, IN MOST CASES WE 1275 00:42:50,968 --> 00:42:53,037 KNOW WHAT VARIABLES WE WANT TO 1276 00:42:53,103 --> 00:42:55,072 INCLUDE AND PERHAPS WE'RE ONLY 1277 00:42:55,139 --> 00:42:56,073 LIMITED BY SAMPLE SIZE IN TERMS 1278 00:42:56,140 --> 00:42:58,208 OF BEING ABLE TO HAVE A BIGGER 1279 00:42:58,275 --> 00:42:59,777 MODEL. 1280 00:42:59,843 --> 00:43:01,145 YOU CAN GO THROUGHERTERATIONS OF 1281 00:43:01,211 --> 00:43:04,048 MODEL BUILDING WITHIN REASON TO 1282 00:43:04,114 --> 00:43:05,516 EXPLORE MODEL FIT AND FINE-TUNE 1283 00:43:05,582 --> 00:43:06,216 WHAT VARIABLES YOU WANT TO 1284 00:43:06,283 --> 00:43:07,251 INCLUDE. 1285 00:43:07,318 --> 00:43:12,890 I THINK MOST OF THE WITH ERK 1286 00:43:12,957 --> 00:43:16,994 THAT WE DO IN NIMs AS FAR AS 1287 00:43:17,061 --> 00:43:18,862 REGRESSION IS CONCERNED, WHEN WE 1288 00:43:18,929 --> 00:43:21,765 GET TO THAT POINT, WE'RE TALKING 1289 00:43:21,832 --> 00:43:22,333 ABOUT EXPLORATORY ANALYSIS. 1290 00:43:22,399 --> 00:43:23,167 IF YOU FEEL YOU ARE ABLE TO LOOK 1291 00:43:23,233 --> 00:43:25,869 AT THE DATA AND THING ABOUT THE 1292 00:43:25,936 --> 00:43:26,070 MODEL. 1293 00:43:26,136 --> 00:43:29,206 I THINK THE KEY HERE IS THAT BE 1294 00:43:29,273 --> 00:43:29,873 MINDFUL THAT YOU SHOULD PROBABLY 1295 00:43:29,940 --> 00:43:32,576 STILL BE REPORTING ALL OF WHAT 1296 00:43:32,643 --> 00:43:34,178 YOU'RE INVESTIGATING, YOU KNOW 1297 00:43:34,244 --> 00:43:35,980 AT THE BOTTOM, IN TERMS OF THE 1298 00:43:36,046 --> 00:43:37,648 THINGS THAT YOU TRIED. 1299 00:43:37,715 --> 00:43:39,016 THE ISSUES THAT YOU RAN INTO. 1300 00:43:39,083 --> 00:43:41,485 WHY YOU INCLUDED VARIABLES AND 1301 00:43:41,552 --> 00:43:42,019 NOT OTHERS. 1302 00:43:42,086 --> 00:43:43,487 THE FIRST MODEL THAT YOU 1303 00:43:43,554 --> 00:43:45,489 SPECIFIED OR THE 50th. 1304 00:43:45,556 --> 00:43:47,024 SO BE MINDFUL THAT YOU DO WANT 1305 00:43:47,091 --> 00:43:49,393 TO SPEAK TO SOME OF THAT AS 1306 00:43:49,460 --> 00:43:49,693 WELL. 1307 00:43:49,760 --> 00:43:51,295 AS FAR AS PUBLISHING RESULTS AND 1308 00:43:51,362 --> 00:43:54,164 COMING UP WITH THE FINAL MODEL. 1309 00:43:54,231 --> 00:43:56,133 AND WE ALSO NEED TO THINK ABOUT 1310 00:43:56,200 --> 00:43:57,668 WHAT OUR SAMPLE SIZE IS AND HOW 1311 00:43:57,735 --> 00:44:00,004 MANY VARIABLES WE CAN INCLUDE. 1312 00:44:00,070 --> 00:44:01,071 THERE ARE VARIETY OF RULES OF 1313 00:44:01,138 --> 00:44:01,872 SUM FOR THIS. 1314 00:44:01,939 --> 00:44:04,074 JUST BE MINDFUL, IF YOU DON'T 1315 00:44:04,141 --> 00:44:05,142 HAVE A HUGE SAMPLE SIZE, YOU 1316 00:44:05,209 --> 00:44:06,744 CAN'T RUN A HUGE MODEL. 1317 00:44:06,810 --> 00:44:10,514 AND EVERY VARIABLE YOU ADD IS 1318 00:44:10,581 --> 00:44:11,815 GOING TO PULL POWER AND IF WE 1319 00:44:11,882 --> 00:44:13,450 DON'T HAVE ENOUGH PEOPLE TO WELL 1320 00:44:13,517 --> 00:44:13,717 ESTIMATE? 1321 00:44:13,784 --> 00:44:16,086 OF THESE THINGS, IT CAN BE 1322 00:44:16,153 --> 00:44:19,089 DIFFICULT TO HAVE A WELL-BEHAVED 1323 00:44:19,156 --> 00:44:19,656 MODEL. 1324 00:44:19,723 --> 00:44:21,358 SHOW JUST BE MINDFUL, WE CANNOT 1325 00:44:21,425 --> 00:44:22,693 RUN MODELS WITH 20 VARIABLES 1326 00:44:22,760 --> 00:44:28,165 WHEN WE HAVE 20 PEOPLE TO BE 1327 00:44:28,232 --> 00:44:29,233 DRAMATIC HERE. 1328 00:44:29,299 --> 00:44:32,169 SO LET'S QUICKLY GO THROUGH SOME 1329 00:44:32,236 --> 00:44:34,071 RESULTS REPORTING FOR EACH OF 1330 00:44:34,138 --> 00:44:34,405 THE EXAMPLES. 1331 00:44:34,471 --> 00:44:36,240 I WANT TO DO THIS. 1332 00:44:36,306 --> 00:44:38,475 WE TALKED ABOUT THE WAYS THAT WE 1333 00:44:38,542 --> 00:44:39,877 CAN SAY SOME OF THESE THINGS, 1334 00:44:39,943 --> 00:44:41,912 BUT I JUST WANT TO BE SPECIFIC 1335 00:44:41,979 --> 00:44:43,514 HOW WE WOULD WRITE THEM. 1336 00:44:43,580 --> 00:44:44,615 I THINK SOMETIMES PEOPLE 1337 00:44:44,681 --> 00:44:45,416 STRUGGLE WITH THIS PART. 1338 00:44:45,482 --> 00:44:48,085 AND AGAIN, SO JUST GOING BACK TO 1339 00:44:48,152 --> 00:44:48,986 THE PREVIOUS EXAMPLE. 1340 00:44:49,053 --> 00:44:50,721 THIS IS OUR SIMPLE LINEAR 1341 00:44:50,788 --> 00:44:51,688 REGRESSION EXAMPLE, WHERE WE HAD 1342 00:44:51,755 --> 00:44:53,157 JUST AGE AND HEART RATE. 1343 00:44:53,223 --> 00:44:54,758 AND HERE ARE OUR LITTLE FIGURES 1344 00:44:54,825 --> 00:44:55,492 FOR THIS. 1345 00:44:55,559 --> 00:44:59,797 SO IN THE METHODS WE CAN WRITE 1346 00:44:59,863 --> 00:45:01,065 SOMETHING LIKE: A SIMPLE LINEAR 1347 00:45:01,131 --> 00:45:02,299 REGRESSION WAS PERFORMED TO 1348 00:45:02,366 --> 00:45:03,667 INVESTIGATE THE RELATIONSHIP 1349 00:45:03,734 --> 00:45:05,369 BETWEEN HANE IN YEARS AND THE 1350 00:45:05,436 --> 00:45:07,171 MAXIMUM HEART RATE ACHIEVED BY 1351 00:45:07,237 --> 00:45:09,573 THE PATIENT IN BEATS PER MINUTE 1352 00:45:09,640 --> 00:45:12,242 HERE ACTUALLY. 1353 00:45:12,309 --> 00:45:14,178 MODEL CHECKING WAS PREFORMED TO 1354 00:45:14,244 --> 00:45:14,878 CHECK RESIDUAL BEHAVIOR. 1355 00:45:14,945 --> 00:45:18,282 LINEAR REGRESSION IN THE RESULTS 1356 00:45:18,348 --> 00:45:19,516 SESSION, YOU CAN WRITE SOMETHING 1357 00:45:19,583 --> 00:45:21,218 LIKE LINEAR REGRESSION SHOWS 1358 00:45:21,285 --> 00:45:22,152 THAT AGE WAS SIGNIFICANTLY 1359 00:45:22,219 --> 00:45:25,756 RELATED TO PLAQUE MEMO HEART 1360 00:45:25,823 --> 00:45:28,092 RATE ACHIEVED WITH A DECLINE OF 1361 00:45:28,158 --> 00:45:31,228 ONE HEART RATE PER MINUTE WITH 1362 00:45:31,295 --> 00:45:33,664 AGE, P LESS THAN 0.001. 1363 00:45:33,730 --> 00:45:36,767 INVESTIGATES OF RESIDUAL 1364 00:45:36,834 --> 00:45:37,301 DISTRIBUTION AND BEHAVIOR, 1365 00:45:37,367 --> 00:45:39,670 INDICATED AT A RELATIVE LENORMAL 1366 00:45:39,736 --> 00:45:40,003 DISTRIBUTION. 1367 00:45:40,070 --> 00:45:41,305 O, AGAIN, I WOULD SAY THAT WE 1368 00:45:41,371 --> 00:45:43,307 VERY RARELY SEE THIS SORT OF 1369 00:45:43,373 --> 00:45:44,108 THING WRITTEN IN PAPERS. 1370 00:45:44,174 --> 00:45:46,376 I DON'T THINK IT IS A BAD THING. 1371 00:45:46,443 --> 00:45:48,078 I THINK YOU SHOULD PROBABLY BE 1372 00:45:48,145 --> 00:45:48,612 DOING IT. 1373 00:45:48,679 --> 00:45:51,215 YOU SHOULD BE INVESTIGATING 1374 00:45:51,281 --> 00:45:53,884 MODEL FIT, BUT YOU WON'T ALWAYS 1375 00:45:53,951 --> 00:45:54,518 BE SPECIFICALLY STATED THAT WAS 1376 00:45:54,585 --> 00:45:55,652 DONE HERE. 1377 00:45:55,719 --> 00:45:57,254 I WOULD SAY, ESPECIALLY WHEN IT 1378 00:45:57,321 --> 00:46:00,624 IS INTUITIVE TO DO SO, 1379 00:46:00,691 --> 00:46:01,291 PRESENTATION MIGHT BE HELPFUL. 1380 00:46:01,358 --> 00:46:02,359 IF THE MODEL IS LONG, WHICH IN 1381 00:46:02,426 --> 00:46:04,628 THIS CASE IT IS NOT, YOU CAN 1382 00:46:04,695 --> 00:46:06,063 ALSO INCLUDE A TABLE OF THE 1383 00:46:06,130 --> 00:46:07,364 ESTIMATES AND THE STANDARD 1384 00:46:07,431 --> 00:46:09,933 ERRORS AND REFER TO IT, VERSUS 1385 00:46:10,000 --> 00:46:11,368 IT ALL WRITTEN OUT IN TEXT. 1386 00:46:11,435 --> 00:46:13,170 BUT WE WILL GET TO THAT, TOO. 1387 00:46:13,237 --> 00:46:15,739 BUT IN SOME CASES A FIGURE MAY 1388 00:46:15,806 --> 00:46:19,143 ALSO BE HELPFUL WITH THE MODEL 1389 00:46:19,209 --> 00:46:19,910 FIT ATTACHED. 1390 00:46:19,977 --> 00:46:22,613 AND AGAIN, HERE I'M SHOWING THE 1391 00:46:22,679 --> 00:46:25,149 ESTIMATE AND ERROR. 1392 00:46:25,215 --> 00:46:26,350 YOU CAN ALSO SHOW CONFIDENCE 1393 00:46:26,416 --> 00:46:27,217 INTERVALS IF YOU LIKE. 1394 00:46:27,284 --> 00:46:29,386 BUT SOME MEASURE OF VARIABLES. 1395 00:46:29,453 --> 00:46:31,155 AND CONFIDENCE INTERVALS CAN BE 1396 00:46:31,221 --> 00:46:34,191 EXTRACTED FROM THIS DATA. 1397 00:46:34,258 --> 00:46:38,228 SO IN OUR ADDED COVARIANT 1398 00:46:38,295 --> 00:46:40,264 EXAMPLE, WE HAVE MORE COVARIANTS 1399 00:46:40,330 --> 00:46:43,233 TO TALK ABOUT, WE WOULD 1400 00:46:43,300 --> 00:46:46,170 ENUMERATE ALL OF THOSE THINGS IN 1401 00:46:46,236 --> 00:46:47,771 THE METHOD, A LINEAR REGRESSION 1402 00:46:47,838 --> 00:46:48,906 WAS PREFORMED TO LOOK AT THE AGE 1403 00:46:48,972 --> 00:46:52,075 IN YEARS AND THE MAXIMUM HEART 1404 00:46:52,142 --> 00:46:54,444 RATE BY PATIENTS, OIDATIONAL 1405 00:46:54,511 --> 00:46:56,313 COVARIANCES WERE FASTING BLOOD 1406 00:46:56,380 --> 00:46:58,615 SUGAR, AND GREATER OR LESS THAN 1407 00:46:58,682 --> 00:47:01,018 120 MILLIGRAMS PER DECILITER. 1408 00:47:01,084 --> 00:47:02,252 AND CHOLESTEROL, CONTINUOUS. 1409 00:47:02,319 --> 00:47:04,254 I GUESS, THAT WOULD BE IMPLIED 1410 00:47:04,321 --> 00:47:09,660 BY THE WAY I WROTE THAT. 1411 00:47:09,726 --> 00:47:11,895 AND ST SLOPE TYPE. 1412 00:47:11,962 --> 00:47:15,732 , AND MODEL CHECK S WAS FORMED. 1413 00:47:15,799 --> 00:47:17,734 I DIDN'T SHOW THAT ON THE 1414 00:47:17,801 --> 00:47:19,169 PRESENTATION, BUT I DID IT. 1415 00:47:19,236 --> 00:47:21,238 SO IN THE RESULTS WE WOULD WRITE 1416 00:47:21,305 --> 00:47:22,239 LINEAR REGRESSION SHOWED THAT 1417 00:47:22,306 --> 00:47:23,807 AFTER CONTROLLING FOR OTHER 1418 00:47:23,874 --> 00:47:26,777 VARIABLES THE HANE WAS 1419 00:47:26,843 --> 00:47:30,747 SIGNIFICANTLY RELATE TO MAXIMUM 1420 00:47:30,814 --> 00:47:31,848 HEART RATE ACHIEVED WITH DECLINE 1421 00:47:31,915 --> 00:47:36,086 OF .86, NEGATIVE SCHLEP, 1422 00:47:36,153 --> 00:47:37,254 STANDARD ERROR INCLUDED, FOR 1423 00:47:37,321 --> 00:47:39,223 EVERY YEAR IN PATIENT AGE. 1424 00:47:39,289 --> 00:47:40,324 ST SLOPE WAS SIGNIFICANTLY 1425 00:47:40,390 --> 00:47:41,992 RELATED TO HEART RATE. 1426 00:47:42,059 --> 00:47:46,129 THIS IS WHERE I'M PULLING IN THE 1427 00:47:46,196 --> 00:47:51,168 OTHER TABLE THAT I SHOWED. 1428 00:47:51,235 --> 00:47:52,869 IT MAKES IT NICER TO WRITE. 1429 00:47:52,936 --> 00:47:54,071 SO ST SLOPE WAS RELATED TO THE 1430 00:47:54,137 --> 00:47:55,405 HEART RATE. 1431 00:47:55,472 --> 00:47:57,507 AND THEN, WE CAN GO ON TO SAY, 1432 00:47:57,574 --> 00:48:02,813 SPECIFICALLY PATIENTS WITH AN 1433 00:48:02,879 --> 00:48:03,947 UPSLOPING ST SEGMENT SHOWED 18.8 1434 00:48:04,014 --> 00:48:05,882 BEATS HIGHER HEART RATE, FROM 1435 00:48:05,949 --> 00:48:07,818 THIS GUY, THEN THOSE WITH FLAT 1436 00:48:07,884 --> 00:48:09,152 SEGMENTS ON AVERAGE. 1437 00:48:09,219 --> 00:48:10,220 AND, AGAIN, INCLUDING THE 1438 00:48:10,287 --> 00:48:12,823 STANDARD ERROR. 1439 00:48:12,889 --> 00:48:14,958 AND THE P-VALUE. 1440 00:48:15,025 --> 00:48:17,527 SPF AND CHOLESTEROL WERE NOT 1441 00:48:17,594 --> 00:48:18,629 SIGNIFICANTLY RELATED TO HEART 1442 00:48:18,695 --> 00:48:19,329 RATE. 1443 00:48:19,396 --> 00:48:20,631 INVESTIGATIONS OF RESIDUAL 1444 00:48:20,697 --> 00:48:22,332 DISTRIBUTION AND BEHAVIOR 1445 00:48:22,399 --> 00:48:26,970 INDICATED A RELATIVELY NORMAL 1446 00:48:27,037 --> 00:48:27,404 DISTRIBUTION. 1447 00:48:27,471 --> 00:48:29,840 THEN FOR THE INTERACTION EXAMPLE 1448 00:48:29,906 --> 00:48:32,075 THAT WE RAN THROUGH, AGAIN, WE 1449 00:48:32,142 --> 00:48:33,710 WANT TO JUST BE CLEAR HOW WERE 1450 00:48:33,777 --> 00:48:35,512 SPECIFYING THE PIECES OF THE 1451 00:48:35,579 --> 00:48:36,213 MODEL AND INCLUDED THE 1452 00:48:36,280 --> 00:48:37,180 INTERACTION TERM. 1453 00:48:37,247 --> 00:48:40,884 SO WE WOULD SAY TO INVESTIGATION 1454 00:48:40,951 --> 00:48:42,586 WHETHER THIS RELATIONSHIP VARIES 1455 00:48:42,653 --> 00:48:44,321 BY HEART DISEASE DIAGNOSES AN 1456 00:48:44,388 --> 00:48:44,988 INVESTIGATION TERM FOR HEART 1457 00:48:45,055 --> 00:48:50,260 DISEASE WAS ALSO INCLUDED IN THE 1458 00:48:50,327 --> 00:48:50,460 MODEL. 1459 00:48:50,527 --> 00:48:53,997 AND WE WOULD REPORT THE RESULTS 1460 00:48:54,064 --> 00:48:54,998 AS FOLLOWS. 1461 00:48:55,065 --> 00:48:56,233 LINEAR REGRESSION SHOWED THAT 1462 00:48:56,300 --> 00:48:57,834 THE INTERACTION BETWEEN AGE AND 1463 00:48:57,901 --> 00:48:58,435 HEART DISEASE WAS SIGNIFICANT 1464 00:48:58,502 --> 00:49:00,604 FOR PREDICTING HEART RATE. 1465 00:49:00,671 --> 00:49:02,205 AND FIRST, I'M CHOOSING TO 1466 00:49:02,272 --> 00:49:03,073 REPORT THE ACTUAL ESTIMATED 1467 00:49:03,140 --> 00:49:03,573 CHANGE IN SLOPE. 1468 00:49:03,640 --> 00:49:05,676 THIS IS THE FIRST PIECE THAT 1469 00:49:05,742 --> 00:49:09,079 TELLS US THAT THE SLOPE, THE 1470 00:49:09,146 --> 00:49:09,913 INTERACTION IS SIGNIFICANT. 1471 00:49:09,980 --> 00:49:12,582 THEN I CAN SAY, SPECIFICALLY 1472 00:49:12,649 --> 00:49:14,117 WHILE INDIVIDUALS WITH HEART 1473 00:49:14,184 --> 00:49:15,686 DISEASE SHOWED NO RELATIONSHIP 1474 00:49:15,752 --> 00:49:17,888 BETWEEN AGE AND HEART RATE, 1475 00:49:17,954 --> 00:49:19,156 REPORTING THAT NONSIGNIFICANT 1476 00:49:19,222 --> 00:49:20,324 SLOPE HERE, FROM EITHER YOUR 1477 00:49:20,390 --> 00:49:21,525 CALCULATION OR THE ONE THAT YOU 1478 00:49:21,591 --> 00:49:25,095 CAN EXTRACT FROM THE SOFTWARE. 1479 00:49:25,162 --> 00:49:26,897 INDIVIDUALS WITHOUT HEART 1480 00:49:26,963 --> 00:49:29,766 DISEASE SHOWED A SIGNIFICANT 1481 00:49:29,833 --> 00:49:33,036 DECLINE OF 1.05 BEATS PER MINUTE 1482 00:49:33,103 --> 00:49:34,037 PER YEAR OF AGING. 1483 00:49:34,104 --> 00:49:36,373 AND REPORTING THAT SLOPE FOR 1484 00:49:36,440 --> 00:49:37,207 THAT GROUP. 1485 00:49:37,274 --> 00:49:39,576 SO AGAIN REPORTING EACH SLOPE 1486 00:49:39,643 --> 00:49:44,581 SECH RATTLY FOR GROUP. 1487 00:49:44,648 --> 00:49:46,383 AND INVESTIGATES SHOWED A NORMAL 1488 00:49:46,450 --> 00:49:47,751 DISTRIBUTION. 1489 00:49:47,818 --> 00:49:49,186 I WOULD SAY IN INTERACTION 1490 00:49:49,252 --> 00:49:51,254 CASES, FIGURE IS USUALLY VERY 1491 00:49:51,321 --> 00:49:52,889 USEFUL FOR HELPING PEOPLE 1492 00:49:52,956 --> 00:49:53,190 UNDERSTAND. 1493 00:49:53,256 --> 00:49:55,559 I MEAN, EVEN, I EVEN WISH I HAD 1494 00:49:55,625 --> 00:49:57,260 IT HERE, EVEN JUST TO 1495 00:49:57,327 --> 00:49:58,929 DEMONSTRATE THE SLOPES CAN BE A 1496 00:49:58,995 --> 00:50:00,097 LOT EASIER TO SEE THE DATA AND 1497 00:50:00,163 --> 00:50:01,965 UNDERSTAND WHAT WE'RE TALKING 1498 00:50:02,032 --> 00:50:02,399 ABOUT. 1499 00:50:02,466 --> 00:50:04,368 THEN JUST READING THE NUMBERS. 1500 00:50:04,434 --> 00:50:06,970 SO, I WOULD ALWAYS ADVOCATE FOR 1501 00:50:07,037 --> 00:50:12,342 USING FIGURES WHEN POSSIBLE. 1502 00:50:12,409 --> 00:50:15,312 SO NOW, JUST TO WALK THROUGH 1503 00:50:15,379 --> 00:50:16,413 SOME OTHER COMMON TESTS THAT 1504 00:50:16,480 --> 00:50:18,615 HAVE A RELATIONSHIP OR HAVE 1505 00:50:18,682 --> 00:50:22,619 QUESTIONS OF RELATIONSHIPS TO 1506 00:50:22,686 --> 00:50:22,919 REGRESSION. 1507 00:50:22,986 --> 00:50:24,020 STARTING WITH CORRELATION. 1508 00:50:24,087 --> 00:50:25,922 SO THIS COMES UP A LOT WHERE 1509 00:50:25,989 --> 00:50:26,990 PEOPLE ARE CONFUSED ABOUT 1510 00:50:27,057 --> 00:50:28,558 WHETHER THEY WANT TO USE A 1511 00:50:28,625 --> 00:50:30,861 CORRELATION OR A REGRESSION. 1512 00:50:30,927 --> 00:50:32,763 SO FOR CORRELATION, WE'RE 1513 00:50:32,829 --> 00:50:33,663 PRIMARILY INTERESTED IN THE 1514 00:50:33,730 --> 00:50:35,832 STRENGTH AND DIRECTION OF THE 1515 00:50:35,899 --> 00:50:37,467 RELATIONSHIP IN A COMMINATIVE 1516 00:50:37,534 --> 00:50:39,803 WAY, NOT REALLY IN TERMS OF ONE 1517 00:50:39,870 --> 00:50:40,971 PREDICTING THE OTHER. 1518 00:50:41,037 --> 00:50:42,239 MAYBE WE JUST WANT TO KNOW IF 1519 00:50:42,305 --> 00:50:44,941 THEY ARE CORRELATED WITH EACH 1520 00:50:45,008 --> 00:50:45,142 OTHER. 1521 00:50:45,208 --> 00:50:47,611 AND AGAIN, THIS WOULD HAVE TO BE 1522 00:50:47,677 --> 00:50:49,546 BOTH VARIABLES MUST BE 1523 00:50:49,613 --> 00:50:52,082 CONTINUOUS, WE WOULD ONLY BE 1524 00:50:52,149 --> 00:50:52,916 INVESTIGATING TWO VARIABLES. 1525 00:50:52,983 --> 00:50:53,984 YOU CAN OBVIOUSLY DO 1526 00:50:54,050 --> 00:50:56,319 CORRELATIONS IN MANY PAIRS OF 1527 00:50:56,386 --> 00:50:58,088 VARIABLES, BUT THIS WOULD ONLY 1528 00:50:58,155 --> 00:51:04,828 ANSWER THE QUESTION ABOUT TWO AT 1529 00:51:04,895 --> 00:51:06,263 A TIME. 1530 00:51:06,329 --> 00:51:07,764 T-TESTS ARE EQUIVALENT TO 1531 00:51:07,831 --> 00:51:12,569 REGRESSION WHEN WE HAVE A SINGLE 1532 00:51:12,636 --> 00:51:13,170 BINARY INDEPENDENT VARIABLE. 1533 00:51:13,236 --> 00:51:14,704 SO MAYBE THIS DOESN'T SEEM LIKE 1534 00:51:14,771 --> 00:51:17,140 A QUESTION ANYBODY WOULD ASK. 1535 00:51:17,207 --> 00:51:19,409 BUT IN A SENSE WHERE PERHAPS YOU 1536 00:51:19,476 --> 00:51:20,811 DO HAVE TWO GROUPS AND ONE 1537 00:51:20,877 --> 00:51:22,012 CONTINUOUS MEASURE AND YOU WANT 1538 00:51:22,078 --> 00:51:23,847 TO KNOW IF IT IS DIFFERENT FOR 1539 00:51:23,914 --> 00:51:26,817 EACH OF YOUR GROUPS, YOU MIGHT 1540 00:51:26,883 --> 00:51:27,651 JUST THINK BY DEFAULT TO RUN A 1541 00:51:27,717 --> 00:51:27,884 T-TEST. 1542 00:51:27,951 --> 00:51:30,387 IF YOU THINK THAT EVENTUALLY 1543 00:51:30,454 --> 00:51:31,955 YOU'RE GOING TO WANT TO RUN 1544 00:51:32,022 --> 00:51:33,623 REGRESSION AND CONTROL FOR OTHER 1545 00:51:33,690 --> 00:51:34,391 VARIABLES IN THERE, YOU SHOULD 1546 00:51:34,458 --> 00:51:37,127 PROBABLY JUST PUT YOURSELF IN A 1547 00:51:37,194 --> 00:51:39,830 REGRESSION FRAMEWORK FROM THE 1548 00:51:39,896 --> 00:51:40,096 BEGINNING. 1549 00:51:40,163 --> 00:51:41,298 SO IF YOU'RE ONLY REALLY 1550 00:51:41,364 --> 00:51:43,700 INTERESTED IN THE TWO VARIABLES 1551 00:51:43,767 --> 00:51:44,000 TOGETHER. 1552 00:51:44,067 --> 00:51:44,501 A SIMPLE T-TEST IS FINE. 1553 00:51:44,568 --> 00:51:46,403 IF YOU THINK YOU'RE USING THAT 1554 00:51:46,470 --> 00:51:49,072 AND WANT TO BUILD ON IT OR ADD 1555 00:51:49,139 --> 00:51:52,242 INTERACTION TERMS AND ADD SORT 1556 00:51:52,309 --> 00:51:54,077 COVARIANTS OR POTENTIAL 1557 00:51:54,144 --> 00:51:55,212 CONFOUNDING FACTORS THEN MAYBE 1558 00:51:55,278 --> 00:51:56,213 WE JUST WANT TO BE IN A 1559 00:51:56,279 --> 00:52:01,218 REGRESSION FRAMEWORK FROM THE 1560 00:52:01,284 --> 00:52:01,418 START. 1561 00:52:01,485 --> 00:52:03,553 AND LASTLY, IS ANOVA, AND I 1562 00:52:03,620 --> 00:52:05,422 GUESS, WHAT I WANT TO SAY ABOUT 1563 00:52:05,489 --> 00:52:07,824 THAT, IS BASICALLY THAT ANOVA IS 1564 00:52:07,891 --> 00:52:08,758 A SPECIAL CASE OF LINEAR 1565 00:52:08,825 --> 00:52:09,559 REGRESSION. 1566 00:52:09,626 --> 00:52:11,461 THEY ARE REALLY THE SAME. 1567 00:52:11,528 --> 00:52:12,963 THEY ARE DOING THE SAME THING. 1568 00:52:13,029 --> 00:52:15,298 IN ANOVA, THE INDEPENDENT 1569 00:52:15,365 --> 00:52:17,501 VARIABLE IS, CAN ONLY BE 1570 00:52:17,567 --> 00:52:18,068 CATEGORICAL. 1571 00:52:18,134 --> 00:52:20,170 AND IN ALL OTHER CASES YOU'RE 1572 00:52:20,237 --> 00:52:22,739 BASICALLY DOING REGRESSION. 1573 00:52:22,806 --> 00:52:24,774 SO IF YOU'RE GOING TO LEARN HOW 1574 00:52:24,841 --> 00:52:27,611 TO DO ANYTHING, I WOULD 1575 00:52:27,677 --> 00:52:28,311 RECOMMEND UNDERSTANDING 1576 00:52:28,378 --> 00:52:29,212 REGRESSION. 1577 00:52:29,279 --> 00:52:31,681 BECAUSE YOU CAN RUN ANOVAS WHEN 1578 00:52:31,748 --> 00:52:32,616 YOU UNDERSTAND REGRESSION. 1579 00:52:32,682 --> 00:52:33,550 WHEN YOU LEARN A95 OFFER, YOU 1580 00:52:33,617 --> 00:52:36,086 ONLY KNOW HOW TO DO ANOVA. 1581 00:52:36,152 --> 00:52:38,755 SO IT IS NORTHWESTERLY 1582 00:52:38,822 --> 00:52:39,523 GENERALIZABLE AND MORE FLEXIBLE 1583 00:52:39,589 --> 00:52:40,223 TO UNDERSTAND REGRESSION 1584 00:52:40,290 --> 00:52:42,392 FORMALLY, LIKE I SAID, ANOVA IS 1585 00:52:42,459 --> 00:52:46,730 JUST A SPECIAL CASE OF LINEAR 1586 00:52:46,796 --> 00:52:47,030 REGRESSION. 1587 00:52:47,097 --> 00:52:51,268 AND WE HAVE, WELL I GUESS, I 1588 00:52:51,334 --> 00:52:53,136 WOULD CALL EXTENSIONS OF 1589 00:52:53,203 --> 00:52:55,138 REGRESSION LOGISTIC REGRESSION 1590 00:52:55,205 --> 00:52:55,739 WOULD USE A BINARY OUTCOME 1591 00:52:55,805 --> 00:52:55,972 MEASURE. 1592 00:52:56,039 --> 00:52:56,873 AND WE'LL BE TALKING ABOUT THAT 1593 00:52:56,940 --> 00:53:01,211 IN THE FUTURE LECTURE FOR THIS 1594 00:53:01,278 --> 00:53:01,444 SERIES. 1595 00:53:01,511 --> 00:53:02,979 POISSON REGRESSION IS USING 1596 00:53:03,046 --> 00:53:07,984 ACCOUNT TOUT COME MEASURE. 1597 00:53:08,051 --> 00:53:09,686 SO, POSITIVE WHOLE NUMBERS. 1598 00:53:09,753 --> 00:53:11,154 COX REGRESSION USING SURVIVAL 1599 00:53:11,221 --> 00:53:11,555 OUTCOME MEASURES. 1600 00:53:11,621 --> 00:53:15,025 AND THEN LINEAR MIXED MODELS 1601 00:53:15,091 --> 00:53:17,394 WHICH IS BASICALLY REGRESSION 1602 00:53:17,460 --> 00:53:18,728 WITH INTERCORRELATED DATA. 1603 00:53:18,795 --> 00:53:20,530 SO MAYBE WE HAVE REPEATED 1604 00:53:20,597 --> 00:53:24,668 MEASURES OR TIME OR CLUSTERED 1605 00:53:24,734 --> 00:53:26,136 DATA OR LONGITUDINAL DATA SO A 1606 00:53:26,202 --> 00:53:27,704 DIFFERENT TYPE OF MODEL WOULD BE 1607 00:53:27,771 --> 00:53:29,205 USED TO ANSWER QUESTIONS FOR 1608 00:53:29,272 --> 00:53:35,478 OUTCOME MEASURES OF THAT TYPE. 1609 00:53:35,545 --> 00:53:37,080 SO IN SUMMARY, LINEAR REGRESSION 1610 00:53:37,147 --> 00:53:40,550 IS A USEFUL AND FLEXIBLE WAY TO 1611 00:53:40,617 --> 00:53:42,319 UNDERSTAND THE PREDICTIVE, BUT 1612 00:53:42,385 --> 00:53:42,953 NOT CAUSAL, BEHAVIOR BETWEEN ONE 1613 00:53:43,019 --> 00:53:44,120 OR MORE VARIABLES IN A 1614 00:53:44,187 --> 00:53:47,023 CONTINUOUS OUTCOME MEASURE. 1615 00:53:47,090 --> 00:53:48,625 AND AS WE DISCUSSED, YOU MUST 1616 00:53:48,692 --> 00:53:49,859 MEET CERTAIN ASSUMPTIONS FOR THE 1617 00:53:49,926 --> 00:53:50,827 MODEL TO BE CONSIDERED 1618 00:53:50,894 --> 00:53:51,695 APPROPRIATE TO USE. 1619 00:53:51,761 --> 00:53:54,297 WE NEED TO CHECK THOSE 1620 00:53:54,364 --> 00:53:55,432 ASSUMPTIONS EITHER SOMETIMES AT 1621 00:53:55,498 --> 00:53:57,434 THE START BY UNDERSTANDING OUR 1622 00:53:57,500 --> 00:54:01,171 DESIGN WHETHER WE HAVE 1623 00:54:01,237 --> 00:54:01,771 INDEPENDENCE. 1624 00:54:01,838 --> 00:54:04,040 AND THE BEHAVIOR OF IF WE HAVE A 1625 00:54:04,107 --> 00:54:06,109 LINEAR BEHAVIOR TO DESCRIBE, BUT 1626 00:54:06,176 --> 00:54:07,177 ALSO WE CHECK ASSUMPTIONS AFTER 1627 00:54:07,243 --> 00:54:12,816 RUNNING THE MODEL WHICH IS -- 1628 00:54:12,882 --> 00:54:14,851 WHEN WE ABSTAIN RESIDUALS FROM 1629 00:54:14,918 --> 00:54:16,453 OUR RUN MODEL ARE THEY BEHAVING 1630 00:54:16,519 --> 00:54:18,588 IN A WAY THAT WE SEE ASSUMPTIONS 1631 00:54:18,655 --> 00:54:22,592 OF REGRESSION, WHICH IS THE 1632 00:54:22,659 --> 00:54:23,693 NORMAL DISTRIBUTION OF 1633 00:54:23,760 --> 00:54:24,327 RESIDUALS. 1634 00:54:24,394 --> 00:54:25,528 ADDITIONAL VARIABLES CAN BE 1635 00:54:25,595 --> 00:54:27,864 ADDED AS NEEDED TO ELABORATE ON 1636 00:54:27,931 --> 00:54:29,232 A RELATIONSHIP BETWEEN, YOU 1637 00:54:29,299 --> 00:54:32,002 KNOW, X AND Y OR INVESTIGATE OUR 1638 00:54:32,068 --> 00:54:33,803 ACTIONS OR ADDRESS CONFOUNDING 1639 00:54:33,870 --> 00:54:34,070 BEHAVIORS. 1640 00:54:34,137 --> 00:54:36,673 SO THIS IS OBVIOUSLY INCREASE IN 1641 00:54:36,740 --> 00:54:38,074 COMPLEXITY VERY QUICKLY. 1642 00:54:38,141 --> 00:54:40,877 VISUAL-- I THINK BECAUSE OF 1643 00:54:40,944 --> 00:54:43,246 THAT, VISUAL INSPECTIONS OF THE 1644 00:54:43,313 --> 00:54:44,581 DATA ARE REALLY KEY TO 1645 00:54:44,648 --> 00:54:47,117 UNDERSTANDING THE DATA OVERALL. 1646 00:54:47,183 --> 00:54:49,719 LOOKING AT RESIDUAL BEHAVIOR. 1647 00:54:49,786 --> 00:54:50,787 UNDERSTANDING DISTRIBUTION, 1648 00:54:50,854 --> 00:54:52,088 VARIABILITY, MODEL PERFORMANCE. 1649 00:54:52,155 --> 00:54:54,324 YOU NEED TO SEE YOUR DATA TO 1650 00:54:54,391 --> 00:54:55,291 REALLY UNDERSTAND IT. 1651 00:54:55,358 --> 00:54:59,329 SO DON'T BE AFRAID TO MAKE THE 1652 00:54:59,396 --> 00:54:59,763 VISUALIZATIONS. 1653 00:54:59,829 --> 00:55:01,097 AND DATA REPORTING SHOULD 1654 00:55:01,164 --> 00:55:02,298 INCLUDE INFORMATION ON MOST OR 1655 00:55:02,365 --> 00:55:04,401 ALL MODELING INVESTIGATED DURING 1656 00:55:04,467 --> 00:55:06,169 THE ANALYTICAL PROCESS. 1657 00:55:06,236 --> 00:55:09,706 ESPECIALLY IF IT IS EXPLORATORY, 1658 00:55:09,773 --> 00:55:10,440 IT SHOULD BE AS REPRODUCIBLE AS 1659 00:55:10,507 --> 00:55:12,409 POSSIBLE IN TERMS OF THE THINGS 1660 00:55:12,475 --> 00:55:13,243 THAT YOU TRIED AND WHAT FAILED. 1661 00:55:13,309 --> 00:55:15,211 AND REPORTING AS MUCH AS YOU CAN 1662 00:55:15,278 --> 00:55:19,983 OF THAT IN THE PUBLISHED 1663 00:55:20,050 --> 00:55:20,283 LITERATURE. 1664 00:55:20,350 --> 00:55:25,155 AND I WANT TO CONCLUDE BY 1665 00:55:25,221 --> 00:55:29,726 TALKING ABOUT THIS QUOTE FROM 1666 00:55:29,793 --> 00:55:31,294 STATISTICIAN GEORGE BOX, OF BOX 1667 00:55:31,361 --> 00:55:33,329 COX TRANSFORMATION, IF YOU HEARD 1668 00:55:33,396 --> 00:55:34,564 OF SUCH A THING. 1669 00:55:34,631 --> 00:55:36,066 HE ATTRIBUTED TO SAYING 1670 00:55:36,132 --> 00:55:37,267 SOMETHING LIKE ALL MODELS ARE 1671 00:55:37,333 --> 00:55:40,236 WRONG, BUT SOME ARE USEFUL. 1672 00:55:40,303 --> 00:55:43,139 THIS IS ORIGINALLY EXTRAP A 1673 00:55:43,206 --> 00:55:45,408 LOTTED FROM SOMETHING HE WROTE 1674 00:55:45,475 --> 00:55:48,845 ABOUT IN 1976 IN JAZZA PAPER, 1675 00:55:48,912 --> 00:55:52,082 CALLED SCIENCE AND STATISTICS. 1676 00:55:52,148 --> 00:55:53,249 HE WROTE ABOUT THIS CONCEPT IN 1677 00:55:53,316 --> 00:55:54,718 TWO PARTICULAR PASSAGES. 1678 00:55:54,784 --> 00:56:02,092 I JUST WANT TO READ THEM HERE. 1679 00:56:02,158 --> 00:56:04,094 SO ABOUT PARSIMONY, HE WRITES 1680 00:56:04,160 --> 00:56:06,963 THIS ALL MODELS ARE WRONG, 1681 00:56:07,030 --> 00:56:08,765 SCIENTISTS CANNOT OBTAIN A 1682 00:56:08,832 --> 00:56:10,567 CORRECT ONE, BY EXCESSIVE 1683 00:56:10,633 --> 00:56:10,867 ELABORATION. 1684 00:56:10,934 --> 00:56:14,971 ON THE CONTRARY, HE WOULD SEEK A 1685 00:56:15,038 --> 00:56:17,107 DESCRIPTION OF NATURAL 1686 00:56:17,173 --> 00:56:18,908 PHENOMENA, TO HAVE SIMPLE, 1687 00:56:18,975 --> 00:56:23,747 MODELS IS THE SIGNATURE OF A 1688 00:56:23,813 --> 00:56:25,815 GREAT SCIENTIST, OVERELABORATION 1689 00:56:25,882 --> 00:56:29,052 IS OFTEN THE MARK AT 1690 00:56:29,119 --> 00:56:29,719 MERITOCRACY. 1691 00:56:29,786 --> 00:56:30,253 AND WORRYING SELECTIVELY. 1692 00:56:30,320 --> 00:56:32,522 HE SAYS SINCE ALL MODALS ARE 1693 00:56:32,589 --> 00:56:35,158 WRONG, THE SCIENTIST MUST BE 1694 00:56:35,225 --> 00:56:40,396 ALERT TO WHAT IS APPROPRIATELY 1695 00:56:40,463 --> 00:56:40,597 WRONG. 1696 00:56:40,663 --> 00:56:42,165 IT IS INAPPROPRIATE TO BE 1697 00:56:42,232 --> 00:56:45,635 CONCERNED ABOUT MICE WHEN THERE 1698 00:56:45,702 --> 00:56:46,503 ARE TIGERS ABROAD. 1699 00:56:46,569 --> 00:56:49,072 THIS IS OVERLY DRAMATIC. 1700 00:56:49,139 --> 00:56:50,840 BUT NOT INCLUDING EVERY 1701 00:56:50,907 --> 00:56:54,110 VARIABLE, NOT HAVING PERFECT 1702 00:56:54,177 --> 00:56:57,046 MODEL FIT, BECAUSE IT REALLY 1703 00:56:57,113 --> 00:56:58,948 OVER EMPHASIZES HOW MUCH WE 1704 00:56:59,015 --> 00:57:00,216 SHOULD BELIEVE THE MODELS 1705 00:57:00,283 --> 00:57:00,850 ANYWAY. 1706 00:57:00,917 --> 00:57:03,787 WE SHOULD BE IF THE MODEL IS 1707 00:57:03,853 --> 00:57:04,854 ELIMINATING, LEARNING SOMETHING 1708 00:57:04,921 --> 00:57:06,923 FROM IT, AND GOOD ENOUGH FOR 1709 00:57:06,990 --> 00:57:08,158 THIS PARTICULAR APPLICATION AND 1710 00:57:08,224 --> 00:57:10,627 THIS PARTICULAR COHORT FOR OUR 1711 00:57:10,693 --> 00:57:11,027 PARTICULAR GOALS. 1712 00:57:11,094 --> 00:57:13,630 AND MODEL FIT IS IMPORTANT IN 1713 00:57:13,696 --> 00:57:14,430 THE SENSE THAT DO WE WILL 1714 00:57:14,497 --> 00:57:16,733 UNDERSTAND THE LIMITATIONS OF 1715 00:57:16,800 --> 00:57:18,434 THE MODEL AND WHERE WE MIGHT 1716 00:57:18,501 --> 00:57:21,905 POSSIBLY BE FAILING. 1717 00:57:21,971 --> 00:57:22,472 I THINK THAT'S BROADLY MORE 1718 00:57:22,539 --> 00:57:24,974 IMPORTANT HAS YOU GO THROUGH THE 1719 00:57:25,041 --> 00:57:26,476 MODELING PROCESS AND LEARN TO DO 1720 00:57:26,543 --> 00:57:27,677 THESE THINGS AND REGRESSION AND 1721 00:57:27,744 --> 00:57:31,714 HONESTLY OTHER THINGS. 1722 00:57:31,781 --> 00:57:35,118 I THINK THIS IS AN INTERESTING 1723 00:57:35,185 --> 00:57:36,119 AND USEFUL QUOTE TO KEEP IN MIND 1724 00:57:36,186 --> 00:57:36,920 AS YOU DO SCIENCE. 1725 00:57:36,986 --> 00:57:38,621 ANY QUESTIONS IN THE LAST-MINUTE 1726 00:57:38,688 --> 00:57:38,855 OR TWO? 1727 00:57:38,922 --> 00:57:41,057 FEEL FREE TO PUT THEM IN THE 1728 00:57:41,124 --> 00:57:41,724 CHAT. 1729 00:57:41,791 --> 00:57:43,259 IF YOU HAVE OTHER QUESTIONS HOAR 1730 00:57:43,326 --> 00:57:44,594 YOU LIKE TO SPEAK FURTHER I'M 1731 00:57:44,661 --> 00:57:46,796 ALWAYS AVAILABLE TO BE EMAILED 1732 00:57:46,863 --> 00:57:57,040 AS WELL. 1733 00:58:19,162 --> 00:58:19,395 ALL RIGHT. 1734 00:58:19,462 --> 00:58:21,764 I DON'T SEE ANY QUESTIONS. 1735 00:58:21,831 --> 00:58:23,600 THANK YOU SO MUCH FOR JOINING, 1736 00:58:23,666 --> 00:58:23,766 ALL. 1737 00:58:23,833 --> 00:58:25,602 LIKE I SAID, PLEASE REACH OUT IF 1738 00:58:25,668 --> 00:58:26,669 YOU HAVE FURTHER QUESTIONS. 1739 00:58:26,736 --> 00:58:27,904 AND THE RECORDING WILL BE 1740 00:58:27,971 --> 00:58:30,907 AVAILABLE AND SLIDES WILL BE 1741 00:58:30,974 --> 00:58:31,741 DISSEMINATED AT SOME POINT. 1742 00:58:31,808 IF YOU'RE INTERESTED IN THAT.