1 00:00:05,533 --> 00:00:07,535 WELL, GOOD AFTERNOON, EVERYONE. 2 00:00:07,535 --> 00:00:10,138 WELCOME TO THE ANNUAL ADA 3 00:00:10,138 --> 00:00:11,139 LOVELACE LECTURE THAT'S 4 00:00:11,139 --> 00:00:13,842 SPONSORED BY THE INTRAMURAL 5 00:00:13,842 --> 00:00:16,077 RESEARCH PROGRAM OF THE NATIONAL 6 00:00:16,077 --> 00:00:26,488 LIBRARY OF MEDICINE. 7 00:00:26,488 --> 00:00:29,224 I'M THE SCIENTIFIC DIRECTOR, 8 00:00:29,224 --> 00:00:34,396 WITH APPROACHES TO INFORMATICS 9 00:00:34,396 --> 00:00:38,333 PROGRAMS IN HUMAN HEALTH. 10 00:00:38,333 --> 00:00:39,634 THIS CELEBRATES THE ACHIEVEMENTS 11 00:00:39,634 --> 00:00:41,703 IN WOMEN IN SCIENCE, TECHNOLOGY, 12 00:00:41,703 --> 00:00:44,406 ENGINEERING AND MATHEMATICS. 13 00:00:44,406 --> 00:00:46,474 IT AIMS TO INCREASE THE 14 00:00:46,474 --> 00:00:49,811 PARTICIPATION OF WOMEN WITHIN 15 00:00:49,811 --> 00:00:51,413 THE STEM SECTORS AND SUPPORTS 16 00:00:51,413 --> 00:00:52,881 WOMEN ALREADY WORKING IN STEM. 17 00:00:52,881 --> 00:00:56,317 YOU MAY NOT KNOW THIS, BUT ADA 18 00:00:56,317 --> 00:00:57,986 LOVELACE WAS AN ENGLISH 19 00:00:57,986 --> 00:01:03,091 MATHEMATICIAN AND WRITER. 20 00:01:03,091 --> 00:01:13,334 SHE WAS WORKING ON CHARLES 21 00:01:13,334 --> 00:01:14,903 BABBAGE'S COMPUTER, THE FIRST 22 00:01:14,903 --> 00:01:16,738 PERSON TO RECOGNIZE THE MACHINE 23 00:01:16,738 --> 00:01:18,706 HAD APPLICATION BEYOND PURE 24 00:01:18,706 --> 00:01:20,074 CALCULATION AND PUBLISHED THE 25 00:01:20,074 --> 00:01:21,709 FIRST ALGORITHM INTENDED TO BE 26 00:01:21,709 --> 00:01:23,978 CARRIED OUT BY SUCH A MACHINE. 27 00:01:23,978 --> 00:01:25,346 AND SO AS A RESULT SHE'S 28 00:01:25,346 --> 00:01:28,750 REGARDED AS ACTUALLY THE FIRST 29 00:01:28,750 --> 00:01:29,751 COMPUTER PROGRAMMER. 30 00:01:29,751 --> 00:01:31,986 TODAY'S TALK WILL BE BROADCAST 31 00:01:31,986 --> 00:01:34,489 LIVING AND ARCHIVED THROUGH THE 32 00:01:34,489 --> 00:01:35,423 NIH VIDEOCAST WEBSITE. 33 00:01:35,423 --> 00:01:38,259 IF YOU HAVE ANY QUESTIONS DURING 34 00:01:38,259 --> 00:01:39,861 THE PRESENTATION, PLEASE POST 35 00:01:39,861 --> 00:01:41,529 THOSE IN THE Q&A WINDOW. 36 00:01:41,529 --> 00:01:44,098 AND THEN WE WILL SAVE THOSE 37 00:01:44,098 --> 00:01:46,835 UNTIL AFTER THE PRESENTATION AND 38 00:01:46,835 --> 00:01:49,504 THEN WE'LL HOPEFULLY HAVE TIME 39 00:01:49,504 --> 00:01:52,173 WHERE OUR PRESENTER CAN ANSWER 40 00:01:52,173 --> 00:01:53,374 THE QUESTIONS. 41 00:01:53,374 --> 00:01:56,544 SO, I'M DELIGHTED TO INTRODUCE 42 00:01:56,544 --> 00:02:01,816 THE 2023 ADA LOVELACE LECTURER, 43 00:02:01,816 --> 00:02:05,186 DR. LUCILA OHNO-MACHADO, DEPUTY 44 00:02:05,186 --> 00:02:08,690 DEAN FOR BIOMEDICAL INFORMATICS 45 00:02:08,690 --> 00:02:11,759 AND CHAIR AT DATA UNIVERSITY. 46 00:02:11,759 --> 00:02:14,429 AS A DEPUTY DIRECTOR, I'M SORRY, 47 00:02:14,429 --> 00:02:16,831 AS DEPUTY DEAN FOR BIOMARKER 48 00:02:16,831 --> 00:02:22,270 INFORMATICS, DR. OHNO-MACHADO 49 00:02:22,270 --> 00:02:23,905 OVERSEES RESEARCH ACROSS THE 50 00:02:23,905 --> 00:02:27,175 YALE ACADEMIC HEALTH SYSTEM AND 51 00:02:27,175 --> 00:02:28,643 HAS A LONGSTANDING CAREER 52 00:02:28,643 --> 00:02:31,346 CONDUCTING RESEARCH IN 53 00:02:31,346 --> 00:02:33,548 DEVELOPING PREDICTIVE MODELS AND 54 00:02:33,548 --> 00:02:37,485 DATA SHARING AROUND CLINICAL 55 00:02:37,485 --> 00:02:37,685 DATA. 56 00:02:37,685 --> 00:02:40,355 PRIOR TO JOINING YALE SHE WAS AT 57 00:02:40,355 --> 00:02:41,890 THE UNIVERSITY OF CALIFORNIA SAN 58 00:02:41,890 --> 00:02:45,927 DIEGO AND WE WERE COLLEAGUES 59 00:02:45,927 --> 00:02:46,127 THERE. 60 00:02:46,127 --> 00:02:52,166 AND WHILE AT UCSD SHING 61 00:02:52,166 --> 00:02:53,735 ORGANIZED AN INITIATIVE TO SHARE 62 00:02:53,735 --> 00:02:54,469 DATA ACROSS THE FIVE UNIVERSITY 63 00:02:54,469 --> 00:02:56,037 OF CALIFORNIA MEDICAL SYSTEMS 64 00:02:56,037 --> 00:02:58,573 AND LATER EXTENDED THE APPROACH 65 00:02:58,573 --> 00:03:00,408 TO VARIOUS INSTITUTIONS ACROSS 66 00:03:00,408 --> 00:03:02,043 THE ENTIRE STATE OF CALIFORNIA 67 00:03:02,043 --> 00:03:07,482 AND IN FACT AROUND THE COUNTRY. 68 00:03:07,482 --> 00:03:11,920 PRIOR TO JOINING UCSD WAS A 69 00:03:11,920 --> 00:03:16,157 CHAIR AT BRIGHAM AND WOMEN'S IN 70 00:03:16,157 --> 00:03:19,561 BOSTON AND FACULTY OF HARVARD. 71 00:03:19,561 --> 00:03:21,930 WELCOME AND THANK YOU FOR 72 00:03:21,930 --> 00:03:23,131 JOINING US TODAY. 73 00:03:23,131 --> 00:03:26,000 AND THE FLOOR IS YOURS. 74 00:03:26,000 --> 00:03:28,636 >> THANK YOU SO MUCH, RICHARD, 75 00:03:28,636 --> 00:03:29,704 FOR THIS OPPORTUNITY. 76 00:03:29,704 --> 00:03:32,273 I WILL SHARE MY SCREEN AND I 77 00:03:32,273 --> 00:03:38,479 WILL ASK YOU TO LET ME KNOW IF I 78 00:03:38,479 --> 00:03:41,416 AM SHARING THE CORRECT MODE OR I 79 00:03:41,416 --> 00:03:46,087 NEED TO DO SOME ADJUSTMENT IN 80 00:03:46,087 --> 00:03:46,621 HERE. 81 00:03:46,621 --> 00:03:50,592 LET ME JUST GO QUICKLY OVER THE 82 00:03:50,592 --> 00:03:51,693 POWERPOINT PRESENTATION, JUST 83 00:03:51,693 --> 00:03:54,228 ONE SECOND. 84 00:03:54,228 --> 00:04:00,868 85 00:04:00,868 --> 00:04:03,638 I'M JUST TRYING TO GET IT IN THE 86 00:04:03,638 --> 00:04:12,180 PRESENTATION MODE HERE. 87 00:04:12,180 --> 00:04:14,349 88 00:04:14,349 --> 00:04:15,450 DOES THAT SOUND CORRECT? 89 00:04:15,450 --> 00:04:15,683 >> YES. 90 00:04:15,683 --> 00:04:16,851 >> THANK YOU SO MUCH. 91 00:04:16,851 --> 00:04:19,320 WHAT I'M PLANNING TO DO IS 92 00:04:19,320 --> 00:04:21,055 INTRODUCE MEDICAL A.I. WITHIN 93 00:04:21,055 --> 00:04:22,256 THE FIELD OF BIOMEDICAL 94 00:04:22,256 --> 00:04:24,659 INFORMATICS AND DATA SCIENCE AND 95 00:04:24,659 --> 00:04:27,395 DESCRIBE SOME OF THE MODELS AND 96 00:04:27,395 --> 00:04:28,863 HOW STATISTICALLY MACHINE 97 00:04:28,863 --> 00:04:30,431 LEARNING CAN HELP PREDICT 98 00:04:30,431 --> 00:04:31,966 CLINICAL OUTCOMES, BUT ALSO 99 00:04:31,966 --> 00:04:35,570 INTRODUCE THE PROBLEM OF 100 00:04:35,570 --> 00:04:36,170 EVALUATING PERSONALIZED 101 00:04:36,170 --> 00:04:39,774 PREDICTION, SO A CRITICAL POINT 102 00:04:39,774 --> 00:04:42,443 IN CLINICAL PRECISION MEDICINE 103 00:04:42,443 --> 00:04:44,145 AND IDENTIFY TECHNICAL, SOCIAL, 104 00:04:44,145 --> 00:04:46,381 ETHICAL CHALLENGES FOR HEALTH 105 00:04:46,381 --> 00:04:46,614 EQUITY. 106 00:04:46,614 --> 00:04:51,853 AND IN WHICH MEDICAL A.I. IS AN 107 00:04:51,853 --> 00:04:54,656 IMPORTANT PORTION OF. 108 00:04:54,656 --> 00:04:55,590 OBJECTIVES, I'LL GO THROUGH 109 00:04:55,590 --> 00:04:58,459 DIFFERENT TYPES OF A.I., 110 00:04:58,459 --> 00:05:00,662 DEMYSTIFY A.I., WILL NOT FOCUS 111 00:05:00,662 --> 00:05:02,296 ON GENERATIVE A.I. BUT TALK A 112 00:05:02,296 --> 00:05:05,767 LITTLE BIT ABOUT IT BECAUSE 113 00:05:05,767 --> 00:05:09,037 OTHERWISE I KNOW PEOPLE GET 114 00:05:09,037 --> 00:05:11,339 DISAPPOINTED, NOT INTRODUCING A 115 00:05:11,339 --> 00:05:13,374 LITTLE BIT ABOUT GENERATIVE A.I. 116 00:05:13,374 --> 00:05:16,277 THEN USERS OF A.I. MODELS IN 117 00:05:16,277 --> 00:05:19,881 HEALTH CARE AND ROLE OF 118 00:05:19,881 --> 00:05:22,884 INFORMATICS IN DATA SCIENCE. 119 00:05:22,884 --> 00:05:25,920 DIFFERENT TYPES ARE OUT THERE, 120 00:05:25,920 --> 00:05:26,788 TERABYTES IN ELECTRONIC HEALTH 121 00:05:26,788 --> 00:05:29,857 RECORDS, A WHOLE LOT OF OTHER 122 00:05:29,857 --> 00:05:31,025 SOURCES. 123 00:05:31,025 --> 00:05:32,894 AND WE FINALLY HAVE 124 00:05:32,894 --> 00:05:34,262 COMPUTATIONAL POWER AND METHODS 125 00:05:34,262 --> 00:05:36,831 TO DEVELOP MODELS THAT CAN BE OF 126 00:05:36,831 --> 00:05:39,801 USE IN CLINICAL MEDICINE. 127 00:05:39,801 --> 00:05:41,469 I SEE BIOMEDICAL INFORMATICS AND 128 00:05:41,469 --> 00:05:45,640 DATA SCIENCE AS A SERIES OF 129 00:05:45,640 --> 00:05:50,078 SUBSPECIALITIES, AND I LIST SOME 130 00:05:50,078 --> 00:05:50,278 HERE. 131 00:05:50,278 --> 00:05:55,349 AND I HIGHLIGHT EVALUATION 132 00:05:55,349 --> 00:05:59,520 METHODS FOR DATA SHARING, 133 00:05:59,520 --> 00:06:01,055 PRIVACY PRESERVING METHODS, THE 134 00:06:01,055 --> 00:06:02,590 ONES I'M MORE CRITICALLY 135 00:06:02,590 --> 00:06:02,924 INVOLVED WITH. 136 00:06:02,924 --> 00:06:04,325 BUT IT TAKES, YOU KNOW, A 137 00:06:04,325 --> 00:06:06,427 VILLAGE IN ORDER TO COME UP WITH 138 00:06:06,427 --> 00:06:10,398 AN APPLICATION THAT CAN BE 139 00:06:10,398 --> 00:06:13,234 DEPLOYED IN THE CLINIC. 140 00:06:13,234 --> 00:06:16,838 SO, I SEE OURSELVES STRIVING TO 141 00:06:16,838 --> 00:06:18,639 GET PRECISION HEALTH, PRECISION 142 00:06:18,639 --> 00:06:21,976 MEDICINE, TO OUR PATIENTS, AND 143 00:06:21,976 --> 00:06:26,013 THAT INVOLVES INTERSECTING 144 00:06:26,013 --> 00:06:27,148 CIRCLES OF MEDICAL, 145 00:06:27,148 --> 00:06:31,486 COMPUTATIONAL BIOLOGY AND PUBLIC 146 00:06:31,486 --> 00:06:36,190 HEALTH COLLEAGUES WITH 147 00:06:36,190 --> 00:06:37,458 ALGORITHMS, TOOLS, CONTROLLED 148 00:06:37,458 --> 00:06:39,827 VOCABULARY, ONTOLOGY, DATA 149 00:06:39,827 --> 00:06:40,495 MANAGEMENT, INFORMATION 150 00:06:40,495 --> 00:06:41,262 RETRIEVAL TECHNIQUES. 151 00:06:41,262 --> 00:06:42,864 IT IS NOT NEW. 152 00:06:42,864 --> 00:06:45,666 I GO AFTER THE FIRST A.I. 153 00:06:45,666 --> 00:06:47,168 SYSTEMS THAT WERE CALLED EXPERT 154 00:06:47,168 --> 00:06:50,805 SYSTEMS IN THE '80s AND 155 00:06:50,805 --> 00:06:51,139 '90s. 156 00:06:51,139 --> 00:06:56,778 THEN NEURAL NETWORKS CAME UP IN 157 00:06:56,778 --> 00:06:58,312 THE '90s AND SUCCESS AT THAT 158 00:06:58,312 --> 00:07:02,083 DECADE BUT NOT IN THE FOLLOWING 159 00:07:02,083 --> 00:07:04,385 ONE IN WHICH MOSTLY SUPPORT 160 00:07:04,385 --> 00:07:06,254 VECTOR MACHINE APPLICATIONS WERE 161 00:07:06,254 --> 00:07:08,890 OUT THERE, LIMITED APPLICATION. 162 00:07:08,890 --> 00:07:12,093 THEN IN THE 2010s AND 20s WE 163 00:07:12,093 --> 00:07:14,428 SAW RESURGENCE OF NEURAL 164 00:07:14,428 --> 00:07:15,163 NETWORKS, ESPECIALLY DEEP 165 00:07:15,163 --> 00:07:16,430 LEARNING. 166 00:07:16,430 --> 00:07:19,567 AND NOW IN THE 20S AND 30s, 167 00:07:19,567 --> 00:07:21,202 GENERATIVE A.I. IS WHAT'S 168 00:07:21,202 --> 00:07:22,937 EXCITING MOST OF US, AND WE 169 00:07:22,937 --> 00:07:24,739 DON'T KNOW WHAT'S COMING FOR THE 170 00:07:24,739 --> 00:07:26,240 '30s AND '40s BUT I'M SURE 171 00:07:26,240 --> 00:07:29,610 IT WILL BE EXCITING AS WELL. 172 00:07:29,610 --> 00:07:33,581 THE NEW ENGLAND JOURNAL AND 173 00:07:33,581 --> 00:07:35,183 OTHER CLINICAL SCIENTIFIC 174 00:07:35,183 --> 00:07:36,918 PUBLICATIONS HAVE SPECIAL 175 00:07:36,918 --> 00:07:38,719 ISSUES, EVEN SPINOFF JOURNALS 176 00:07:38,719 --> 00:07:40,454 RELATED TO A.I. 177 00:07:40,454 --> 00:07:44,425 JAMA AS WELL WITH A COLLECTION 178 00:07:44,425 --> 00:07:46,961 OF ARTICLES FOCUSING ON LARGE 179 00:07:46,961 --> 00:07:48,830 LANGUAGE MODELS, FOR EXAMPLE IN 180 00:07:48,830 --> 00:07:49,997 OTHER A.I. TECHNOLOGIES. 181 00:07:49,997 --> 00:07:52,066 IT'S OUT THERE. 182 00:07:52,066 --> 00:07:54,068 IT'S POPULAR NOW, AS OPPOSED TO 183 00:07:54,068 --> 00:07:58,105 THE TIMES WHERE WE STUDIED IT 184 00:07:58,105 --> 00:08:04,178 AND IT WAS A VERY NICHE AREA. 185 00:08:04,178 --> 00:08:08,616 I MUST REMIND YOU THAT ELISA, 186 00:08:08,616 --> 00:08:11,853 THE FIRST CHATBOT, WAS 1966, 187 00:08:11,853 --> 00:08:13,487 TERMINALS DIDN'T EVEN HAVE 188 00:08:13,487 --> 00:08:13,921 MONITORS. 189 00:08:13,921 --> 00:08:17,525 IT WAS ALL DONE ON PAPER AND 190 00:08:17,525 --> 00:08:20,127 PRINTED HERE, LATER THEY GOT 191 00:08:20,127 --> 00:08:22,697 THIS MONOCHROMATIC MONITOR. 192 00:08:22,697 --> 00:08:26,400 TODAY IF YOU'RE INTERESTED IN 193 00:08:26,400 --> 00:08:29,103 KNOWING HOW GOOD ELIZA WAS THERE 194 00:08:29,103 --> 00:08:32,506 ARE EMULATORS, CAN YOU GO ONLINE 195 00:08:32,506 --> 00:08:34,909 AND HAVE A CHAT. 196 00:08:34,909 --> 00:08:35,676 IT'S ENTERTAINING. 197 00:08:35,676 --> 00:08:38,512 EXPERT TEAMS CAME AFTER THAT, 198 00:08:38,512 --> 00:08:41,048 DONE IN THIS MAINFRAME 199 00:08:41,048 --> 00:08:47,054 COMPUTERS, MANY COMPUTERS AFTER 200 00:08:47,054 --> 00:08:47,688 THAT. 201 00:08:47,688 --> 00:08:56,964 THERE WAS CASSNET FOR GLAUCOMA, 202 00:08:56,964 --> 00:08:58,633 MYCIN, AND INTERNIST, NOT 203 00:08:58,633 --> 00:08:59,700 DIRECTLY DATA DRIVEN AS SYSTEMS 204 00:08:59,700 --> 00:09:04,538 WE HAVE TODAY. 205 00:09:04,538 --> 00:09:06,774 SO, DATA DRIVEN MODELS, HERE AN 206 00:09:06,774 --> 00:09:08,175 EXAMPLE OF SUPERVISED LEARNING, 207 00:09:08,175 --> 00:09:09,644 EXAMPLES WITH A GOLD STANDARD, 208 00:09:09,644 --> 00:09:12,914 AND THEN YOU SELECT A MODEL IN 209 00:09:12,914 --> 00:09:15,483 THIS CASE FOR EXAMPLE LINEAR 210 00:09:15,483 --> 00:09:16,784 REGRESSION MODEL, AND YOU TUNE 211 00:09:16,784 --> 00:09:19,687 ITS PARAMETERS UNTIL THE MODEL 212 00:09:19,687 --> 00:09:20,588 FITS THE DATA. 213 00:09:20,588 --> 00:09:22,757 THEN ONCE YOU HAVE A MODEL YOU 214 00:09:22,757 --> 00:09:24,392 APPLY THIS MODEL TO PREVIOUSLY 215 00:09:24,392 --> 00:09:26,727 UNSEEN CASES AND THEN YOU CAN 216 00:09:26,727 --> 00:09:28,696 ESTIMATE THE OUTCOMES OF 217 00:09:28,696 --> 00:09:29,063 INTEREST. 218 00:09:29,063 --> 00:09:36,003 SO THIS IS, YOU KNOW, PRIOR TO 219 00:09:36,003 --> 00:09:36,170 A.I. 220 00:09:36,170 --> 00:09:37,271 THIS IS STATISTICS, IT'S BEEN 221 00:09:37,271 --> 00:09:39,206 THERE MORE A WHILE. 222 00:09:39,206 --> 00:09:42,743 MOST MODELS IN MEDICINE ARE 223 00:09:42,743 --> 00:09:48,416 STILL BASED ON STATISTICAL 224 00:09:48,416 --> 00:09:51,352 LEARNING OR REGRESSION MODELS OF 225 00:09:51,352 --> 00:09:52,820 THIS TYPE. 226 00:09:52,820 --> 00:09:54,355 A.I. FOR CLINICIANS AND 227 00:09:54,355 --> 00:09:55,589 RESEARCHERS, THERE ARE THREE 228 00:09:55,589 --> 00:09:56,157 COMPONENTS. 229 00:09:56,157 --> 00:09:57,024 FIRST DATA. 230 00:09:57,024 --> 00:10:01,395 DATA CAN BE FROM COHORT STUDIES, 231 00:10:01,395 --> 00:10:02,196 ELECTRONIC HEALTH RECORDS, FROM 232 00:10:02,196 --> 00:10:03,698 A LOT OF SOURCES. 233 00:10:03,698 --> 00:10:07,034 THEN YOU HAVE MODELS IN WHICH 234 00:10:07,034 --> 00:10:08,669 YOU CAN DO STATISTICAL MACHINE 235 00:10:08,669 --> 00:10:11,472 LEARNING ON IT, WHILE PRESERVING 236 00:10:11,472 --> 00:10:13,774 PRIVACY OF THE INDIVIDUALS. 237 00:10:13,774 --> 00:10:15,209 AND THEN YOU HAVE 238 00:10:15,209 --> 00:10:16,310 IMPLEMENTATION. 239 00:10:16,310 --> 00:10:18,212 AND FOR IMPLEMENTATION YOU NEED 240 00:10:18,212 --> 00:10:20,414 TO PACE SPECIAL ATTENTION FOR 241 00:10:20,414 --> 00:10:21,148 PERFORMANCE MEASURES, 242 00:10:21,148 --> 00:10:22,984 RESPONSIBLE USE OF A.I. 243 00:10:22,984 --> 00:10:25,686 AND THE MODEL AS A MEDICAL 244 00:10:25,686 --> 00:10:26,887 DEVICE, BECAUSE IT CAN CHANGE 245 00:10:26,887 --> 00:10:30,691 CARE AND IT CAN CHANGE OUTCOMES. 246 00:10:30,691 --> 00:10:33,928 I WILL NOT GO IN ORDER BUT I 247 00:10:33,928 --> 00:10:36,163 WILL TOUCH ON THESE ITEMS 248 00:10:36,163 --> 00:10:36,497 INDIVIDUALLY. 249 00:10:36,497 --> 00:10:37,798 IN TERMS OF MODELS, THE 250 00:10:37,798 --> 00:10:41,002 STATISTICAL AND MACHINE LEARNING 251 00:10:41,002 --> 00:10:42,536 MODELS, MEDICAL A.I., THEY ARE 252 00:10:42,536 --> 00:10:43,504 USED IN PRACTICE. 253 00:10:43,504 --> 00:10:48,809 YOU HAVE, FOR EXAMPLE, THE MODEL 254 00:10:48,809 --> 00:10:51,979 FOR END STAGE LIVER DISEASE 255 00:10:51,979 --> 00:10:55,216 BEING USED AS CRITERION FOR 256 00:10:55,216 --> 00:10:56,751 LIVER TRANSPLANTATION ORDER 257 00:10:56,751 --> 00:10:58,119 PRIORITY. 258 00:10:58,119 --> 00:11:03,190 YOU HAVE SEVERAL MODELS FOR 259 00:11:03,190 --> 00:11:05,826 ACUTE KIDNEY INJURY, SEPSIS 260 00:11:05,826 --> 00:11:08,062 PREDICTION, MANY ARE ALREADY 261 00:11:08,062 --> 00:11:09,964 INTEGRATED INTO ELECTRONIC 262 00:11:09,964 --> 00:11:16,037 HEALTH RECORD SYSTEMS. 263 00:11:16,037 --> 00:11:17,671 AND THE LOCAL POPULATION, THEY 264 00:11:17,671 --> 00:11:21,275 ARE NOT WELL CALIBRATED, I'LL 265 00:11:21,275 --> 00:11:22,576 GIVE EXAMPLES SUBSEQUENTLY. 266 00:11:22,576 --> 00:11:25,813 I SEE WE HAVE A HUGE OPPORTUNITY 267 00:11:25,813 --> 00:11:27,748 HERE IN MEDICAL A.I. 268 00:11:27,748 --> 00:11:30,818 I PUT THIS AS ONE IS THE 269 00:11:30,818 --> 00:11:32,119 OPPORTUNITY TO DO NO HARM 270 00:11:32,119 --> 00:11:36,023 BECAUSE UNFORTUNATELY IF MEDICAL 271 00:11:36,023 --> 00:11:37,291 A.I. IS USED INAPPROPRIATELY IT 272 00:11:37,291 --> 00:11:39,126 CAN CAUSE HARM. 273 00:11:39,126 --> 00:11:41,529 TWO, TO PROMOTE HEALTH EQUITY. 274 00:11:41,529 --> 00:11:43,431 AND WE'LL SEE WAYS IN WHICH WE 275 00:11:43,431 --> 00:11:44,665 CAN DO THAT. 276 00:11:44,665 --> 00:11:47,902 SO, FIRST I WANTED TO JUST GIVE 277 00:11:47,902 --> 00:11:50,671 A VERY BRIEF OVERVIEW OF MEDICAL 278 00:11:50,671 --> 00:11:51,539 A.I. 279 00:11:51,539 --> 00:11:54,442 WHAT'S A.I. VERSUS REGRESSION 280 00:11:54,442 --> 00:11:54,842 MODELS. 281 00:11:54,842 --> 00:11:55,876 SO IMAGINE YOU HAVE LIKE HERE 282 00:11:55,876 --> 00:12:00,114 JUST TO GIVE AN EXAMPLE, TWO 283 00:12:00,114 --> 00:12:01,682 VARIABLES, AGE AND CERTAIN 284 00:12:01,682 --> 00:12:02,483 LABORATORY VALUE. 285 00:12:02,483 --> 00:12:07,621 AND YOU WANT TO SEPARATE THE 286 00:12:07,621 --> 00:12:10,057 GREEN CASES, FOR EXAMPLE, THAT 287 00:12:10,057 --> 00:12:14,028 DID NOT SURVIVE AND RED ONES 288 00:12:14,028 --> 00:12:16,163 THAT DID SURVIVE. 289 00:12:16,163 --> 00:12:18,966 CAN YOU USE REGRESSION MODEL, 290 00:12:18,966 --> 00:12:22,403 HERE LOGISTIC MODEL, THE VALUES 291 00:12:22,403 --> 00:12:23,904 OF INPUTS, COEFFICIENTS USED AND 292 00:12:23,904 --> 00:12:25,940 THEN YOU RUN THROUGH A FUNCTION 293 00:12:25,940 --> 00:12:28,175 AND IN THIS CASE LOGISTIC 294 00:12:28,175 --> 00:12:29,376 FUNCTION, AND YOU GET A 295 00:12:29,376 --> 00:12:31,879 PROBABILITY OF THE OUTCOME. 296 00:12:31,879 --> 00:12:33,681 SO THE PROBABILITY COMING OUT OF 297 00:12:33,681 --> 00:12:33,881 THAT. 298 00:12:33,881 --> 00:12:37,418 SO THIS IS A TWO-DIMENSIONAL 299 00:12:37,418 --> 00:12:39,320 EXAMPLE, AND YOU GET THIS 300 00:12:39,320 --> 00:12:42,423 CONTINUOUS OUTPUT FOR AN OUTCOME 301 00:12:42,423 --> 00:12:44,925 THAT IS EITHER GREEN OR RED. 302 00:12:44,925 --> 00:12:47,795 SO BINARY OUTPUT. 303 00:12:47,795 --> 00:12:50,931 BUT YOU GET A PROBABILITY OF, 304 00:12:50,931 --> 00:12:53,000 FOR EXAMPLE, OF GREEN. 305 00:12:53,000 --> 00:12:55,436 NOW, STATISTICAL LEARNING SUCH 306 00:12:55,436 --> 00:12:56,971 AS LOGISTIC REGRESSION IS GOOD 307 00:12:56,971 --> 00:12:58,272 FOR INTERPRETATION, WORKS WELL 308 00:12:58,272 --> 00:13:01,008 FOR CERTAIN PROBLEMS, HAS VERY 309 00:13:01,008 --> 00:13:02,643 FEW PARAMETERS, EASY TO COMPUTE, 310 00:13:02,643 --> 00:13:06,013 AND IT'S EASY TO INTERPRET. 311 00:13:06,013 --> 00:13:14,488 HERE AN EXAMPLE WITH THREE 312 00:13:14,488 --> 00:13:17,024 VARIABLES AND SO ON. 313 00:13:17,024 --> 00:13:18,893 DIFFERENT STATISTICAL SOFTWARE 314 00:13:18,893 --> 00:13:21,395 MAY GIVE SLIGHTLY DIFFERENT 315 00:13:21,395 --> 00:13:23,197 COEFFICIENTS BUT TYPICALLY IT'S 316 00:13:23,197 --> 00:13:23,597 VERY SIMILAR. 317 00:13:23,597 --> 00:13:26,000 ONCE THEY ARE TRAINED, THAT IS 318 00:13:26,000 --> 00:13:28,569 ONCE THE MODEL IS FIT, THE 319 00:13:28,569 --> 00:13:32,039 PARAMETERS DO NOT CHANGE, WILL 320 00:13:32,039 --> 00:13:34,675 ALWAYS GENERATE THE SAME OUTPUT, 321 00:13:34,675 --> 00:13:38,445 SO REPRODUCIBILITY IS PERFECT. 322 00:13:38,445 --> 00:13:39,680 SOME MODELS ARE PROPRIETARY, 323 00:13:39,680 --> 00:13:43,184 HOWEVER, BUT THEY ARE ALL 324 00:13:43,184 --> 00:13:45,586 INTERPRETABLE WHICH FACTORS ARE 325 00:13:45,586 --> 00:13:46,554 PROTECTIVE, MEANING WHICH ONES 326 00:13:46,554 --> 00:13:48,055 DECREASE THE RISK OF DISEASE, 327 00:13:48,055 --> 00:13:49,623 WHICH ONES INCREASE THE RISK OF 328 00:13:49,623 --> 00:13:53,227 DISEASE, THIS IS VERY WELL 329 00:13:53,227 --> 00:13:53,561 KNOWN. 330 00:13:53,561 --> 00:13:55,763 THE MODELS ARE ONLY AS GOOD AS 331 00:13:55,763 --> 00:13:57,898 DATA THEY ARE TRAINED ON, 332 00:13:57,898 --> 00:13:59,633 HOWEVER, AND I'LL TALK A LITTLE 333 00:13:59,633 --> 00:14:04,171 BIT MORE ABOUT THAT. 334 00:14:04,171 --> 00:14:07,575 EXAMPLES HERE VERY POPULAR 335 00:14:07,575 --> 00:14:08,509 FINDING CARDIOVASCULAR RISK 336 00:14:08,509 --> 00:14:11,478 MODEL, AS I MENTIONED BEFORE. 337 00:14:11,478 --> 00:14:13,714 SO WHY NEURAL NETWORKS CAME UP 338 00:14:13,714 --> 00:14:15,249 IS BECAUSE THERE ARE CERTAIN 339 00:14:15,249 --> 00:14:16,917 PROBLEMS BETTER SERVES, HAVING 340 00:14:16,917 --> 00:14:18,719 MORE PARAMETERS, AND HAVING A 341 00:14:18,719 --> 00:14:20,221 DIFFERENT TYPE OF MODELS THAT, 342 00:14:20,221 --> 00:14:22,856 FOR EXAMPLE, CAN GIVE THE 343 00:14:22,856 --> 00:14:25,492 SEPARATION THAT IS SHOWN IN THE 344 00:14:25,492 --> 00:14:27,561 FIGURE HERE, FOR PROBLEMS IN 345 00:14:27,561 --> 00:14:33,067 WHICH THERE IS NOT -- LINEAR 346 00:14:33,067 --> 00:14:35,002 OPERATION IS NOT GOOD. 347 00:14:35,002 --> 00:14:36,036 TO ILLUSTRATE AN ARTIFICIAL 348 00:14:36,036 --> 00:14:37,638 NEURAL NETWORK OF THE TYPE OF 349 00:14:37,638 --> 00:14:40,874 THE '90s, FOR EXAMPLE, WAS 350 00:14:40,874 --> 00:14:42,743 MORE SHALLOW, HAD ONLY ONE 351 00:14:42,743 --> 00:14:43,711 HIDDEN LAYER. 352 00:14:43,711 --> 00:14:48,682 THE WAY IT WORKS IS HIM -- 353 00:14:48,682 --> 00:14:50,651 SIMILAR, IF YOU THINK OF LAYERS 354 00:14:50,651 --> 00:14:54,021 OF LOGISTIC MODELS, PUT IT ALL 355 00:14:54,021 --> 00:14:57,191 TOGETHER, A MORE COMPLEX, MORE 356 00:14:57,191 --> 00:14:58,492 PARAMETERS, AND PERFORM HERE THE 357 00:14:58,492 --> 00:15:01,996 VERY SAME FUNCTION WHICH IS TO 358 00:15:01,996 --> 00:15:04,665 ESTIMATE BINARY OUTCOME BY 359 00:15:04,665 --> 00:15:08,369 GIVING A CONTINUOUS OUTCOME 360 00:15:08,369 --> 00:15:10,571 BETWEEN ZERO AND ONE, LIKE 60% 361 00:15:10,571 --> 00:15:16,610 PROBABILITY OF DYING, FOR 362 00:15:16,610 --> 00:15:17,444 EXAMPLE. 363 00:15:17,444 --> 00:15:19,179 NEWLY NETWORKS WORK WELL, BETTER 364 00:15:19,179 --> 00:15:21,348 FOR SOME TIMES OF PROBLEMS, HAVE 365 00:15:21,348 --> 00:15:23,884 MORE PARAMETERS TO LEARN DATA, 366 00:15:23,884 --> 00:15:24,618 REQUIRES MORE COMPUTATIONAL. 367 00:15:24,618 --> 00:15:27,321 THEY ARE WAY MORE FLEXIBLE BUT 368 00:15:27,321 --> 00:15:28,956 WAY COMPUTATIONALLY MORE 369 00:15:28,956 --> 00:15:29,890 EXPENSIVE THAN REGRESSION 370 00:15:29,890 --> 00:15:30,557 MODELS. 371 00:15:30,557 --> 00:15:32,960 THEY ARE LESS INTERPRETABLE AS 372 00:15:32,960 --> 00:15:33,494 WELL. 373 00:15:33,494 --> 00:15:35,729 BUT THEY ARE REPRODUCIBLE WHICH 374 00:15:35,729 --> 00:15:37,231 IS SOMETHING THAT SOMETIMES 375 00:15:37,231 --> 00:15:40,000 PEOPLE DON'T UNDERSTAND. 376 00:15:40,000 --> 00:15:42,469 IT'S WHEN YOU CALCULATE THOSE 377 00:15:42,469 --> 00:15:45,406 WEIGHTS OR COEFFICIENTS, YOU CAN 378 00:15:45,406 --> 00:15:46,874 FIX THEM, AND THEN YOU CAN 379 00:15:46,874 --> 00:15:48,208 REPRODUCE YOUR RESULTS. 380 00:15:48,208 --> 00:15:53,647 IT'S NOT THAT IT'S CONSTANTLY 381 00:15:53,647 --> 00:15:54,415 BEING RETRAINED. 382 00:15:54,415 --> 00:15:56,083 THAT'S UPON YOU TO CONTROL. 383 00:15:56,083 --> 00:15:58,919 JUST WANTED TO DEMYSTIFY THAT. 384 00:15:58,919 --> 00:15:59,820 AGAIN, THE DEEP LEARNING THAT 385 00:15:59,820 --> 00:16:03,590 CAME BACK AT THE BEGINNING OF 386 00:16:03,590 --> 00:16:07,161 THIS CENTURY IS NOTHING BUT MANY 387 00:16:07,161 --> 00:16:10,464 MORE LAYERS OF NEURAL OR NODES, 388 00:16:10,464 --> 00:16:13,767 ALSO MANY MORE NODES, SO YOU 389 00:16:13,767 --> 00:16:17,604 HAVE THOUSANDS OF PARAMETERS IN 390 00:16:17,604 --> 00:16:19,573 NEW MODELS OR LARGE LANGUAGE 391 00:16:19,573 --> 00:16:22,009 MODELS MAY HAVE MILLIONS OF 392 00:16:22,009 --> 00:16:25,679 PARAMETERS. 393 00:16:25,679 --> 00:16:26,480 SO COMPUTATIONAL IT'S VERY HARD 394 00:16:26,480 --> 00:16:28,182 TO DO. 395 00:16:28,182 --> 00:16:32,853 IT TAKES A LONG TIME. 396 00:16:32,853 --> 00:16:34,822 REQUIRES CERTAIN TYPES OF 397 00:16:34,822 --> 00:16:36,557 HARDWARE, SOMETIMES, IN ORDER TO 398 00:16:36,557 --> 00:16:39,626 GET THE NETWORK TO BE TRAINED. 399 00:16:39,626 --> 00:16:43,764 THE WAY IT'S TRAINED IS THE SAME 400 00:16:43,764 --> 00:16:45,566 AS PROPAGATION, THAT PROPAGATION 401 00:16:45,566 --> 00:16:48,035 YOU START MANY TIMES RANDOM OR 402 00:16:48,035 --> 00:16:51,138 START WITH A SET OF WEIGHTS, 403 00:16:51,138 --> 00:16:52,373 TRANSFER LEARNING, ADJUST THOSE 404 00:16:52,373 --> 00:16:54,842 UNTIL IT FITS THE DATA THAT YOU 405 00:16:54,842 --> 00:16:58,579 HAVE AT HAND. 406 00:16:58,579 --> 00:17:00,114 SO, AGAIN, THERE'S NO MAGIC IN 407 00:17:00,114 --> 00:17:02,249 A.I. MODELS OF THIS KIND. 408 00:17:02,249 --> 00:17:04,184 THEY ARE CLASSIFIERS. 409 00:17:04,184 --> 00:17:05,819 THEY ARE JUST ADJUSTING THE 410 00:17:05,819 --> 00:17:07,488 PARAMETERS UNTIL THEY FIT THE 411 00:17:07,488 --> 00:17:12,459 DATA THEY ARE PRESENTED WITH. 412 00:17:12,459 --> 00:17:16,163 NOW, I ALSO CAN SAY NEURAL 413 00:17:16,163 --> 00:17:18,198 NETWORKS ARE NOT THE ONLY A.I. 414 00:17:18,198 --> 00:17:20,167 TYPE OF METHODS. 415 00:17:20,167 --> 00:17:25,239 ANOTHER ONE CALLED RANDOM FOREST 416 00:17:25,239 --> 00:17:26,940 DERIVES ITSELF FROM 417 00:17:26,940 --> 00:17:28,675 CLASSIFICATION TREES, IN 418 00:17:28,675 --> 00:17:29,576 PICTURES THEY DO ORTHOGONAL 419 00:17:29,576 --> 00:17:31,412 SEPARATIONS OF DATA AND KEEP 420 00:17:31,412 --> 00:17:33,981 DOING IT, SO HERE FIRST YOU 421 00:17:33,981 --> 00:17:36,450 DIVIDE INTO TWO, AND THEN AMONG 422 00:17:36,450 --> 00:17:38,786 THOSE TWO YOU DIVIDE 423 00:17:38,786 --> 00:17:41,388 ORTHOGONALLY, IN EACH QUADRANT 424 00:17:41,388 --> 00:17:42,389 YOU KEEP DOING THE SAME UNTIL 425 00:17:42,389 --> 00:17:45,426 YOU FIT THE DATA YOU HAVE. 426 00:17:45,426 --> 00:17:48,095 SO THESE ARE CLASSIFICATION 427 00:17:48,095 --> 00:17:48,429 TREES. 428 00:17:48,429 --> 00:17:49,830 YOU START PARTITIONS, ACCORDING 429 00:17:49,830 --> 00:17:51,064 TO ONE VARIABLE, THEN ACCORDING 430 00:17:51,064 --> 00:17:53,434 TO THE SECOND VARIABLE. 431 00:17:53,434 --> 00:17:54,968 IN A CLASSIFICATION TREES, THEY 432 00:17:54,968 --> 00:17:57,204 ARE SIMPLE AND INTERPRETABLE. 433 00:17:57,204 --> 00:17:59,706 HOWEVER RANDOM FOREST, IT'S 434 00:17:59,706 --> 00:18:01,575 SEVERAL OF THOSE TREES, 435 00:18:01,575 --> 00:18:02,643 CONSIDERED TOGETHER. 436 00:18:02,643 --> 00:18:05,679 HIGHLY ADAPTABLE, WAY MORE 437 00:18:05,679 --> 00:18:05,946 FLEXIBLE. 438 00:18:05,946 --> 00:18:07,381 BUT NO LONGER INTERPRETABLE AS 439 00:18:07,381 --> 00:18:08,849 THE CLASSIFICATION TREES WERE. 440 00:18:08,849 --> 00:18:11,285 SO THOSE ARE A.I. MODELS, AGAIN 441 00:18:11,285 --> 00:18:16,156 ALONG THE SAME LINES, MORE 442 00:18:16,156 --> 00:18:18,158 FLEXIBILITY, LESS 443 00:18:18,158 --> 00:18:18,559 INTERPRETABILITY. 444 00:18:18,559 --> 00:18:24,164 SO, DATA-DRIVEN MODELS AND 445 00:18:24,164 --> 00:18:24,965 MACHINE LEARNING IN STATISTICAL 446 00:18:24,965 --> 00:18:27,468 MODELS, THERE ARE HUNDREDS OF 447 00:18:27,468 --> 00:18:29,036 PREDICTIVE MODELS FOR DIAGNOSIS 448 00:18:29,036 --> 00:18:31,438 AND PROGNOSIS OF DISEASE. 449 00:18:31,438 --> 00:18:31,972 CLASSIFICATION CAN PERFORM 450 00:18:31,972 --> 00:18:33,040 HUMANS, FOR EXAMPLE, BUT THE 451 00:18:33,040 --> 00:18:34,608 WHOLE TRICK IS HOW YOU PRESENT 452 00:18:34,608 --> 00:18:37,544 THIS DATA AND HOW YOU FORMULATE 453 00:18:37,544 --> 00:18:39,713 SO THAT MODEL CAN FIT ITS 454 00:18:39,713 --> 00:18:42,783 PARAMETERS TO YOUR DATA. 455 00:18:42,783 --> 00:18:44,518 CLASSIFIERS SUCH AS THE ONES I 456 00:18:44,518 --> 00:18:47,287 PRESENTED ARE THE MOST COMMON 457 00:18:47,287 --> 00:18:48,822 TIME, DISCRIMINATIVE MODELS THAT 458 00:18:48,822 --> 00:18:51,558 DISCRIMINATE FROM ONE CLASS TO 459 00:18:51,558 --> 00:18:52,759 ANOTHER CLASS. 460 00:18:52,759 --> 00:18:55,362 NOW, I COULD NOT GIVE THIS TALK 461 00:18:55,362 --> 00:18:57,931 WITHOUT SAYING A LITTLE THING 462 00:18:57,931 --> 00:19:00,067 ABOUT GENERATIVE MODELS. 463 00:19:00,067 --> 00:19:01,301 GENERATIVE MODELS ARE DIFFERENT. 464 00:19:01,301 --> 00:19:03,170 SUPPOSE YOU HAVE A SOURCE HERE 465 00:19:03,170 --> 00:19:08,175 ON THE TOP, AND YOU WANT TO 466 00:19:08,175 --> 00:19:08,809 GENERATE FAKE DATA, ARTIFICIAL 467 00:19:08,809 --> 00:19:12,579 DATA, OUT OF THE SOURCE. 468 00:19:12,579 --> 00:19:15,249 WHAT YOU DO IS THIS WAS DONE IN 469 00:19:15,249 --> 00:19:18,819 PART IN RESPONSE TO SAYING, 470 00:19:18,819 --> 00:19:21,154 WELL, NEURAL NETWORKS, THOSE 471 00:19:21,154 --> 00:19:22,122 MODELS ARE NOT CREATIVE, THEY 472 00:19:22,122 --> 00:19:24,825 CAN NOT GENERATE ANYTHING NEW. 473 00:19:24,825 --> 00:19:25,826 WELL, GENERATIVE MODELS CAN. 474 00:19:25,826 --> 00:19:27,928 THE WAY THEY DO IT, THEY HAVE 475 00:19:27,928 --> 00:19:30,864 TWO COMPONENTS NOW. 476 00:19:30,864 --> 00:19:32,966 ONE IS GENERATOR THAT PREDICTS 477 00:19:32,966 --> 00:19:33,467 FAKE DATA. 478 00:19:33,467 --> 00:19:36,436 THE OTHER ONE IS DISCRIMINATOR 479 00:19:36,436 --> 00:19:37,471 NETWORK, EXACTLY THE CLASSIFIER 480 00:19:37,471 --> 00:19:40,807 THAT I WAS TALKING TO YOU ABOUT 481 00:19:40,807 --> 00:19:42,442 BEFORE. 482 00:19:42,442 --> 00:19:45,412 SO, AGAIN, JUST QUICKLY, 483 00:19:45,412 --> 00:19:46,480 TRAINING A GENERATIVE 484 00:19:46,480 --> 00:19:47,514 ADVERSARIAL NETWORK AGAIN, WHICH 485 00:19:47,514 --> 00:19:52,553 IS USED IN LARGE LANGUAGE MODELS 486 00:19:52,553 --> 00:19:54,821 THAT CHAT GPT CONSISTS OF, REAL 487 00:19:54,821 --> 00:19:57,858 DATA, YOU BUILD A GENERATOR THAT 488 00:19:57,858 --> 00:19:59,259 CREATES ARTIFICIAL DATA, THEN 489 00:19:59,259 --> 00:20:00,861 YOU THROW REAL DATA AND 490 00:20:00,861 --> 00:20:02,930 ARTIFICIAL DATA INTO A 491 00:20:02,930 --> 00:20:04,998 CLASSIFIER, IT CLASSIFIES INTO 492 00:20:04,998 --> 00:20:07,267 REAL OR ARTIFICIAL. 493 00:20:07,267 --> 00:20:11,138 SO THE GOAL HERE IS IF YOU CAN 494 00:20:11,138 --> 00:20:14,508 CLEARLY CLASSIFY TO REAL AND 495 00:20:14,508 --> 00:20:15,809 ARTIFICIAL YOU GIVE FEEDBACK TO 496 00:20:15,809 --> 00:20:17,744 GENERATOR AND SAID, WELL, YOU'RE 497 00:20:17,744 --> 00:20:20,314 NOT DOING REALISTIC ENOUGH DATA. 498 00:20:20,314 --> 00:20:23,717 AND THEN THE GENERATOR ITERATES, 499 00:20:23,717 --> 00:20:25,552 CREATES NEW TYPES OF ARTIFICIAL 500 00:20:25,552 --> 00:20:27,454 DATA, YOU TRY THIS UNTIL 501 00:20:27,454 --> 00:20:29,723 CLASSIFIER CAN NO LONGER TELL 502 00:20:29,723 --> 00:20:31,625 THE DIFFERENCE, AND THAT'S WHEN 503 00:20:31,625 --> 00:20:34,528 YOU TRAINED THIS GENERATOR IN A 504 00:20:34,528 --> 00:20:36,496 WAY THAT PRODUCED FAKE DATA 505 00:20:36,496 --> 00:20:38,699 THAT'S REAL ENOUGH. 506 00:20:38,699 --> 00:20:41,835 SO, THIS IS ABOUT TRAINING 507 00:20:41,835 --> 00:20:42,469 GENERATIVE ADVERSARIAL NETWORK 508 00:20:42,469 --> 00:20:47,608 THAT, AGAIN, HAS MILLIONS OF 509 00:20:47,608 --> 00:20:48,442 PARAMETERS AND COMPUTATIONAL 510 00:20:48,442 --> 00:20:49,610 VERY EXPENSIVE. 511 00:20:49,610 --> 00:20:51,712 HOWEVER, ONCE IT'S DONE, IT'S A 512 00:20:51,712 --> 00:20:52,713 SET OF MULTIPLICATIONS YOU HAVE 513 00:20:52,713 --> 00:20:54,381 TO DO. 514 00:20:54,381 --> 00:20:59,753 SO USING IT AGAIN OR USING A 515 00:20:59,753 --> 00:21:00,921 GENERATIVE ADVERSARIAL NETWORK 516 00:21:00,921 --> 00:21:01,755 LOCALLY, FOR EXAMPLE, 517 00:21:01,755 --> 00:21:04,424 CREATING -- USING A LARGE 518 00:21:04,424 --> 00:21:06,860 LANGUAGE MODEL LOCALLY, ALL YOU 519 00:21:06,860 --> 00:21:09,563 NEED IS A GENERATOR THAT 520 00:21:09,563 --> 00:21:11,832 COMMENTS ALREADY PACKAGED, AND 521 00:21:11,832 --> 00:21:14,768 THEN YOU USE YOUR DATA TO 522 00:21:14,768 --> 00:21:16,870 GENERATE NEW DATA, IN THIS CASE 523 00:21:16,870 --> 00:21:19,673 HERE IN CLINICAL MEDICINE IT 524 00:21:19,673 --> 00:21:21,975 COULD BE BULLET POINTS ABOUT A 525 00:21:21,975 --> 00:21:24,111 PATIENT, FOR EXAMPLE, OR 526 00:21:24,111 --> 00:21:27,848 RESPONSE TO A PATIENT REQUEST. 527 00:21:27,848 --> 00:21:29,816 AND THE GENERATOR CREATES TEXT 528 00:21:29,816 --> 00:21:32,653 THAT ANSWERS THAT QUESTION, AND 529 00:21:32,653 --> 00:21:35,756 THE ARTIFICIAL DATA IS CREATED. 530 00:21:35,756 --> 00:21:37,791 THE PHYSICIAN VETS AND SAYS, 531 00:21:37,791 --> 00:21:40,160 YES, IT'S CORRECT, OR, NO, IT'S 532 00:21:40,160 --> 00:21:40,627 NOT CORRECT. 533 00:21:40,627 --> 00:21:42,829 AND THEN IT RETURNS OR NOT TO 534 00:21:42,829 --> 00:21:43,830 THE PATIENT. 535 00:21:43,830 --> 00:21:46,366 NOW, PEOPLE HAVE BEEN SAYING, 536 00:21:46,366 --> 00:21:48,568 WELL, BUT THEY HAVE -- 537 00:21:48,568 --> 00:21:50,003 GENERATORS HAVE HALLUCINATIONS 538 00:21:50,003 --> 00:21:55,042 AND THEY COME WITH VERY 539 00:21:55,042 --> 00:21:55,876 UNFEASIBLE DATA. 540 00:21:55,876 --> 00:21:57,978 IT'S EXACTLY WHAT THEY WERE 541 00:21:57,978 --> 00:21:58,645 CREATED FOR. 542 00:21:58,645 --> 00:22:00,814 REMEMBER, THEY ARE -- THEY WERE 543 00:22:00,814 --> 00:22:02,649 CREATED TO GENERATE NEW DATA, SO 544 00:22:02,649 --> 00:22:07,821 I THINK IT'S SOMEWHAT UNFAIR TO 545 00:22:07,821 --> 00:22:09,256 SAY IT'S HALLUCINATIONS, WHAT IT 546 00:22:09,256 --> 00:22:11,124 REALLY IS THEY WERE NOT TRAINED 547 00:22:11,124 --> 00:22:12,526 ENOUGH AND THEY WERE NOT TRAINED 548 00:22:12,526 --> 00:22:15,495 IN YOUR TYPE OF DATA IN ORDER 549 00:22:15,495 --> 00:22:17,664 NOT TO GENERATE SOMETHING THAT 550 00:22:17,664 --> 00:22:18,999 WASN'T USEFUL. 551 00:22:18,999 --> 00:22:21,201 SO, AGAIN, THIS FEEDBACK LOOP, 552 00:22:21,201 --> 00:22:22,703 ONCE YOU STOP TRAINING, IT 553 00:22:22,703 --> 00:22:24,571 DOESN'T EXIST. 554 00:22:24,571 --> 00:22:27,841 SO THAT'S A PROBLEM BECAUSE IF 555 00:22:27,841 --> 00:22:29,843 YOU WANT A LARGE LANGUAGE MODEL 556 00:22:29,843 --> 00:22:32,479 TO BE TRAINING YOUR DATA, AGAIN, 557 00:22:32,479 --> 00:22:34,648 THE COMPUTATIONAL EXPENSE WILL 558 00:22:34,648 --> 00:22:36,983 BE HIGH, AND, YOU KNOW, IT'S NOT 559 00:22:36,983 --> 00:22:38,418 SOMETHING EASY TO DO. 560 00:22:38,418 --> 00:22:41,488 BUT IF YOU WANT IT TO BE TRAINED 561 00:22:41,488 --> 00:22:42,989 TOGETHER WITH OTHER DATA, YOU 562 00:22:42,989 --> 00:22:47,761 MIGHT NEED TO SUBMIT YOUR REAL 563 00:22:47,761 --> 00:22:49,596 DATA ELSEWHERE WHICH ALSO MIGHT 564 00:22:49,596 --> 00:22:52,199 CREATE A PRIVACY ISSUE. 565 00:22:52,199 --> 00:22:54,835 IN TERMS OF DEMYSTIFYING MODELS 566 00:22:54,835 --> 00:22:57,604 THERE ARE REASONS TO USE NEURAL 567 00:22:57,604 --> 00:23:04,644 NETWORKS AND REGRESSION, AGAIN 568 00:23:04,644 --> 00:23:05,979 LARGE LANGUAGE MODELS WITHOUT 569 00:23:05,979 --> 00:23:08,782 LOCAL TRAINING ARE OF CONCERN. 570 00:23:08,782 --> 00:23:10,517 ONCE IT'S CALCULATED, FIXED, IT 571 00:23:10,517 --> 00:23:11,585 IS REPRODUCIBLE. 572 00:23:11,585 --> 00:23:14,855 SO A.I. MODELS ARE REPRODUCIBLE. 573 00:23:14,855 --> 00:23:19,126 LET ME TALK A LITTLE BIT ABOUT 574 00:23:19,126 --> 00:23:19,993 PERFORMANCE MEASURES YOU SHOULD 575 00:23:19,993 --> 00:23:21,895 BE AWARE FOR RESPONSIBLE USE OF 576 00:23:21,895 --> 00:23:23,697 THOSE MODELS BECAUSE MODEL CAN 577 00:23:23,697 --> 00:23:27,267 BE A MEDICAL DEVICE. 578 00:23:27,267 --> 00:23:28,802 SO, WILL CLINICIANS USE IT, HOW 579 00:23:28,802 --> 00:23:30,504 WILL THEY USE IT, IS THERE A 580 00:23:30,504 --> 00:23:34,074 CONTROL GROUP BEFORE YOU FULLY 581 00:23:34,074 --> 00:23:35,242 IMPLEMENT THAT IN CLINIC, WHEN 582 00:23:35,242 --> 00:23:38,445 WILL THE MODEL BE EVALUATED OR 583 00:23:38,445 --> 00:23:40,413 REEVALUATE AND HOW. 584 00:23:40,413 --> 00:23:42,182 PERSONALIZED MEDICINE, A LOT OF 585 00:23:42,182 --> 00:23:45,986 IT WILL DEPEND ON HOW GOOD THIS 586 00:23:45,986 --> 00:23:47,387 PREDICTIVE MODELS ARE. 587 00:23:47,387 --> 00:23:51,224 AND I GAVE THE EXAMPLE OF ACUTE 588 00:23:51,224 --> 00:23:52,859 KIDNEY INJURY BEING A MAJOR 589 00:23:52,859 --> 00:23:55,295 CAUSE OF MORBIDITY AND 590 00:23:55,295 --> 00:23:56,163 MORTALITY. 591 00:23:56,163 --> 00:23:59,599 THERE ARE HUNDREDS OF PREDICTION 592 00:23:59,599 --> 00:24:01,501 MODELS FOR AKI OUT THERE, COMING 593 00:24:01,501 --> 00:24:04,704 FROM ALL OVER THE PLACE. 594 00:24:04,704 --> 00:24:05,672 MANY PREDICTIVE MODELS EXIST, 595 00:24:05,672 --> 00:24:09,042 SOME ARE DEEP, SOME ARE NOT. 596 00:24:09,042 --> 00:24:12,112 HOW THE MODEL IS EVALUATED. 597 00:24:12,112 --> 00:24:16,049 SO THIS IS SOMETHING, A MODEL WE 598 00:24:16,049 --> 00:24:19,553 PRODUCED USING UCSD DATA WHILE 599 00:24:19,553 --> 00:24:21,054 AGO, WITH NEPHROLOGISTS. 600 00:24:21,054 --> 00:24:23,190 IF YOU'RE FAMILIAR WITH 601 00:24:23,190 --> 00:24:26,226 PERFORMANCE METRICS, AND AREA 602 00:24:26,226 --> 00:24:28,161 UNDER THE CURVE, 81 IS 603 00:24:28,161 --> 00:24:30,163 REASONABLE, WITHIN THE 604 00:24:30,163 --> 00:24:31,498 CONFIDENCE INTERVALS. 605 00:24:31,498 --> 00:24:33,033 PRESENTED HERE IS REASONABLE, SO 606 00:24:33,033 --> 00:24:36,636 THIS MODEL COULD BE USED IN 607 00:24:36,636 --> 00:24:36,903 PRACTICE. 608 00:24:36,903 --> 00:24:39,372 HOWEVER, WHEN WE WERE VALIDATING 609 00:24:39,372 --> 00:24:44,110 IT WITH A MAYO CLINIC COHORT, WE 610 00:24:44,110 --> 00:24:45,278 NOTED A PARTICULAR PROBLEM. 611 00:24:45,278 --> 00:24:48,181 IN HERE I'LL EXPLAIN WHY, BUT 612 00:24:48,181 --> 00:24:49,349 YOU WANT THE ACTUAL PROBABILITY 613 00:24:49,349 --> 00:24:54,354 FOR A GROUP OF PATIENTS TO BE 614 00:24:54,354 --> 00:24:56,790 ALONG THE 45-DEGREE LINE THAT IS 615 00:24:56,790 --> 00:24:58,758 ITS CLOSE TO ESTIMATED AVERAGE 616 00:24:58,758 --> 00:24:59,693 PROBABILITY FOR THAT GROUP. 617 00:24:59,693 --> 00:25:03,763 YOU SEE, FOR EXAMPLE, IF I GO 618 00:25:03,763 --> 00:25:06,466 HERE TO THE ESTIMATED 619 00:25:06,466 --> 00:25:08,969 PROBABILITY OF 0.4, 40% CHANCE 620 00:25:08,969 --> 00:25:13,206 OF HAVING AKI FOR A PARTICULAR 621 00:25:13,206 --> 00:25:17,844 GROUP, AT THE UCSD SAMPLE IT WAS 622 00:25:17,844 --> 00:25:19,379 FINE, RELATIVELY CLOSE, 623 00:25:19,379 --> 00:25:21,047 45-DEGREE LINE, WHEREAS AT THE 624 00:25:21,047 --> 00:25:23,750 MAYO CLINIC THIS ACTUAL 625 00:25:23,750 --> 00:25:25,585 PROBABILITY WAS ACTUALLY 626 00:25:25,585 --> 00:25:26,620 DOUBLED, 0.8. 627 00:25:26,620 --> 00:25:29,489 THEREFORE THE MODEL WAS NOT WELL 628 00:25:29,489 --> 00:25:31,558 CALIBRATED, IN GENERAL, TO THE 629 00:25:31,558 --> 00:25:35,462 MAYO COHORT, IF YOU SEE ALL THE 630 00:25:35,462 --> 00:25:39,833 POINTS BEING ON THE UPPER SIDE 631 00:25:39,833 --> 00:25:40,734 OF THE CURVE. 632 00:25:40,734 --> 00:25:43,303 HOW DO YOU EVALUATE THESE 633 00:25:43,303 --> 00:25:43,904 MODELS? 634 00:25:43,904 --> 00:25:45,272 I'LL GO QUICKLY, JUST REMIND 635 00:25:45,272 --> 00:25:48,808 YOU, YOU HAVE ON THE X-AXIS 636 00:25:48,808 --> 00:25:49,676 PREDICTION ESTIMATES AND ON THE 637 00:25:49,676 --> 00:25:51,177 Y-AXIS THE NUMBER OF PEOPLE. 638 00:25:51,177 --> 00:25:53,079 LET'S SAY YOU HAVE A POPULATION 639 00:25:53,079 --> 00:25:54,981 THAT IS NORMAL, AND A POPULATION 640 00:25:54,981 --> 00:25:56,917 WITH A PARTICULAR DISEASE. 641 00:25:56,917 --> 00:25:58,551 YOU HAVE ALMOST LIKE HAVING A 642 00:25:58,551 --> 00:26:00,120 TEST, RIGHT? 643 00:26:00,120 --> 00:26:02,789 HAVING THIS PREDICTIVE MODEL 644 00:26:02,789 --> 00:26:05,025 SAYING IF IT'S ABOVE 50% THEN I 645 00:26:05,025 --> 00:26:06,493 WILL DECLARE THE PERSON HAS 646 00:26:06,493 --> 00:26:08,929 DISEASE, IF IT'S BELOW I WILL 647 00:26:08,929 --> 00:26:11,364 DECLARE A PERSON DOES NOT HAVE 648 00:26:11,364 --> 00:26:11,932 DISEASE. 649 00:26:11,932 --> 00:26:13,333 AND WITH THAT, OF COURSE, YOU 650 00:26:13,333 --> 00:26:16,603 WILL HAVE A LOT OF TRUE 651 00:26:16,603 --> 00:26:18,571 NEGATIVES, TRUE POSITIVES, A FEW 652 00:26:18,571 --> 00:26:20,573 FALSE NEGATIVES, A FEW FALSE 653 00:26:20,573 --> 00:26:25,178 POSITIVES FOR THIS PARTICULAR 654 00:26:25,178 --> 00:26:28,915 THRESHOLD OF 0.5. 655 00:26:28,915 --> 00:26:30,283 NOW, EACH THRESHOLD WILL CREATE 656 00:26:30,283 --> 00:26:34,854 A TWO BY TWO TABLE, YOU CAN 657 00:26:34,854 --> 00:26:35,855 CALCULATE SPECIFICITY, POSITIVE 658 00:26:35,855 --> 00:26:37,357 AND NEGATIVE PREDICTIVE VALUE, 659 00:26:37,357 --> 00:26:38,258 OVERALL ACCURACY. 660 00:26:38,258 --> 00:26:40,360 YOU JUST LOOK AT TWO BY TWO 661 00:26:40,360 --> 00:26:45,966 TABLE, NOTHING LESS BUT THE REAL 662 00:26:45,966 --> 00:26:47,534 GOALS ENTERED, NO DISEASE OR 663 00:26:47,534 --> 00:26:49,569 DISEASE, AND WHAT THE ESTIMATE 664 00:26:49,569 --> 00:26:51,338 IS FOR THAT PARTICULAR 665 00:26:51,338 --> 00:26:56,042 THRESHOLD, NO DISEASE OR 666 00:26:56,042 --> 00:26:56,376 DISEASE. 667 00:26:56,376 --> 00:27:00,013 SO, KNOW THAT FOR A FACT, AND WE 668 00:27:00,013 --> 00:27:04,651 CAN DO THE CALCULATION AND SOME 669 00:27:04,651 --> 00:27:06,453 NUMBERS, SENSITIVITY OF 80%, 670 00:27:06,453 --> 00:27:07,954 SPECIFICITY OF 90, AND SO ON, 671 00:27:07,954 --> 00:27:10,390 THAT'S FOR THIS PARTICULAR 672 00:27:10,390 --> 00:27:10,657 THRESHOLD. 673 00:27:10,657 --> 00:27:13,159 NOW, YOU CAN DO ANOTHER 674 00:27:13,159 --> 00:27:14,761 THRESHOLD IN WHICH YOU'RE 675 00:27:14,761 --> 00:27:16,296 CONSERVATIVE AND WILL SAY PEOPLE 676 00:27:16,296 --> 00:27:18,198 HAVE DISEASE WITH LOWER 677 00:27:18,198 --> 00:27:21,401 THRESHOLD, AND YOU HAVE A 678 00:27:21,401 --> 00:27:22,502 SENSITIVITY OF 100% AND 679 00:27:22,502 --> 00:27:24,237 SPECIFICITY OF 80%. 680 00:27:24,237 --> 00:27:26,272 OR YOU CAN GO IN THE MIDDLE LIKE 681 00:27:26,272 --> 00:27:29,976 WE SAID OR YOU CAN GO MORE 682 00:27:29,976 --> 00:27:32,178 LIBERAL AND YOU WILL SAY I WILL 683 00:27:32,178 --> 00:27:33,813 ONLY DECLARE DISEASE IF I'M 684 00:27:33,813 --> 00:27:36,649 REALLY SURE ABOUT IT, SO HIGHER 685 00:27:36,649 --> 00:27:40,020 THRESHOLD FOR THIS ESTIMATE, 686 00:27:40,020 --> 00:27:41,221 SPECIFICITY EQUALS 100%, 687 00:27:41,221 --> 00:27:42,789 SENSITIVITY ONLY 60%. 688 00:27:42,789 --> 00:27:45,158 SO I GAVE YOU THREE THRESHOLDS 689 00:27:45,158 --> 00:27:47,694 AND THREE TWO BY TWO TABLES, 690 00:27:47,694 --> 00:27:47,894 RIGHT? 691 00:27:47,894 --> 00:27:52,165 SO IF YOU PUT THEM ALL ON A PLOT 692 00:27:52,165 --> 00:27:53,767 OF SENSITIVITY OVER ONE MINUS 693 00:27:53,767 --> 00:27:57,771 SPECIFICITY YOU HAVE AN ROC 694 00:27:57,771 --> 00:28:00,273 CURVE, NOTHING BUT THE PLOT OF 695 00:28:00,273 --> 00:28:02,442 THE TWO VALUES. 696 00:28:02,442 --> 00:28:04,010 AND ROC CURVES YOU'RE MORE USED 697 00:28:04,010 --> 00:28:07,047 TO THEY ARE MORE CONTINUOUS LIKE 698 00:28:07,047 --> 00:28:07,547 THIS. 699 00:28:07,547 --> 00:28:11,217 NOW, YOU CAN TELL FROM THIS THAT 700 00:28:11,217 --> 00:28:16,489 IF YOU HAVE 45-DEGREE LINE THAT 701 00:28:16,489 --> 00:28:18,425 MEANS NO DISCRIMINATION, NO 702 00:28:18,425 --> 00:28:20,293 VALUE IN THIS CLASSIFIER, AND IF 703 00:28:20,293 --> 00:28:23,263 YOU HAVE -- THAT COMES WITH AN 704 00:28:23,263 --> 00:28:26,566 AREA UNDER THE ROC CURVE OF 0.5. 705 00:28:26,566 --> 00:28:28,902 NOW, YOU TYPICALLY HAVE AN AREA 706 00:28:28,902 --> 00:28:32,539 THAT IS BETWEEN 0.5 AND 1, 707 00:28:32,539 --> 00:28:36,576 OTHERWISE YOUR MODEL DOESN'T DO 708 00:28:36,576 --> 00:28:37,077 ANYTHING. 709 00:28:37,077 --> 00:28:38,144 AND YOU YOU CALCULATE THINGS 710 00:28:38,144 --> 00:28:38,912 THIS WAY. 711 00:28:38,912 --> 00:28:41,881 THERE ARE MANY PAPERS THAT COME 712 00:28:41,881 --> 00:28:43,616 ONLY WITH PERFORMANCE METRIC OF 713 00:28:43,616 --> 00:28:46,286 THE AREA UNDER THE ROC CURVE 714 00:28:46,286 --> 00:28:49,456 WHICH IS A DISCRIMINATIVE 715 00:28:49,456 --> 00:28:50,123 PERFORMANCE. 716 00:28:50,123 --> 00:28:51,925 BASED ON SENSITIVITY AND 717 00:28:51,925 --> 00:28:52,425 SPECIFICITY ALONE. 718 00:28:52,425 --> 00:28:54,160 BUT THEY DON'T TELL ANYTHING 719 00:28:54,160 --> 00:28:56,029 ABOUT THE QUALITY OF THOSE 720 00:28:56,029 --> 00:28:56,429 ESTIMATES. 721 00:28:56,429 --> 00:28:58,231 THEY TALK ABOUT THE ORDER BUT 722 00:28:58,231 --> 00:28:58,965 NOT THE QUALITY. 723 00:28:58,965 --> 00:29:02,035 LET ME GIVE AN EXAMPLE OF HOW 724 00:29:02,035 --> 00:29:03,002 THIS IS. 725 00:29:03,002 --> 00:29:04,504 LET'S SAY OF THE ORIGINAL 726 00:29:04,504 --> 00:29:06,539 ESTIMATES, LET'S SAY WE ONLY 727 00:29:06,539 --> 00:29:09,309 HAVE 20 PATIENTS AND WE 728 00:29:09,309 --> 00:29:15,415 ESTIMATED A VALUE FOR THOSE WHO 729 00:29:15,415 --> 00:29:16,116 WERE ALIVE AND DECEASED, THESE 730 00:29:16,116 --> 00:29:17,350 ARE THE NUMBERS. 731 00:29:17,350 --> 00:29:19,552 I ORDERED THEM HERE AND PUT IN 732 00:29:19,552 --> 00:29:21,054 THE TWO COLUMNS. 733 00:29:21,054 --> 00:29:23,256 AN INTERPRETATION OF THE AREA 734 00:29:23,256 --> 00:29:26,126 THAT UNDER ROC CURVE IS THE 735 00:29:26,126 --> 00:29:28,695 CALCULATION OF HOW MANY TIMES IF 736 00:29:28,695 --> 00:29:31,464 I DO PAIRS OF ALIVE AN DECEASED 737 00:29:31,464 --> 00:29:34,868 PATIENTS, HOW MANY TIMES I'M 738 00:29:34,868 --> 00:29:36,002 RIGHT, MEANING THIS VALUE FOR 739 00:29:36,002 --> 00:29:38,538 ALIVE HAS TO BE SMALLER THAN A 740 00:29:38,538 --> 00:29:40,373 VALUE FOR THE DECEASED. 741 00:29:40,373 --> 00:29:41,474 IF EVERYTHING WERE PERFECT, THEN 742 00:29:41,474 --> 00:29:45,712 YOU WOULD HAVE A DIVIDING LINE, 743 00:29:45,712 --> 00:29:48,848 LET'S SAY AT 0.3, AND AFTER THAT 744 00:29:48,848 --> 00:29:50,350 EVERYONE WOULD HAVE DISEASE OR 745 00:29:50,350 --> 00:29:50,917 NOT. 746 00:29:50,917 --> 00:29:52,652 BUT AS YOU CAN SEE, THAT'S NOT 747 00:29:52,652 --> 00:29:53,887 THE CASE. 748 00:29:53,887 --> 00:29:55,388 SO, ONE INTERPRETATION IS TO 749 00:29:55,388 --> 00:29:58,091 CALCULATE THE AREA UNDER THE ROC 750 00:29:58,091 --> 00:30:04,931 CURVE, AS THE NUMBER OF 751 00:30:04,931 --> 00:30:06,733 CONCORDANT PAIRS PLUS BY 100 752 00:30:06,733 --> 00:30:07,967 PAIRS, TEN ON THIS SIDE, TEN ON 753 00:30:07,967 --> 00:30:08,635 THIS OTHER SIDE. 754 00:30:08,635 --> 00:30:11,171 IF YOU DO THAT, YOU WILL SEE 755 00:30:11,171 --> 00:30:13,239 THAT YOU WILL CALCULATE AN AREA 756 00:30:13,239 --> 00:30:16,776 UNDER THE ROC CURVE, IN THIS 757 00:30:16,776 --> 00:30:17,944 CASE 0.785. 758 00:30:17,944 --> 00:30:21,281 AND YOU SAY, WELL, THAT'S A GOOD 759 00:30:21,281 --> 00:30:21,548 MODEL. 760 00:30:21,548 --> 00:30:22,248 ACCEPTABLE MODEL. 761 00:30:22,248 --> 00:30:24,117 HERE ARE THE ESTIMATES. 762 00:30:24,117 --> 00:30:26,186 HOW DO I KNOW THESE ESTIMATES 763 00:30:26,186 --> 00:30:26,953 ARE CORRECT? 764 00:30:26,953 --> 00:30:29,789 AND THE ANSWER IS I DON'T, 765 00:30:29,789 --> 00:30:31,457 BECAUSE ROC CURVES DON'T DO 766 00:30:31,457 --> 00:30:32,058 THAT. 767 00:30:32,058 --> 00:30:34,494 IN FACT, IF YOU DIVIDE ALL 768 00:30:34,494 --> 00:30:37,130 ESTIMATES BY 10, AS I'M SAYING 769 00:30:37,130 --> 00:30:38,531 HERE, THE ORDER REMAINS THE 770 00:30:38,531 --> 00:30:41,301 SAME, THE NUMBER OF CONCORDANT 771 00:30:41,301 --> 00:30:44,103 AND DISCORDANT PAIRS REMAINS THE 772 00:30:44,103 --> 00:30:44,971 SAME. 773 00:30:44,971 --> 00:30:47,640 YET IT'S MUCH BETTER, RIGHT? 774 00:30:47,640 --> 00:30:50,143 RISK FOR DISEASE, I WOULD MUCH 775 00:30:50,143 --> 00:30:55,348 RATHER HAVE A RISK OF 2.7% THAN 776 00:30:55,348 --> 00:30:57,016 27% RISK. 777 00:30:57,016 --> 00:30:59,586 SO, THIS IS CALLED, YOU KNOW, 778 00:30:59,586 --> 00:31:02,121 DIFFERENCES IN CALIBRATION OF 779 00:31:02,121 --> 00:31:04,424 THE MODELS, THERE ARE METHODS TO 780 00:31:04,424 --> 00:31:05,792 RECALIBRATE WHICH WE'VE BEEN 781 00:31:05,792 --> 00:31:08,861 APPLYING, AND IN THIS CASE YOU 782 00:31:08,861 --> 00:31:11,364 SEE THE BEST RECALIBRATION WOULD 783 00:31:11,364 --> 00:31:13,700 BRING BAD PARTICULAR ESTIMATE, 784 00:31:13,700 --> 00:31:16,102 ALL THE WAY FROM 27 TO 78%, 785 00:31:16,102 --> 00:31:18,571 WHICH IS A HUGE DIFFERENCE. 786 00:31:18,571 --> 00:31:22,008 SO, THE WAY YOU APPLY THE MODELS 787 00:31:22,008 --> 00:31:23,376 IN CLINICAL MEDICINE, IT'S 788 00:31:23,376 --> 00:31:25,311 EXTREMELY IMPORTANT THAT THE 789 00:31:25,311 --> 00:31:26,479 MODELS BE CALCULATED. 790 00:31:26,479 --> 00:31:29,549 AND THIS CAN BE SEEN HERE THAT 791 00:31:29,549 --> 00:31:32,151 ALTHOUGH THE AREA UNDER THE ROC 792 00:31:32,151 --> 00:31:36,222 CURVE IS EXACTLY THE SAME, THERE 793 00:31:36,222 --> 00:31:37,957 ARE SIGNIFICANT DIFFERENCES IN 794 00:31:37,957 --> 00:31:39,726 TERMS OF ABSOLUTE ERROR OF THE 795 00:31:39,726 --> 00:31:41,961 ESTIMATES, AND SO ON. 796 00:31:41,961 --> 00:31:45,431 THIS IS A WHOLE NEW AREA THAT 797 00:31:45,431 --> 00:31:47,233 UNFORTUNATELY JOURNAL EDITORS 798 00:31:47,233 --> 00:31:49,636 HAVE NOT ALWAYS EMPHASIZED, BUT 799 00:31:49,636 --> 00:31:51,170 IT'S CRITICALLY IMPORTANT FOR 800 00:31:51,170 --> 00:31:54,607 CLINICAL MEDICINE. 801 00:31:54,607 --> 00:31:57,210 THERE ARE CALIBRATION INDICES IN 802 00:31:57,210 --> 00:32:00,179 THE, AGAIN, ESTIMATING THE 803 00:32:00,179 --> 00:32:01,681 PROBABILITY IS REASONABLY 804 00:32:01,681 --> 00:32:02,015 CORRECT. 805 00:32:02,015 --> 00:32:04,317 AND ONE WAY TO THINK ABOUT THIS 806 00:32:04,317 --> 00:32:07,320 IS HOW WELL DO PREDICTED 807 00:32:07,320 --> 00:32:09,255 PROBABILITIES MATCH THE ACTUAL 808 00:32:09,255 --> 00:32:11,257 PROBABILITIES, FOR EXAMPLE, 809 00:32:11,257 --> 00:32:14,627 PROGNOSIS OF CANCER AND SO ON. 810 00:32:14,627 --> 00:32:16,229 ACTUAL PROBABILITY OF METASTASIS 811 00:32:16,229 --> 00:32:19,432 IS 10% FOR A PATIENT, IF 10 OUT 812 00:32:19,432 --> 00:32:23,403 OF 100 PATIENTS JUST LIKE THIS 813 00:32:23,403 --> 00:32:24,370 DEVELOP CANCER, THE WHOLE 814 00:32:24,370 --> 00:32:26,472 PROBLEM IS TO DECIDE WHO IS JUST 815 00:32:26,472 --> 00:32:31,144 LIKE THAT PATIENT, HOW YOU 816 00:32:31,144 --> 00:32:32,912 CALCULATE THAT NEIGHBOR PATIENTS 817 00:32:32,912 --> 00:32:34,614 FOR THAT PARTICULAR CASE. 818 00:32:34,614 --> 00:32:38,084 AND THE WAY IT IS DONE, IT IS 819 00:32:38,084 --> 00:32:39,919 IMPERFECT BUT IT HAPPENS TODAY, 820 00:32:39,919 --> 00:32:44,390 IS THAT YOU SORT ALL THE SYSTEM 821 00:32:44,390 --> 00:32:46,959 ESTIMATES, AVERAGE WITHIN THAT 822 00:32:46,959 --> 00:32:48,227 GROUP, YOU COMPARE TO THE 823 00:32:48,227 --> 00:32:49,862 PROPORTION OF EVENTS THAT 824 00:32:49,862 --> 00:32:51,664 ACTUALLY HAPPENED IN THAT GROUP. 825 00:32:51,664 --> 00:32:54,701 SO ON THE RIGHT-HAND SIDE IS THE 826 00:32:54,701 --> 00:32:56,569 REAL OUTCOMES, GOLD STANDARD, ON 827 00:32:56,569 --> 00:32:59,172 THE LEFT-HAND SIDE IS THE SYSTEM 828 00:32:59,172 --> 00:32:59,806 ESTIMATES. 829 00:32:59,806 --> 00:33:03,142 THIS IS ANY A.I. SYSTEM. 830 00:33:03,142 --> 00:33:04,410 ANY STATISTICAL LEARNING SYSTEM. 831 00:33:04,410 --> 00:33:06,379 AGAIN, YOU CAN CALCULATE THIS 832 00:33:06,379 --> 00:33:08,348 WAY, AND THEN PLOT THAT 833 00:33:08,348 --> 00:33:10,249 CALIBRATION CURVE THAT I 834 00:33:10,249 --> 00:33:15,355 MENTIONED TO YOU ABOUT, AND THAT 835 00:33:15,355 --> 00:33:18,191 WAS SO DIFFERENT BETWEEN THE 836 00:33:18,191 --> 00:33:20,626 EVALUATION DONE AT UCSD VERSUS 837 00:33:20,626 --> 00:33:23,463 THE EVALUATION DONE AT MAYO 838 00:33:23,463 --> 00:33:24,130 CLINIC. 839 00:33:24,130 --> 00:33:26,966 SO, AGAIN, METHODS EXIST, NOT 840 00:33:26,966 --> 00:33:29,702 ALWAYS USED, AND IT'S CRITICALLY 841 00:33:29,702 --> 00:33:33,573 IMPORTANT TO VALIDATE THE MODEL 842 00:33:33,573 --> 00:33:34,974 IN THE POPULATION THAT IT'S 843 00:33:34,974 --> 00:33:36,609 GOING TO BE USED, BUT THAT'S NOT 844 00:33:36,609 --> 00:33:39,979 ALWAYS DONE. 845 00:33:39,979 --> 00:33:42,448 AGAIN, HERE JUST SHOWING HOW 846 00:33:42,448 --> 00:33:43,750 UNDERESTIMATION WOULD HAPPEN IF 847 00:33:43,750 --> 00:33:46,452 WE HAD APPLIED THE UCSD MODEL TO 848 00:33:46,452 --> 00:33:50,590 THE MAYO COHORT WITHOUT 849 00:33:50,590 --> 00:33:51,224 ADJUSTMENTS. 850 00:33:51,224 --> 00:33:54,327 NOW, THERE WAS A REVIEW OF 851 00:33:54,327 --> 00:33:56,662 SYSTEMATIC REVIEWS FOR MODELS OF 852 00:33:56,662 --> 00:33:57,463 ACUTE KIDNEY INJURY, AGAIN 853 00:33:57,463 --> 00:34:01,033 THERE'S SO MANY OF THEM, WHAT 854 00:34:01,033 --> 00:34:04,570 AUTHORS FOUND IS THAT IN THIS -- 855 00:34:04,570 --> 00:34:09,609 EACH ONE OF THESE REVIEWS OF 856 00:34:09,609 --> 00:34:11,277 THIS, THIS IS SYSTEMATIC, SO 857 00:34:11,277 --> 00:34:14,514 EACH OF THE SYSTEMATIC REVIEWS, 858 00:34:14,514 --> 00:34:16,416 THE MODELS PRESENTED HAVE VERY, 859 00:34:16,416 --> 00:34:18,918 VERY FEW EXTERNAL VALIDATIONS SO 860 00:34:18,918 --> 00:34:26,659 YOU TAKE THE FIRST ROW, 861 00:34:26,659 --> 00:34:29,495 CORONARY, ANGIO PLASTIC PATIENTS 862 00:34:29,495 --> 00:34:31,798 ONLY, THERE WERE 70 PREDICTION 863 00:34:31,798 --> 00:34:32,799 MODELS WITH INTERNAL VALIDATION, 864 00:34:32,799 --> 00:34:35,868 MEAN THEIR OWN SITE THEY 865 00:34:35,868 --> 00:34:37,603 PERFORMED WELL, BUT THERE WERE 866 00:34:37,603 --> 00:34:39,572 ONLY 9 WITH EXTERNAL VALIDATION. 867 00:34:39,572 --> 00:34:41,541 SO, WHAT DOES THIS MEAN? 868 00:34:41,541 --> 00:34:44,710 IT MEANS YOU CANNOT JUST CREATE 869 00:34:44,710 --> 00:34:47,580 A MODEL IN YOUR PLACE AND NOT 870 00:34:47,580 --> 00:34:48,681 VALIDATE ELSEWHERE. 871 00:34:48,681 --> 00:34:50,216 IN FACT, THE MORE YOU CAN SHARE 872 00:34:50,216 --> 00:34:53,252 HOW YOU DID YOUR MODEL AND 873 00:34:53,252 --> 00:34:56,556 PERHAPS AGGREGATE DATA FROM 874 00:34:56,556 --> 00:34:58,458 OTHER SITES, THE BETTER IT IS IN 875 00:34:58,458 --> 00:35:01,160 EXTERNAL VALIDATION. 876 00:35:01,160 --> 00:35:02,929 AGAIN, ALWAYS NECESSARY. 877 00:35:02,929 --> 00:35:05,498 SO, NOW WE GET INTO POLYGENIC 878 00:35:05,498 --> 00:35:06,532 RISK SCORES FOR PRECISION 879 00:35:06,532 --> 00:35:06,966 MEDICINE. 880 00:35:06,966 --> 00:35:09,068 WE'LL GET TO THAT POINT. 881 00:35:09,068 --> 00:35:10,803 AND THEY ARE GOING TO ASSESS 882 00:35:10,803 --> 00:35:12,605 RISK FOR MANY DISEASES. 883 00:35:12,605 --> 00:35:14,340 CAN WE IMPROVE THOSE METHODS? 884 00:35:14,340 --> 00:35:16,843 CAN THOSE METHODS BE 885 00:35:16,843 --> 00:35:17,944 CONTINUOUSLY UPDATED? 886 00:35:17,944 --> 00:35:20,313 CAN WE DO THIS WITHOUT 887 00:35:20,313 --> 00:35:22,748 INTRODUCING MORE BIAS THAT LEADS 888 00:35:22,748 --> 00:35:24,083 TO MORE DISCRIMINATION OF 889 00:35:24,083 --> 00:35:25,618 PEOPLE, CAN WE PROTECT PRIVACY 890 00:35:25,618 --> 00:35:26,886 OF THE INDIVIDUALS? 891 00:35:26,886 --> 00:35:29,255 WHAT TO DO WITH INDIVIDUALS FROM 892 00:35:29,255 --> 00:35:29,922 MIXED ANCESTRY? 893 00:35:29,922 --> 00:35:33,159 SO A LOT OF QUESTIONS NOW, AND 894 00:35:33,159 --> 00:35:35,161 WE KNOW FROM A.I. THAT WE NEED A 895 00:35:35,161 --> 00:35:36,796 LOT OF DATA. 896 00:35:36,796 --> 00:35:38,297 HOW CAN WE DO THIS? 897 00:35:38,297 --> 00:35:40,600 WELL, WE NEED TO BUILD ACCESS TO 898 00:35:40,600 --> 00:35:41,567 LARGE REPOSITORIES TO IMPROVE 899 00:35:41,567 --> 00:35:42,468 RESEARCH. 900 00:35:42,468 --> 00:35:46,639 WE NEED TO DO THIS IN A WAY THAT 901 00:35:46,639 --> 00:35:48,174 PRESERVES PRIVACY AND AGGREGATE 902 00:35:48,174 --> 00:35:49,609 DATA FROM DIFFERENT COUNTRIES 903 00:35:49,609 --> 00:35:51,143 AND USE FOR NEW ANALYSIS. 904 00:35:51,143 --> 00:35:53,279 THAT GOES A LITTLE BIT BACK 905 00:35:53,279 --> 00:35:55,047 TOWARDS THE HOW TO PROTECT 906 00:35:55,047 --> 00:35:57,049 PRIVACY WHILE BUILDING 907 00:35:57,049 --> 00:36:00,086 EVALUATION MODELS, WHICH WE'VE 908 00:36:00,086 --> 00:36:02,054 BEEN VERY INTERESTED IN, BECAUSE 909 00:36:02,054 --> 00:36:04,023 WE USE ELECTRONIC HEALTH RECORDS 910 00:36:04,023 --> 00:36:05,691 A LOT, AND WE THINK IS IT RIGHT 911 00:36:05,691 --> 00:36:09,462 TO SHARE BECAUSE PEOPLE HAVE NOT 912 00:36:09,462 --> 00:36:10,830 ALWAYS BEEN ASKED THAT THEIR 913 00:36:10,830 --> 00:36:12,131 DATA BE USED FOR RESEARCH. 914 00:36:12,131 --> 00:36:16,936 IS IT RIGHT NOT TO SHARE? 915 00:36:16,936 --> 00:36:18,037 BECAUSE NEW DISCOVERIES DEPEND 916 00:36:18,037 --> 00:36:20,473 ON SHARING, BUILDING GOOD MODELS 917 00:36:20,473 --> 00:36:22,141 DEPENDS ON HAVING A LOT OF DATA. 918 00:36:22,141 --> 00:36:25,411 CAN WE SHARE WITHOUT MOVING DATA 919 00:36:25,411 --> 00:36:26,345 AROUND, ON FEDERATED ANALYSIS, 920 00:36:26,345 --> 00:36:28,915 IS THIS PRACTICAL, IS IT 921 00:36:28,915 --> 00:36:29,181 ACCURATE? 922 00:36:29,181 --> 00:36:31,450 SO ITEMS THAT WE HAVE LOOKED FOR 923 00:36:31,450 --> 00:36:33,920 CERTAIN TIME, AND JUST HERE TO 924 00:36:33,920 --> 00:36:35,454 GIVE THE MOTIVATION, 925 00:36:35,454 --> 00:36:36,122 COMBINATIONS OF VALUES CAN 926 00:36:36,122 --> 00:36:37,890 IDENTIFY PEOPLE SO EVEN THOUGH 927 00:36:37,890 --> 00:36:40,560 YOU MIGHT NOT ENTER BIOMETRIC 928 00:36:40,560 --> 00:36:43,596 DATA YOU HAVE UNIQUE 929 00:36:43,596 --> 00:36:49,569 COMBINATIONS, YOU CAN -- YOU 930 00:36:49,569 --> 00:36:51,571 KNOW SOMETHING ABOUT ELIZA, YOU 931 00:36:51,571 --> 00:36:52,972 KNOW A AND B, YOU GO TO THE 932 00:36:52,972 --> 00:36:56,509 TABLE OF THE STUDY AND YOU MATCH 933 00:36:56,509 --> 00:36:56,976 THAT. 934 00:36:56,976 --> 00:37:00,079 IN FACT, YOU DON'T EVEN NEED 935 00:37:00,079 --> 00:37:00,680 UNIQUENESS. 936 00:37:00,680 --> 00:37:04,150 IF ALL THE ENTRIES YOU FIND, A 937 00:37:04,150 --> 00:37:08,187 EQUALS 10, B EQUALS 20, HAVE THE 938 00:37:08,187 --> 00:37:11,924 SAME DIAGNOSIS OR IF 99% IT, SO 939 00:37:11,924 --> 00:37:12,224 ON. 940 00:37:12,224 --> 00:37:14,193 THERE ARE A LOT OF ISSUES IN 941 00:37:14,193 --> 00:37:17,263 PRIVACY THAT EVEN SO CALLED 942 00:37:17,263 --> 00:37:22,902 DE-IDENTIFIED DATA NEEDS TO BE 943 00:37:22,902 --> 00:37:24,203 EXAMINED. 944 00:37:24,203 --> 00:37:27,139 GENOME ARE THE BEST BIOMETRICS, 945 00:37:27,139 --> 00:37:27,940 SHOULD THEY BE REGULATED THE 946 00:37:27,940 --> 00:37:29,742 SAME AS OTHER IDENTIFIERS IN 947 00:37:29,742 --> 00:37:29,942 HIPAA? 948 00:37:29,942 --> 00:37:33,145 THAT'S SOMETHING WE SHOULD THINK 949 00:37:33,145 --> 00:37:33,546 ABOUT. 950 00:37:33,546 --> 00:37:35,848 THEY SHOULD BE TREATED AS 951 00:37:35,848 --> 00:37:38,384 IDENTIFIERS BECAUSE NOT ONLY 952 00:37:38,384 --> 00:37:39,785 THEY ARE GOOD IDENTIFIERS, 953 00:37:39,785 --> 00:37:40,886 BETTER THAN FINGERPRINTS AND 954 00:37:40,886 --> 00:37:42,188 OTHER THINGS BECAUSE THEY CAN 955 00:37:42,188 --> 00:37:44,924 BRIDGE THE PRIVACY OF RELATIVES, 956 00:37:44,924 --> 00:37:47,593 AS YOU KNOW, FROM THE PRESS. 957 00:37:47,593 --> 00:37:52,264 SO, WE HAVE DONE A LOT OF WORK 958 00:37:52,264 --> 00:37:55,234 IN INSTITUTIONAL PEOPLE CENTRIC 959 00:37:55,234 --> 00:37:59,105 STUDIES OF CONSENT, CONSENT 960 00:37:59,105 --> 00:37:59,805 MANAGEMENT, TIERED CONSENT AND 961 00:37:59,805 --> 00:38:02,541 SO ON BUT ALSO PRIVACY 962 00:38:02,541 --> 00:38:04,610 TECHNOLOGY PRACTICES THAT HELP 963 00:38:04,610 --> 00:38:07,279 WITH PROTECTING THE PRIVACY, ONE 964 00:38:07,279 --> 00:38:08,681 OF THEM IS FEDERATED 965 00:38:08,681 --> 00:38:11,083 COMPUTATION, AGAIN BY ITSELF IT 966 00:38:11,083 --> 00:38:12,852 DOESN'T PROTECT PRIVACY BUT DOES 967 00:38:12,852 --> 00:38:14,153 GIVE SOME LEVEL OF COMFORT THAT 968 00:38:14,153 --> 00:38:17,323 THE DATA ARE NOT GOING OUT THERE 969 00:38:17,323 --> 00:38:19,625 AND BEING UNCONTROLLED. 970 00:38:19,625 --> 00:38:21,060 SO, COMPUTING WITH DISTRIBUTED 971 00:38:21,060 --> 00:38:22,194 DATA, THERE ARE SEVERAL 972 00:38:22,194 --> 00:38:24,330 ALGORITHMS TO DO THAT. 973 00:38:24,330 --> 00:38:25,798 SO THAT EACH INSTITUTION KEEPS 974 00:38:25,798 --> 00:38:27,933 THEIR OWN DATA BUT WE CAN 975 00:38:27,933 --> 00:38:30,836 DEVELOP AN A.I. MODEL OUT OF 976 00:38:30,836 --> 00:38:31,671 THAT. 977 00:38:31,671 --> 00:38:41,180 THERE ARE TWO TYPES OF 978 00:38:41,180 --> 00:38:42,148 PARTITIONS DIDACTIC, WE COMBINE 979 00:38:42,148 --> 00:38:44,984 THEM IN THE ANALYSIS OR VERTICAL 980 00:38:44,984 --> 00:38:45,818 IN WHICH ONE INSTITUTION HAS 981 00:38:45,818 --> 00:38:47,420 SOME PART OF DATA, THE OTHER 982 00:38:47,420 --> 00:38:48,821 INSTITUTIONS HAVE THE OTHER 983 00:38:48,821 --> 00:38:51,590 PART, AND WE NEED TO COMPUTE 984 00:38:51,590 --> 00:38:53,893 WITH THEM WITHOUT PUTTING THIS 985 00:38:53,893 --> 00:38:54,627 DATA TOGETHER, WITHOUT 986 00:38:54,627 --> 00:38:56,062 CENTRALIZING THE DATA. 987 00:38:56,062 --> 00:38:59,865 SO WE HAVE DONE THAT, AS WAS 988 00:38:59,865 --> 00:39:02,368 MENTIONED BEFORE, WITH CLINICAL 989 00:39:02,368 --> 00:39:11,811 DATA RESEARCH NETWORKS. 990 00:39:11,811 --> 00:39:14,513 AGAIN, NESTED HUBS, WE CAN 991 00:39:14,513 --> 00:39:15,648 CALCULATE WITH, FOR EXAMPLE, 992 00:39:15,648 --> 00:39:16,515 ELECTRONIC HEALTH RECORDS FROM 993 00:39:16,515 --> 00:39:21,020 THE V.A. IF WE ARRANGE THE 994 00:39:21,020 --> 00:39:23,656 NETWORK IN CERTAIN WAYS SO THAT, 995 00:39:23,656 --> 00:39:26,392 AGAIN, WE DON'T NEED TO MOVE ANY 996 00:39:26,392 --> 00:39:27,960 DATA AROUND. 997 00:39:27,960 --> 00:39:31,497 DURING COVID, WE DID THAT FOR 998 00:39:31,497 --> 00:39:33,933 DISCOVERY FROM CLINICAL RECORDS 999 00:39:33,933 --> 00:39:36,769 IN WHICH THE CLINICIANS DIDN'T 1000 00:39:36,769 --> 00:39:40,406 KNOW THE SAFETY OF CERTAIN 1001 00:39:40,406 --> 00:39:41,607 MEDICATIONS, FOR PATIENTS WITH 1002 00:39:41,607 --> 00:39:45,678 COVID-19, AND WE WERE ABLE TO 1003 00:39:45,678 --> 00:39:47,046 CONSULT THE LARGE NETWORK AND 1004 00:39:47,046 --> 00:39:51,417 DETERMINE IF THERE WERE ANY 1005 00:39:51,417 --> 00:39:52,218 SIGNALS THAT SOME MEDICATIONS 1006 00:39:52,218 --> 00:39:53,652 WOULD BE UNSAFE. 1007 00:39:53,652 --> 00:39:59,959 SO WE DID THAT WITH A WHOLE V.A. 1008 00:39:59,959 --> 00:40:01,861 NETWORK, WITH INSTITUTIONS SHOWN 1009 00:40:01,861 --> 00:40:06,365 HERE IN RED, AS WELL AS ONE 1010 00:40:06,365 --> 00:40:06,999 INSTITUTION IN GERMANY THAT, 1011 00:40:06,999 --> 00:40:08,701 AGAIN, BECAUSE WE COULD DO THIS 1012 00:40:08,701 --> 00:40:11,337 ALL IN A FEDERATED MANNER WAS 1013 00:40:11,337 --> 00:40:12,404 ABLE TO PARTICIPATE. 1014 00:40:12,404 --> 00:40:14,707 IT WAS IN THE BEGINNING OF THE 1015 00:40:14,707 --> 00:40:16,542 PANDEMIC, AND WE COULD CONSULT 1016 00:40:16,542 --> 00:40:20,079 NOT ONLY WITH PATIENTS WITH 1017 00:40:20,079 --> 00:40:23,716 COVID BUT ALSO CONTROLS. 1018 00:40:23,716 --> 00:40:26,986 WE DID FEDERATED ANALYSIS, IN 1019 00:40:26,986 --> 00:40:28,921 THIS CASE SIMPLE LOGISTIC 1020 00:40:28,921 --> 00:40:33,893 REGRESSION TO EVALUATE AND USING 1021 00:40:33,893 --> 00:40:35,795 DATA FROM SEVERAL INSTITUTIONS, 1022 00:40:35,795 --> 00:40:39,165 WE DID NOT HAVE DIFFERENCES IN 1023 00:40:39,165 --> 00:40:43,302 MORTALITY RELATED TO RACE AND 1024 00:40:43,302 --> 00:40:45,271 ETHNICITY, AND COMFORTING FOR 1025 00:40:45,271 --> 00:40:46,472 ALL THE INSTITUTIONS INVOLVED, 1026 00:40:46,472 --> 00:40:50,643 BUT ALSO MEANING THAT WE 1027 00:40:50,643 --> 00:40:54,146 PROBABLY JUST GET THE PATIENTS 1028 00:40:54,146 --> 00:40:56,549 HOSPITALIZED AND WE'RE MISSING A 1029 00:40:56,549 --> 00:40:58,517 LOT OF OTHER -- THOSE WHO DID 1030 00:40:58,517 --> 00:41:01,253 NOT GET INTO THE HEALTHCARE 1031 00:41:01,253 --> 00:41:03,322 SYSTEM. 1032 00:41:03,322 --> 00:41:06,992 WE ALSO HAVE WORKED WITH TRIBAL 1033 00:41:06,992 --> 00:41:10,896 NATIONS, IN WHICH THE STRONG 1034 00:41:10,896 --> 00:41:12,097 HEART STUDY COLLABORATORS WORKED 1035 00:41:12,097 --> 00:41:16,202 WITH THEIR GROUPS IN ARIZONA, 1036 00:41:16,202 --> 00:41:17,903 OKLAHOMA, NORTH AND SOUTH 1037 00:41:17,903 --> 00:41:20,773 DAKOTA, AND THE INTERESTING 1038 00:41:20,773 --> 00:41:22,541 METHODOLOGIC PROBLEM HERE IS 1039 00:41:22,541 --> 00:41:24,577 THAT THE PHENOTYPES ARE 1040 00:41:24,577 --> 00:41:26,979 CENTRALIZED IN THE OKLAHOMA 1041 00:41:26,979 --> 00:41:28,914 COORDINATING CENTER, BUT THE 1042 00:41:28,914 --> 00:41:30,850 GENOTYPES ARE CENTRALIZED IN 1043 00:41:30,850 --> 00:41:31,350 TEXAS. 1044 00:41:31,350 --> 00:41:33,085 SO, THERE ARE TWO INSTITUTIONS, 1045 00:41:33,085 --> 00:41:36,255 AGAIN, THAT HAVE DIFFERENT 1046 00:41:36,255 --> 00:41:39,825 PORTIONS OF THE PARTICIPANTS' 1047 00:41:39,825 --> 00:41:43,229 RECORDS, WE SHOWED THROUGH 1048 00:41:43,229 --> 00:41:43,863 VERTICAL PARTITIONS ANALYSIS 1049 00:41:43,863 --> 00:41:46,165 FEDERATED ANALYSIS THIS WAY, WE 1050 00:41:46,165 --> 00:41:50,636 COULD COMPUTE WITH THIS DATA 1051 00:41:50,636 --> 00:41:52,171 TOGETHER WITHOUT MOVING THE DATA 1052 00:41:52,171 --> 00:41:55,674 AROUND SO THERE'S AN ALGORITHM 1053 00:41:55,674 --> 00:41:56,175 THAT'S BEEN PUBLISHED, 1054 00:41:56,175 --> 00:42:00,813 SUBSEQUENT ONES TOO IN WHICH WE 1055 00:42:00,813 --> 00:42:03,015 USED SPECIFIC TECHNIQUES TO 1056 00:42:03,015 --> 00:42:03,682 LEARN GLOBAL REGRESSION 1057 00:42:03,682 --> 00:42:06,418 PARAMETERS, DO THIS IN AN 1058 00:42:06,418 --> 00:42:10,789 ITERATIVE PROCESS WITHOUT, 1059 00:42:10,789 --> 00:42:12,124 AGAIN, MOVING DATA AROUND. 1060 00:42:12,124 --> 00:42:16,829 WE ARE ALSO CURRENTLY WORKING ON 1061 00:42:16,829 --> 00:42:17,496 HORIZONTAL PARTITIONS, AND 1062 00:42:17,496 --> 00:42:20,666 WORKING WITH THE "ALL OF US" 1063 00:42:20,666 --> 00:42:24,103 PROGRAM AND MILLION VETERAN 1064 00:42:24,103 --> 00:42:28,007 PROGRAM TO DO PHENOTYPE, 1065 00:42:28,007 --> 00:42:32,578 GENOTYPE, TESTS ON HORIZONTALLY 1066 00:42:32,578 --> 00:42:34,413 PARTITIONS, WORKING ON LOCAL 1067 00:42:34,413 --> 00:42:36,382 ANCESTRY FOR GENOTYPES AND 1068 00:42:36,382 --> 00:42:39,685 PARTICULARLY WE HAVE A CENTER OF 1069 00:42:39,685 --> 00:42:40,552 EXCELLENCE IN GENOME SCIENCE 1070 00:42:40,552 --> 00:42:49,061 FOCUSED ON THE STUDY OF ADMIXED 1071 00:42:49,061 --> 00:42:49,962 INDIVIDUALS OFTEN LEFT OUT 1072 00:42:49,962 --> 00:42:52,631 BECAUSE IT'S HARDER TO COMPUTE. 1073 00:42:52,631 --> 00:42:57,636 ON THE CONTRARY WE BELIEVE 1074 00:42:57,636 --> 00:43:02,942 INCLUSION OF ADMIXED INDIVIDUALS 1075 00:43:02,942 --> 00:43:09,281 IS CRITICAL FOR ANALYSIS. 1076 00:43:09,281 --> 00:43:10,082 WE'RE METHODOLOGISTS BUT AT THE 1077 00:43:10,082 --> 00:43:11,884 CORE IS THE DATA. 1078 00:43:11,884 --> 00:43:15,421 NO MATTER HOW A.I. FAIRNESS WE 1079 00:43:15,421 --> 00:43:17,890 STUDY AND TRY TO FIX PROBLEMS 1080 00:43:17,890 --> 00:43:20,726 WITH REPRESENTATION OF CERTAIN 1081 00:43:20,726 --> 00:43:23,462 GROUPS OF INDIVIDUALS, THEY ARE 1082 00:43:23,462 --> 00:43:26,432 NO SUBSTITUTE TO FIXING THE DATA 1083 00:43:26,432 --> 00:43:28,767 THEMSELVES, MEANING COLLECTING 1084 00:43:28,767 --> 00:43:32,204 THE DATA FROM MORE DIVERSE 1085 00:43:32,204 --> 00:43:33,005 POPULATIONS. 1086 00:43:33,005 --> 00:43:36,308 SO WE'VE BEEN DOING THAT THROUGH 1087 00:43:36,308 --> 00:43:37,343 CLINICAL DATA NETWORKS, ALSO NOW 1088 00:43:37,343 --> 00:43:40,579 AS PART OF THE "ALL OF US" 1089 00:43:40,579 --> 00:43:40,813 PROGRAM. 1090 00:43:40,813 --> 00:43:43,882 SO PRECISION X, X COULD BE 1091 00:43:43,882 --> 00:43:45,918 MEDICINE, NURSING, PUBLIC 1092 00:43:45,918 --> 00:43:48,087 HEALTH, ANY OTHER HEALTH-RELATED 1093 00:43:48,087 --> 00:43:50,789 DISCIPLINE, IT REQUIRES 1094 00:43:50,789 --> 00:43:51,290 DIVERSITY. 1095 00:43:51,290 --> 00:43:52,925 THE GENOMIC, FOR EXAMPLE, 1096 00:43:52,925 --> 00:43:54,293 VARIANTS OF SIGNIFICANCE HAVE 1097 00:43:54,293 --> 00:43:59,665 BEEN SHOWNING TO SIGNIFICANT IN 1098 00:43:59,665 --> 00:44:00,332 CERTAIN POPULATIONS, 1099 00:44:00,332 --> 00:44:01,734 PARTICULARLY EUROPEANS, SOME 1100 00:44:01,734 --> 00:44:04,470 ASIAN POPULATIONS WELL 1101 00:44:04,470 --> 00:44:06,638 REPRESENTED BUT NOT ALL. 1102 00:44:06,638 --> 00:44:08,640 THE PHENOTYPIC DATA AS WELL, WE 1103 00:44:08,640 --> 00:44:09,842 OPERATE ON ELECTRONIC HEALTH 1104 00:44:09,842 --> 00:44:10,809 RECORDS, BUT WHAT ABOUT DATA 1105 00:44:10,809 --> 00:44:14,413 FROM THOSE WHO DON'T HAVE AN 1106 00:44:14,413 --> 00:44:16,715 ELECTRONIC HEALTH RECORD? 1107 00:44:16,715 --> 00:44:18,083 ALSO PERSONAL MONITORING, 1108 00:44:18,083 --> 00:44:19,818 BEHAVIOR DATA, ET CETERA. 1109 00:44:19,818 --> 00:44:21,353 ENVIRONMENTAL DATA, IS SOMETHING 1110 00:44:21,353 --> 00:44:25,457 WE NEED TO WORK A LOT ON BECAUSE 1111 00:44:25,457 --> 00:44:27,860 EVERYONE IS DIFFERENT. 1112 00:44:27,860 --> 00:44:31,030 AND CONCEPTS OF LABELING SOMEONE 1113 00:44:31,030 --> 00:44:36,335 FROM A CERTAIN DEMOGRAPHIC AS A 1114 00:44:36,335 --> 00:44:37,536 PROXY FOR ENVIRONMENTAL 1115 00:44:37,536 --> 00:44:39,071 EXPOSURES OR DIET, FOR EXAMPLE, 1116 00:44:39,071 --> 00:44:41,940 ARE NOT CORRECT. 1117 00:44:41,940 --> 00:44:43,442 SO, "ALL OF US" RESEARCH PROGRAM 1118 00:44:43,442 --> 00:44:45,744 THAT'S WHY WE AS INFORMATICS 1119 00:44:45,744 --> 00:44:46,678 PEOPLE GOT EXCITED ABOUT IT 1120 00:44:46,678 --> 00:44:50,549 BECAUSE THIS IS OUR CHANCE TO 1121 00:44:50,549 --> 00:44:54,686 HELP COLLECT DATA AT SCALE FROM 1122 00:44:54,686 --> 00:44:56,922 DIVERSE INDIVIDUALS WHO FORMERLY 1123 00:44:56,922 --> 00:44:57,556 WERE UNDERREPRESENTED IN 1124 00:44:57,556 --> 00:45:02,928 BIOMEDICAL RESEARCH. 1125 00:45:02,928 --> 00:45:04,263 CURRENT PROTOCOL, YOU CAN FIND 1126 00:45:04,263 --> 00:45:07,599 AN EXCITING COLLECTION OF ITEMS, 1127 00:45:07,599 --> 00:45:11,737 CONSENT IS VERY DETAILED, VERY 1128 00:45:11,737 --> 00:45:14,406 STUDIED, SO VERY COMFORTABLE 1129 00:45:14,406 --> 00:45:16,775 ASSISTING WITH THAT INITIATIVE. 1130 00:45:16,775 --> 00:45:19,111 THE CALIFORNIA CONSORTIUM THAT 1131 00:45:19,111 --> 00:45:23,882 WE LED HAS MORE THAN 50,000 1132 00:45:23,882 --> 00:45:25,851 PARTICIPANTS SO FAR, EXTENDING 1133 00:45:25,851 --> 00:45:27,286 TO THE EAST COAST AND PUERTO 1134 00:45:27,286 --> 00:45:29,455 RICO, AS WELL. 1135 00:45:29,455 --> 00:45:34,793 WE HOPE TO RECRUIT 10% OF THE 1136 00:45:34,793 --> 00:45:36,728 PARTICIPANTS FOR THE "ALL OF US" 1137 00:45:36,728 --> 00:45:38,530 CONSORTIUM AS WELL AS 1138 00:45:38,530 --> 00:45:39,965 PARTICIPATE IN VARIOUS 1139 00:45:39,965 --> 00:45:41,500 INITIATIVES OF ETHICS AND 1140 00:45:41,500 --> 00:45:44,069 SCIENCE COMMITTEE AND A WHOLE 1141 00:45:44,069 --> 00:45:45,370 LOT OF OTHER EXCITING THINGS 1142 00:45:45,370 --> 00:45:48,407 THAT ARE HAPPENING IN THERE. 1143 00:45:48,407 --> 00:45:50,943 SO, FOR US, INFORMATICS, MEDICAL 1144 00:45:50,943 --> 00:45:54,913 A.I. PEOPLE, DOCUMENTING LACK OF 1145 00:45:54,913 --> 00:45:56,348 DIVERSITY IS NOT ENOUGH. 1146 00:45:56,348 --> 00:45:58,884 WE NEED TO HELP REALIZE THIS 1147 00:45:58,884 --> 00:46:00,252 DREAM OF PERSONALIZED HEALTH 1148 00:46:00,252 --> 00:46:01,954 CARE, BY HELPING WITH ETHICAL 1149 00:46:01,954 --> 00:46:04,256 COLLECTION OF DATA FROM DIVERSE 1150 00:46:04,256 --> 00:46:06,892 POPULATIONS, ON NEW METHODS TO 1151 00:46:06,892 --> 00:46:09,328 ENGAGE AND RETAIN DIVERSE 1152 00:46:09,328 --> 00:46:10,629 POPULATIONS, NEW ALGORITHMS AND 1153 00:46:10,629 --> 00:46:13,899 TOOLS TO ANALYZE THIS DATA, AND 1154 00:46:13,899 --> 00:46:16,368 HELP DISSEMINATE FINDINGS TO 1155 00:46:16,368 --> 00:46:18,704 VARIOUS CONSTITUENCIES, THEY 1156 00:46:18,704 --> 00:46:20,038 CANNOT BE LIMITED TO THE 1157 00:46:20,038 --> 00:46:21,140 SCIENTIFIC COMMUNITY ONLY. 1158 00:46:21,140 --> 00:46:24,443 SO OUR VISION IS THAT NO ONE 1159 00:46:24,443 --> 00:46:27,312 WILL BE LEFT OUT, THEY WILL 1160 00:46:27,312 --> 00:46:28,480 REPLACE DEMOGRAPHICALLY WITH 1161 00:46:28,480 --> 00:46:29,548 OBJECTIVE VARIABLES IN THESE 1162 00:46:29,548 --> 00:46:31,183 MODELS AND WILL DEVELOP NEW 1163 00:46:31,183 --> 00:46:35,287 METHODS AND TOOLS THAT ALLOW FOR 1164 00:46:35,287 --> 00:46:37,990 GENETIC FINDINGS, ANALYSIS OF 1165 00:46:37,990 --> 00:46:39,491 CLINICALLY RELEVANT DATA, 1166 00:46:39,491 --> 00:46:41,560 EVENTUALLY PRECISION X TO BE 1167 00:46:41,560 --> 00:46:42,561 APPLICABLE TO ALL. 1168 00:46:42,561 --> 00:46:44,096 WITH THAT I WILL LEAVE SOME TIME 1169 00:46:44,096 --> 00:46:45,764 FOR QUESTIONS AND I WANT TO 1170 00:46:45,764 --> 00:46:48,667 ACKNOWLEDGE A LOT OF PROJECTS 1171 00:46:48,667 --> 00:46:52,437 THAT HELPED US GET TO WHERE WE 1172 00:46:52,437 --> 00:46:54,540 ARE, AND A LONG LIST OF 1173 00:46:54,540 --> 00:46:58,377 COLLABORATORS AND IN EACH ONE 1174 00:46:58,377 --> 00:46:59,511 OF THESE PROJECTS. 1175 00:46:59,511 --> 00:47:00,746 AGAIN, THANK YOU ALL FOR 1176 00:47:00,746 --> 00:47:06,618 ATTENDING. 1177 00:47:06,618 --> 00:47:07,653 >> THANKS, LUCILA. 1178 00:47:07,653 --> 00:47:10,989 YOU COVERED A LOT OF TERRITORY. 1179 00:47:10,989 --> 00:47:11,456 INTERESTING. 1180 00:47:11,456 --> 00:47:12,124 I LEARNED A LOT. 1181 00:47:12,124 --> 00:47:14,126 WE HAVE A NUMBER OF QUESTIONS. 1182 00:47:14,126 --> 00:47:17,062 WHY DON'T WE DIVE IN. 1183 00:47:17,062 --> 00:47:19,464 FIRST ONE FROM ALI, WHAT UNMET 1184 00:47:19,464 --> 00:47:20,365 HEALTH CHALLENGES DO YOU FORESEE 1185 00:47:20,365 --> 00:47:28,941 A.I. SOLVING WITHIN THE NEXT 1186 00:47:28,941 --> 00:47:29,174 DECADE? 1187 00:47:29,174 --> 00:47:36,081 >> I THINK A NEED TO ASSIST AND 1188 00:47:36,081 --> 00:47:36,915 MAKE HEALTHCARE PROVIDERS MORE 1189 00:47:36,915 --> 00:47:39,251 FOCUSED ON WHAT THEY ARE GOOD 1190 00:47:39,251 --> 00:47:40,552 AT, LESS FOCUSED ON THE 1191 00:47:40,552 --> 00:47:44,189 PAPERWORK AND OTHER THINGS. 1192 00:47:44,189 --> 00:47:46,525 I THINK THAT WILL HELP A LOT. 1193 00:47:46,525 --> 00:47:48,493 PATTERN RECOGNITION OF THINGS 1194 00:47:48,493 --> 00:47:51,196 THAT ESCAPE THE EYE OF HUMANS 1195 00:47:51,196 --> 00:47:57,769 WILL BE DEFINITELY OUT THERE. 1196 00:47:57,769 --> 00:47:59,838 AND ANY, AGAIN, ASSISTANCE WITH 1197 00:47:59,838 --> 00:48:03,675 CHAT I THINK CHATBOTS WILL 1198 00:48:03,675 --> 00:48:05,210 IMPROVE OVER TIME AND MIGHT BE 1199 00:48:05,210 --> 00:48:07,813 ABLE TO SOLVE SIMPLE THINGS. 1200 00:48:07,813 --> 00:48:13,885 THEN RESERVE THE SPECIALIZED 1201 00:48:13,885 --> 00:48:15,487 TIME FOR MORE COMPLEX 1202 00:48:15,487 --> 00:48:15,787 SITUATIONS. 1203 00:48:15,787 --> 00:48:19,558 >> YOU MADE THE INTERESTING 1204 00:48:19,558 --> 00:48:22,628 COMMENT ABOUT THESE GENERATIVED 1205 00:48:22,628 --> 00:48:24,363 ADVERSARIAL WHERE YOU TRAIN THEM 1206 00:48:24,363 --> 00:48:25,998 NOT TO DISTINGUISH BETWEEN TRUE 1207 00:48:25,998 --> 00:48:27,666 AND ARTIFICIAL BUT WE REALLY 1208 00:48:27,666 --> 00:48:29,034 WANT TO GENERATE TRUE 1209 00:48:29,034 --> 00:48:30,435 INFORMATION FROM THE GENERATIVE 1210 00:48:30,435 --> 00:48:30,702 MODELS. 1211 00:48:30,702 --> 00:48:34,873 I WONDER IF THERE'S A WAY OF 1212 00:48:34,873 --> 00:48:37,943 TURNING IT AROUND TO AVOID THE 1213 00:48:37,943 --> 00:48:38,243 ARTIFICIAL. 1214 00:48:38,243 --> 00:48:40,412 >> OH, THAT'S ONLY BECAUSE IF 1215 00:48:40,412 --> 00:48:43,515 YOU TRAIN THEM WITH YOUR DATA, 1216 00:48:43,515 --> 00:48:51,256 AND WITH YOUR SCRUTINY OF THE 1217 00:48:51,256 --> 00:48:52,491 CLASSIFIERS, YES, IT'S CORRECT. 1218 00:48:52,491 --> 00:48:54,626 I WILL GO BACK TO IMPROPER USE 1219 00:48:54,626 --> 00:48:56,662 OF SOME METHODS AND MODELS THAT 1220 00:48:56,662 --> 00:49:00,532 WERE CREATED FOR ONE PURPOSE AND 1221 00:49:00,532 --> 00:49:02,067 NOW WE'RE TRYING TO ADAPT TO 1222 00:49:02,067 --> 00:49:03,435 SOME OTHER PURPOSE WITHOUT 1223 00:49:03,435 --> 00:49:05,704 THINKING ABOUT THE TRAINING 1224 00:49:05,704 --> 00:49:09,074 SETS, HOW THE CORRECTIONS ARE 1225 00:49:09,074 --> 00:49:09,274 MADE. 1226 00:49:09,274 --> 00:49:10,742 >> YEAH, AND I THINK ONE OF THE 1227 00:49:10,742 --> 00:49:11,977 THINGS THAT CONCERNS ME IS 1228 00:49:11,977 --> 00:49:14,746 PEOPLE WHO ARE USING THESE A.I. 1229 00:49:14,746 --> 00:49:15,614 APPROACHES WITHOUT REALLY 1230 00:49:15,614 --> 00:49:16,281 UNDERSTANDING THE DETAILS ABOUT 1231 00:49:16,281 --> 00:49:17,582 WHAT THEY ARE DOING. 1232 00:49:17,582 --> 00:49:21,286 AND I THINK THAT'S VERY 1233 00:49:21,286 --> 00:49:21,586 DANGEROUS. 1234 00:49:21,586 --> 00:49:23,722 THE NEXT QUESTION IS HOW DO YOU 1235 00:49:23,722 --> 00:49:27,492 HANDLE MEDICAL IMAGE DATA LIKE 1236 00:49:27,492 --> 00:49:28,927 MRIs, CTs, ET CETERA? 1237 00:49:28,927 --> 00:49:32,764 >> YEAH, AGAIN, I THINK THERE 1238 00:49:32,764 --> 00:49:34,733 ARE -- THAT'S IMAGING IS ONE 1239 00:49:34,733 --> 00:49:36,268 AREA THAT A.I., YOU KNOW, HAS 1240 00:49:36,268 --> 00:49:40,038 PROVEN TO BE SO HELPFUL AND WILL 1241 00:49:40,038 --> 00:49:42,074 CONTINUE TO DO SO. 1242 00:49:42,074 --> 00:49:44,409 IMAGES ARE ALSO VERY HARD TO 1243 00:49:44,409 --> 00:49:45,610 DE-IDENTIFY, RIGHT? 1244 00:49:45,610 --> 00:49:48,613 SO THAT MAKES IT ALSO HARD TO 1245 00:49:48,613 --> 00:49:54,186 SHIP OUT ALL THE IMAGES TO SOME 1246 00:49:54,186 --> 00:49:57,956 VERY INTENSE COMPUTE CLUSTER 1247 00:49:57,956 --> 00:49:59,491 SOMEWHERE, THAT WE CANNOT 1248 00:49:59,491 --> 00:50:00,125 NECESSARILY TRUST. 1249 00:50:00,125 --> 00:50:05,163 BUT I THINK THE TRUST ISSUE IS 1250 00:50:05,163 --> 00:50:06,498 RESOLVED WITH AGREEMENTS, AND 1251 00:50:06,498 --> 00:50:09,968 WITH LEGAL SUPPORT, SO I DON'T 1252 00:50:09,968 --> 00:50:11,737 SEE THAT AS UNSURMOUNTABLE. 1253 00:50:11,737 --> 00:50:15,240 I JUST THINK WE NEED TO, YOU 1254 00:50:15,240 --> 00:50:16,808 KNOW, MAKE SURE WE PROPERLY 1255 00:50:16,808 --> 00:50:19,511 HANDLE THE PRIVACY OF THE 1256 00:50:19,511 --> 00:50:20,612 PATIENTS AND PARTICIPANTS. 1257 00:50:20,612 --> 00:50:22,280 >> , YOU KNOW, EXTENDING THAT, 1258 00:50:22,280 --> 00:50:24,182 DO YOU THINK IF YOU CONVERT 1259 00:50:24,182 --> 00:50:27,152 IMAGES INTO FEATURES THAT THEN 1260 00:50:27,152 --> 00:50:30,122 IT'S MORE -- YOU KNOW, IT'S 1261 00:50:30,122 --> 00:50:33,458 HARDER TO REIDENTIFY THE 1262 00:50:33,458 --> 00:50:33,759 INDIVIDUAL? 1263 00:50:33,759 --> 00:50:36,461 >> YEAH, THAT COULD BE SO, BUT 1264 00:50:36,461 --> 00:50:39,698 EVEN THAT CONVERSION, RIGHT, IF 1265 00:50:39,698 --> 00:50:42,367 YOU USE AUTOENCODER OR SOME 1266 00:50:42,367 --> 00:50:45,237 OTHER THING, THAT REQUIRES 1267 00:50:45,237 --> 00:50:46,304 COMPUTATIONAL POWER THAT IN 1268 00:50:46,304 --> 00:50:49,207 TIMES IS NOT EASY TO OBTAIN. 1269 00:50:49,207 --> 00:50:49,841 >> YEAH, YEAH. 1270 00:50:49,841 --> 00:50:51,710 NEXT QUESTION, DO YOU THINK THE 1271 00:50:51,710 --> 00:50:55,981 USE OF SYNTHETIC DATA IS A 1272 00:50:55,981 --> 00:50:58,083 SOLUTION FOR DATA PRIVACY. 1273 00:50:58,083 --> 00:50:58,717 >> I DON'T. 1274 00:50:58,717 --> 00:51:02,554 I THINK SYNTHETIC DATA MAY BE 1275 00:51:02,554 --> 00:51:04,589 USEFUL, LET'S PUT IT, FOR 1276 00:51:04,589 --> 00:51:06,458 EXAMPLE, ELECTRONIC HEALTH 1277 00:51:06,458 --> 00:51:09,060 RECORD DATA, SYNTHETIC, IT MAY 1278 00:51:09,060 --> 00:51:11,897 BE USEFUL TO TRAIN SOMEBODY TO 1279 00:51:11,897 --> 00:51:13,632 OPERATE AN ELECTRONIC HEALTH 1280 00:51:13,632 --> 00:51:14,633 RECORD SYSTEM WITH REALISTIC 1281 00:51:14,633 --> 00:51:16,401 DATA, FOR EXAMPLE. 1282 00:51:16,401 --> 00:51:18,537 NOW, TO CREATE A MEDICAL A.I. 1283 00:51:18,537 --> 00:51:23,108 SYSTEM THAT LEARNS FROM THAT 1284 00:51:23,108 --> 00:51:25,310 DATA I WOULD SAY I DON'T THINK 1285 00:51:25,310 --> 00:51:26,278 IT'S USEFUL BECAUSE SYNTHETIC 1286 00:51:26,278 --> 00:51:29,648 DATA, IF IT IS OF THE SOURCE 1287 00:51:29,648 --> 00:51:32,784 THAT SOME INSTITUTIONS OR 1288 00:51:32,784 --> 00:51:34,820 COMPANIES PORTRAY AS RETAINING 1289 00:51:34,820 --> 00:51:36,521 EVERY SINGLE ASSOCIATION THAT 1290 00:51:36,521 --> 00:51:39,391 WAS AVAILABLE IN THE ORIGINAL 1291 00:51:39,391 --> 00:51:43,428 DATA, IF YOU HAVE THAT THEN YOU 1292 00:51:43,428 --> 00:51:44,729 ACTUALLY DON'T NEED SYNTHETIC 1293 00:51:44,729 --> 00:51:47,199 DATA BECAUSE YOU ALREADY HAVE, 1294 00:51:47,199 --> 00:51:48,433 YOU KNOW, THE MODEL THAT 1295 00:51:48,433 --> 00:51:49,768 GENERATES ALL OF THAT SO YOU 1296 00:51:49,768 --> 00:51:51,970 HAVE THE KNOWLEDGE THAT YOU WERE 1297 00:51:51,970 --> 00:51:54,039 LOOKING FOR. 1298 00:51:54,039 --> 00:51:58,310 SO I WOULD SAY I DON'T BELIEVE 1299 00:51:58,310 --> 00:51:59,811 IN SYNTHETIC DATA. 1300 00:51:59,811 --> 00:52:01,413 MY COLLEAGUES ALSO SAY, WELL, 1301 00:52:01,413 --> 00:52:03,582 MEDICAL DATA IS SO NOISY 1302 00:52:03,582 --> 00:52:06,551 ALREADY, WHY ARE YOU TRYING TO 1303 00:52:06,551 --> 00:52:08,820 PUT IN EVEN MORE NOISE INTO DATA 1304 00:52:08,820 --> 00:52:09,921 AND SO ON. 1305 00:52:09,921 --> 00:52:12,891 I DO BELIEVE IN AGREEMENTS. 1306 00:52:12,891 --> 00:52:14,326 I BELIEVE IN TRANSPARENCY OF 1307 00:52:14,326 --> 00:52:17,929 USER DATA. 1308 00:52:17,929 --> 00:52:19,898 SYNTHETIC DATA ARE DERIVED FROM 1309 00:52:19,898 --> 00:52:21,933 REAL DATA, RIGHT? 1310 00:52:21,933 --> 00:52:24,703 TO THE EXTENT THEY ARE STILL 1311 00:52:24,703 --> 00:52:28,173 SOME PRIVACY RISK THERE, THAT 1312 00:52:28,173 --> 00:52:29,608 HAS TO BE, BECAUSE IF YOU 1313 00:52:29,608 --> 00:52:32,010 ONLY -- THE ONLY WAY TO PRESERVE 1314 00:52:32,010 --> 00:52:33,945 PRIVACY IS REMOVE ANY SIGNAL 1315 00:52:33,945 --> 00:52:38,917 FROM THE DATA AND MAKING IT 1316 00:52:38,917 --> 00:52:39,417 USELESS. 1317 00:52:39,417 --> 00:52:40,318 >> NEXT QUESTION, COULD YOU 1318 00:52:40,318 --> 00:52:43,722 PLEASE COMMENT MORE ON THE BEST 1319 00:52:43,722 --> 00:52:45,357 PRACTICE FOR DE-IDENTIFYING 1320 00:52:45,357 --> 00:52:46,892 DATA, BUT PRESERVING ITS 1321 00:52:46,892 --> 00:52:49,427 CAPABILITY OF LINKING TO OTHER 1322 00:52:49,427 --> 00:52:52,197 DATA SOURCES WHEN NEEDED, ANY 1323 00:52:52,197 --> 00:52:53,798 CONCERNS REGARDING PRIVACY AND 1324 00:52:53,798 --> 00:52:54,032 ETHICS? 1325 00:52:54,032 --> 00:52:55,600 >> I HAVE A LOT OF CONCERNS WITH 1326 00:52:55,600 --> 00:53:00,205 LINKAGE OF DATA THAT IS NOT 1327 00:53:00,205 --> 00:53:01,773 CONSENTED SIMPLY BECAUSE THE 1328 00:53:01,773 --> 00:53:04,442 LONGER -- THINK OF THE RECORD AS 1329 00:53:04,442 --> 00:53:06,411 HAVING A CERTAIN SIZE, AND 1330 00:53:06,411 --> 00:53:08,046 REMEMBER YOU CAN MATCH 1331 00:53:08,046 --> 00:53:11,116 INFORMATION YOU KNOW AND THEN 1332 00:53:11,116 --> 00:53:12,317 IDENTIFY AN INDIVIDUAL, THE 1333 00:53:12,317 --> 00:53:17,155 TARGET INDIVIDUAL IN THAT 1334 00:53:17,155 --> 00:53:17,389 DATASET. 1335 00:53:17,389 --> 00:53:18,657 THE LONGER YOU MAKE THE RECORD 1336 00:53:18,657 --> 00:53:20,625 THE MORE CHANCES TO IDENTIFY THE 1337 00:53:20,625 --> 00:53:23,695 INDIVIDUAL, RIGHT? 1338 00:53:23,695 --> 00:53:27,299 THE LINKAGE ITSELF INCREASES THE 1339 00:53:27,299 --> 00:53:29,134 RISK OF PRIVACY BREACH BY A LOT, 1340 00:53:29,134 --> 00:53:31,903 WHERE THE PEOPLE WHO ARE 1341 00:53:31,903 --> 00:53:34,139 CONSENTING FOR LINKAGE 1342 00:53:34,139 --> 00:53:35,607 UNDERSTAND THAT IS UNCLEAR, AND 1343 00:53:35,607 --> 00:53:37,943 WHETHER PEOPLE ARE EVEN 1344 00:53:37,943 --> 00:53:40,812 CONSENTING FOR THAT, IT'S ALSO 1345 00:53:40,812 --> 00:53:41,079 UNCLEAR. 1346 00:53:41,079 --> 00:53:42,647 SO I THINK THERE ARE TREMENDOUS 1347 00:53:42,647 --> 00:53:45,250 ETHICAL ISSUES TO BE RESOLVED, 1348 00:53:45,250 --> 00:53:47,919 BUT OF COURSE KNOWING THAT 1349 00:53:47,919 --> 00:53:49,521 LINKAGE HELPS RESEARCH BECAUSE 1350 00:53:49,521 --> 00:53:53,992 IT MAKES THE REFERENCE MORE 1351 00:53:53,992 --> 00:53:55,460 COMPLETE, LESS -- DEDUPLICATE 1352 00:53:55,460 --> 00:53:55,927 RECORDS. 1353 00:53:55,927 --> 00:53:59,898 WITH EVERYTHING, IT'S A BALANCE, 1354 00:53:59,898 --> 00:54:00,231 I BELIEVE. 1355 00:54:00,231 --> 00:54:00,899 >> UH-HUH, YEAH. 1356 00:54:00,899 --> 00:54:02,300 TWO MORE QUESTIONS. 1357 00:54:02,300 --> 00:54:04,936 IS THE LEGAL SYSTEM READY TO 1358 00:54:04,936 --> 00:54:07,539 HANDLE MEDICAL MISTAKES MADE BY 1359 00:54:07,539 --> 00:54:07,872 A.I.? 1360 00:54:07,872 --> 00:54:09,941 IN OTHER WORDS, WHO IS TO BE 1361 00:54:09,941 --> 00:54:14,412 SUED IF A CERTIFIED A.I.-BASED 1362 00:54:14,412 --> 00:54:14,846 SYSTEM CAUSES HARM? 1363 00:54:14,846 --> 00:54:17,682 >> THAT'S A GOOD QUESTION. 1364 00:54:17,682 --> 00:54:19,584 I DON'T THINK THE STRUCTURE IS 1365 00:54:19,584 --> 00:54:19,784 THERE. 1366 00:54:19,784 --> 00:54:22,687 I CAN SAY THAT THERE HAS BEEN A 1367 00:54:22,687 --> 00:54:25,991 LOT OF DISCUSSION AND CURRENT 1368 00:54:25,991 --> 00:54:28,893 PANEL HAVE BEEN IN WHICH WILL 1369 00:54:28,893 --> 00:54:34,065 ISSUE A PAPER VERY SOON ON 1370 00:54:34,065 --> 00:54:36,368 RACIAL AND ETHNIC DISCRIMINATION 1371 00:54:36,368 --> 00:54:39,537 IN A.I. MODELS, FOR EXAMPLE, 1372 00:54:39,537 --> 00:54:42,807 WHICH IS, YOU KNOW, BEYOND 1373 00:54:42,807 --> 00:54:43,541 MISTAKES, RIGHT? 1374 00:54:43,541 --> 00:54:50,148 BECAUSE DEPENDS WHAT YOU 1375 00:54:50,148 --> 00:54:50,682 CONSIDER MISTAKES. 1376 00:54:50,682 --> 00:54:52,550 THAT TOPS EVERYTHING HOW TO TRY 1377 00:54:52,550 --> 00:54:57,155 TO AVOID, AND IF IT HAPPENS, WHO 1378 00:54:57,155 --> 00:54:59,391 SHOULD BE RESPONSIBLE FOR. 1379 00:54:59,391 --> 00:55:02,360 BUT I WOULD SAY IN MANY WAYS IT 1380 00:55:02,360 --> 00:55:04,362 CAN BE CONSIDERED LIKE A 1381 00:55:04,362 --> 00:55:06,665 DIAGNOSTIC TEST, AND, YOU KNOW, 1382 00:55:06,665 --> 00:55:09,567 THE LEGAL SYSTEM HAS WAYS TO 1383 00:55:09,567 --> 00:55:09,868 HANDLE THAT. 1384 00:55:09,868 --> 00:55:11,603 >> RIGHT, RIGHT. 1385 00:55:11,603 --> 00:55:13,071 YEAH, MY QUESTION WHICH IS 1386 00:55:13,071 --> 00:55:14,172 RELATED TO THAT IS THAT, YOU 1387 00:55:14,172 --> 00:55:16,908 KNOW, HOW DO YOU GO ABOUT 1388 00:55:16,908 --> 00:55:17,509 VALIDATING DEEP NEURAL NETWORK 1389 00:55:17,509 --> 00:55:20,145 MODELS SO THAT THEY CAN BE USED 1390 00:55:20,145 --> 00:55:22,213 FOR DIAGNOSTICS AND OBVIOUSLY IT 1391 00:55:22,213 --> 00:55:26,117 HAS IMPLICATIONS FOR ACCURACY AS 1392 00:55:26,117 --> 00:55:26,484 WELL. 1393 00:55:26,484 --> 00:55:26,918 >> RIGHT. 1394 00:55:26,918 --> 00:55:29,354 SO LIKE I MENTIONED, YOU DON'T 1395 00:55:29,354 --> 00:55:30,789 NEED TO RETRAIN NECESSARILY WITH 1396 00:55:30,789 --> 00:55:32,724 YOUR DATA BUT ONCE YOU HAVE 1397 00:55:32,724 --> 00:55:38,129 FIXED MODEL YOU CAN APPLY TO 1398 00:55:38,129 --> 00:55:40,065 YOUR DATA, AND SAY -- AND ASSESS 1399 00:55:40,065 --> 00:55:42,133 THE PERFORMANCE IN YOUR DATA 1400 00:55:42,133 --> 00:55:46,671 BOTH IN TERMS OF DISCRIMINATIVE 1401 00:55:46,671 --> 00:55:48,039 PERFORMANCE AND CALIBRATION, SO 1402 00:55:48,039 --> 00:55:50,775 THAT SHOULD BE THE MINIMUM THAT 1403 00:55:50,775 --> 00:55:55,313 SHOULD BE DONE BEFORE TURNING ON 1404 00:55:55,313 --> 00:55:56,181 A PARTICULAR MODEL. 1405 00:55:56,181 --> 00:55:58,750 BUT I'M FIRST TO SAY IT'S 1406 00:55:58,750 --> 00:56:00,485 PROBABLY NOT DONE. 1407 00:56:00,485 --> 00:56:01,820 >> YEAH, YEAH. 1408 00:56:01,820 --> 00:56:03,722 ALL RIGHT. 1409 00:56:03,722 --> 00:56:07,358 LAST QUESTION IS SO THE OCTOBER 1410 00:56:07,358 --> 00:56:09,694 2023 DATA SNAPSHOT FOR ALL OF US 1411 00:56:09,694 --> 00:56:14,766 SHOWS A 3% PARTICIPATION RATE 1412 00:56:14,766 --> 00:56:16,201 FOR ASIAN PARTICIPANTS VERSUS 16 1413 00:56:16,201 --> 00:56:18,203 AND 17% FOR BLACK AND HISPANICS. 1414 00:56:18,203 --> 00:56:21,339 DO YOU HAVE ANY INSIGHT INTO THE 1415 00:56:21,339 --> 00:56:22,640 LOW PARTICIPATION NUMBER FOR 1416 00:56:22,640 --> 00:56:26,611 ASIANS, MORE PRIVACY CONCERNS, 1417 00:56:26,611 --> 00:56:27,378 LESS ADVERTISING, ANY INSIGHTS 1418 00:56:27,378 --> 00:56:34,686 ABOUT WHY THERE'S SUCH A BIG 1419 00:56:34,686 --> 00:56:35,153 DIFFERENCE? 1420 00:56:35,153 --> 00:56:38,089 >> WELL, I CAN SAY AS A 1421 00:56:38,089 --> 00:56:40,492 RECRUITMENT CENTER WE DO 1422 00:56:40,492 --> 00:56:46,397 EVERYTHING WE CAN TO EXPAND AND 1423 00:56:46,397 --> 00:56:47,699 NOT FORGET ANY PARTICULAR 1424 00:56:47,699 --> 00:56:48,333 POPULATION. 1425 00:56:48,333 --> 00:56:49,567 IT DOES TAKE I'M. 1426 00:56:49,567 --> 00:56:54,139 I WOULD SAY I CAN CONFIRM THOSE 1427 00:56:54,139 --> 00:56:55,740 NUMBERS RIGHT NOW -- CAN'T 1428 00:56:55,740 --> 00:56:56,808 CONFIRM THOSE NUMBERS NOW BUT I 1429 00:56:56,808 --> 00:56:58,143 WOULD SAY THE NUMBERS IN THE 1430 00:56:58,143 --> 00:57:03,548 "ALL OF US" PROGRAM ARE HIGHER 1431 00:57:03,548 --> 00:57:04,849 FOR UNDERREPRESENTED 1432 00:57:04,849 --> 00:57:05,416 BIOMEDICAL -- IN BIOMEDICAL 1433 00:57:05,416 --> 00:57:07,218 RESEARCH POPULATIONS THAN ANY 1434 00:57:07,218 --> 00:57:08,186 OTHER COHORT STUDY. 1435 00:57:08,186 --> 00:57:11,456 SO THAT'S NOT TO SAY WE DON'T 1436 00:57:11,456 --> 00:57:18,396 HAVE A LOT OF IMPROVEMENT TO DO, 1437 00:57:18,396 --> 00:57:20,532 BUT IT'S UNPRECEDENTED, THE 1438 00:57:20,532 --> 00:57:21,065 REPRESENTATION OF CERTAIN 1439 00:57:21,065 --> 00:57:21,332 SEGMENTS. 1440 00:57:21,332 --> 00:57:22,867 >> YEAH, YEAH. 1441 00:57:22,867 --> 00:57:24,035 BLACKS AND HISPANICS, YOU KNOW, 1442 00:57:24,035 --> 00:57:25,503 THOSE NUMBERS ARE BETTER THAN I 1443 00:57:25,503 --> 00:57:27,405 WOULD HAVE THOUGHT, BUT ASIANS 1444 00:57:27,405 --> 00:57:32,443 BEING SO LOW IS A LITTLE 1445 00:57:32,443 --> 00:57:32,877 SURPRISING. 1446 00:57:32,877 --> 00:57:35,380 >> WE HAVE WORK TO DO. 1447 00:57:35,380 --> 00:57:37,482 >> ONE LAST QUESTION, SINCE THIS 1448 00:57:37,482 --> 00:57:39,384 IS THE ADA LOVELACE LECTURE, WE 1449 00:57:39,384 --> 00:57:41,553 OFTEN HEAR THERE ARE NOT ENOUGH 1450 00:57:41,553 --> 00:57:43,121 WOMEN IN STEM, AND MANY OF US 1451 00:57:43,121 --> 00:57:45,857 SAY THAT THERE ARE ENOUGH WOMEN 1452 00:57:45,857 --> 00:57:47,859 IN MEDICAL SCHOOLS AND IN 1453 00:57:47,859 --> 00:57:50,395 Ph.D. PROGRAMS AND IN 1454 00:57:50,395 --> 00:57:51,629 INFORMATICS MAYBE. 1455 00:57:51,629 --> 00:57:53,698 WHAT HAVE YOUR EXPERIENCES BEEN 1456 00:57:53,698 --> 00:57:55,400 ABOUT THE REPRESENTATION OF 1457 00:57:55,400 --> 00:57:57,602 WOMEN IN CLINICAL SCIENCES AND 1458 00:57:57,602 --> 00:57:59,370 INFORMATICS, DO YOU THINK 1459 00:57:59,370 --> 00:58:00,405 THERE'S STILL A SHORTAGE OF 1460 00:58:00,405 --> 00:58:01,840 WOMEN IN THESE AREAS? 1461 00:58:01,840 --> 00:58:04,809 IF SO, WHAT CAN WE DO TO ADDRESS 1462 00:58:04,809 --> 00:58:05,009 THAT? 1463 00:58:05,009 --> 00:58:11,883 >> YEAH, I WOULD SAY STATISTICS 1464 00:58:11,883 --> 00:58:14,219 IN MEDICAL SCHOOLS SHOW EQUAL 1465 00:58:14,219 --> 00:58:15,854 REPRESENTATION FOR INCOMING 1466 00:58:15,854 --> 00:58:17,822 CLASSES, RIGHT? 1467 00:58:17,822 --> 00:58:19,390 IF YOU CONSIDER REPRESENTATION 1468 00:58:19,390 --> 00:58:21,492 AT THE PROFESSORSHIP LEVEL AND 1469 00:58:21,492 --> 00:58:23,494 LEADERSHIP POSITIONS YOU SEE A 1470 00:58:23,494 --> 00:58:24,829 DIFFERENT PICTURE. 1471 00:58:24,829 --> 00:58:28,066 PROGRESS IS BEING MADE. 1472 00:58:28,066 --> 00:58:31,603 I DARE SAY MORE IN MEDICINE THAN 1473 00:58:31,603 --> 00:58:32,570 PROBABLY ENGINEERING, BUT A 1474 00:58:32,570 --> 00:58:35,506 WHOLE LOT MORE NEEDS TO BE DONE. 1475 00:58:35,506 --> 00:58:38,276 AND I THINK WE'RE, YOU KNOW, 1476 00:58:38,276 --> 00:58:42,080 INCREASINGLY HAVING MORE ROLE 1477 00:58:42,080 --> 00:58:43,781 MODELS, MORE EXAMPLES, AND WE 1478 00:58:43,781 --> 00:58:45,383 WILL GET THERE. 1479 00:58:45,383 --> 00:58:47,318 HOPEFULLY WE CAN ACCELERATE A 1480 00:58:47,318 --> 00:58:51,923 LITTLE BIT THIS PACE, BUT IT HAS 1481 00:58:51,923 --> 00:58:55,560 BEEN A PRODUCT OF, YOU KNOW, 1482 00:58:55,560 --> 00:58:56,895 UNDERREPRESENTATION AT HIGH 1483 00:58:56,895 --> 00:58:57,128 LEVELS. 1484 00:58:57,128 --> 00:58:58,329 >> WELL, LET'S HOPE EVERYTHING 1485 00:58:58,329 --> 00:59:00,999 KIND OF TRICKLES UP AND WE GET 1486 00:59:00,999 --> 00:59:03,234 BROADER REPRESENTATION AT THOSE 1487 00:59:03,234 --> 00:59:03,768 HIGHEST LEVELS. 1488 00:59:03,768 --> 00:59:06,371 WE'RE AT THE END OF THE HOUR. 1489 00:59:06,371 --> 00:59:10,108 LET ME JUST THANK DR. 1490 00:59:10,108 --> 00:59:11,776 OHNO-MACHADO AGAIN FOR THE 1491 00:59:11,776 --> 00:59:12,310 ENLIGHTENING PRESENTATION. 1492 00:59:12,310 --> 00:59:13,945 THE RECORDING WILL BE ARCHIVED 1493 00:59:13,945 --> 00:59:15,280 THROUGH THE NIH VIDEOCAST 1494 00:59:15,280 --> 00:59:16,180 WEBSITE. 1495 00:59:16,180 --> 00:59:17,148 IF YOU HAVE ANY ADDITIONAL 1496 00:59:17,148 --> 00:59:18,983 QUESTIONS PLEASE FEEL FREE TO 1497 00:59:18,983 --> 00:59:22,954 REACH OUT TO ME, OR TO DR. 1498 00:59:22,954 --> 00:59:24,055 OHNO-MACHADO DIRECTLY, AND I'M 1499 00:59:24,055 --> 00:59:26,124 SURE SHE WOULD BE HAPPY TO 1500 00:59:26,124 --> 00:59:30,728 ANSWER THEM. 1501 00:59:30,728 --> 00:59:31,262 THANK YOU, AGAIN, LUCILA. 1502 00:59:31,262 --> 00:59:41,262 THANK YOU SO.