1 00:00:05,560 --> 00:00:08,520 WELCOME TO THE NIH NATIONAL 2 00:00:08,520 --> 00:00:10,240 LANGUAGE SPECIAL INTEREST 3 00:00:10,240 --> 00:00:12,920 GROUP EVENT. 4 00:00:12,920 --> 00:00:14,360 TODAY WE HAVE A SPECIAL ONE. 5 00:00:14,360 --> 00:00:19,520 WE'RE THRILLED TO WELCOME 6 00:00:19,520 --> 00:00:20,960 DR. LARRY HUNTER PROFESSOR OF 7 00:00:20,960 --> 00:00:21,520 PHARMACOLOGY AND COMPUTER 8 00:00:21,520 --> 00:00:23,880 SCIENCE AT BOULDER, COLORADO. 9 00:00:23,880 --> 00:00:27,800 DR. HUNTER RECEIVED HIS Ph.D. IN 10 00:00:27,800 --> 00:00:29,520 COMPUTER SCIENCE FROM YALE 11 00:00:29,520 --> 00:00:32,680 UNIVERSITY IN 1989 AND THEN 12 00:00:32,680 --> 00:00:34,280 JOINED NIH AS A STAFF SCIENTIST. 13 00:00:34,280 --> 00:00:35,880 FIRST AT THE NATIONAL LIBRARY OF 14 00:00:35,880 --> 00:00:38,000 MEDICINE AND THEN AT THE 15 00:00:38,000 --> 00:00:39,720 NATIONAL CANCER INSTITUTE BEFORE 16 00:00:39,720 --> 00:00:42,320 GOING TO COLORADO IN 2000. 17 00:00:42,320 --> 00:00:43,440 DR. HUNTER IS WIDELY RECOGNIZED 18 00:00:43,440 --> 00:00:50,160 AS ONE OF THE FOUNDERS OF 19 00:00:50,160 --> 00:00:52,200 BIOINFORMATICS AND PUBLISHED ONE 20 00:00:52,200 --> 00:00:54,800 OF THE PAPER IN MACHINE LEARNING 21 00:00:54,800 --> 00:00:55,800 PREDICTIONS OF MOLECULAR 22 00:00:55,800 --> 00:00:56,120 FINDINGS. 23 00:00:56,120 --> 00:00:58,640 HE SERVED AS THE FIRST 24 00:00:58,640 --> 00:01:00,120 PRESIDENT OF THE INTERNATIONAL 25 00:01:00,120 --> 00:01:02,280 SOCIETY FOR COMPUTATIONAL 26 00:01:02,280 --> 00:01:02,520 BIOLOGY. 27 00:01:02,520 --> 00:01:04,600 AND HE CREATED SEVERAL OF THE 28 00:01:04,600 --> 00:01:06,880 MOST IMPORTANT CONFERENCES IN 29 00:01:06,880 --> 00:01:09,520 THE FIELD INCLUDING ISMB, PSB 30 00:01:09,520 --> 00:01:13,480 AND THE ROCKY MOUNTAIN 31 00:01:13,480 --> 00:01:17,360 CONFERENCE ON BIOINFORMATICS. 32 00:01:17,360 --> 00:01:19,040 HIS INTEREST SPANS COGNITIVE 33 00:01:19,040 --> 00:01:20,000 SCIENCE TO DRUG DESIGN. 34 00:01:20,000 --> 00:01:22,600 HE HAS PUBLISHED MORE THAN 200 35 00:01:22,600 --> 00:01:24,960 SCIENTIFIC PAPERS, HOLDS TWO 36 00:01:24,960 --> 00:01:26,680 PATENTS AND HAS BEEN ELECTED A 37 00:01:26,680 --> 00:01:30,480 FELLOW OF BOTH THE ISCB AND THE 38 00:01:30,480 --> 00:01:32,920 AMERICAN COLLEGE OF MEDICAL 39 00:01:32,920 --> 00:01:33,480 INFORMATICS. 40 00:01:33,480 --> 00:01:35,560 HIS PRIMARY FOCUS RECENTLY HAS 41 00:01:35,560 --> 00:01:39,480 BEEN THE INTEGRATION OF NATURAL 42 00:01:39,480 --> 00:01:42,280 LANGUAGE PROCESSING, KNOWLEDGE 43 00:01:42,280 --> 00:01:45,760 REPRESENTATION, MACHINE LEARNING 44 00:01:45,760 --> 00:01:50,000 AND ADVANCED VISUALIZATION 45 00:01:50,000 --> 00:01:52,160 TECHNIQUES TO LOOKING AT HIGH 46 00:01:52,160 --> 00:01:53,240 THROUGHPUT MOLECULAR BIOLOGY. 47 00:01:53,240 --> 00:01:55,680 WITH THAT, DR. HUNTER, THE FLOOR 48 00:01:55,680 --> 00:01:56,560 IS YOURS. 49 00:01:56,560 --> 00:02:00,120 >>IT'S A REAL PLEASURE TO BE 50 00:02:00,120 --> 00:02:01,160 BACK AT NIH EVEN VIRTUALLY. 51 00:02:01,160 --> 00:02:06,960 I APPRECIATE YOU COMING TO THIS. 52 00:02:06,960 --> 00:02:10,120 I'M GOING TALK LITTLE MORE 53 00:02:10,120 --> 00:02:10,800 BROADLY BUT THERE'S GOOD CONTENT 54 00:02:10,800 --> 00:02:12,000 IN HERE. 55 00:02:12,000 --> 00:02:14,320 I WANT TO TALK MORE BROADLY 56 00:02:14,320 --> 00:02:18,080 ABOUT KNOWLEDGE-BASED SYSTEMS. 57 00:02:18,080 --> 00:02:20,640 I KNOW THERE'S A SWING BACK AND 58 00:02:20,640 --> 00:02:22,640 FORTH IN THE HISTORY OF 59 00:02:22,640 --> 00:02:25,960 ARTIFICIAL INTELLIGENCE BETWEEN 60 00:02:25,960 --> 00:02:29,320 NEURAL NETWORK OR OTHER PURELY 61 00:02:29,320 --> 00:02:33,120 DATA-DRIVEN AND KNOWLEDGE-BASED 62 00:02:33,120 --> 00:02:34,800 SYSTEMS NOT ALWAYS SYMBOLIC BUT 63 00:02:34,800 --> 00:02:38,200 OFTEN AND I WANT TO TALK ABOUT 64 00:02:38,200 --> 00:02:40,160 WHY KNOWLEDGE-BASED SYSTEMS ARE 65 00:02:40,160 --> 00:02:41,200 IMPORTANT IN THE PERFORMS FROM 66 00:02:41,200 --> 00:02:42,120 NEURAL NETWORKS. 67 00:02:42,120 --> 00:02:45,000 THE IDEA OF A KNOWLEDGE-BASED 68 00:02:45,000 --> 00:02:46,400 SYSTEMS IS WRITING PROGRAMS AS 69 00:02:46,400 --> 00:02:48,040 IF THEY KNEW ABOUT BIOLOGY AND 70 00:02:48,040 --> 00:02:48,440 MEDICINE. 71 00:02:48,440 --> 00:02:50,120 THAT KIND OF KNOWLEDGE AS WE'LL 72 00:02:50,120 --> 00:02:52,800 TALK ABOUT IN A HUMAN BEING MAY 73 00:02:52,800 --> 00:02:55,600 INCLUDE THINGS LIKE ANSWERING 74 00:02:55,600 --> 00:02:57,040 QUESTIONS OR PASSING EXAMINES. 75 00:02:57,040 --> 00:02:59,600 USING EXISTING KNOWLEDGE TO 76 00:02:59,600 --> 00:03:02,520 EVALUATE OR RANK BIOMEDICAL H 77 00:03:02,520 --> 00:03:04,720 HYPOTHESES MORE COMPATIBLE WITH 78 00:03:04,720 --> 00:03:07,600 WHAT WE KNOW OR INTERPRET DATA 79 00:03:07,600 --> 00:03:10,480 IN LIGHT OF EXISTING KNOWLEDGE 80 00:03:10,480 --> 00:03:13,040 AND WHAT DOES IT TELL US ABOUT 81 00:03:13,040 --> 00:03:14,600 THE OPEN QUESTIONS IN 82 00:03:14,600 --> 00:03:16,160 BIOMEDICINE AND HOW DOES IT 83 00:03:16,160 --> 00:03:17,120 INTERACT WHAT WE THOUGHT WE 84 00:03:17,120 --> 00:03:17,400 ALREADY NEW. 85 00:03:17,400 --> 00:03:19,040 THE HISTORY OF TAKING PRIOR 86 00:03:19,040 --> 00:03:22,160 KNOWLEDGE AND USING IT AS PART 87 00:03:22,160 --> 00:03:25,000 OF ANALYZING DATA GOES BACK TO 88 00:03:25,000 --> 00:03:25,640 BAYESIAN INTERPRETATION BUT 89 00:03:25,640 --> 00:03:26,800 THERE'S OTHER FRAMEWORKS AS 90 00:03:26,800 --> 00:03:27,400 WELL. 91 00:03:27,400 --> 00:03:30,000 I'M GOING TO TALK ABOUT 92 00:03:30,000 --> 00:03:31,600 AUTOMATED REASONING THAT ACTS ON 93 00:03:31,600 --> 00:03:33,240 COMPUTATIONAL REPRESENTATIONS OF 94 00:03:33,240 --> 00:03:33,960 KNOWLEDGE. 95 00:03:33,960 --> 00:03:35,720 THOSE ARE PRIMARILY SYMBOLIC BUT 96 00:03:35,720 --> 00:03:39,520 ALSO SUBSYMBOLIC OR 97 00:03:39,520 --> 00:03:40,000 MACHINE-LEARNING BASED 98 00:03:40,000 --> 00:03:46,360 APPROACHES TO REPRESENTATION AND 99 00:03:46,360 --> 00:03:47,600 PRECIPITATION LEARNING IS A HOT 100 00:03:47,600 --> 00:03:49,760 TOPIC AND AN ARTICLE LAYS OUT 101 00:03:49,760 --> 00:03:51,400 THE LANDSCAPE OF KNOWLEDGE-BASED 102 00:03:51,400 --> 00:03:51,600 SYSTEMS. 103 00:03:51,600 --> 00:03:54,680 WHY I THINK THIS IS SO IMPORTANT 104 00:03:54,680 --> 00:03:59,240 RIGHT NOW IS BECAUSE THERE'S AN 105 00:03:59,240 --> 00:04:00,960 AREA OF SCIENTIFIC WORK NOT YET 106 00:04:00,960 --> 00:04:02,680 WELL SUPPORT AUTOMATION. 107 00:04:02,680 --> 00:04:06,000 I CONSIDER THAT DISCUSSION 108 00:04:06,000 --> 00:04:06,280 INFORMATICS. 109 00:04:06,280 --> 00:04:09,400 BIOINFORMATICS THE LAST 30 YEARS 110 00:04:09,400 --> 00:04:11,120 HAS BEEN GETTING INCREASINGLY 111 00:04:11,120 --> 00:04:12,800 GOOD AT PROCESSING RAW DATA THAT 112 00:04:12,800 --> 00:04:15,200 COMES FROM INSTRUMENTS, P AND A 113 00:04:15,200 --> 00:04:16,880 SEQUENCE TO PRODUCE THE THINGS 114 00:04:16,880 --> 00:04:18,200 THAT GO IN THE RESULT SECTION OF 115 00:04:18,200 --> 00:04:19,440 YOUR PAPER. 116 00:04:19,440 --> 00:04:22,640 WHAT ARE STATISTICALLY OVER 117 00:04:22,640 --> 00:04:25,600 REPRESENTED TRANSCRIPT 118 00:04:25,600 --> 00:04:25,880 ABUNDANCES. 119 00:04:25,880 --> 00:04:27,000 WHAT ARE VARIANCES ASSOCIATED 120 00:04:27,000 --> 00:04:29,480 WITH A PARTICULAR PHENOTYPE. 121 00:04:29,480 --> 00:04:32,560 THREES A WIDE RANGE OF BIO 122 00:04:32,560 --> 00:04:33,960 INFORMATICS TOOL THAT MAKE IT 123 00:04:33,960 --> 00:04:35,600 POSSIBLE TO ANALYZE LARGE 124 00:04:35,600 --> 00:04:37,440 AMOUNTS OF DATA AND TURN IT INTO 125 00:04:37,440 --> 00:04:41,960 THE FIGURES THAT GO IN THE 126 00:04:41,960 --> 00:04:42,320 RESULT SECTION. 127 00:04:42,320 --> 00:04:43,600 HOWEVER, THE DISCUSSION SECTION 128 00:04:43,600 --> 00:04:46,120 OF THE PAPER WHERE WE INTERPRET 129 00:04:46,120 --> 00:04:47,320 THOSE RESULTS AND TRY TO 130 00:04:47,320 --> 00:04:48,800 UNDERSTAND WHAT THEY MEAN FOR 131 00:04:48,800 --> 00:04:50,440 HUMAN HEALTH AND UNDERSTANDING 132 00:04:50,440 --> 00:04:53,680 OF THE FUNDAMENTAL MECHANISMS OF 133 00:04:53,680 --> 00:04:54,920 BIOLOGY COMPUTATIONAL TOOLS ARE 134 00:04:54,920 --> 00:04:58,160 NOT REALLY ALL THAT GREAT AT 135 00:04:58,160 --> 00:05:00,440 HELPING SCIENTISTS UNDERSTAND 136 00:05:00,440 --> 00:05:01,960 THE RESULTS AND SO WHAT I'VE 137 00:05:01,960 --> 00:05:03,400 BEEN INTERESTED IN FOR THE LAST 138 00:05:03,400 --> 00:05:06,880 FEW YEARS IS COMPUTATIONAL 139 00:05:06,880 --> 00:05:07,440 SUPPORT FOR WRITING THE 140 00:05:07,440 --> 00:05:09,040 DISCUSSION SECTION OF A PAPER. 141 00:05:09,040 --> 00:05:10,440 THAT INCLUDES A WIDE VARIETY OF 142 00:05:10,440 --> 00:05:13,680 DIFFERENT ACTIVITIES, GENERATING 143 00:05:13,680 --> 00:05:15,600 HYPOTHESES THAT WOULD EXPLAIN 144 00:05:15,600 --> 00:05:20,320 THE RESULTS WE COVER -- 145 00:05:20,320 --> 00:05:21,000 OBSERVED. 146 00:05:21,000 --> 00:05:23,320 DRAWING DIAGRAMS THAT CAPTURE A 147 00:05:23,320 --> 00:05:26,560 SET OF RESULTS SOMETIMES WE CALL 148 00:05:26,560 --> 00:05:27,600 THOSE MOLECULAR CARTOONS THAT 149 00:05:27,600 --> 00:05:31,600 ARE COMMON IN LOTS OF PAPERS IN 150 00:05:31,600 --> 00:05:32,640 PATHWAYS AND ALSO DESCRIBING 151 00:05:32,640 --> 00:05:36,920 EVENTS THAT UNFOLD OVER TIME AND 152 00:05:36,920 --> 00:05:38,200 MAYBE EVEN AT THE TISSUE LEVEL 153 00:05:38,200 --> 00:05:39,920 AND I'M GOING TO TALK MOSTLY AT 154 00:05:39,920 --> 00:05:44,680 THE NLP PIECE OF THIS TALK ABOUT 155 00:05:44,680 --> 00:05:45,520 PROVIDING CONTEXT FOR THE 156 00:05:45,520 --> 00:05:47,720 RESULTS AND HOW DO THE RESULTS 157 00:05:47,720 --> 00:05:49,520 WE FOUND IN THIS EXPERIMENT 158 00:05:49,520 --> 00:05:50,360 RELATE TO THE EXISTING 159 00:05:50,360 --> 00:05:51,680 LITERATURE AND ADDRESS OPEN 160 00:05:51,680 --> 00:05:52,320 QUESTIONS OR PERHAPS CREATE NEW 161 00:05:52,320 --> 00:05:56,680 ONES? 162 00:05:56,680 --> 00:06:00,520 SO THE SINGLE MOST WIDELY USED 163 00:06:00,520 --> 00:06:01,200 COMPUTATIONAL APPROACH FOR 164 00:06:01,200 --> 00:06:02,600 HELPING PEOPLE INTERPRET RESULTS 165 00:06:02,600 --> 00:06:07,600 IS CALLED ENRICHMENT AND WHAT WE 166 00:06:07,600 --> 00:06:10,960 DO OFTEN STAKE A LIST OF GENES 167 00:06:10,960 --> 00:06:13,440 OR GENE PRODUCTS OR METABOLITES 168 00:06:13,440 --> 00:06:15,520 OR OTHER FACTORS THAT COME FROM 169 00:06:15,520 --> 00:06:16,840 AN EXPERIMENT THAT SEEM TO BE 170 00:06:16,840 --> 00:06:18,280 RELEVANT SOMEHOW TO THE QUESTION 171 00:06:18,280 --> 00:06:21,560 THAT WAS BEING ASKED AND THEN 172 00:06:21,560 --> 00:06:23,600 IDENTIFY A SET OF THINGS THAT WE 173 00:06:23,600 --> 00:06:26,360 KNOW ABOUT THOSE GENES OR GENE 174 00:06:26,360 --> 00:06:28,080 PRODUCTS OFTEN BASED IN ONTOLOGY 175 00:06:28,080 --> 00:06:30,120 TERMS THAT HAVE BEEN ANNOTATED 176 00:06:30,120 --> 00:06:31,160 TO THEM MANUALLY. 177 00:06:31,160 --> 00:06:34,200 WE CALCULATE A BACKGROUND 178 00:06:34,200 --> 00:06:38,200 DISTRIBUTION ABOUT WHAT WE WOULD 179 00:06:38,200 --> 00:06:41,280 EXPECT TO SEE BY CHANCE SOME 180 00:06:41,280 --> 00:06:42,920 BIOLOGICAL PHENOMENON ARE 181 00:06:42,920 --> 00:06:43,160 COMMON. 182 00:06:43,160 --> 00:06:45,880 INFLAMMATION IS A COMMON THING 183 00:06:45,880 --> 00:06:47,160 ONE OBSERVES IN MANY 184 00:06:47,160 --> 00:06:50,480 EXPERIMENTS, CHANGES IN 185 00:06:50,480 --> 00:06:52,160 METABOLISM LOOK FOR TERMS THAT 186 00:06:52,160 --> 00:06:53,680 APPEAR MORE FREQUENTLY THAN 187 00:06:53,680 --> 00:06:55,280 WOULD BE EXPECTED BY CHANCE BY 188 00:06:55,280 --> 00:06:57,360 WHATEVER THE PROCESS WAS THAT 189 00:06:57,360 --> 00:06:58,200 THIS UNDERLYING LIST. 190 00:06:58,200 --> 00:07:01,160 THERE'S A LOT OF DIFFERENT 191 00:07:01,160 --> 00:07:02,760 ENRICHMENT ENGINES OUT THERE, 192 00:07:02,760 --> 00:07:04,160 NIH RUNS ONE CALLED DAVID WHICH 193 00:07:04,160 --> 00:07:04,560 IS POPULAR. 194 00:07:04,560 --> 00:07:07,800 THIS IS A TYPICAL RESULT FROM 195 00:07:07,800 --> 00:07:18,360 ONE OF THESE ENRICHMENT STUDIES. 196 00:07:20,840 --> 00:07:22,680 YOU SEE A LIST OF TERMS THAT ARE 197 00:07:22,680 --> 00:07:26,240 SIMILAR TO EACH OTHER AND MAY 198 00:07:26,240 --> 00:07:27,520 NOTICE THESE ARE SOMEWHAT VAGUE 199 00:07:27,520 --> 00:07:28,560 AS WELL AS OVERLAPPING. 200 00:07:28,560 --> 00:07:31,320 THEY HELP US UNDERSTAND A LITTLE 201 00:07:31,320 --> 00:07:34,200 BIT OF WHAT'S GOING ON IN OUR 202 00:07:34,200 --> 00:07:35,800 EXPERIMENT RESULTS BUT NOT 203 00:07:35,800 --> 00:07:36,960 HELPFUL IN THE WRITING 204 00:07:36,960 --> 00:07:37,400 DISCUSSION SESSION. 205 00:07:37,400 --> 00:07:39,600 WHAT CAN WE DO MORE THAN 206 00:07:39,600 --> 00:07:47,720 ENRICHMENT? 207 00:07:47,720 --> 00:07:49,840 WE CAN DO OTHER KINDS OF 208 00:07:49,840 --> 00:07:51,080 STATISTICAL NORMALIZATION. 209 00:07:51,080 --> 00:07:52,080 FOR EXAMPLE, LOOK AT TERMS THAT 210 00:07:52,080 --> 00:07:54,200 TEND TO BE ENRICHES IN MANY 211 00:07:54,200 --> 00:07:58,240 EXPERIMENTS RATHER THAN ONE 212 00:07:58,240 --> 00:08:00,200 WE'RE LOOKING AT A DIFFERENT 213 00:08:00,200 --> 00:08:01,120 BASELINE THAN LOOKING AT RANDOM 214 00:08:01,120 --> 00:08:05,560 GENE PRODUCTS. 215 00:08:05,560 --> 00:08:07,640 WE NEED SPECIFIC AND IMPORTANT 216 00:08:07,640 --> 00:08:10,240 HYPOTHESES OF WHAT'S GOING ON IN 217 00:08:10,240 --> 00:08:13,160 THE LARGE SCALE RESULTS. 218 00:08:13,160 --> 00:08:15,600 WHAT REGULATORS NEED TO MAKE 219 00:08:15,600 --> 00:08:17,720 DECISIONS ABOUT THE SAFETY OF 220 00:08:17,720 --> 00:08:20,280 CHEMICAL COMPOUNDS OR THE 221 00:08:20,280 --> 00:08:22,400 EFFECTIVENESS OF CHEMICAL ENTITY 222 00:08:22,400 --> 00:08:24,240 MAY BE DRUGS ARE MECHANISTIC 223 00:08:24,240 --> 00:08:24,600 ACCOUNTS. 224 00:08:24,600 --> 00:08:25,800 WHO ARE THE PLAYERS? 225 00:08:25,800 --> 00:08:31,960 HOW DO THEY INTERACT AND WHAT DO 226 00:08:31,960 --> 00:08:34,640 THAT DO OR SUSTAIN OR CHANGE? 227 00:08:34,640 --> 00:08:36,960 AND THERE'S A METALEVEL AT WHICH 228 00:08:36,960 --> 00:08:38,160 DISCUSSION SECTIONS WORK. 229 00:08:38,160 --> 00:08:40,120 WE DON'T JUST TALK ABOUT WHAT WE 230 00:08:40,120 --> 00:08:41,560 BELIEVE THE FACTS TO BE BUT TALK 231 00:08:41,560 --> 00:08:43,400 ABOUT WHAT THE EVIDENCE FOR THE 232 00:08:43,400 --> 00:08:46,320 CLAIMS ARE. 233 00:08:46,320 --> 00:08:50,000 CITATIONS TO THE LITERATURE 234 00:08:50,000 --> 00:08:51,320 DISCUSSIONS OF OTHER DATA, KINDS 235 00:08:51,320 --> 00:08:52,720 OF INFERENCE ARE ALL IMPORTANT 236 00:08:52,720 --> 00:08:55,120 IN DISCUSSING RESULTS HOW CAN 237 00:08:55,120 --> 00:08:57,240 COMPUTER PROGRAMS HELP US DO 238 00:08:57,240 --> 00:08:58,240 THESE THINGS? 239 00:08:58,240 --> 00:09:00,120 I LAY THIS OUT TO GIVE YOU A 240 00:09:00,120 --> 00:09:02,600 LANDSCAPE OF WHAT OUR GOALS ARE 241 00:09:02,600 --> 00:09:07,000 AND DISCUSSION INFORMATICS. 242 00:09:07,000 --> 00:09:09,280 IN ORDER TO DO THIS WORK WE NEED 243 00:09:09,280 --> 00:09:09,920 COMPUTATIONAL REPRESENTATIONS OF 244 00:09:09,920 --> 00:09:14,680 WHAT IS ALREADY KNOWN. 245 00:09:14,680 --> 00:09:16,240 DISCUSSION INFORMATICS HAS A 246 00:09:16,240 --> 00:09:17,680 BROAD REPRESENTATION OF EXISTING 247 00:09:17,680 --> 00:09:19,000 BIOMEDICAL KNOWLEDGE AND OUR 248 00:09:19,000 --> 00:09:22,240 APPROACH IN MY LAB HAS BEEN TO 249 00:09:22,240 --> 00:09:24,120 USE SYMBOLIC APPROACHES USING 250 00:09:24,120 --> 00:09:25,960 ONTOLOGY AS THE FOUNDATION AND 251 00:09:25,960 --> 00:09:31,400 NOT JUST ONTOLOGIES BECAUSE THEY 252 00:09:31,400 --> 00:09:42,000 THEMSELVES ARE SORT OF AUTOMIC 253 00:09:42,960 --> 00:09:48,720 AND I SHOW A CARTOON GIVING A 254 00:09:48,720 --> 00:09:52,240 SENSE OF WHAT WE'RE LOOKING FOR 255 00:09:52,240 --> 00:09:54,840 WHEN WE'RE TRYING TO REPRESENT 256 00:09:54,840 --> 00:09:59,280 BIOLOGICAL KNOWLEDGE. 257 00:09:59,280 --> 00:10:02,440 WE'RE TRYING TO TAKE CELLULAR 258 00:10:02,440 --> 00:10:04,560 MUTATIONS, WHAT'S IN THE NUCLEUS 259 00:10:04,560 --> 00:10:07,400 AND CYTOPLASM AND MOLECULAR 260 00:10:07,400 --> 00:10:11,000 PLAYERS, PROTEINS, 261 00:10:11,000 --> 00:10:13,400 POST-TRANSLATIONALLY MODIFIED 262 00:10:13,400 --> 00:10:16,320 PROTEINS, HIGHER ORDER 263 00:10:16,320 --> 00:10:18,920 COMBINATIONS OF THINGS, 264 00:10:18,920 --> 00:10:20,840 MOLECULAR PROCESSES LIKE 265 00:10:20,840 --> 00:10:22,280 OXIDATIVE STRESS AND THE ARROWS 266 00:10:22,280 --> 00:10:26,480 INDICATE CAUSAL OR TEMPORAL 267 00:10:26,480 --> 00:10:30,360 RELATIONSHIPS SUCH AS REGULATION 268 00:10:30,360 --> 00:10:31,600 OR ENZYMATIC TRANSFORMATION OF 269 00:10:31,600 --> 00:10:33,120 THE ENTITIES. 270 00:10:33,120 --> 00:10:35,200 WE CAPTURE ALL THAT STUFF IN A 271 00:10:35,200 --> 00:10:38,040 WAY THAT COMMUNICATES WELL TO 272 00:10:38,040 --> 00:10:38,360 SCIENTISTS. 273 00:10:38,360 --> 00:10:39,640 THERE'S BEEN A WIDE VARIETY OF 274 00:10:39,640 --> 00:10:43,600 ATTEMPTS TO DO THIS. 275 00:10:43,600 --> 00:10:46,600 THE COMPUTATIONAL METHODS ARE 276 00:10:46,600 --> 00:10:47,720 OFTEN CALLED KNOWLEDGE GRAPHS. 277 00:10:47,720 --> 00:10:50,640 THIS GOES BACK FOR A DECADE, 278 00:10:50,640 --> 00:10:53,320 MAYBE A LITTLE BIT MORE, THE 279 00:10:53,320 --> 00:10:56,360 EARLIER VERSIONS OF THIS WORK 280 00:10:56,360 --> 00:10:58,760 AND WHAT MICHELLE CALLED A 281 00:10:58,760 --> 00:10:59,360 MASH-UP. 282 00:10:59,360 --> 00:11:02,200 SHE DIDN'T WORRY ABOUT THE 283 00:11:02,200 --> 00:11:03,240 SEMANTIC CONSISTENCY OR THE 284 00:11:03,240 --> 00:11:07,440 TERMS USED OR ONTOLOGICAL UNDER 285 00:11:07,440 --> 00:11:07,720 PINNINGS. 286 00:11:07,720 --> 00:11:11,040 HE IS JUST TRIED TO CAPTURE THE 287 00:11:11,040 --> 00:11:14,840 RELATIONSHIPS BETWEEN DRUGS AND 288 00:11:14,840 --> 00:11:16,280 DISEASES AND SYMPTOMS AND 289 00:11:16,280 --> 00:11:17,200 BIOLOGICAL PROCESSES, SIDE 290 00:11:17,200 --> 00:11:20,040 EFFECTS AND IS THE LIKE. 291 00:11:20,040 --> 00:11:23,520 AND THERE'S BEEN CONSISTENT 292 00:11:23,520 --> 00:11:24,360 IMPROVEMENTS IN THIS. 293 00:11:24,360 --> 00:11:26,960 THE WORK AT UCSF AND THEN AT 294 00:11:26,960 --> 00:11:31,360 PENN WAS IN TRYING TO ASSEMBLE 295 00:11:31,360 --> 00:11:39,600 THESE INTO MORE SEMANTICALLY 296 00:11:39,600 --> 00:11:44,640 COHERENT APPROACH AND THE WORK 297 00:11:44,640 --> 00:11:51,360 IN SAUDI ARABIA LOOKED AT NEU 298 00:11:51,360 --> 00:11:52,480 NEUROSYMBNEU 299 00:11:52,480 --> 00:11:53,800 NEUROSYMBOLIC REPRESENTATIONS 300 00:11:53,800 --> 00:11:56,480 AND OPENS DOORS TO THE LEARNED 301 00:11:56,480 --> 00:11:57,560 REPRESENTATIONS AS INPUTS TO 302 00:11:57,560 --> 00:11:59,600 NEURAL NETWORKS TO TAKE 303 00:11:59,600 --> 00:12:01,440 ADVANTAGE OF THE POWER OF DEEP 304 00:12:01,440 --> 00:12:03,480 LEARNING. 305 00:12:03,480 --> 00:12:05,800 AND ALSO TO DO VICTORY SPACE 306 00:12:05,800 --> 00:12:06,840 CALCULATIONS AND EDGE PREDICTION 307 00:12:06,840 --> 00:12:08,200 IN THE GRAPHS AND THE LIKE. 308 00:12:08,200 --> 00:12:09,400 THERE'S BEEN A GREAT DEAL OF 309 00:12:09,400 --> 00:12:11,480 WORK IN THIS AREA OVER THE LAST 310 00:12:11,480 --> 00:12:15,240 COUPLE OF DECADES. 311 00:12:15,240 --> 00:12:16,400 THE REPRESENTATIONS OF KNOWLEDGE 312 00:12:16,400 --> 00:12:19,480 AND GRAPHS OFTEN WHEN COMBINED 313 00:12:19,480 --> 00:12:22,280 WITH NEURAL NETWORK KINDS OF 314 00:12:22,280 --> 00:12:24,680 LEARNING GIVE US GREAT POWER TO 315 00:12:24,680 --> 00:12:26,880 TRY AND DO THIS OPEN TASK OF 316 00:12:26,880 --> 00:12:28,240 SUPPORTING THE WRITING OF 317 00:12:28,240 --> 00:12:29,960 DISCUSSION SECTIONS AND THE 318 00:12:29,960 --> 00:12:31,600 INTERPRETATION OF OUR MOLECULAR 319 00:12:31,600 --> 00:12:34,000 BIOLOGY RESULTS. 320 00:12:34,000 --> 00:12:36,280 THE GENE ONTOLOGY FOLKS HAVE 321 00:12:36,280 --> 00:12:37,400 ALSO BEEN TRYING TO MOVE AWAY 322 00:12:37,400 --> 00:12:40,280 FROM SIMPLE TRIPLES INTO MORE 323 00:12:40,280 --> 00:12:41,240 COMPLEX GRAPHS. 324 00:12:41,240 --> 00:12:48,840 AND THEY'VE BEEN FOCUSSING ON 325 00:12:48,840 --> 00:12:50,240 CAUSAL ACTIVITY MODELS AND 326 00:12:50,240 --> 00:12:51,800 THERE'S AN URL AT THE BOTTOM. 327 00:12:51,800 --> 00:12:54,160 THE IDEA IS TO TRY TO TAKE A SET 328 00:12:54,160 --> 00:12:58,280 OF MOLECULAR RESULTS AND TIE 329 00:12:58,280 --> 00:13:01,920 THEM TOGETHER INTO 330 00:13:01,920 --> 00:13:03,360 REPRESENTATIONS AROUND 331 00:13:03,360 --> 00:13:06,840 REGULATION, TRANSPORT THE 332 00:13:06,840 --> 00:13:07,640 UNDERLYING MECHANISMS THAT 333 00:13:07,640 --> 00:13:10,280 DESCRIBE THE BIOLOGICAL PROCESS 334 00:13:10,280 --> 00:13:12,760 ARE CAPTURED IN THE 335 00:13:12,760 --> 00:13:13,560 REPRESENTATIONS. 336 00:13:13,560 --> 00:13:17,000 THIS WORK IS SLOW IN THAT ALL 337 00:13:17,000 --> 00:13:20,520 GENE ONTOLOGY IS BECAUSE IT 338 00:13:20,520 --> 00:13:22,560 DEPENDS ON MANUAL ANNOTATION OF 339 00:13:22,560 --> 00:13:24,000 THE LITERATURE BUT THEY'RE 340 00:13:24,000 --> 00:13:25,000 POWERFUL AND MAKE POSSIBLE 341 00:13:25,000 --> 00:13:26,640 CERTAIN KINDS OF INFERENCE AND 342 00:13:26,640 --> 00:13:28,600 THIGHS ARE HIGHLY RELIABLE 343 00:13:28,600 --> 00:13:31,600 DIAGRAMS COMPARED TO THE ONES 344 00:13:31,600 --> 00:13:37,400 ASSEMBLED AUTOMATICALLY SUCH AS 345 00:13:37,400 --> 00:13:38,120 THE OTHERS. 346 00:13:38,120 --> 00:13:40,920 OUR WORK LIVE IN THE MIDDLE. 347 00:13:40,920 --> 00:13:42,120 WE TRY TO BE CAREFUL WITH THE 348 00:13:42,120 --> 00:13:47,400 WAY WE ASSEMBLE THESE. 349 00:13:47,400 --> 00:13:51,520 THESE ARE NOT FULLY AUTOMATIC 350 00:13:51,520 --> 00:13:53,920 BUT AND WE HAVE A SYSTEM THAT 351 00:13:53,920 --> 00:13:57,400 WAS DEVELOPED BY A GRADUATE 352 00:13:57,400 --> 00:14:02,200 STUDENT NAMED TIFFANY CALLAHAN 353 00:14:02,200 --> 00:14:09,080 THAT HAS A MANUSCRIPT UNDER 354 00:14:09,080 --> 00:14:11,920 REVIEW AND THIS LINK THE KIND OF 355 00:14:11,920 --> 00:14:13,200 INFORMATION WE HAVE ABOUT 356 00:14:13,200 --> 00:14:14,360 INDIVIDUAL PATIENTS, WHICH YOU 357 00:14:14,360 --> 00:14:16,320 CAN SEE UP AT THE TOP OF THIS. 358 00:14:16,320 --> 00:14:18,560 SO THINGS LIKE THE DRUGS THAT 359 00:14:18,560 --> 00:14:23,960 WE'RE GIVING THEM AND THE 360 00:14:23,960 --> 00:14:26,680 DISEASES THAT THEY HAVE THE 361 00:14:26,680 --> 00:14:28,520 KINDS OF CHEMICALS AND 362 00:14:28,520 --> 00:14:31,120 CO-FACTORS AND CATALYSTS THAT 363 00:14:31,120 --> 00:14:33,400 PLAY A ROLE IN BIOLOGICAL 364 00:14:33,400 --> 00:14:35,000 ACTIVITIES AND ALSO VACCINES. 365 00:14:35,000 --> 00:14:37,720 WE DID WORK WITH COVID HERE AND 366 00:14:37,720 --> 00:14:39,880 TIE THEM USING THE ELEMENTS IN 367 00:14:39,880 --> 00:14:41,520 GREEN, THE RELATIONSHIPS AND THE 368 00:14:41,520 --> 00:14:44,160 DEEPER KNOWLEDGE OF THE 369 00:14:44,160 --> 00:14:45,040 UNDERLYING MOLECULAR BIOLOGY 370 00:14:45,040 --> 00:14:47,600 THAT BOTTOMS OUT AT THE BOTTOM 371 00:14:47,600 --> 00:14:50,520 IN DATABASES OF KNOWLEDGE ABOUT 372 00:14:50,520 --> 00:14:52,280 MOLECULES SUCH AS THE HUMAN 373 00:14:52,280 --> 00:14:54,880 PROTEIN ATLAS OR THE GENOTYPE 374 00:14:54,880 --> 00:14:56,200 TISSUE EXPRESSION SUBJECT SO WE 375 00:14:56,200 --> 00:14:57,200 KNOW WHAT GENES ARE BEING 376 00:14:57,200 --> 00:14:59,240 EXPRESSED IN WHAT TISSUES AND 377 00:14:59,240 --> 00:15:03,280 ALL THIS TIES TOGETHER ON A 378 00:15:03,280 --> 00:15:06,760 MOLECULAR BASIS FOR THE 379 00:15:06,760 --> 00:15:08,040 DIAGNOSES, LABS AND THERAPIES 380 00:15:08,040 --> 00:15:08,680 PATIENTS GET. 381 00:15:08,680 --> 00:15:12,720 THE IDEA IS TO TRY TO UNDERSTAND 382 00:15:12,720 --> 00:15:14,240 MOLECULAR BASIS OF THESE THOUGH 383 00:15:14,240 --> 00:15:15,120 WE DON'T HAVE MOLECULAR DATA 384 00:15:15,120 --> 00:15:17,480 ABOUT THE PATIENTS. 385 00:15:17,480 --> 00:15:19,880 IN SOME SENSE DIAGNOSES AND 386 00:15:19,880 --> 00:15:22,760 DRUGS AND LAB RESULTS IMPLY 387 00:15:22,760 --> 00:15:23,600 MOLECULAR INFORMATION ABOUT 388 00:15:23,600 --> 00:15:24,280 INDIVIDUAL PATIENTS. 389 00:15:24,280 --> 00:15:26,360 THIS IS HOW WE TIE THEM ALL 390 00:15:26,360 --> 00:15:27,600 TOGETHER. 391 00:15:27,600 --> 00:15:28,600 TIFFANY'S WORK DID AMAZING 392 00:15:28,600 --> 00:15:28,800 STUFF. 393 00:15:28,800 --> 00:15:30,080 I'LL TALK MORE ABOUT IT IN THE 394 00:15:30,080 --> 00:15:31,600 FUTURE BUT YOU CAN SEE WHAT'S 395 00:15:31,600 --> 00:15:33,680 GOING ON AT HER GITHUB ACCOUNT. 396 00:15:33,680 --> 00:15:36,640 THIS IS CONTINUALLY KEPT UP TO 397 00:15:36,640 --> 00:15:39,880 DATE AND HOPE THIS IS A RESOURCE 398 00:15:39,880 --> 00:15:41,200 FOR THE COMMUNITY. 399 00:15:41,200 --> 00:15:49,080 WE CAN DO A LOT OF THINGS WITH 400 00:15:49,080 --> 00:15:52,880 THINGS LINE PHEKNOWLATER AND 401 00:15:52,880 --> 00:15:55,160 THIS IS A QUERY WHERE WE TRY TO 402 00:15:55,160 --> 00:15:57,400 LOOKED AT POTENTIAL DRUG-DRUG 403 00:15:57,400 --> 00:16:03,200 INTERACTION AND TRY TO FIND 404 00:16:03,200 --> 00:16:04,320 SYNERGIES AND FIND POTENTIAL 405 00:16:04,320 --> 00:16:07,600 ADVERSE AFFECTS FROM TWO DRUGS 406 00:16:07,600 --> 00:16:09,000 BEING PRESCRIBED SIMULTANEOUSLY 407 00:16:09,000 --> 00:16:11,280 AND LOOKED AT PATHWAYS THROUGH 408 00:16:11,280 --> 00:16:12,240 PROTEINS AND PROCESSES. 409 00:16:12,240 --> 00:16:19,120 THIS QUERY CAN BE WRITTEN AS A 410 00:16:19,120 --> 00:16:21,760 SPARKLE, A QUERY LANGUAGE FOR 411 00:16:21,760 --> 00:16:24,080 THE WEB ONTOLOGY KNOWLEDGE AND 412 00:16:24,080 --> 00:16:24,880 STANDARDS FOR EXPRESSING 413 00:16:24,880 --> 00:16:26,280 KNOWLEDGE GRAPHS AND SO THERE'S 414 00:16:26,280 --> 00:16:27,920 A GOOD SET OF UNDERLYING 415 00:16:27,920 --> 00:16:30,960 COMPUTATIONAL TOOLS THAT HAVE 416 00:16:30,960 --> 00:16:32,960 BEEN STANDARDIZED AND THERE'S A 417 00:16:32,960 --> 00:16:33,920 LOT OF PUBLIC AND PRIVATE TOOL 418 00:16:33,920 --> 00:16:35,840 FOR BUILDING THESE SORTS OF 419 00:16:35,840 --> 00:16:37,560 KNOWLEDGE GRAPHS AND QUERYING 420 00:16:37,560 --> 00:16:37,760 THEM. 421 00:16:37,760 --> 00:16:43,040 SO SPARKLE IS THE EQUIVALENT OF 422 00:16:43,040 --> 00:16:45,400 A SEQUEL QUERY AND YOU CAN SEE 423 00:16:45,400 --> 00:16:47,040 ON THE FAR LEFT AND RIGHT ARE 424 00:16:47,040 --> 00:16:49,120 DRUGS ON THE GRAPH AND THE 425 00:16:49,120 --> 00:16:50,280 PURPLE ELEMENTS ARE PROTEINS -- 426 00:16:50,280 --> 00:16:52,480 NO, THE GRAY ELEMENTS ARE 427 00:16:52,480 --> 00:16:55,480 PROTEIN, THE PURPLE ELEMENTS ARE 428 00:16:55,480 --> 00:16:56,760 BIOLOGICAL PROCESSES. 429 00:16:56,760 --> 00:16:58,800 YOU SEE SETS OF PROCESSES THAT 430 00:16:58,800 --> 00:17:01,160 RELATE THE DRUGS THAT PROVIDE 431 00:17:01,160 --> 00:17:02,720 MECHANISTIC HYPOTHESES ABOUT 432 00:17:02,720 --> 00:17:05,360 THEIR SYNERGIES OR POTENTIAL 433 00:17:05,360 --> 00:17:06,720 TOXICITIES. 434 00:17:06,720 --> 00:17:07,920 THESE HAVE BEEN USED AND IT'S A 435 00:17:07,920 --> 00:17:09,600 BASIC WAY OF USING THE KNOWLEDGE 436 00:17:09,600 --> 00:17:09,960 GRAPH. 437 00:17:09,960 --> 00:17:12,720 IT STILL HAS BEEN USEFUL IN 438 00:17:12,720 --> 00:17:14,760 TRYING TO INTERPRET EXISTING 439 00:17:14,760 --> 00:17:15,200 DATA. 440 00:17:15,200 --> 00:17:17,480 A RICHER WAY TO GO AFTER THIS 441 00:17:17,480 --> 00:17:19,600 WAS DONE BY ANOTHER FAIRLY 442 00:17:19,600 --> 00:17:24,840 RECENT GRADUATE IN THE LAB, 443 00:17:24,840 --> 00:17:28,160 AGNACIA CAPOTI WAS INTERESTED IN 444 00:17:28,160 --> 00:17:31,320 THE MECHANISMS OF TOXICITY. 445 00:17:31,320 --> 00:17:32,520 THIS IS IMPORTANT BECAUSE WE SEE 446 00:17:32,520 --> 00:17:33,600 A LOT OF PROGRESS IN THE LAST 447 00:17:33,600 --> 00:17:36,960 DECADE OR SO OF TRYING TO 448 00:17:36,960 --> 00:17:38,840 PREDICT TOXICITY FROM CELLULAR 449 00:17:38,840 --> 00:17:41,800 ASSAYS, GENE EXPRESSION OF 450 00:17:41,800 --> 00:17:43,600 USUALLY HUMAN CELL LINES WITH 451 00:17:43,600 --> 00:17:46,200 POTENTIAL TOXICANTS AND USING 452 00:17:46,200 --> 00:17:47,600 THE GENE EXPRESSIONS TO TRY 453 00:17:47,600 --> 00:17:49,800 THROUGH MACHINE LEARNING MAKE 454 00:17:49,800 --> 00:17:50,520 PREDICTORS ABOUT TOXICITY. 455 00:17:50,520 --> 00:17:53,480 SO TOXIC OR NOT OR PERHAPS TEXAS 456 00:17:53,480 --> 00:17:59,600 IN PARTICULAR DOMAINS, CARDIO 457 00:17:59,600 --> 00:18:00,760 TOXICITY AND SO FORTH. 458 00:18:00,760 --> 00:18:02,120 THOUGH WE'RE FAIRLY ACCURATE AT 459 00:18:02,120 --> 00:18:08,400 MAKING THE PREDICTS, REGULATORS, 460 00:18:08,400 --> 00:18:11,560 FDA AND OTHERS HAVE BEEN 461 00:18:11,560 --> 00:18:14,040 RESISTANT TO TAKE THEM UP AND 462 00:18:14,040 --> 00:18:16,360 REGULATORS BEFORE THEY MAKE 463 00:18:16,360 --> 00:18:17,400 DECISIONS THAT AFFECT 464 00:18:17,400 --> 00:18:19,600 POTENTIALLY BILLIONS OF DOLLARS 465 00:18:19,600 --> 00:18:23,240 IN REVENUE AND THE HEALTH OF THE 466 00:18:23,240 --> 00:18:26,680 PUBLIC, REALLY DEMAND 467 00:18:26,680 --> 00:18:27,240 MECHANISTIC ACCOUNTS OF 468 00:18:27,240 --> 00:18:29,240 POTENTIAL TOXICITIES AND 469 00:18:29,240 --> 00:18:34,320 THEREFORE ANIMAL EXPERIMENTS 470 00:18:34,320 --> 00:18:39,920 UNTIL RECENTLY. 471 00:18:39,920 --> 00:18:42,640 AND I LIKE TO TEASE HIM HE'S THE 472 00:18:42,640 --> 00:18:46,320 ONLY ARGENTINIAN VEGAN AND 473 00:18:46,320 --> 00:18:48,280 WANTED TO GET AWAY FROM THIS AND 474 00:18:48,280 --> 00:18:51,160 TRY TO AFFIRM MECHANISMS OF 475 00:18:51,160 --> 00:18:53,680 ACTION FROM THE SAME CELLULAR 476 00:18:53,680 --> 00:18:54,480 TRANSCRIPTOME DATA. 477 00:18:54,480 --> 00:18:58,320 WHAT HE DID WAS A MOLECULAR 478 00:18:58,320 --> 00:19:00,160 TOXICOLOGY TEXTBOOK I'M SHOWING 479 00:19:00,160 --> 00:19:03,720 HERE AND CURATED 11 DIFFERENT 480 00:19:03,720 --> 00:19:05,760 MECHANISMS AND USED THE OPEN 481 00:19:05,760 --> 00:19:06,960 BIOMEDICAL ONTOLOGIES AND THEIR 482 00:19:06,960 --> 00:19:08,560 CAUSAL PATHWAYS FOR THEM AND 483 00:19:08,560 --> 00:19:10,680 THEN HE COULD USE THE 484 00:19:10,680 --> 00:19:12,880 TRANSCRIPTOMIC DATA TO INFER 485 00:19:12,880 --> 00:19:15,600 WHICH MECHANISMS WERE OPERATING 486 00:19:15,600 --> 00:19:17,160 IN SOME PARTICULAR EXPERIMENT. 487 00:19:17,160 --> 00:19:19,400 THIS IS A COMPLICATED TASK 488 00:19:19,400 --> 00:19:24,080 BECAUSE THE MECHANISMS DEPEND -- 489 00:19:24,080 --> 00:19:25,800 THERE'S NOT NECESSARILY 1 AND 490 00:19:25,800 --> 00:19:27,280 WHICH ONE IS ACTIVE DEPENDS ON 491 00:19:27,280 --> 00:19:28,320 DOSE AND TIME COURSE OF THE 492 00:19:28,320 --> 00:19:30,560 PHENOMENON. 493 00:19:30,560 --> 00:19:32,920 HERE'S A TYPICAL MECHANISM. 494 00:19:32,920 --> 00:19:35,600 THIS IS A CAUSAL SEQUENCE OF 495 00:19:35,600 --> 00:19:36,760 PROCESSES REPRESENTED BY GO 496 00:19:36,760 --> 00:19:39,600 BIOLOGY PROCESSES AND EACH ONE 497 00:19:39,600 --> 00:19:41,840 IN THE PURPLE BOX IS A GO 498 00:19:41,840 --> 00:19:45,800 BIOLOGICAL PROCESSES AND IT'S 499 00:19:45,800 --> 00:19:47,760 THE SIMPLEST ONE, AND THIS 500 00:19:47,760 --> 00:19:51,760 STARTS WITH A SERIES OF STEPS. 501 00:19:51,760 --> 00:19:53,960 A POSITIVE REGULATION OF 502 00:19:53,960 --> 00:19:55,600 MITOCHONDRIAL MEMBRANE THAT GET 503 00:19:55,600 --> 00:19:59,080 MORE POROUS AND BECAUSE OF THAT 504 00:19:59,080 --> 00:20:03,600 IN THE INFLUX OF CALCIUM THERE'S 505 00:20:03,600 --> 00:20:06,240 A RELEASE OF CYTOCHROME C AND 506 00:20:06,240 --> 00:20:11,400 WHEN IT'S FOUND IN THE CYTOPLASM 507 00:20:11,400 --> 00:20:21,160 IT ACTIVATED CASPASS IN THIS 508 00:20:21,160 --> 00:20:24,600 PATHWAY AND THIS IS THE WAY THE 509 00:20:24,600 --> 00:20:33,160 TOXICITY HAPPENS THEN HE 510 00:20:33,160 --> 00:20:34,560 DEVELOPED A SYSTEM YOU CAN LOOK 511 00:20:34,560 --> 00:20:37,920 UP HERE WHICH TAKES THE 512 00:20:37,920 --> 00:20:40,240 KNOWLEDGE GRAPH APOPTOTIC 513 00:20:42,160 --> 00:20:46,280 AND FILLS IN IMPLICIT ENTITIES 514 00:20:46,280 --> 00:20:49,120 AND CREATES THE NEURAL NETWORK 515 00:20:49,120 --> 00:20:50,480 REPRESENTATIONS THE GRAPHICAL 516 00:20:50,480 --> 00:20:51,160 REPRESENTATION LEARNING APPROACH 517 00:20:51,160 --> 00:20:54,000 AND DOES THE SAME THING FOR THE 518 00:20:54,000 --> 00:20:55,600 GENE EXPRESSION SO WE RUN 519 00:20:55,600 --> 00:20:58,520 TREATED VERSUS UNTREATED CELLS 520 00:20:58,520 --> 00:21:06,600 IN TIME COURSES TO LOOK AT 521 00:21:06,600 --> 00:21:07,400 DIFFERENTIALLY EXPRESSED GENES 522 00:21:07,400 --> 00:21:09,440 IN THE ONES THAT HAVE TOXICANTS 523 00:21:09,440 --> 00:21:11,600 VERSUS CONTROLS AND DID A 524 00:21:11,600 --> 00:21:14,200 MECHANISTIC INFERENCE TO FIGURE 525 00:21:14,200 --> 00:21:16,320 OUT WHICH MECHANISMS HE'S 526 00:21:16,320 --> 00:21:18,520 REPRESENTED ALIGN BEST WITH THE 527 00:21:18,520 --> 00:21:22,120 NODE EMBEDDINGS WITH THE DATA 528 00:21:22,120 --> 00:21:23,600 HE'S ANALYZING AND THEN FROM 529 00:21:23,600 --> 00:21:26,560 THOSE MECHANISMS HE CAN GENERATE 530 00:21:26,560 --> 00:21:31,360 BOTH DIAGRAMS AND NARRATIVES CAN 531 00:21:31,360 --> 00:21:32,480 ACCOUNTS OF WHAT THE 532 00:21:32,480 --> 00:21:37,680 HYPOTHETICAL MECHANISM IS AND 533 00:21:37,680 --> 00:21:39,160 CREATED DIMENSIONAL VECTORS. 534 00:21:39,160 --> 00:21:40,280 YOU GET NICE SEPARATION. 535 00:21:40,280 --> 00:21:45,400 THIS IS A DIAGRAM OF VARIOUS 536 00:21:45,400 --> 00:21:47,120 DIFFERENT THINGS SORT OF RELATED 537 00:21:47,120 --> 00:21:50,080 TO EACH OTHER, BREAST CANCER 538 00:21:50,080 --> 00:21:51,480 GENES, PARTICULAR CHANGES THAT 539 00:21:51,480 --> 00:21:52,520 HAPPEN AND BREAST CANCER ITSELF 540 00:21:52,520 --> 00:21:56,760 AND THE LIKE AND YOU CAN SEE 541 00:21:56,760 --> 00:21:58,320 NICE SEPARATION BUT STILL 542 00:21:58,320 --> 00:22:00,960 RELATIONSHIPS BETWEEN THE 543 00:22:00,960 --> 00:22:04,320 PHENOMENON AND SO THE IDEA IS 544 00:22:04,320 --> 00:22:10,320 THOSE CELLS WITH A TOXIN EXPRESS 545 00:22:10,320 --> 00:22:13,960 DIFFERENTIALLY EXPRESSED GENES 546 00:22:13,960 --> 00:22:15,600 AND CALCULATE THE DIFFERENCE 547 00:22:15,600 --> 00:22:17,520 FROM EACH GENE TO MECHANISM STEP 548 00:22:17,520 --> 00:22:18,480 AND THERE'S A SEQUENCE PENALTY. 549 00:22:18,480 --> 00:22:20,800 THE FIRST STEP SHOULD APPEAR 550 00:22:20,800 --> 00:22:22,320 BEFORE THE LAST STEP OF THE 551 00:22:22,320 --> 00:22:23,600 MECHANISM IN THE EXPERIMENTAL 552 00:22:23,600 --> 00:22:23,840 DATA. 553 00:22:23,840 --> 00:22:26,720 THOUGH THE MECHANISTIC STEPS 554 00:22:26,720 --> 00:22:31,160 DON'T NECESSARILY ALIGN WITH THE 555 00:22:31,160 --> 00:22:31,800 FOUR-HOUR DIFFERENCES IN THE 556 00:22:31,800 --> 00:22:32,960 DATA EXTRACTED FROM THE CELL 557 00:22:32,960 --> 00:22:33,320 LINE. 558 00:22:33,320 --> 00:22:35,400 YOU DON'T WANT TO MAKE IT A 559 00:22:35,400 --> 00:22:36,600 STRICT RELATIONSHIP. 560 00:22:36,600 --> 00:22:38,360 IT'S A PENALTY FOR BEING CLOSER 561 00:22:38,360 --> 00:22:40,320 OR FURTHER AWAY FROM THE ORDER 562 00:22:40,320 --> 00:22:40,760 WE OBSERVED. 563 00:22:40,760 --> 00:22:43,120 AND WHAT THAT DOES IS GENERATE A 564 00:22:43,120 --> 00:22:44,600 SCALE OR SCORE FOR EACH 565 00:22:44,600 --> 00:22:48,040 MECHANISM AND ALLOWS US TO RANK. 566 00:22:48,040 --> 00:22:50,240 SO ONCE YOU'VE GOT A MECHANISM 567 00:22:50,240 --> 00:22:51,600 ALIGNED TO A GRAPH YOU CAN SEE 568 00:22:51,600 --> 00:22:54,240 WHICH THE SIMILAR ONES ARE WE 569 00:22:54,240 --> 00:22:59,200 CAN GENERATE THESE NICE OUTPUTS 570 00:22:59,200 --> 00:23:04,160 INTERPRETABLE BY THE MOLECULAR 571 00:23:04,160 --> 00:23:05,360 BIOLOGISTS AND SEE IN THE 572 00:23:05,360 --> 00:23:07,920 CARTOON THE MECHANISM OF THE 573 00:23:07,920 --> 00:23:14,080 TOXICITY THE PURPLE BOX AES ARE 574 00:23:14,080 --> 00:23:17,240 ALIGNED TO CERTAIN GENE PRODUCTS 575 00:23:17,240 --> 00:23:19,520 IN THE GRAY AND CAN CREATE 576 00:23:19,520 --> 00:23:20,360 NARRATIVES THAT DESCRIBE THESE 577 00:23:20,360 --> 00:23:20,720 THINGS. 578 00:23:20,720 --> 00:23:22,320 THE GENE WE SAW IN THE 579 00:23:22,320 --> 00:23:24,440 EXPRESSION INTERACTS WITH SOME 580 00:23:24,440 --> 00:23:26,240 OTHER GENE PRODUCT WHERE WE SAW 581 00:23:26,240 --> 00:23:27,640 THE TRANSCRIPT AND EXPRESSION 582 00:23:27,640 --> 00:23:29,520 AND INTERACTS WITH ANOTHER GENE 583 00:23:29,520 --> 00:23:30,720 PRODUCT AND THOSE PARTICIPATE IN 584 00:23:30,720 --> 00:23:36,000 THE STEP AND MECHANISM. 585 00:23:36,000 --> 00:23:39,600 ANOTHER NICE THING ABOUT THIS WE 586 00:23:39,600 --> 00:23:43,520 CAN GENERATE SUGGESTIONS FOR 587 00:23:43,520 --> 00:23:44,520 EXPERIMENTS AND ANOTHER NICE 588 00:23:44,520 --> 00:23:46,160 THING WE NEED IN THE DISCUSSION 589 00:23:46,160 --> 00:23:48,800 SECTION IS NOT JUST SOME 590 00:23:48,800 --> 00:23:51,600 HYPOTHESIS BUT SUPPORTING DATA 591 00:23:51,600 --> 00:23:52,360 AND EVEN MORE IMPORTANT 592 00:23:52,360 --> 00:23:54,200 POTENTIAL EXPERIMENTS THAT WOULD 593 00:23:54,200 --> 00:23:56,480 CONFIRM IF THIS HYPOTHESIS WERE 594 00:23:56,480 --> 00:23:59,120 TRUE OR NOT TO MAKE SUGGESTIONS 595 00:23:59,120 --> 00:24:01,080 ABOUT GENE KNOCKOUTS YOU'D SEE 596 00:24:01,080 --> 00:24:03,360 AFFECT THE PROCESS OR VARIANTS 597 00:24:03,360 --> 00:24:07,600 IN THE HUMAN GENOME WE MAY 598 00:24:07,600 --> 00:24:10,120 EXPECT WOULD AFFECT THE PROCESS. 599 00:24:10,120 --> 00:24:13,800 OVERALL IGNACIO'S WORK HAS 600 00:24:13,800 --> 00:24:16,560 GOTTEN UPTAKE AND EPA WHICH DOES 601 00:24:16,560 --> 00:24:20,600 REGULATION OF THE INTOXICANTS IN 602 00:24:20,600 --> 00:24:22,320 OUR ENVIRONMENT AS A CONTRACTOR 603 00:24:22,320 --> 00:24:24,400 HIRED HIM TO BRING THIS IN THE 604 00:24:24,400 --> 00:24:25,320 REGULATORY PROCESS IN THE UNITED 605 00:24:25,320 --> 00:24:25,560 STATES. 606 00:24:25,560 --> 00:24:27,680 SO THIS IS I THINK REALLY HIGH 607 00:24:27,680 --> 00:24:29,440 IMPACT WORK. 608 00:24:29,440 --> 00:24:32,080 HERE'S THE PERFORMANCE. 609 00:24:32,080 --> 00:24:32,640 IT'S NOT TREMENDOUS. 610 00:24:32,640 --> 00:24:35,400 SO WHEN WE LOOKED AT ALL THE 611 00:24:35,400 --> 00:24:37,600 ASSAYS WHERE WE KNEW WHAT THE 612 00:24:37,600 --> 00:24:39,240 RESPONSES WERE AND WE ALSO DID 613 00:24:39,240 --> 00:24:40,320 SOMEWHERE WE DIDN'T KNOW WHAT 614 00:24:40,320 --> 00:24:41,320 THE RESPONSES WERE AND THEN 615 00:24:41,320 --> 00:24:43,920 LOOKED TO SEE IF THE PREDICTIONS 616 00:24:43,920 --> 00:24:46,720 WERE CORRECT, IF WE LOOK AT 617 00:24:46,720 --> 00:24:47,480 PRECISION, IT'S IN THE .7 618 00:24:47,480 --> 00:24:49,280 NEIGHBORHOOD IF YOU LOOK AT THE 619 00:24:49,280 --> 00:24:49,960 TOP THREE. 620 00:24:49,960 --> 00:24:50,560 SO THAT'S OKAY. 621 00:24:50,560 --> 00:24:52,160 THERE'S PLENTY OF ROOM TO 622 00:24:52,160 --> 00:24:54,720 IMPROVE THIS BUT AS I SAID, IT'S 623 00:24:54,720 --> 00:24:55,720 A COMPLICATED TASK. 624 00:24:55,720 --> 00:24:57,480 MORE THAN ONE DIFFERENT 625 00:24:57,480 --> 00:25:00,000 MECHANISM MIGHT BE AT PLAY. 626 00:25:00,000 --> 00:25:02,760 IF WE SEE TWO CORRECT ONES IT'S 627 00:25:02,760 --> 00:25:04,760 POSSIBLE TO DO AND IT DEPENDS ON 628 00:25:04,760 --> 00:25:06,560 THE DOSE AND TIME AND THE LIKE. 629 00:25:06,560 --> 00:25:09,280 BUT IT'S A COMPLICATED TASK. 630 00:25:09,280 --> 00:25:13,800 I THINK THIS IS PROMISING WORK. 631 00:25:13,800 --> 00:25:18,720 NOW, BEYOND WHAT IGNACIO DID AND 632 00:25:18,720 --> 00:25:21,680 WHAT TIFFANY'S Ph.D. WORK WAS 633 00:25:21,680 --> 00:25:22,320 CONNECTING BIOMEDICAL RECORDS TO 634 00:25:22,320 --> 00:25:23,520 KNOWLEDGE AND BUILT THIS GRAPH 635 00:25:23,520 --> 00:25:24,640 IN THE FIRST PLACE AND HER 636 00:25:24,640 --> 00:25:27,600 PURPOSE WAS MOSTLY LOOKING AT 637 00:25:27,600 --> 00:25:29,080 RARE DISEASE TRIES TO TAKE 638 00:25:29,080 --> 00:25:30,240 PATIENT DATA STRAIGHT FROM THE 639 00:25:30,240 --> 00:25:32,680 E.H.R. AND TRY TO INFER 640 00:25:32,680 --> 00:25:34,400 MOLECULAR MECHANISMS UNDERLYING 641 00:25:34,400 --> 00:25:37,640 THE PATIENTS AND HER ULTIMATE 642 00:25:37,640 --> 00:25:39,440 GOAL WAS TO SUBPHENOTYPE RARE 643 00:25:39,440 --> 00:25:41,040 DISEASE PATIENTS WITH THE HOPE 644 00:25:41,040 --> 00:25:43,280 OF HAVING DIFFERENT TREATMENT 645 00:25:43,280 --> 00:25:44,320 MODALITIES FOR PATIENTS HAVE 646 00:25:44,320 --> 00:25:44,960 DIFFERENT MOLECULAR BASIS FOR 647 00:25:44,960 --> 00:25:48,200 THEIR DISEASE. 648 00:25:48,200 --> 00:25:49,840 SO MEDICAL RECORDS THEMSELVES 649 00:25:49,840 --> 00:25:51,560 ARE CHALLENGING. 650 00:25:51,560 --> 00:25:53,520 ONE OF THE THINGS SHE DID WAS 651 00:25:53,520 --> 00:25:55,600 TRY TO FIGURE OUT MAP FROM 652 00:25:55,600 --> 00:25:57,360 PATIENT RECORDS INTO OUR 653 00:25:57,360 --> 00:25:57,960 KNOWLEDGE GRAPH. 654 00:25:57,960 --> 00:26:00,160 THE KNOWLEDGE GRAPH IS 655 00:26:00,160 --> 00:26:02,200 REPRESENTED IN OPEN BIOMEDICAL 656 00:26:02,200 --> 00:26:04,120 ONTOLOGIES, THINGS LIKE GO AND 657 00:26:04,120 --> 00:26:05,640 THE HUMAN PHENOTYPE ONTOLOGY AND 658 00:26:05,640 --> 00:26:09,640 THINGS LIKE THAT WHICH ARE NOT 659 00:26:09,640 --> 00:26:13,640 USED TO CHARACTERIZE PATIENTS IN 660 00:26:13,640 --> 00:26:17,600 A CLINICAL SETTING AND DID APE 661 00:26:17,600 --> 00:26:19,200 AMAZING MAPPING FROM A CLINICAL 662 00:26:19,200 --> 00:26:23,080 DATA MODEL THAT IS OPEN AND THAT 663 00:26:23,080 --> 00:26:24,920 MANY DIFFERENT CLINICAL 664 00:26:24,920 --> 00:26:28,320 VOCABULARIES HAVE BEEN MAPPED TO 665 00:26:28,320 --> 00:26:33,720 AND HER PAPER ON O MAP TO OVO 666 00:26:33,720 --> 00:26:36,200 WAS PUBLISHED FAIRLY RECENTLY IN 667 00:26:36,200 --> 00:26:38,520 NATURE DIGITAL MEDICINE IN A 668 00:26:38,520 --> 00:26:39,440 WIDE CONTEXT. 669 00:26:39,440 --> 00:26:43,960 FOR THOSE DOING WORK WITH 670 00:26:43,960 --> 00:26:45,240 ELECTRONIC HEALTH RECORD I 671 00:26:45,240 --> 00:26:47,320 ENCOURAGE TO YOU LOOK AT THE 672 00:26:47,320 --> 00:26:49,080 PAPER I THINK YOU'LL FIND 673 00:26:49,080 --> 00:26:49,280 USEFUL. 674 00:26:49,280 --> 00:26:51,320 IT'S MAPPING MORE THAN 65,000 675 00:26:51,320 --> 00:26:52,840 TERMS, ALMOST ALL OF WHICH HAVE 676 00:26:52,840 --> 00:26:57,520 BEEN REVIEWED BY HUMAN EXPERTS, 677 00:26:57,520 --> 00:26:59,880 PHARMACISTS, DOCS, NURSES, IT'S 678 00:26:59,880 --> 00:27:01,640 AN AMAZING RESOURCE. 679 00:27:01,640 --> 00:27:08,560 SHE THEN COULD MAP FROM THE OBOs 680 00:27:08,560 --> 00:27:19,120 TO THE PHONOE -- PHEKNOWLATER. 681 00:27:20,040 --> 00:27:22,040 WE TAKE EACH VISIT RECORD FROM 682 00:27:22,040 --> 00:27:24,080 AN ELECTRONIC RECORD AND EACH 683 00:27:24,080 --> 00:27:27,640 VISIT HAS A LIST OF DIAGNOSES OF 684 00:27:27,640 --> 00:27:29,840 DRUG TREATMENTS AND OF LABS AND 685 00:27:29,840 --> 00:27:35,320 LAB VALUES ALSO POTENTIALLY 686 00:27:35,320 --> 00:27:39,200 VACCIN 687 00:27:39,200 --> 00:27:40,360 VACCINES THAT WERE MAPPED AND 688 00:27:40,360 --> 00:27:50,840 PUT THOSE IN PHEKNOWLATER AND 689 00:28:02,040 --> 00:28:03,720 CAN COMBINE THE NODES LIT UP BY 690 00:28:03,720 --> 00:28:07,080 A PARTICULAR PATIENT VISIT AND 691 00:28:07,080 --> 00:28:08,480 COME UP WITH A REPRESENTATION IN 692 00:28:08,480 --> 00:28:10,360 THE SPACE FOR THAT PARTICULAR 693 00:28:10,360 --> 00:28:12,200 PATIENT VISIT. 694 00:28:12,200 --> 00:28:15,600 WE CAN ALSO AVERAGE ALL PATIENT 695 00:28:15,600 --> 00:28:18,000 VISITS AND COME UP WITH A 696 00:28:18,000 --> 00:28:19,960 REPRESENTATION FOR THAT VISIT. 697 00:28:19,960 --> 00:28:23,480 THIS HAS ALL KINDS OF VALUE. 698 00:28:23,480 --> 00:28:24,440 HAVING A VECTOR REPRESENTATION 699 00:28:24,440 --> 00:28:26,160 FOR A PATIENT, I'LL SHOW YOU 700 00:28:26,160 --> 00:28:27,600 WHAT WE DO WITH THOSE CLINICALLY 701 00:28:27,600 --> 00:28:29,400 BUT I WANT TO POINT OUT I WAS 702 00:28:29,400 --> 00:28:32,320 SHOCKED BOTH THE HOSPITALS AND 703 00:28:32,320 --> 00:28:34,240 OUR IRB HAVE DECIDE THE PATIENT 704 00:28:34,240 --> 00:28:35,640 REPRESENTATIONS JUST VECTORS OF 705 00:28:35,640 --> 00:28:37,160 NUMBERS ARE NOT PROTECTING 706 00:28:37,160 --> 00:28:37,800 HEALTH INFORMATION. 707 00:28:37,800 --> 00:28:39,320 WE CAN SHARE THEM PUBLICLY. 708 00:28:39,320 --> 00:28:43,600 SO WE MADE THAT AVAILABLE AND 709 00:28:43,600 --> 00:28:47,600 WHILE IT'S NOT A VERY DIRECT -- 710 00:28:47,600 --> 00:28:50,320 IT'S HARD TO DISENTANGLE WHAT 711 00:28:50,320 --> 00:28:51,880 THE VECTOR REPRESENTATIONS MEAN 712 00:28:51,880 --> 00:28:52,760 ABOUT THE PATIENTS. 713 00:28:52,760 --> 00:28:55,280 THEY DO GIVE US A WAY TO TRY TO 714 00:28:55,280 --> 00:28:57,520 GET SOME OF THE ADVANTAGES OF 715 00:28:57,520 --> 00:28:59,320 THE BROADER COMPUTER SCIENCE 716 00:28:59,320 --> 00:29:02,280 COMMUNITY TO LOOK AT THESE 717 00:29:02,280 --> 00:29:02,680 TASKS. 718 00:29:02,680 --> 00:29:05,680 AND WE'VE DONE A COUPLE OF 719 00:29:05,680 --> 00:29:08,800 THINGS WITH THESE EMBEDDINGS 720 00:29:08,800 --> 00:29:10,280 THAT ARE DIRECTLY USEFUL 721 00:29:10,280 --> 00:29:10,920 CLINICALLY. 722 00:29:10,920 --> 00:29:12,760 ONE IS WE TRIED TO DO 723 00:29:12,760 --> 00:29:13,240 CLUSTERING. 724 00:29:13,240 --> 00:29:16,160 WE LOOK AT THE PATIENT VECTORS 725 00:29:16,160 --> 00:29:20,600 BOTH BY VISIT AND INTEGRATED 726 00:29:20,600 --> 00:29:22,440 VISITS AND DO CLUSTERING ON TO 727 00:29:22,440 --> 00:29:23,080 THE VECTORS. 728 00:29:23,080 --> 00:29:26,760 IF YOU DO THE SAME THING, 729 00:29:26,760 --> 00:29:28,640 CLUSTERING OVER THE CLINICAL 730 00:29:28,640 --> 00:29:30,480 DATA ALONE AS YOU CAN SEE ON THE 731 00:29:30,480 --> 00:29:34,360 LEFT IS FOR HALF A DOZEN 732 00:29:34,360 --> 00:29:42,160 RELATIVELY RARE DISEASE. 733 00:29:42,160 --> 00:29:45,560 DOWN AT THE BOTTOM THERE'S A 734 00:29:45,560 --> 00:29:47,000 LEGEND BUT SO THIS IS SICKLE 735 00:29:47,000 --> 00:29:50,480 CELL DISEASE AND CYSTIC FIBROSIS 736 00:29:50,480 --> 00:29:50,960 AND THE LIKE. 737 00:29:50,960 --> 00:29:53,720 IF YOU LOOK AT THE LEFT, WHEN WE 738 00:29:53,720 --> 00:29:56,080 DO CLUSTERING ON THE BASIS OF 739 00:29:56,080 --> 00:29:58,800 CLINICAL DATA ALONE THE CLUSTERS 740 00:29:58,800 --> 00:30:02,360 SHOW UP TO RECAPITULATE THE 741 00:30:02,360 --> 00:30:05,840 DIAGNOSIS WHICH IS FINE BUT NOT 742 00:30:05,840 --> 00:30:10,160 INTERESTING AND WE CAN GET 743 00:30:10,160 --> 00:30:11,040 PHENOTYPING AND SEE DIFFERENT 744 00:30:11,040 --> 00:30:16,360 GROUPS OF PATIENTS WITHIN A 745 00:30:16,360 --> 00:30:18,680 PARTICULAR DIAGNOSES. 746 00:30:18,680 --> 00:30:22,480 SOME OVERLAP WITH KNOWN CLINICAL 747 00:30:22,480 --> 00:30:24,760 SUB PHENOTYPES IN SICKLE CELL 748 00:30:24,760 --> 00:30:25,400 PATIENTS THERE'S TWO BROAD 749 00:30:25,400 --> 00:30:35,280 CLASSES THAT SHOW UP IN THIS. 750 00:30:35,280 --> 00:30:36,520 WE WERE INTERESTED IN THESE 751 00:30:36,520 --> 00:30:38,840 BECAUSE THEY TEND TO REFLECT 752 00:30:38,840 --> 00:30:39,600 DIFFERENCES IN THE WAY THEY 753 00:30:39,600 --> 00:30:42,440 TREATED AND IS AN EARLY WAY TO 754 00:30:42,440 --> 00:30:44,480 IDENTIFY SUB PHENOTYPES THAT GET 755 00:30:44,480 --> 00:30:46,320 DIFFERENT SORTS OF TREATMENT. 756 00:30:46,320 --> 00:30:48,040 AND SO THEY'RE USEFUL IN 757 00:30:48,040 --> 00:30:51,600 TREATING THE PATIENTS 758 00:30:51,600 --> 00:30:52,320 CLINICALLY. 759 00:30:52,320 --> 00:30:54,720 NOW I WANT TO TRANSITION A 760 00:30:54,720 --> 00:30:56,720 LITTLE BIT TO NEW WORK. 761 00:30:56,720 --> 00:31:03,600 THESE ARE PAPERS BY GRADUATE 762 00:31:03,600 --> 00:31:08,120 STUDENTS LUCAS STONEWATER AND 763 00:31:08,120 --> 00:31:13,880 BROOK SANTANGELO AND NOURAH SAIL 764 00:31:13,880 --> 00:31:21,520 JEM AND WE WANT TO -- -- SALEM 765 00:31:21,520 --> 00:31:23,600 AND WE WANT TO USE KNOWLEDGE 766 00:31:23,600 --> 00:31:28,560 GRAPHS TO MAKE BETTER MOLECULAR 767 00:31:28,560 --> 00:31:30,360 CARTOONS AND KNOWLEDGE GRAPHS 768 00:31:30,360 --> 00:31:31,240 DON'T DIRECTLY TRANSLATE INTO 769 00:31:31,240 --> 00:31:32,720 THE MOLECULAR CARTOONS. 770 00:31:32,720 --> 00:31:33,600 THERE'S STEPS NOBODY SHOWS IN 771 00:31:33,600 --> 00:31:38,320 THE CARTOONS AND THE CO-FACTORS 772 00:31:38,320 --> 00:31:41,720 AND THE ATP BEING CONSUMED IF 773 00:31:41,720 --> 00:31:45,800 YOU LOOK AT THE KNOWLEDGE BASES 774 00:31:45,800 --> 00:31:51,440 LIKE REACTOME AND K AND IF WE'RE 775 00:31:51,440 --> 00:31:53,960 TRYING TO LINK UP WHAT'S 776 00:31:53,960 --> 00:32:00,560 DISTANT, BITS OF REGULATORY 777 00:32:00,560 --> 00:32:02,320 PATHWAYS AND LOOKING AT DISEASE 778 00:32:02,320 --> 00:32:04,200 LEVEL ACTIVITIES WE'LL ONLY 779 00:32:04,200 --> 00:32:09,400 SAMPLE A SMALL BIT OF THE THINGS 780 00:32:09,400 --> 00:32:14,320 WE COULD FIND HOW DO WE DECIDE 781 00:32:14,320 --> 00:32:15,280 WHAT SHOW. 782 00:32:15,280 --> 00:32:22,680 WE SHOW SEMANTIC ACTIONS. 783 00:32:22,680 --> 00:32:25,400 IN LOOKING BACK AT DRUG-DRUG 784 00:32:25,400 --> 00:32:26,680 INTERACTIONS AND WANTED TO FIND 785 00:32:26,680 --> 00:32:30,000 A DRUG CONNECTED TO A PROTEIN 786 00:32:30,000 --> 00:32:31,320 TARGET MAYBE CONNECTED TO A 787 00:32:31,320 --> 00:32:31,600 PROCESS. 788 00:32:31,600 --> 00:32:34,240 THAT'S A SEMANTIC ACTION. 789 00:32:34,240 --> 00:32:38,400 WE CALL IT THAT BECAUSE THE 790 00:32:38,400 --> 00:32:39,600 CONSTRAINTS ON THE PATH INVOLVE 791 00:32:39,600 --> 00:32:41,320 THE KINDS OF NODE AND 792 00:32:41,320 --> 00:32:43,600 RELATIONSHIPS IN THE PATH ARE 793 00:32:43,600 --> 00:32:44,800 THE SEMANTICS OF THE NODE IN 794 00:32:44,800 --> 00:32:46,440 ACTION AND WHY THEY'RE SEMANTIC 795 00:32:46,440 --> 00:32:46,920 ACTION. 796 00:32:46,920 --> 00:32:50,480 WE BELIEVE BY BEING ABLE TO 797 00:32:50,480 --> 00:32:52,280 DESCRIBE WHAT WE'RE LOOKING FOR 798 00:32:52,280 --> 00:32:53,760 IN BUILDING THE CARTOONS IN THE 799 00:32:53,760 --> 00:32:58,360 SEMANTIC WAY, PERHAPS IN 800 00:32:58,360 --> 00:33:01,560 ADDITION LOOKING AT THINGS IN 801 00:33:01,560 --> 00:33:04,960 THE STRUCTURE OF THE NETWORK AND 802 00:33:04,960 --> 00:33:12,160 LOOK FOR NODES WE CAN FIND THEM 803 00:33:12,160 --> 00:33:14,680 FASTER AND WE WANTED TO 804 00:33:14,680 --> 00:33:16,360 RECAPITULATE THREE EXISTING 805 00:33:16,360 --> 00:33:23,440 CARTOONS FROM EXISTING KNOWLEDGE 806 00:33:23,440 --> 00:33:23,640 BASES. 807 00:33:23,640 --> 00:33:29,040 WE INDEX AND THEN TAKING THE 808 00:33:29,040 --> 00:33:31,600 THINGS THAT WE FOUND IN THE 809 00:33:31,600 --> 00:33:35,600 KNOWLEDGE GRAPH AND CONSTRUCTING 810 00:33:35,600 --> 00:33:44,000 THE SUB GRAPH, IF YOU WILL, IF 811 00:33:44,000 --> 00:33:47,280 WHAT WE WANT AND WE CONSIDER 812 00:33:47,280 --> 00:33:48,360 CREATING THE SUB GRAPH THE NODES 813 00:33:48,360 --> 00:33:53,400 AND EDGES IN THE CARTOON AND 814 00:33:53,400 --> 00:33:57,360 LAYING THEM OUT AND MAKING THE 815 00:33:57,360 --> 00:34:00,440 GLYPHS IS ANOTHER TASK AND 816 00:34:00,440 --> 00:34:09,760 SORTED BYCYTE -- BY CYTO SCAPE 817 00:34:09,760 --> 00:34:11,600 OR BIOREM BUT NEITHER PROVIDE 818 00:34:11,600 --> 00:34:13,680 THE SEMANTIC SUPPORT FOR 819 00:34:13,680 --> 00:34:15,600 QUERYING THE KNOWLEDGE GRAPH AND 820 00:34:15,600 --> 00:34:21,640 THAT'S WHERE WE'RE FOCUSES. 821 00:34:21,640 --> 00:34:32,160 THIS IS AN EXAMPLE AND ONE OF 822 00:34:38,080 --> 00:34:39,880 THE INTERESTING THINGS ABOUT THE 823 00:34:39,880 --> 00:34:41,400 SUBGRAPHS SIT INCLUDES THINGS 824 00:34:41,400 --> 00:34:43,920 NOT IN THE ORIGINAL CARTOON BUT 825 00:34:43,920 --> 00:34:44,760 MAY HAVE BEEN INTERESTING TO 826 00:34:44,760 --> 00:34:49,960 INCLUDE IN THEM SUCH AS DRUGS 827 00:34:49,960 --> 00:34:51,120 THAT AFFECT THE STEPS IN THE 828 00:34:51,120 --> 00:34:51,480 CARTOON. 829 00:34:51,480 --> 00:34:53,960 WE HOPE BY DOING THIS WE MIGHT 830 00:34:53,960 --> 00:34:55,960 BE ABLE TO IMPROVE THE QUALITY 831 00:34:55,960 --> 00:35:00,240 OF THESE CARTOONS OR DIAGRAMS BY 832 00:35:00,240 --> 00:35:02,680 SUGGESTING TO THE AUTHORS OF THE 833 00:35:02,680 --> 00:35:03,280 ADDITIONAL THINGS THAT THE 834 00:35:03,280 --> 00:35:05,520 KNOWLEDGE GRAPH SUGGEST ARE 835 00:35:05,520 --> 00:35:06,640 RELATED BUT WERE NOT IN THE 836 00:35:06,640 --> 00:35:09,200 INITIAL QUERY THAT GENERATED IT. 837 00:35:09,200 --> 00:35:11,600 SO WE HOPE THAT NOT ONLY WILL IT 838 00:35:11,600 --> 00:35:13,200 MAKE IT EASIER AND FASTER TO 839 00:35:13,200 --> 00:35:15,440 GENERATE THE CARTOONS BUT MAY 840 00:35:15,440 --> 00:35:15,920 INCREASE THE QUALITY. 841 00:35:15,920 --> 00:35:21,720 ALL RIGHT, SO NOW I WANT TO 842 00:35:21,720 --> 00:35:26,360 TRANSITION TO SOMETHING IT 843 00:35:26,360 --> 00:35:27,600 THAT'S DIFFERENT THAN NATURAL 844 00:35:27,600 --> 00:35:29,680 LANGUAGE PROCESSING AND INSPIRED 845 00:35:29,680 --> 00:35:33,000 BY HARVEY FIRESTEIN'S BOOK WAS 846 00:35:33,000 --> 00:35:35,280 THE CHAIR OF MOLECULAR BIOLOGY 847 00:35:35,280 --> 00:35:37,560 AT COLUMBIA FOR DECADES, RETIRED 848 00:35:37,560 --> 00:35:41,840 ABOUT FIVE YEARS AGO AND HAD 849 00:35:41,840 --> 00:35:43,040 BEEN RESPONSIBLE FOR TEACHING 850 00:35:43,040 --> 00:35:45,080 MOLECULAR BIOLOGY TO GENERATIONS 851 00:35:45,080 --> 00:35:47,280 OF COLUMBIA GRADUATE AND 852 00:35:47,280 --> 00:35:48,080 UNDERGRADUATE STUDENTS AND HIS 853 00:35:48,080 --> 00:35:49,360 METHOD OF TEACHING WAS TO FOCUS 854 00:35:49,360 --> 00:35:51,560 ON WHAT ARE THE INTERESTING 855 00:35:51,560 --> 00:35:52,200 QUESTIONS SCIENTISTS ARE GOING 856 00:35:52,200 --> 00:35:53,080 AFTER. 857 00:35:53,080 --> 00:35:54,280 RATHER THAN TRYING TO GET 858 00:35:54,280 --> 00:35:55,840 STUDENTS TO MEMORIZE FACTS WHICH 859 00:35:55,840 --> 00:36:01,120 ARE IMPORTANT BUT LESS 860 00:36:01,120 --> 00:36:04,400 MOTIVATING, HE FOUND IT WAS 861 00:36:04,400 --> 00:36:06,760 MOTIVATING TO WORK AN WHAT WE 862 00:36:06,760 --> 00:36:08,280 DON'T KNOW AND WROTE A LOVELY 863 00:36:08,280 --> 00:36:09,600 BOOK CALLED IGNORANCE, WHAT 864 00:36:09,600 --> 00:36:10,720 DRIVES SCIENCE. 865 00:36:10,720 --> 00:36:11,840 WE WERE INSPIRED BY THAT. 866 00:36:11,840 --> 00:36:13,760 IF YOU LOOK AT THE HISTORY OF 867 00:36:13,760 --> 00:36:14,440 NATURAL LANGUAGE PROCESSING 868 00:36:14,440 --> 00:36:15,680 THERE'S BEEN A FAIR AMOUNT OF 869 00:36:15,680 --> 00:36:18,400 WORK IN TRYING TO CHARACTERIZE 870 00:36:18,400 --> 00:36:20,080 STATEMENTS THAT WERE NOT FACTUAL 871 00:36:20,080 --> 00:36:21,480 IN THE LITERATURE USUALLY TO 872 00:36:21,480 --> 00:36:23,920 EXCLUDE THEM FROM INFORMATION 873 00:36:23,920 --> 00:36:24,720 EXTRACTION. 874 00:36:24,720 --> 00:36:27,480 WORK IN HEDGING OR SPECULATION, 875 00:36:27,480 --> 00:36:28,920 IDENTIFICATION, THINGS LIKE 876 00:36:28,920 --> 00:36:29,120 THAT. 877 00:36:29,120 --> 00:36:30,800 WE FLIPPED THAT ON ITS HEAD AND 878 00:36:30,800 --> 00:36:32,120 DECIDED TO FOCUS ON THE 879 00:36:32,120 --> 00:36:32,960 STATEMENTS IN THE LITERATURE 880 00:36:32,960 --> 00:36:34,360 THAT WERE STATEMENTS OF 881 00:36:34,360 --> 00:36:37,280 IGNORANCE. 882 00:36:37,280 --> 00:36:39,040 THAT WAS INTERESTING NOT BECAUSE 883 00:36:39,040 --> 00:36:40,680 THEY'RE STATEMENTS OF IGNORANCE 884 00:36:40,680 --> 00:36:41,680 BUT GOALS FOR SCIENTIFIC 885 00:36:41,680 --> 00:36:41,960 KNOWLEDGE. 886 00:36:41,960 --> 00:36:43,840 WHAT DO WE WANT TO KNOW AND 887 00:36:43,840 --> 00:36:45,680 THINK WE CAN FIGURE OUT AND SO 888 00:36:45,680 --> 00:36:47,600 THAT'S IMPORTANT BECAUSE IT DOES 889 00:36:47,600 --> 00:36:49,160 DRIVE SCIENCE FORWARD. 890 00:36:49,160 --> 00:36:53,000 AND SO A GRADUATE STUDENT WHO 891 00:36:53,000 --> 00:36:55,400 SHOULD FINISH IN JANUARY AND ON 892 00:36:55,400 --> 00:36:56,680 THE JOB MARKET, FEEL FREE TO 893 00:36:56,680 --> 00:36:59,680 REACH OUT TO HER, HER THESIS AND 894 00:36:59,680 --> 00:37:02,000 WE HAVE A PAPER THAT'S PART OF 895 00:37:02,000 --> 00:37:04,000 THIS UNDER REVIEW CALLED 896 00:37:04,000 --> 00:37:04,760 IDENTIFYING AND CLASSIFYING 897 00:37:04,760 --> 00:37:06,280 GOALS FOR SCIENTIFIC KNOWLEDGE. 898 00:37:06,280 --> 00:37:07,840 WE'RE LOOKING IN TEXT TO TRY AND 899 00:37:07,840 --> 00:37:10,400 FIND SENTENCES THAT EXPRESS 900 00:37:10,400 --> 00:37:12,360 BASICALLY OPEN SCIENTIFIC 901 00:37:12,360 --> 00:37:13,080 QUESTIONS. 902 00:37:13,080 --> 00:37:15,440 AND WE'VE DONE THIS BY DOING 903 00:37:15,440 --> 00:37:19,240 CLASSIFIERS OVER SENTENCES, WE 904 00:37:19,240 --> 00:37:21,080 TRY AND CHARACTERIZE THE CONTENT 905 00:37:21,080 --> 00:37:24,240 OF THE QUERY OR STATEMENTS ABOUT 906 00:37:24,240 --> 00:37:25,800 GOALS OF SCIENTIFIC KNOWLEDGE 907 00:37:25,800 --> 00:37:27,040 USING ONTOLOGY TERMS AND USING 908 00:37:27,040 --> 00:37:29,560 CONCEPT RECOGNITION IN THE 909 00:37:29,560 --> 00:37:29,840 SENTENCES. 910 00:37:29,840 --> 00:37:31,920 AND THEN ONCE WE HAVE THIS WE 911 00:37:31,920 --> 00:37:35,600 CREATE A GRAPH OF ALL THIS 912 00:37:35,600 --> 00:37:37,000 IGNORANCE AND WE CAN EXPLORE 913 00:37:37,000 --> 00:37:38,640 THAT GRAPH TO TRY AND IDENTIFY 914 00:37:38,640 --> 00:37:40,040 WHAT ARE THE KINDS OF 915 00:37:40,040 --> 00:37:41,520 INTERESTING QUESTIONS BEING 916 00:37:41,520 --> 00:37:43,400 PURSUED IN A PARTICULAR AREA AND 917 00:37:43,400 --> 00:37:47,080 WE CAN ALSO DO THE SORT OF 918 00:37:47,080 --> 00:37:50,240 ENRICHMENT TASK AND ENRICH FOR 919 00:37:50,240 --> 00:37:50,920 IGNORANCE INSTEAD OF KNOWLEDGE 920 00:37:50,920 --> 00:37:53,480 AND TELLS US WHAT ARE THE OPEN 921 00:37:53,480 --> 00:37:55,440 QUESTION A DATA SET MIGHT APPLY 922 00:37:55,440 --> 00:37:56,880 TO RATHER THAN THE THINGS WE 923 00:37:56,880 --> 00:37:58,080 ALREADY KNOW APPLY TO THE DATA 924 00:37:58,080 --> 00:37:58,680 SET. 925 00:37:58,680 --> 00:37:59,800 THIS IS PARTICULARLY INTERESTING 926 00:37:59,800 --> 00:38:03,000 WHEN THE QUESTIONS THAT WE 927 00:38:03,000 --> 00:38:05,200 IDENTIFY THAT WILL MIGHT APPLY 928 00:38:05,200 --> 00:38:06,400 TO SOME SOME EXPERIMENTAL 929 00:38:06,400 --> 00:38:12,040 RESULTS COME FROM OUTSIDE OF THE 930 00:38:12,040 --> 00:38:22,600 DOMAIN OF THE EXPERIMENT AND MAY 931 00:38:22,920 --> 00:38:24,560 INTRIGUE PEOPLE AND SEEMS IT MAY 932 00:38:24,560 --> 00:38:35,080 BEAR ON THE OTHER OPEN QUESTION. 933 00:38:40,480 --> 00:38:41,960 IGNORANCE STATEMENTS DON'T COME 934 00:38:41,960 --> 00:38:43,480 WITH QUESTION MARKS AT THE END 935 00:38:43,480 --> 00:38:44,360 OF THE SENTENCE. 936 00:38:44,360 --> 00:38:46,400 THEY'RE OFTEN IN THE FORM OF 937 00:38:46,400 --> 00:38:47,560 STATEMENTS ABOUT CONTROVERSIES 938 00:38:47,560 --> 00:38:51,040 OR STATEMENTS ABOUT UNKNOWNS. 939 00:38:51,040 --> 00:38:52,720 AND THEY SHOW UP THROUGHOUT 940 00:38:52,720 --> 00:38:54,200 PUBLISHED JOURNAL ARTICLES. 941 00:38:54,200 --> 00:38:56,000 THERE'S AN AWFUL LOT OF THEM IN 942 00:38:56,000 --> 00:38:57,680 THE INTRODUCTION AS WELL AS THE 943 00:38:57,680 --> 00:38:58,400 DISCUSSION SECTION OF THE PAPER 944 00:38:58,400 --> 00:39:00,080 AND USUALLY THE STATEMENTS OF 945 00:39:00,080 --> 00:39:02,160 IGNORANCE IN THE INTRODUCTION OF 946 00:39:02,160 --> 00:39:03,440 A PAPER ARE THINGS THE AUTHOR IS 947 00:39:03,440 --> 00:39:05,280 GOING TO CLAIM TO RESOLVE OR AT 948 00:39:05,280 --> 00:39:07,200 LEAST PUSH FORWARD WITH THE 949 00:39:07,200 --> 00:39:08,720 RESULTS FOUND. 950 00:39:08,720 --> 00:39:10,800 YOU START OFF WITH A SET OF 951 00:39:10,800 --> 00:39:12,800 STATEMENTS OR IGNORANCE OR 952 00:39:12,800 --> 00:39:14,600 QUESTIONS OR THE LIKE, GOALS FOR 953 00:39:14,600 --> 00:39:14,880 KNOWLEDGE. 954 00:39:14,880 --> 00:39:16,800 THE RESULTS BEAR ON THE GOALS 955 00:39:16,800 --> 00:39:18,160 AND THEN IN THE DISCUSSION WE 956 00:39:18,160 --> 00:39:20,400 TALK ABOUT NEW QUESTIONS THAT 957 00:39:20,400 --> 00:39:23,600 ARISE FROM THIS OR CONTROVERSIES 958 00:39:23,600 --> 00:39:25,920 OR NEW HYPOTHESES. 959 00:39:25,920 --> 00:39:27,920 SO THE STATEMENT OF IGNORANCE 960 00:39:27,920 --> 00:39:30,720 SHOW UP THROUGHOUT DOCUMENTS. 961 00:39:30,720 --> 00:39:35,080 THEY SHOW UP OCCASIONALLY IN 962 00:39:35,080 --> 00:39:36,240 METHOD STATEMENTS AS WELL 963 00:39:36,240 --> 00:39:37,480 SOMETIMES WHICH IS INTERESTING. 964 00:39:37,480 --> 00:39:39,920 AFTER WE CAN DO THE ENRICHMENT 965 00:39:39,920 --> 00:39:42,520 AND SAME IDEA TAKE THE 966 00:39:42,520 --> 00:39:43,920 STATEMENTS OF IGNORANCE THAT 967 00:39:43,920 --> 00:39:46,360 HAVE GO ONTOLOGY TERMS OR OTHER 968 00:39:46,360 --> 00:39:49,280 ONTOLOGY TERMS IN THE SENTENCES, 969 00:39:49,280 --> 00:39:50,400 MAP FROM THE RESULTING 970 00:39:50,400 --> 00:39:52,000 EXPERIMENT TO THOSE ONTOLOGY 971 00:39:52,000 --> 00:39:53,840 TERMS AND THEN LOOK TO FIND 972 00:39:53,840 --> 00:39:58,800 STATEMENTS OF IGNORANCE THAT ARE 973 00:39:58,800 --> 00:40:01,640 OVER REPRESENTED STATISTICALLY 974 00:40:01,640 --> 00:40:03,600 SPEAKING, SAME ANALYSIS, THE 975 00:40:03,600 --> 00:40:06,520 STATEMENT OF KNOWLEDGES ARE OVER 976 00:40:06,520 --> 00:40:07,600 REPRESENTED IN. 977 00:40:07,600 --> 00:40:09,520 SAME TECHNIQUE BUT OVER 978 00:40:09,520 --> 00:40:10,720 REPRESENTED BITS. 979 00:40:10,720 --> 00:40:11,600 THIS IS QUITE INTERESTING. 980 00:40:11,600 --> 00:40:13,480 I CAN'T GO INTO DETAILS IT'S IN 981 00:40:13,480 --> 00:40:14,680 THE PAPER BUT THEY DON'T OVERLAP 982 00:40:14,680 --> 00:40:16,320 AND YOU HOPE STATEMENTS OF 983 00:40:16,320 --> 00:40:18,920 KNOWLEDGE AND IGNORANCE WOULD BE 984 00:40:18,920 --> 00:40:19,200 DIFFERENT. 985 00:40:19,200 --> 00:40:28,920 AS THE A DIFFERENT THING FOR 986 00:40:28,920 --> 00:40:31,600 AUTHORS TO THINK ABOUT AND 987 00:40:31,600 --> 00:40:33,280 THEY'RE DIFFERENT FROM 988 00:40:33,280 --> 00:40:34,880 STATEMENTS OF KNOWLEDGE. 989 00:40:34,880 --> 00:40:37,560 THIS EXAMPLE TAKEN FROM 990 00:40:37,560 --> 00:40:41,560 NUTRITION AND VITAMIN D WORK 991 00:40:41,560 --> 00:40:43,360 FOCUSSED ON METABOLIC PROCESSES 992 00:40:43,360 --> 00:40:44,840 IN PREGNANCY, GOOD THING TO 993 00:40:44,840 --> 00:40:45,480 FIND, BUT THEY'RE QUITE 994 00:40:45,480 --> 00:40:47,520 DIFFERENT FROM THE SORT OF 995 00:40:47,520 --> 00:40:51,600 THINGS THE GO ENRICHMENT 996 00:40:51,600 --> 00:41:00,160 GENERATED. 997 00:41:00,160 --> 00:41:08,000 SO IN SUMMARY AND I'M HOPING TO 998 00:41:08,000 --> 00:41:10,280 LEAVE QUESTIONS, WE'VE TALKED 999 00:41:10,280 --> 00:41:12,080 ABOUT GENERATING MECHANISTIC 1000 00:41:12,080 --> 00:41:14,760 EXPLANATIONS OF EXPERIMENTAL 1001 00:41:14,760 --> 00:41:15,600 RESULTS, TALKED A LITTLE BIT 1002 00:41:15,600 --> 00:41:18,520 ABOUT SEMANTIC CARTOONING AND 1003 00:41:18,520 --> 00:41:21,400 ABOUT THE REPRESENTATION OF USE 1004 00:41:21,400 --> 00:41:22,280 OF IGNORANCE STATEMENTS USING 1005 00:41:22,280 --> 00:41:24,440 NATURAL LANGUAGE PROCESS. 1006 00:41:24,440 --> 00:41:26,640 SO THERE'S MORE TO BE DONE. 1007 00:41:26,640 --> 00:41:28,800 ONE THING THAT'S INTERESTING IS 1008 00:41:28,800 --> 00:41:31,240 USING TRANSFORMERS OR LARGE 1009 00:41:31,240 --> 00:41:32,720 LANGUAGE MODELS TO TRY TO 1010 00:41:32,720 --> 00:41:33,400 GENERATE DISCUSSION SECTIONS. 1011 00:41:33,400 --> 00:41:34,880 THERE'S BEEN A FAIR AMOUNT OF 1012 00:41:34,880 --> 00:41:37,560 INTEREST IN THAT THOUGH I HAVE 1013 00:41:37,560 --> 00:41:40,280 TO SAY THAT EVEN THE RECENT 1014 00:41:40,280 --> 00:41:43,600 RELEASE JUST THIS WEEK OF 1015 00:41:43,600 --> 00:41:46,760 META-A.I.s, SCIENCE TRAINED 1016 00:41:46,760 --> 00:41:50,040 GALACTICA TRANSFORMER IS 1017 00:41:50,040 --> 00:41:51,280 IMPRESSIVE AT THE SCIENTIFIC 1018 00:41:51,280 --> 00:41:53,680 TEXT IT CAN GENERATE BUT STILL 1019 00:41:53,680 --> 00:41:54,560 NOWHERE NEAR WHAT HUMAN BEINGS 1020 00:41:54,560 --> 00:41:55,600 CAN DO. 1021 00:41:55,600 --> 00:41:58,320 THE DISCUSSIONS ARE PRETTY 1022 00:41:58,320 --> 00:41:58,920 BASIC. 1023 00:41:58,920 --> 00:42:03,600 IT MAKES LOTS OF ERRORS SOME OF 1024 00:42:03,600 --> 00:42:11,600 WHICH ARE QUITE SUBTLE AND 1025 00:42:11,600 --> 00:42:13,800 ELUCIDATES WHAT DOESN'T EXIST 1026 00:42:13,800 --> 00:42:16,880 AND WE HAVE MORE TO DISCUSS BIT 1027 00:42:16,880 --> 00:42:18,240 IT SEEMS THE WRITING IS ON THE 1028 00:42:18,240 --> 00:42:20,080 WALL IS COMING AND FOR THOSE WHO 1029 00:42:20,080 --> 00:42:22,400 CARE ABOUT NATURAL LANGUAGE 1030 00:42:22,400 --> 00:42:25,920 PROCESSING APPLY TO BIOMEDICAL 1031 00:42:25,920 --> 00:42:27,640 JOURNAL STYLE TEXT THE IDEA OF 1032 00:42:27,640 --> 00:42:29,600 REAL DISCUSSIONS WITH COMPUTER 1033 00:42:29,600 --> 00:42:30,920 PROGRAMS OF INTEREST AS WE 1034 00:42:30,920 --> 00:42:32,360 INTERPRET OUR EXPERIMENTAL 1035 00:42:32,360 --> 00:42:38,240 RESULTS IS REALLY A VERY 1036 00:42:38,240 --> 00:42:39,400 PROMISING NEW FRONTIER IN 1037 00:42:39,400 --> 00:42:40,760 BIOMEDICAL NATURAL LANGUAGE 1038 00:42:40,760 --> 00:42:42,000 PROCESSING AND HOPE YOU'RE AS 1039 00:42:42,000 --> 00:42:43,840 EXCITED ABOUT THE ASPECT AS I 1040 00:42:43,840 --> 00:42:44,400 AM. 1041 00:42:44,400 --> 00:42:49,160 WITH THAT, I WANT TO THANK THE 1042 00:42:49,160 --> 00:42:50,040 WONDERFUL STUDENTS IN THE LAB 1043 00:42:50,040 --> 00:42:52,840 WHO ARE OUT AND VARIOUS PLACES 1044 00:42:52,840 --> 00:42:56,120 IN THE WORLD INCLUDING MAYLA 1045 00:42:56,120 --> 00:42:58,680 LOOKING FOR A POSTDOC OR OTHER 1046 00:42:58,680 --> 00:43:01,280 KIND OF JOB AND AT THIS POINT I 1047 00:43:01,280 --> 00:43:03,600 WOULD LOVE TO STOP AND TAKE YOUR 1048 00:43:03,600 --> 00:43:11,320 QUESTIONS. 1049 00:43:11,320 --> 00:43:14,520 >>THREW FOR THE AMAZING TALK. 1050 00:43:14,520 --> 00:43:17,600 CAN I ASK YOU TO BRING UP THE 1051 00:43:17,600 --> 00:43:18,800 SLIDE TO SEE THE E-MAIL FOR 1052 00:43:18,800 --> 00:43:26,400 PEOPLE TO SEND THE QUESTIONS? 1053 00:43:26,400 --> 00:43:27,400 THANK YOU SO MUCH. 1054 00:43:27,400 --> 00:43:31,600 SO IF I MAY IT WAS SO INTRIGUING 1055 00:43:31,600 --> 00:43:35,600 TO REALIZE AND I THINK THIS THE 1056 00:43:35,600 --> 00:43:37,680 FIRST TIME I HEARD IT YOUR 1057 00:43:37,680 --> 00:43:39,600 GENERATING VECTORS OF PATIENT. 1058 00:43:39,600 --> 00:43:41,240 YOU CAN REPRESENT EACH PATIENT 1059 00:43:41,240 --> 00:43:43,680 AS A WORD OR PATIENT VECTOR AND 1060 00:43:43,680 --> 00:43:45,680 IT'S SO AMAZING AND INTRIGUING. 1061 00:43:45,680 --> 00:43:47,440 THERE'S SO MUCH YOU CAN DO WITH 1062 00:43:47,440 --> 00:43:47,920 THIS. 1063 00:43:47,920 --> 00:43:49,760 IT'S ALMOST ENDLESS WHAT YOU CAN 1064 00:43:49,760 --> 00:43:51,800 DO WITH THAT INFORMATION IN 1065 00:43:51,800 --> 00:43:55,600 TERMS OF TREATING PATIENTS THAT 1066 00:43:55,600 --> 00:43:56,840 HAVE RARE DISEASES OR GROUPING 1067 00:43:56,840 --> 00:43:58,560 THEM BY WHATEVER METRIC THAT ONE 1068 00:43:58,560 --> 00:44:03,440 CAN THINK. 1069 00:44:03,440 --> 00:44:05,040 >>THE CLUSTERING WE DID IS ONLY 1070 00:44:05,040 --> 00:44:06,280 THE BEGINNING. 1071 00:44:06,280 --> 00:44:07,600 IT WILL BE INTERESTING TO USE 1072 00:44:07,600 --> 00:44:11,600 THE PATIENT VECTORS AS INPUTS 1073 00:44:11,600 --> 00:44:13,840 CLASSIFIERS TO TRY AND HELP 1074 00:44:13,840 --> 00:44:14,840 SUPPORT CLINICAL DECISION MAKING 1075 00:44:14,840 --> 00:44:17,160 AROUND TREATING SUCH PATIENTS. 1076 00:44:17,160 --> 00:44:17,960 YOU'RE RIGHT. 1077 00:44:17,960 --> 00:44:18,960 THE POTENTIAL IS HUGE. 1078 00:44:18,960 --> 00:44:20,960 THAT PART OF TIFFANY'S WORK IS 1079 00:44:20,960 --> 00:44:22,360 ALREADY PUBLISHED. 1080 00:44:22,360 --> 00:44:26,200 YOU CAN FIND THAT IF YOU -- I 1081 00:44:26,200 --> 00:44:27,360 DON'T HAVE THE PAPER CITATION 1082 00:44:27,360 --> 00:44:29,840 OFF THE TOP OF MY HEAD BUT IF 1083 00:44:29,840 --> 00:44:34,880 YOU LOOK AT THE OR THERE'S A 1084 00:44:34,880 --> 00:44:41,000 NODE ON PHEKNOWLATER 1085 00:44:41,000 --> 00:44:42,480 P-H-E-K-N-O-W-L-A-T-O-R THAT 1086 00:44:42,480 --> 00:44:44,960 SHOULD GET YOU TO THE LINK WHICH 1087 00:44:44,960 --> 00:44:46,040 HAS A BUNCH OF CONFERENCE 1088 00:44:46,040 --> 00:44:48,560 PRESENTATIONS AND SHORT PAPERS, 1089 00:44:48,560 --> 00:44:56,880 THE LONGER PAPER IS COMING UP. 1090 00:44:56,880 --> 00:44:59,000 I HOPE THE PATIENT VECTOR 1091 00:44:59,000 --> 00:45:00,360 EMBEDDINGS WILL WIND UP USEFUL. 1092 00:45:00,360 --> 00:45:02,840 >>AND WE DON'T HAVE TO DEAL 1093 00:45:02,840 --> 00:45:03,960 WITH PRIVACY ISSUES. 1094 00:45:03,960 --> 00:45:05,760 >>THAT'S WHAT OUR IRB IN OUR 1095 00:45:05,760 --> 00:45:06,080 HOSPITALS SAY. 1096 00:45:06,080 --> 00:45:10,360 NOT EVERYBODY HAS SIGNED ON TO 1097 00:45:10,360 --> 00:45:15,960 THAT BUT IT SEEMS IT'S AN EASIER 1098 00:45:15,960 --> 00:45:18,160 CASE TO MAKE OF PROTECTED HEALTH 1099 00:45:18,160 --> 00:45:18,720 INFORMATION NOT INDIVIDUAL 1100 00:45:18,720 --> 00:45:29,120 IDENTIFIABLE FOR SURE. 1101 00:45:30,560 --> 00:45:33,040 >>GREAT IDEA, VERY INSPIRING. 1102 00:45:33,040 --> 00:45:37,640 >>THANK YOU FOR THE GREAT TALK. 1103 00:45:37,640 --> 00:45:39,000 THE QUESTION IS ABOUT EVALUATION 1104 00:45:39,000 --> 00:45:40,400 ON THE KNOWLEDGE. 1105 00:45:40,400 --> 00:45:42,920 I THINK IT'S PROBABLY YOUR FIRST 1106 00:45:42,920 --> 00:45:45,880 SESSION OF THE TALK HOW YOU 1107 00:45:45,880 --> 00:45:47,600 GENERATE WITH KNOWLEDGE GRAPHS 1108 00:45:47,600 --> 00:45:53,920 AND THE AUDIENCE ASKED THE 1109 00:45:53,920 --> 00:45:58,080 RECOMMENDED WAY TO MAKE THE 1110 00:45:58,080 --> 00:46:01,160 GRAPHS AND THERE MAY BE 1111 00:46:01,160 --> 00:46:04,320 RELATIONSHIP OR CLASSIFICATIONS 1112 00:46:04,320 --> 00:46:06,040 AND HAVE MANY EVALUATIONS 1113 00:46:06,040 --> 00:46:08,200 GENERATED ON THE HYPOTHESIS. 1114 00:46:08,200 --> 00:46:11,240 WHAT IS THE RECOMMENDED WAY FOR 1115 00:46:11,240 --> 00:46:13,760 EVALUATION ON THE KNOWLEDGE 1116 00:46:13,760 --> 00:46:13,960 GRAPH? 1117 00:46:13,960 --> 00:46:15,240 >>THAT'S AN OUTSTANDING 1118 00:46:15,240 --> 00:46:15,600 QUESTION. 1119 00:46:15,600 --> 00:46:25,920 THAT'S A HARD TASK. 1120 00:46:33,120 --> 00:46:34,960 I RECOGNIZE A LOT OF KNOWLEDGE 1121 00:46:34,960 --> 00:46:37,360 GRAPHS ARE RECOGNIZED USING THE 1122 00:46:37,360 --> 00:46:37,800 LANGUAGE. 1123 00:46:37,800 --> 00:46:41,760 THERE'S NO IDEAL WAY TO EVALUATE 1124 00:46:41,760 --> 00:46:43,440 KNOWLEDGE GRAPHS. 1125 00:46:43,440 --> 00:46:45,000 THERE ARE AND THERE ARE VERY 1126 00:46:45,000 --> 00:46:46,240 SIGNIFICANT DIFFERENCES BETWEEN 1127 00:46:46,240 --> 00:46:47,560 THEM AND THEIR UTILITY. 1128 00:46:47,560 --> 00:46:50,200 AN IMPORTANT AREA OF RESEARCH IS 1129 00:46:50,200 --> 00:46:51,840 FIGURING OUT HOW TO EVALUATE 1130 00:46:51,840 --> 00:46:52,400 THESE. 1131 00:46:52,400 --> 00:46:53,800 SO THERE ARE AUTOMATED METHODS 1132 00:46:53,800 --> 00:46:57,640 THAT PEOPLE HAVE COME UP WITH TO 1133 00:46:57,640 --> 00:47:00,520 TRY AND COMPARE KNOWLEDGE 1134 00:47:00,520 --> 00:47:00,760 GRAPHS. 1135 00:47:00,760 --> 00:47:08,040 SO SOMETHING LIKE DOING LINK 1136 00:47:08,040 --> 00:47:10,400 PREDICTION AND MASK CERTAIN 1137 00:47:10,400 --> 00:47:13,160 PARTS AND TRY TO RECAPITULATE 1138 00:47:13,160 --> 00:47:16,240 AND WHAT PROPORTIONATE WHAT 1139 00:47:16,240 --> 00:47:18,640 LEVEL CAN WE REPRODUCE FROM THE 1140 00:47:18,640 --> 00:47:21,240 KNOWLEDGE GRAPH WE HAVE, CAN WE 1141 00:47:21,240 --> 00:47:22,360 REPRODUCE FROM BITS OF IT. 1142 00:47:22,360 --> 00:47:24,800 IT WORKS AT SCALE BUT IT'S NOT 1143 00:47:24,800 --> 00:47:26,040 THE MOST INTERESTING EVALUATION 1144 00:47:26,040 --> 00:47:26,240 GRAPH. 1145 00:47:26,240 --> 00:47:27,840 IT'S A LITTLE BIT OF HOW 1146 00:47:27,840 --> 00:47:28,680 REDUNDANT THE GRAPH IS RATHER 1147 00:47:28,680 --> 00:47:32,040 THAN THE QUALITY OF THE GRAPH. 1148 00:47:32,040 --> 00:47:34,160 A MORE MANUAL APPROACH TO SUCH 1149 00:47:34,160 --> 00:47:37,520 THINGS ARE USING COMPETENCY 1150 00:47:37,520 --> 00:47:37,920 QUESTIONS. 1151 00:47:37,920 --> 00:47:40,480 A GRAPH ABOUT MOLECULAR BIOLOGY 1152 00:47:40,480 --> 00:47:41,880 SHOULD BE ABLE TO ANSWER 1153 00:47:41,880 --> 00:47:45,480 QUESTIONS ABOUT RELATIONSHIPS 1154 00:47:45,480 --> 00:47:48,320 BETWEEN THINGS, WHAT'S THE 1155 00:47:48,320 --> 00:47:52,160 RELATIONSHIP BETWEEN GLUCOSE AND 1156 00:47:52,160 --> 00:47:54,760 GLUCOSE PHOSPHATE EXOKINASE OR 1157 00:47:54,760 --> 00:47:57,000 SOMETHING LIKE THAT AND YOU CAN 1158 00:47:57,000 --> 00:47:58,880 SET UPSETS OF COMPET AN 1159 00:47:58,880 --> 00:48:00,880 QUESTIONS A GRAPH -- COMPETENCY 1160 00:48:00,880 --> 00:48:02,400 QUESTION A GRAPH SHOULD HAVE 1161 00:48:02,400 --> 00:48:04,320 ANSWERS TO AND THAT'S A GOOD WAY 1162 00:48:04,320 --> 00:48:06,480 OF LOOKING AT QUALITY THOUGH YOU 1163 00:48:06,480 --> 00:48:07,960 CAN'T DO IT EXHAUSTIVELY SINCE 1164 00:48:07,960 --> 00:48:10,400 THE COMPETENCY QUESTIONS ARE 1165 00:48:10,400 --> 00:48:11,960 DRIVEN BY UNDERGRADUATE OR 1166 00:48:11,960 --> 00:48:12,400 MEDICAL INFORMATION. 1167 00:48:12,400 --> 00:48:15,240 HOW CAN YOU USE THE GRAPH TO 1168 00:48:15,240 --> 00:48:16,720 ANSWER MLE STAGE 1 QUESTIONS. 1169 00:48:16,720 --> 00:48:19,800 THAT WOULD BE A REAL CHALLENGE. 1170 00:48:19,800 --> 00:48:22,200 SORRY, THAT'S THE UNITED STATES 1171 00:48:22,200 --> 00:48:23,000 MEDICAL LICENSING EXAMINE ARE 1172 00:48:23,000 --> 00:48:25,280 THE KINDS OF QUESTIONS WE GIVE 1173 00:48:25,280 --> 00:48:26,440 TO DOCTORS AFTER CLASSROOM 1174 00:48:26,440 --> 00:48:26,720 TRAINING. 1175 00:48:26,720 --> 00:48:29,160 WE CAN USE COMPETENCY QUESTIONS. 1176 00:48:29,160 --> 00:48:32,800 WE CAN USE SYSTEMATIC APPROACHES 1177 00:48:32,800 --> 00:48:34,880 LIKE LINK PREDICTION, ALGORITHMS 1178 00:48:34,880 --> 00:48:38,280 AND THEN WE CAN USE EXTRINSIC 1179 00:48:38,280 --> 00:48:38,520 MEASURES. 1180 00:48:38,520 --> 00:48:41,720 HOW GOOD ARE THEY AT THINGS LIKE 1181 00:48:41,720 --> 00:48:43,240 SUBCLASSIFYING PHENOTYPES AND 1182 00:48:43,240 --> 00:48:44,280 RARE DISEASE PATIENTS RATHER 1183 00:48:44,280 --> 00:48:45,600 THAN TRYING TO MEASURE THE 1184 00:48:45,600 --> 00:48:46,600 QUALITY OF THE KNOWLEDGE GRAPH 1185 00:48:46,600 --> 00:48:48,080 ITSELF WE MEASURE THE QUALITY OF 1186 00:48:48,080 --> 00:48:50,760 HOW WELL WE CAN USE THE 1187 00:48:50,760 --> 00:48:52,600 KNOWLEDGE GRAPH TO DO SOME OTHER 1188 00:48:52,600 --> 00:48:52,840 TASK. 1189 00:48:52,840 --> 00:48:54,560 AND SO PERFORMANCE AT THE OTHER 1190 00:48:54,560 --> 00:48:56,320 TASK SPEAKS TO THE QUALITY OF 1191 00:48:56,320 --> 00:48:58,960 THE KNOWLEDGE GRAPH AND THAT'S 1192 00:48:58,960 --> 00:49:00,480 INTRINSIC EVALUATION RATHER THAN 1193 00:49:00,480 --> 00:49:03,200 INTRINSIC EVALUATION OF THE 1194 00:49:03,200 --> 00:49:03,520 GRAPH ITSELF. 1195 00:49:03,520 --> 00:49:04,720 KNOWLEDGE OF THOSE ARE PERFECT. 1196 00:49:04,720 --> 00:49:08,320 ALL THEM ARE HARD TO DO. 1197 00:49:08,320 --> 00:49:10,400 AND I THINK IT'S A REALLY 1198 00:49:10,400 --> 00:49:12,320 IMPORTANT OPEN QUESTION IN OUR 1199 00:49:12,320 --> 00:49:15,280 FIELD IS BOTH HOW DO WE EVALUATE 1200 00:49:15,280 --> 00:49:17,720 KNOWLEDGE GRAPHS AND RELATEDLY, 1201 00:49:17,720 --> 00:49:19,320 ONCE WE CAN EVALUATE THEM, HOW 1202 00:49:19,320 --> 00:49:22,800 DO WE BUILD THE HIGHEST QUALITY 1203 00:49:22,800 --> 00:49:26,720 ONES WE CAN POSSIBLY BUILD. 1204 00:49:26,720 --> 00:49:28,800 I THINK I CAN SAY 1205 00:49:28,800 --> 00:49:33,440 WITHOUT CLOSING THINGS I 1206 00:49:33,440 --> 00:49:39,240 SHOULDN'T DISCLOSE WITH 1207 00:49:39,240 --> 00:49:39,880 PHARMACEUTICALS THEY'RE 1208 00:49:39,880 --> 00:49:40,840 CONCERNED ABOUT THE QUALITY OF 1209 00:49:40,840 --> 00:49:42,520 THE GRAPH AND HOW RELIABLE THE 1210 00:49:42,520 --> 00:49:43,520 INFERENCES COMING OUT OF THEM. 1211 00:49:43,520 --> 00:49:47,720 IT'S A CRITICAL QUESTION TO THE 1212 00:49:47,720 --> 00:49:49,680 WIDESPREAD ADOPTION OF KNOWLEDGE 1213 00:49:49,680 --> 00:50:00,200 GRAPHS IN BIOMEDICAL RESEARCH. 1214 00:50:05,160 --> 00:50:07,280 >>LOOKING AT DIFFERENT 1215 00:50:07,280 --> 00:50:07,960 PERSPECTIVES. 1216 00:50:07,960 --> 00:50:11,320 >>I ULTIMATELY IT WILL BE LIKE 1217 00:50:11,320 --> 00:50:13,280 NATURAL LANGUAGE WHERE WE LOOK 1218 00:50:13,280 --> 00:50:15,480 AT PERFORMANCE ACROSS MANY TASKS 1219 00:50:15,480 --> 00:50:19,320 VERSUS ONE THE SAME WAY WE 1220 00:50:19,320 --> 00:50:19,720 EVALUATE KNOWLEDGE. 1221 00:50:19,720 --> 00:50:29,960 >>THANK YOU. 1222 00:50:33,520 --> 00:50:39,840 >>HOW WE IDENTIFY THE CONCEPTS 1223 00:50:39,840 --> 00:50:50,400 AND FOR EXAMPLE HOW DO THEY DO 1224 00:50:55,080 --> 00:50:55,600 REPRESENTATIONS. 1225 00:50:55,600 --> 00:50:56,360 >>WE DON'T IN THIS PARTICULAR 1226 00:50:56,360 --> 00:51:05,120 CASE. 1227 00:51:05,120 --> 00:51:08,880 THE PATH WITH INTERACTING WITH 1228 00:51:08,880 --> 00:51:09,160 RECORDS. 1229 00:51:09,160 --> 00:51:12,080 IT'S A NATIONAL SYSTEM FOR 1230 00:51:12,080 --> 00:51:15,600 STANDARDIZED OR LAYER OF 1231 00:51:15,600 --> 00:51:19,760 STANDARDIZED INTERACTIONS. 1232 00:51:19,760 --> 00:51:22,480 O MAP IS A VOCABULARY OF COMMON 1233 00:51:22,480 --> 00:51:24,440 DATA ELEMENTS AND ALL THE 1234 00:51:24,440 --> 00:51:26,320 VOCABULARY WE USED IN OUR 1235 00:51:26,320 --> 00:51:28,400 APPLICATIONS CAME FROM FIELDED 1236 00:51:28,400 --> 00:51:30,480 ASPECTS OF THE ELECTRONIC 1237 00:51:30,480 --> 00:51:32,280 PATIENT RECORDS. 1238 00:51:32,280 --> 00:51:36,480 FROM THE DRUG LIST AND DIAGNOSIS 1239 00:51:36,480 --> 00:51:39,600 LIST AND DID NOT DO NATURAL 1240 00:51:39,600 --> 00:51:40,480 LANGUAGE PROCESSING OVER 1241 00:51:40,480 --> 00:51:41,840 CLINICAL NOTES. 1242 00:51:41,840 --> 00:51:46,440 THE WORK I'M MOST HIGH ON IN THE 1243 00:51:46,440 --> 00:51:48,880 DOMAIN MAPS DIRECTLY TO OPEN 1244 00:51:48,880 --> 00:51:50,800 BIOMEDICAL ONTOLOGY PARTICULARLY 1245 00:51:50,800 --> 00:52:01,320 TO THE HUMAN PHENOTYPE ONTOLOGY. 1246 00:52:15,400 --> 00:52:17,120 THEY DON'T HAVE THE SAME AMOUNT 1247 00:52:17,120 --> 00:52:18,640 OF GRANULARITY BASIC SCIENTISTS 1248 00:52:18,640 --> 00:52:21,000 WOULD WANT TO SEE IN THE HUMAN 1249 00:52:21,000 --> 00:52:26,760 PHENOTYPE ONTOLOGY IS A BETTER 1250 00:52:26,760 --> 00:52:37,280 TARGET FOR NATURAL PROCESSING. 1251 00:52:41,440 --> 00:52:46,440 THERE'S DIFFERENT REPORTS AND 1252 00:52:46,440 --> 00:52:49,840 THE LIKE. 1253 00:52:49,840 --> 00:52:51,640 DOING EXTRACTION IS A BROAD AND 1254 00:52:51,640 --> 00:52:52,440 DIVERSE FIELD WHERE THERE'S BEEN 1255 00:52:52,440 --> 00:52:57,200 A LOT OF ACTIVITY. 1256 00:52:57,200 --> 00:52:59,440 YOU DON'T THINK WE'RE WORKING ON 1257 00:52:59,440 --> 00:53:02,280 DIRECTLY I DESCRIBED TODAY AT 1258 00:53:02,280 --> 00:53:08,120 LEAST DIRECTLY BEARS ON THAT. 1259 00:53:08,120 --> 00:53:18,680 EDEEP LEARNING VAENS ADVANCED T 1260 00:53:19,000 --> 00:53:24,920 STATE OF THE ART AND MODELS 1261 00:53:24,920 --> 00:53:27,600 TRAINED EXPLICITLY CLINICAL 1262 00:53:27,600 --> 00:53:29,840 PERFORM BETTER THAN THOSE 1263 00:53:29,840 --> 00:53:33,440 TRAINED ON GENERAL GOOD DATA 1264 00:53:33,440 --> 00:53:35,280 EVEN WHEN THEY'RE TUNED USING 1265 00:53:35,280 --> 00:53:41,640 CLINICAL OR SCIENTIFIC DATA. 1266 00:53:41,640 --> 00:53:46,560 >>I'LL ASK THE LAST QUESTION 1267 00:53:46,560 --> 00:53:53,600 FROM OUR AUDIENCE AND IT'S ABOUT 1268 00:53:53,600 --> 00:53:56,840 HOW PATIENT DATA IS CARRIED OUT 1269 00:53:56,840 --> 00:53:59,560 BEFORE PRODUCING THE PATIENT 1270 00:53:59,560 --> 00:53:59,800 VECTOR. 1271 00:53:59,800 --> 00:54:04,440 I GUESS THERE'S SOME 1272 00:54:04,440 --> 00:54:05,280 INDICATIONS. 1273 00:54:05,280 --> 00:54:10,200 >>WE HAVE -- WE BUILT THIS ON A 1274 00:54:10,200 --> 00:54:15,520 MINIMAL DATA SET BUT ON AN 1275 00:54:15,520 --> 00:54:16,120 IDENTIFIABLE DATA SET. 1276 00:54:16,120 --> 00:54:18,960 AND BECAUSE WE NEEDED TO GET 1277 00:54:18,960 --> 00:54:20,960 ACCESS TO THE DIAGNOSIS LIST, 1278 00:54:20,960 --> 00:54:25,920 THE DRUG LIST AND THE LABS. 1279 00:54:25,920 --> 00:54:27,360 THERE IS ELECTRONIC PATIENT 1280 00:54:27,360 --> 00:54:29,920 RECORD WAREHOUSE AT THE 1281 00:54:29,920 --> 00:54:31,240 UNIVERSITY OF COLORADO CALLED 1282 00:54:31,240 --> 00:54:34,240 CAMPUS THAT INTEGRATES THE 1283 00:54:34,240 --> 00:54:34,520 INFORMATION. 1284 00:54:34,520 --> 00:54:38,200 IT MOSTLY CAME FROM THE GENERAL 1285 00:54:38,200 --> 00:54:39,480 HOSPITALS SINCE THAT'S WHERE THE 1286 00:54:39,480 --> 00:54:41,160 RARE DISEASE INFORMATION IS AND 1287 00:54:41,160 --> 00:54:42,880 HAD TO GO THROUGH HOOP JUMPING 1288 00:54:42,880 --> 00:54:45,520 AND THE LIKE TO GET ACCESS IN 1289 00:54:45,520 --> 00:54:47,320 THE FIRST PLACE AND THEN AS I 1290 00:54:47,320 --> 00:54:51,280 SAID WE USED ONLY FIELDED DATA. 1291 00:54:51,280 --> 00:54:54,880 SO IT WAS RELATIVELY 1292 00:54:54,880 --> 00:54:55,240 STRAIGHTFORWARD. 1293 00:54:55,240 --> 00:55:04,440 THE O MAP DATA ARE 1294 00:55:04,440 --> 00:55:06,600 TRANSFORMATIONS AND WE GOT IN 1295 00:55:06,600 --> 00:55:07,960 THE BIOMEDICAL ONTOLOGY SPACE 1296 00:55:07,960 --> 00:55:09,320 AND DID THE GRAPH REPRESENTATION 1297 00:55:09,320 --> 00:55:11,600 LEARNING TO GENERATE THOSE 1298 00:55:11,600 --> 00:55:12,560 PATIENT VECTORS. 1299 00:55:12,560 --> 00:55:14,560 SO THOSE ARE THE STEPS IN OUR 1300 00:55:14,560 --> 00:55:14,760 WORLD. 1301 00:55:14,760 --> 00:55:16,960 WE HAD TO DO ALL THE HOOPS TO 1302 00:55:16,960 --> 00:55:21,560 GET ACCESS TO PROTECTING HEALTH 1303 00:55:21,560 --> 00:55:24,320 INFORMATION AND MAKE SURE WE 1304 00:55:24,320 --> 00:55:26,920 RIGHTED IT PROPERLY BUT THE 1305 00:55:26,920 --> 00:55:28,680 VECTORS WHICH ARE IRB AND THE 1306 00:55:28,680 --> 00:55:30,600 HOSPITALS THAT PROVIDE THE DATA 1307 00:55:30,600 --> 00:55:41,000 AGREED WE'RE NOT EHI. 1308 00:55:44,440 --> 00:55:48,040 >>OVER TO LANA. 1309 00:55:48,040 --> 00:55:48,640 >>I WOULD LIKE TO THANK YOU 1310 00:55:48,640 --> 00:56:18,400 VERY MUCH. 1311 00:56:18,400 --> 00:56:20,800 >>THANK YOU IT'S NICE TO BE 1312 00:56:20,800 --> 00:56:21,480 BACK AT NIH AND ENJOYED BY TIME 1313 00:56:21,480 --> 00:56:22,160 IN THE INTRAMURAL SCIENCES. 1314 00:56:22,160 --> 00:00:00,000 GOOD-BYE.