1 00:00:05,284 --> 00:00:06,085 WELCOME EVERYONE GOOD 2 00:00:06,085 --> 00:00:08,087 AFTERNOON AND THANK YOU FOR 3 00:00:08,087 --> 00:00:08,387 JOINING US. 4 00:00:08,387 --> 00:00:12,324 I'M YOUR HOST FOR TODAY'S EVENT 5 00:00:12,324 --> 00:00:14,894 ORGANIZED BY NATIONAL LANGUAGE 6 00:00:14,894 --> 00:00:18,464 PROCESSING AND TEXT MINING 7 00:00:18,464 --> 00:00:20,132 SPECIAL INTEREST GROUP AT NIH. 8 00:00:20,132 --> 00:00:22,034 LEARNING HAS TRANSFORMED THE 9 00:00:22,034 --> 00:00:25,371 FIELD BUT THE IMPACT EXTENDS FAR 10 00:00:25,371 --> 00:00:27,306 BEYOND THE FIELD OF BIOLOGY AS 11 00:00:27,306 --> 00:00:27,540 WELL. 12 00:00:27,540 --> 00:00:29,708 TODAY WE'RE FORTUNATE TO HAVE 13 00:00:29,708 --> 00:00:32,111 TWO EXCEPTIONAL SPEAKERS WHO 14 00:00:32,111 --> 00:00:33,579 WILL DISCUSS HOW DEEP LEARNING 15 00:00:33,579 --> 00:00:36,715 IS BEING APPLIED TO THE FIELD OF 16 00:00:36,715 --> 00:00:37,216 COMPUTATIONAL BIOLOGY. 17 00:00:37,216 --> 00:00:40,219 THIS EVENT IS GIVEN THE 18 00:00:40,219 --> 00:00:41,787 RECOGNITION OF ALPHA FOLD IN THE 19 00:00:41,787 --> 00:00:44,490 NOVEL PRIZES HIGHLIGHTING THE 20 00:00:44,490 --> 00:00:45,658 GROUND BREAKING ROLE OF A.I. IN 21 00:00:45,658 --> 00:00:46,492 SCIENTIFIC DISCOVERY. 22 00:00:46,492 --> 00:00:50,529 OUR TWO SPEAKERS ARE DR. LAUREN 23 00:00:50,529 --> 00:00:53,432 PORTER AND DR. IVAN OVCHARENKO. 24 00:00:53,432 --> 00:00:57,503 THEY'RE BOTH TENURE TRACK 25 00:00:57,503 --> 00:01:01,006 INVESTIGATORS AT THE DIVISION OF 26 00:01:01,006 --> 00:01:02,241 INTRAMURAL RESEARCH. 27 00:01:02,241 --> 00:01:05,010 DR. PORTER WILL KICK OFF THE 28 00:01:05,010 --> 00:01:05,978 EVENT FOLLOWED BY 29 00:01:05,978 --> 00:01:06,712 DR. OVCHARENKO. 30 00:01:06,712 --> 00:01:09,815 IF YOU HAVE QUESTIONS FOR 31 00:01:09,815 --> 00:01:11,984 DR. PORTER USE THE LIVE FEEDBACK 32 00:01:11,984 --> 00:01:13,352 BUTTON AVAILABLE IN YOUR 33 00:01:13,352 --> 00:01:14,353 VIDEOCAST SCREEN AND WE'LL HAVE 34 00:01:14,353 --> 00:01:16,455 A QUESTION AND ANSWER SESSION 35 00:01:16,455 --> 00:01:22,428 WITH HER AROUND 2:25 AND 2:30 36 00:01:22,428 --> 00:01:23,662 BEFORE SWITCHING TO 37 00:01:23,662 --> 00:01:24,430 DR. OVCHARENKO. 38 00:01:24,430 --> 00:01:26,632 WITH THAT I'M EXCITED TO INVITE 39 00:01:26,632 --> 00:01:33,405 TO THE FLOOR, DR. LAUREN PORTER. 40 00:01:33,405 --> 00:01:34,907 >> THANK YOU. 41 00:01:34,907 --> 00:01:41,480 I WILL SHARE MY SCREEN. 42 00:01:41,480 --> 00:01:42,915 HOLD ON A SECOND. 43 00:01:42,915 --> 00:01:47,686 I HAVE A PROBLEM. 44 00:01:47,686 --> 00:01:49,555 >> IT LOOKS LIKE IT'S ONLY THREE 45 00:01:49,555 --> 00:01:52,258 OF US, YOU, ME AND LAUREN. 46 00:01:52,258 --> 00:01:54,727 DO YOU WANT TO GIVE OTHERS FIVE 47 00:01:54,727 --> 00:01:56,195 MORE MINUTES TO GIVE THEM A 48 00:01:56,195 --> 00:01:58,964 CHANCE TO JOIN OR I MIGHT BE 49 00:01:58,964 --> 00:01:59,531 MISSING THE LIST OF 50 00:01:59,531 --> 00:02:05,437 PARTICIPANTS. 51 00:02:05,437 --> 00:02:07,640 >> IVAN, WE'RE IN THE CLOSED 52 00:02:07,640 --> 00:02:08,774 GROUP IN THE ZOOM SESSION IT'S 53 00:02:08,774 --> 00:02:10,643 ONLY US AND EVERYBODY ELSE IS 54 00:02:10,643 --> 00:02:12,778 JOINING THROUGH THE VIDEOCAST. 55 00:02:12,778 --> 00:02:14,446 WE HAVE MANY MORE PEOPLE JUST 56 00:02:14,446 --> 00:02:16,215 WATCHING US THROUGH THE VIDEO. 57 00:02:16,215 --> 00:02:18,651 >> OKAY, MY APOLOGIES THEN. 58 00:02:18,651 --> 00:02:20,486 >> THEY'LL SEND QUESTIONS IN SO 59 00:02:20,486 --> 00:02:22,454 I'LL BE HAPPY TO READ OUT THE 60 00:02:22,454 --> 00:02:23,022 QUESTIONS AND COMMENTS FROM 61 00:02:23,022 --> 00:02:23,389 THEM. 62 00:02:23,389 --> 00:02:23,622 >> OKAY. 63 00:02:23,622 --> 00:02:26,825 THANK YOU. 64 00:02:26,825 --> 00:02:28,827 >> ALL RIGHT. 65 00:02:28,827 --> 00:02:30,729 WELL, THANK YOU, LANA FOR THE 66 00:02:30,729 --> 00:02:31,130 INVITATION. 67 00:02:31,130 --> 00:02:34,500 YOU CAN SEE I CHANGED MY TITLE 68 00:02:34,500 --> 00:02:37,469 TO TODAY WE'RE GOING TO TALK 69 00:02:37,469 --> 00:02:38,971 ABOUT CLASS OF PROTEINS CALLED 70 00:02:38,971 --> 00:02:40,472 FOLD SWITCHING PROTEINS THAT 71 00:02:40,472 --> 00:02:42,908 SHOW SOMETHING ABOUT LIMITATIONS 72 00:02:42,908 --> 00:02:45,411 OF ALPHA FOLD AS POWERFUL AS IT 73 00:02:45,411 --> 00:02:45,778 IS. 74 00:02:45,778 --> 00:02:49,415 IF WE HAVE TIME AT THE END WE'LL 75 00:02:49,415 --> 00:02:51,717 TALK A LITTLE BIT ABOUT HOW IT'S 76 00:02:51,717 --> 00:02:51,951 WORKING. 77 00:02:51,951 --> 00:03:01,160 SO IF YOU LOOK AT THE REQ SEQ 78 00:03:01,160 --> 00:03:05,064 DATABASE IT HAS ABOUT 3.6 NON 79 00:03:05,064 --> 00:03:06,865 REDUNDANT SEQUENCES FROM MANY 80 00:03:06,865 --> 00:03:14,106 OREN 81 00:03:14,106 --> 00:03:14,606 OREN 82 00:03:14,606 --> 00:03:16,442 ORENI -- ORGANISMS. 83 00:03:16,442 --> 00:03:23,215 AN NATURAL QUESTION THAT COMES P 84 00:03:23,215 --> 00:03:25,250 IS HOW DO THEY REPRESENT THE 85 00:03:25,250 --> 00:03:26,452 PROTEIN UNIVERSE. 86 00:03:26,452 --> 00:03:28,487 IF YOU TAKE ANYTHING FROM THE 87 00:03:28,487 --> 00:03:29,955 TALK I HOPE WHAT YOU'LL TAKE IS 88 00:03:29,955 --> 00:03:30,990 IT'S NOT A FULL REPRESENTATION 89 00:03:30,990 --> 00:03:32,658 AND WE STILL DON'T REALLY KNOW 90 00:03:32,658 --> 00:03:33,625 HOW COMPLETE OR INCOMPLETE THE 91 00:03:33,625 --> 00:03:37,062 REPRESENTATION IS. 92 00:03:37,062 --> 00:03:38,230 SO, WHAT DO I MEAN BY THAT? 93 00:03:38,230 --> 00:03:39,565 HERE'S AN EXAMPLE OF WHAT I MEAN 94 00:03:39,565 --> 00:03:42,868 OF A PROTEIN THAT KIND OF DEFIES 95 00:03:42,868 --> 00:03:43,268 EXPECTATIONS. 96 00:03:43,268 --> 00:03:46,772 THIS PROTEIN IS CALLED RFH. 97 00:03:46,772 --> 00:03:49,241 IT HAS AN INTERNAL DOMAIN YOU 98 00:03:49,241 --> 00:03:53,679 CAN SEE IN GRAY WHICH BINDS RNA 99 00:03:53,679 --> 00:03:55,114 POLYMERASE AND C TERMINAL DOMAIN 100 00:03:55,114 --> 00:03:57,016 IN TEAL WHICH IS IMPORTANT FOR 101 00:03:57,016 --> 00:03:59,385 AUTO INHIBITION. 102 00:03:59,385 --> 00:04:03,222 THIS PROTEIN ONLY WORKS AT SITES 103 00:04:03,222 --> 00:04:06,892 WITHIN AN OPS DNA SEQUENCE. 104 00:04:06,892 --> 00:04:10,396 BUT UPON BINDING THE SEQUENCE 105 00:04:10,396 --> 00:04:12,831 AND POLYMERASE THE C TERMINAL 106 00:04:12,831 --> 00:04:14,666 DOMAIN DE TAXES FROM THE END 107 00:04:14,666 --> 00:04:17,069 TERMINAL DOMAIN AND REFOLDS TO A 108 00:04:17,069 --> 00:04:22,441 BETA BARREL CONFIRMATION THAT 109 00:04:22,441 --> 00:04:25,110 BINDS THE INTEGRAL SUB UNIT OF 110 00:04:25,110 --> 00:04:26,912 THE RIBOSOME FOSTERING EFFICIENT 111 00:04:26,912 --> 00:04:27,212 TRANSLATION. 112 00:04:27,212 --> 00:04:30,249 WHEN I FIRST LEARNED PROTEINS 113 00:04:30,249 --> 00:04:31,450 COULD DO THIS, I THOUGHT WOW. 114 00:04:31,450 --> 00:04:34,853 I DIDN'T THINK PROTEINS COULD 115 00:04:34,853 --> 00:04:43,095 DRAMATICALLY AND REVERSIBLY 116 00:04:43,095 --> 00:04:44,963 SWITCH THEIR PROTEIN BUT IT'S 117 00:04:44,963 --> 00:04:46,165 IMPORTANT FOR REGULATORY 118 00:04:46,165 --> 00:04:48,801 MECHANISMS AND A NUMBER OF HUMAN 119 00:04:48,801 --> 00:04:49,068 DISEASES. 120 00:04:49,068 --> 00:04:53,539 HERE'S A FEW EXAMPLES. 121 00:04:53,539 --> 00:04:55,607 TUBERCULOSIS, ALZHEIMER'S, 122 00:04:55,607 --> 00:05:01,847 GASTROENTERITIS AND THE PROBLEM 123 00:05:01,847 --> 00:05:02,648 WITH SWITCHING PROTEINS THERE'S 124 00:05:02,648 --> 00:05:04,483 SO LITTLE WE KNOW. 125 00:05:04,483 --> 00:05:07,319 IT'S A VERY DATA POOR AREA RIGHT 126 00:05:07,319 --> 00:05:07,653 NOW. 127 00:05:07,653 --> 00:05:09,655 SO THERE ARE A LOT OF BARRIERS 128 00:05:09,655 --> 00:05:11,223 TO UNDERSTANDING HOW THE 129 00:05:11,223 --> 00:05:14,893 PROTEINS ARE WORKING AND WHAT 130 00:05:14,893 --> 00:05:15,794 THEY'RE ROLES ARE IN BIOLOGY. 131 00:05:15,794 --> 00:05:20,432 SO, I BELIEVE IF WE CAN 132 00:05:20,432 --> 00:05:22,101 UNDERSTAND HOW IT CHANGES IN OUR 133 00:05:22,101 --> 00:05:23,001 ENVIRONMENT CAN CAUSE PROTEINS 134 00:05:23,001 --> 00:05:24,770 TO CHANGE THEIR FOLDS AND 135 00:05:24,770 --> 00:05:27,940 FUNCTION COULD HAVE SIGNIFICANT 136 00:05:27,940 --> 00:05:28,273 IMPLICATIONS. 137 00:05:28,273 --> 00:05:31,376 IT COULD GIVE US A MORE 138 00:05:31,376 --> 00:05:32,144 UNDERSTANDING OF PROTEIN FOLDING 139 00:05:32,144 --> 00:05:38,917 AND DYNAMICS AND DISCOVER NEW 140 00:05:38,917 --> 00:05:40,953 BIOLOGICAL PATHWAYS SINCE WE 141 00:05:40,953 --> 00:05:44,590 KNOW FOLD SWITCHING CAN LEAD TO 142 00:05:44,590 --> 00:05:46,291 ALTERNATIVE FOLDED STATES. 143 00:05:46,291 --> 00:05:47,893 FOR EXAMPLE THERE'S ONE MUTATION 144 00:05:47,893 --> 00:05:50,229 HIGHLY ASSOCIATED WITH CANCER IN 145 00:05:50,229 --> 00:05:54,266 THE MEF2B PROTEIN A D83V 146 00:05:54,266 --> 00:05:55,534 MUTATION THAT IS BELIEVED TO 147 00:05:55,534 --> 00:05:58,237 SWITCH THE PROTEIN SECONDARY 148 00:05:58,237 --> 00:05:59,838 STRUCTURE ELEMENT FROM ALPHA 149 00:05:59,838 --> 00:06:02,441 HELIX TO BETA SHEET AND 150 00:06:02,441 --> 00:06:05,644 INACTIVATE THE PROTEIN AND ALSO 151 00:06:05,644 --> 00:06:07,746 SPLICING AND POLYMORPHISMS AND 152 00:06:07,746 --> 00:06:09,581 POST TRANSLATIONAL MODIFICATIONS 153 00:06:09,581 --> 00:06:10,682 CAN COULD PROTEINS TO SWITCH 154 00:06:10,682 --> 00:06:11,783 THEIR FOLDS AND FUNCTIONS. 155 00:06:11,783 --> 00:06:16,455 I GAVE AN EXAMPLE OF THE SNIP 156 00:06:16,455 --> 00:06:22,861 AND THERE'S PRO -- PROTEINS THAT 157 00:06:22,861 --> 00:06:23,795 SWITCH FOLD. 158 00:06:23,795 --> 00:06:25,430 THERE'S BARRIERS TO PROGRESS. 159 00:06:25,430 --> 00:06:26,865 THERE'S NOT VERY MANY EXAMPLES 160 00:06:26,865 --> 00:06:33,105 ANDS YOU KNOW IT BE HARD TO 161 00:06:33,105 --> 00:06:35,507 GENERALIZE IN A DATA POOR AREA 162 00:06:35,507 --> 00:06:36,141 OF RESEARCH. 163 00:06:36,141 --> 00:06:40,445 FOLD SWITCHING IS IS HARD TO 164 00:06:40,445 --> 00:06:41,780 PREDICT AND WHY WE'LL FOCUS OUR 165 00:06:41,780 --> 00:06:43,649 ATTENTION AND THEY'RE DIFFICULT 166 00:06:43,649 --> 00:06:45,184 TO HANDLE EXPERIMENTALLY. 167 00:06:45,184 --> 00:06:51,990 HERE'S A PICTURE OF THREE NMR 168 00:06:51,990 --> 00:06:53,759 TUBES AND THIS PROTEIN CRASHED 169 00:06:53,759 --> 00:06:55,794 THE SOLUTION WITHIN MINUTES OF 170 00:06:55,794 --> 00:06:57,429 ME PUTTING IT IN THE TUBES AND 171 00:06:57,429 --> 00:07:00,899 IT MAY BE THIS IS A DATA POOR 172 00:07:00,899 --> 00:07:01,667 AREA BECAUSE THEY'RE UNSTABLE 173 00:07:01,667 --> 00:07:03,202 AND DIFFICULT TO WORK WITH. 174 00:07:03,202 --> 00:07:05,337 SO TODAY WE'RE GOING TO FOCUS 175 00:07:05,337 --> 00:07:07,372 ALL OF OUR ATTENTION EVEN THOUGH 176 00:07:07,372 --> 00:07:09,441 MY LAB HAS A BUNCH OF 177 00:07:09,441 --> 00:07:10,442 INTERESTING WE'LL FOCUS OUR 178 00:07:10,442 --> 00:07:12,911 ATTENTION ON PREDICTING FOLD 179 00:07:12,911 --> 00:07:13,946 SWITCHERS FROM SEQUENCE. 180 00:07:13,946 --> 00:07:14,913 THE NATURAL QUESTION EVERYBODY 181 00:07:14,913 --> 00:07:16,582 ASKS ME IS WHY DON'T YOU JUST 182 00:07:16,582 --> 00:07:18,917 USE ALPHA FOLD AND IT TURNS OUT 183 00:07:18,917 --> 00:07:19,918 FOLD SWITCHING IS EVEN DIFFICULT 184 00:07:19,918 --> 00:07:21,587 FOR ALPHA FOLD. 185 00:07:21,587 --> 00:07:24,656 SO BACK WHEN ALPHA FOLD WAS 186 00:07:24,656 --> 00:07:26,925 FIRST PUBLISHED WE ASKED HOW IT 187 00:07:26,925 --> 00:07:29,561 DOES PREDICTING THE TWO 188 00:07:29,561 --> 00:07:31,964 CONFIRMATION OF FOLD PREDICTING 189 00:07:31,964 --> 00:07:32,998 PROTEINS FOLD 1 AND 2 AND 190 00:07:32,998 --> 00:07:34,733 MEASURED ACCURACY BY A SCORE AND 191 00:07:34,733 --> 00:07:39,037 CAN GET A SIMILAR RESULT USING 192 00:07:39,037 --> 00:07:39,238 RMSD. 193 00:07:39,238 --> 00:07:41,907 THIS IS THE IDENTIFY LINE AND 194 00:07:41,907 --> 00:07:43,875 WOULD EXPECT IF ALPHA FOLD 2 195 00:07:43,875 --> 00:07:45,277 PREDICTED CONFIRMATIONS 196 00:07:45,277 --> 00:07:48,080 CONSISTENTLY AND RELIABLY WE'D 197 00:07:48,080 --> 00:07:50,048 SEE AN EQUAL DISTRIBUTION OF 198 00:07:50,048 --> 00:07:51,216 POINTS ABOVE AND BELOW THE 199 00:07:51,216 --> 00:07:53,185 DIAGONAL BUT THAT'S NOT WHAT WE 200 00:07:53,185 --> 00:07:53,518 SEE. 201 00:07:53,518 --> 00:07:56,421 ALPHA FOLD IS BIAS TO PREDICTING 202 00:07:56,421 --> 00:07:57,389 ONE CONFIRMATION AND MISSING THE 203 00:07:57,389 --> 00:07:57,889 OTHER COMPLETELY. 204 00:07:57,889 --> 00:08:00,158 THIS IS AN ALPHA FOLD WHEN IT 205 00:08:00,158 --> 00:08:02,027 WAS FIRST RELEASED. 206 00:08:02,027 --> 00:08:04,129 YOU COULD ASK MORE THINGS COULD 207 00:08:04,129 --> 00:08:06,598 BE DONE AND THE MODEL CAME OUT 208 00:08:06,598 --> 00:08:09,568 AND ENHANCED SAMPLING METHODS 209 00:08:09,568 --> 00:08:10,602 CAME OUT. 210 00:08:10,602 --> 00:08:11,503 WHY NOT TRY THOSE. 211 00:08:11,503 --> 00:08:12,104 SO WE DID. 212 00:08:12,104 --> 00:08:14,840 WE ENDED UP GENERATING NEARLY 213 00:08:14,840 --> 00:08:17,109 300,000 STRUCTURES RUNNING EVERY 214 00:08:17,109 --> 00:08:21,013 VERSION OF ALPHA FOLD AND EVERY 215 00:08:21,013 --> 00:08:22,614 ITERATION YOU CAN IMAGINATION. 216 00:08:22,614 --> 00:08:26,318 WE RAN ALL PROTEIN BINDING 217 00:08:26,318 --> 00:08:29,755 PARTNERS AND ON MONOMERIC 218 00:08:29,755 --> 00:08:34,459 SPECIES AND EVERY SET OF 219 00:08:34,459 --> 00:08:40,565 CONDITIONS WE COULD PHOR THE 220 00:08:40,565 --> 00:08:42,367 STRUCTURES AND 70% OF ALL 221 00:08:42,367 --> 00:08:44,503 INTERACTIONS AND THE METHODS 222 00:08:44,503 --> 00:08:47,272 COMBINED AND GENERATING NEARLY 223 00:08:47,272 --> 00:08:48,907 300,000 DIFFERENT STRUCTURES HAD 224 00:08:48,907 --> 00:08:50,909 A PRETTY POOR SUCCESS RATE. 225 00:08:50,909 --> 00:08:53,845 SO WE'RE ONLY GENERATING OUT OF 226 00:08:53,845 --> 00:08:55,847 92 FOLD SWITCHING PROTEINS ONLY 227 00:08:55,847 --> 00:08:57,983 GETTING ABOUT 35% SUCCESS RATE 228 00:08:57,983 --> 00:08:59,618 AND THESE ARE FOR PROTEINS THAT 229 00:08:59,618 --> 00:09:02,454 WERE PROBABLY IN ALPHA FOLDS 230 00:09:02,454 --> 00:09:06,358 TRAINING SET BECAUSE THEY WERE 231 00:09:06,358 --> 00:09:08,193 PUBLISHED MAY OF 2018. 232 00:09:08,193 --> 00:09:10,896 SO I WOULD SAY IT SHOWS ALPHA 233 00:09:10,896 --> 00:09:16,468 FOLD IS A WEAK PREDICTOR OF 234 00:09:16,468 --> 00:09:16,902 SWITCHING PROTEINS. 235 00:09:16,902 --> 00:09:18,837 CAN ALPHA FOLD DO BETTER? 236 00:09:18,837 --> 00:09:20,339 WE NEED TO KNOW HOW IT WORKS. 237 00:09:20,339 --> 00:09:22,607 WE TAKE AN INPUT SEQUENCE AND 238 00:09:22,607 --> 00:09:24,876 USE A GENETIC DATABASE TO MAKE A 239 00:09:24,876 --> 00:09:28,480 MULTIPLE SEQUENCE ALIGNMENT AND 240 00:09:28,480 --> 00:09:30,248 ALPHA FOLD TAKES THIS AND MAKES 241 00:09:30,248 --> 00:09:33,952 AN MSA REPRESENTATION AND USES 242 00:09:33,952 --> 00:09:35,587 THE IMPORTANT BLOCK CALLED THE 243 00:09:35,587 --> 00:09:39,825 EVO FORMER A CNN TO GENERATE A 244 00:09:39,825 --> 00:09:40,959 PAIR REPRESENTATION FROM WHICH A 245 00:09:40,959 --> 00:09:43,395 STRUCTURE CAN BE MADE. 246 00:09:43,395 --> 00:09:46,898 SO BASICALLY THIS PAIR 247 00:09:46,898 --> 00:09:48,600 REPRESENTATION PROVIDES 248 00:09:48,600 --> 00:09:49,368 RESTRAINTS THAT ALLOWS ALPHA 249 00:09:49,368 --> 00:09:52,170 FOLD TO MAKE A STRUCTURE AND 250 00:09:52,170 --> 00:09:53,605 SOMETIMES PASSING IT BACK TO THE 251 00:09:53,605 --> 00:09:56,141 MODEL AND RERUNNING THE MODEL IN 252 00:09:56,141 --> 00:09:58,210 A PROCESS CALLED RECYCLES MAKES 253 00:09:58,210 --> 00:09:59,945 FOR BETTER PREDICTIONS. 254 00:09:59,945 --> 00:10:02,180 THE QUESTION IS HOW DOES ALPHA 255 00:10:02,180 --> 00:10:08,620 FOLD PREDICT FOLD SWITCHED 256 00:10:08,620 --> 00:10:08,954 CONFIRMATION? 257 00:10:08,954 --> 00:10:12,391 IT CAN CONFIRM ONE SWITCH BUT 258 00:10:12,391 --> 00:10:14,426 WHAT ABOUT THE OTHER 259 00:10:14,426 --> 00:10:14,760 CONFIRMATION? 260 00:10:14,760 --> 00:10:15,293 THAT'S THE CHALLENGE. 261 00:10:15,293 --> 00:10:16,962 THERE'S TWO OPTIONS HERE. 262 00:10:16,962 --> 00:10:19,498 ONE IS THAT IT'S GENERATIVE 263 00:10:19,498 --> 00:10:22,467 WHICH WOULD MEAN IT'S USING 264 00:10:22,467 --> 00:10:25,203 ALTERNATIVE CO-EVOLUTIONARY 265 00:10:25,203 --> 00:10:27,172 SIGNALS TO MAKE STRUCTURES IT 266 00:10:27,172 --> 00:10:28,473 HAS NOT SEEN IN THE TRAIN SET 267 00:10:28,473 --> 00:10:30,709 BEFORE AND THE OTHER OPTION 268 00:10:30,709 --> 00:10:33,178 WOULD BE IT'S ASSOCIATIVE THAT 269 00:10:33,178 --> 00:10:37,716 MEANS FOR PREDICTING ALTERNATIVE 270 00:10:37,716 --> 00:10:39,451 CONFIRMATIONS ALPHA FOLD IS 271 00:10:39,451 --> 00:10:42,788 ACTUALLY NOT USING 272 00:10:42,788 --> 00:10:43,855 CO-EVOLUTIONARY INFORMATION AND 273 00:10:43,855 --> 00:10:46,625 TRAINING ON ITS TRAINING FROM 274 00:10:46,625 --> 00:10:50,695 THE PBB PROBABLY ASSOCIATIONS 275 00:10:50,695 --> 00:10:52,297 SEQUENCES WITH STRUCTURES DURING 276 00:10:52,297 --> 00:10:52,564 TRAINING. 277 00:10:52,564 --> 00:10:56,435 I'M CONVINCED AND WHEN IT COMES 278 00:10:56,435 --> 00:11:00,439 TO ALTERNATIVE CONFIRMATIONS OF 279 00:11:00,439 --> 00:11:02,307 SWITCH PROTEINS ALPHA FOLD IS 280 00:11:02,307 --> 00:11:02,707 ASSOCIATED. 281 00:11:02,707 --> 00:11:04,409 HOW DO I KNOW THIS? 282 00:11:04,409 --> 00:11:08,046 AS YOU KNOW EXPERIMENTS LAG 283 00:11:08,046 --> 00:11:08,346 PREDICTION. 284 00:11:08,346 --> 00:11:10,482 SO, RECENTLY A NUMBER OF 285 00:11:10,482 --> 00:11:13,485 STRUCTURES HAVE BEEN PUBLISHED. 286 00:11:13,485 --> 00:11:16,455 THE ALPHA FOLD HAS NOT PREDICTED 287 00:11:16,455 --> 00:11:18,557 ACCURATELY. 288 00:11:18,557 --> 00:11:21,026 LOUIS K'S GROUP HAD A STRUCTURE 289 00:11:21,026 --> 00:11:23,795 OF PROINTERLEUKIN 18 WHICH DOES 290 00:11:23,795 --> 00:11:28,300 NOT MATCH WHAT ALPHA FOLD 291 00:11:28,300 --> 00:11:28,567 PREDICTED. 292 00:11:28,567 --> 00:11:30,402 BCC ALPHA A SPLICE OF HUMAN 293 00:11:30,402 --> 00:11:34,072 CANCER PROTEIN WAS RECENTLY 294 00:11:34,072 --> 00:11:34,306 SOLVED. 295 00:11:34,306 --> 00:11:38,677 ITS STRUCTURE WAS UNLIKE 296 00:11:38,677 --> 00:11:43,114 ANYTHING IN THE PBB AND A LAB 297 00:11:43,114 --> 00:11:46,651 PUBLISHED THE STRUCTURE OF THE 298 00:11:46,651 --> 00:11:49,120 MP20 THOUGH ALPHA FOLD 2 AND 3 299 00:11:49,120 --> 00:11:52,290 PREDICT THE FOLDS OF EACH UNIT 300 00:11:52,290 --> 00:11:54,693 CORRECTLY IT GETS THE 301 00:11:54,693 --> 00:12:00,065 ORIENTATION WRONG PREDICTING AN 302 00:12:00,065 --> 00:12:06,371 OCTOMER OPPOSED TO TWO TETROMERS 303 00:12:06,371 --> 00:12:10,575 AND AN ANCESTOR OF RIBOSOME 304 00:12:10,575 --> 00:12:12,511 BINDING PROTEINS WAS ALSO SOLVED 305 00:12:12,511 --> 00:12:17,148 BY CRYSTALLOGRAPHY AND PREDICTED 306 00:12:17,148 --> 00:12:18,517 INCORRECTLY BY ALPHA FOLD 2 AND 307 00:12:18,517 --> 00:12:18,750 3. 308 00:12:18,750 --> 00:12:20,018 WHY IS THIS HAPPENING? 309 00:12:20,018 --> 00:12:23,588 WHILE IT'S PREDICTING STRUCTURES 310 00:12:23,588 --> 00:12:26,925 LIKELY IN IT'S TRAINING SET 311 00:12:26,925 --> 00:12:28,093 RATHER THAN EXPERIMENTALLY 312 00:12:28,093 --> 00:12:28,493 CONSISTENT ONES. 313 00:12:28,493 --> 00:12:32,564 THERE WAS NO STRUCTURE LIKE BCC 314 00:12:32,564 --> 00:12:33,965 ALPHA IN THE PDB BEFORE IT WAS 315 00:12:33,965 --> 00:12:35,700 SOLVED BEFORE IT WAS TRAINED. 316 00:12:35,700 --> 00:12:36,568 WHAT DID ALPHA FOLD? 317 00:12:36,568 --> 00:12:42,040 IT TOOK THE SEQUENCE AND 318 00:12:42,040 --> 00:12:42,807 PREDICTED SOMETHING ALREADY IN 319 00:12:42,807 --> 00:12:44,109 THE TRAINING SET. 320 00:12:44,109 --> 00:12:52,517 SIMILAR STORY WITH PROINT 321 00:12:52,517 --> 00:12:53,518 INTERLEUKIN 18 RATHER THAN WHAT 322 00:12:53,518 --> 00:12:55,186 IS DETERMINED BY EXPERIMENT IN 323 00:12:55,186 --> 00:12:56,521 THE CASES. 324 00:12:56,521 --> 00:12:58,323 SO WHY IS IT FAILING TO PREDICT 325 00:12:58,323 --> 00:13:01,359 THE NEW FOLDS? 326 00:13:01,359 --> 00:13:04,629 WELL, WHAT RECENTLY PATRICK 327 00:13:04,629 --> 00:13:05,864 BRYANT AND FRANK DID MY DREAM 328 00:13:05,864 --> 00:13:08,633 EXPERIMENT WHERE THEY RETRAINED 329 00:13:08,633 --> 00:13:10,635 THE ALPHA FOLD ARCHITECTURE ON 330 00:13:10,635 --> 00:13:12,971 ONE CONFIRMATION OF PROTEINS BUT 331 00:13:12,971 --> 00:13:13,939 NOT THE OTHER AND THEY DIDN'T 332 00:13:13,939 --> 00:13:16,775 MEAN TO DO THIS BUT IT TURNS OUT 333 00:13:16,775 --> 00:13:18,009 THEY'RE TRAINING SET HAD THESE 334 00:13:18,009 --> 00:13:20,312 DOMINANT FOLDS YOU CAN SEE IN 335 00:13:20,312 --> 00:13:21,646 THE COLUMN HERE BUT THEY ONLY 336 00:13:21,646 --> 00:13:23,348 HAVE ONE OF THESE ALTERNATIVE 337 00:13:23,348 --> 00:13:24,616 FOLDS YOU CAN SEE HERE BUT NONE 338 00:13:24,616 --> 00:13:27,552 OF THESE OTHERS. 339 00:13:27,552 --> 00:13:31,189 AND WHEN YOU RUN C FOLDS, THEIR 340 00:13:31,189 --> 00:13:34,292 VERSION OF ALPHA FOLD 2 ON OUR 341 00:13:34,292 --> 00:13:36,661 FOLD SWITCHING PROTEINS, YOU CAN 342 00:13:36,661 --> 00:13:38,096 SEE EVERYTHING IN ITS TRAINING 343 00:13:38,096 --> 00:13:40,599 SET IS PREDICTED ACCURATELY AND 344 00:13:40,599 --> 00:13:43,268 WITH HIGH CONFIDENCE BUT IT 345 00:13:43,268 --> 00:13:45,770 COMPLETELY MISSES THESE 346 00:13:45,770 --> 00:13:46,438 ALTERNATIVE CONFIRMATIONS. 347 00:13:46,438 --> 00:13:48,340 AND WHAT'S REALLY INTERESTING IS 348 00:13:48,340 --> 00:13:54,913 WITH THE RFH PROTEIN FOR ALPHA 349 00:13:54,913 --> 00:14:01,319 FOLD 2 IT PREDICTS THE CONF 350 00:14:01,319 --> 00:14:02,053 CONFIRMA 351 00:14:02,053 --> 00:14:02,387 CONFIRMATION. 352 00:14:02,387 --> 00:14:04,356 IF YOU GIVE C FOLD THE SAME IT 353 00:14:04,356 --> 00:14:06,057 PREDICT THE BETA SEQ INFORMATION 354 00:14:06,057 --> 00:14:09,094 AND THE INFORMATION IN THE FULL 355 00:14:09,094 --> 00:14:11,162 MSA MATCHES BETA SHEET NOT ALPHA 356 00:14:11,162 --> 00:14:11,596 HELIX. 357 00:14:11,596 --> 00:14:15,734 WE THINK IS THIS CASE IT'S 358 00:14:15,734 --> 00:14:18,937 MEMORIZED THE SEQUENCE AND NOT 359 00:14:18,937 --> 00:14:21,673 USING CO-EVOLUTIONARY 360 00:14:21,673 --> 00:14:23,274 INFORMATION TO PREDICT THIS. 361 00:14:23,274 --> 00:14:25,577 WE CAN DO FANCY SEQUENCE 362 00:14:25,577 --> 00:14:26,578 SAMPLING AND GET BETTER 363 00:14:26,578 --> 00:14:28,046 PREDICTIONS OF SOME OF THE 364 00:14:28,046 --> 00:14:29,614 PROTEINS BUT NONE OF THEM ARE 365 00:14:29,614 --> 00:14:32,884 VERY ACCURATE. 366 00:14:32,884 --> 00:14:34,719 ALL ARE LOW CONFIDENCE AND ALPHA 367 00:14:34,719 --> 00:14:36,454 FOLD WHEN ANYTHING IS HIGH 368 00:14:36,454 --> 00:14:38,957 CONFIDENCE, ALPHA FOLD ALSO 369 00:14:38,957 --> 00:14:41,960 PREDICTS INCORRECT CONFIRMATIONS 370 00:14:41,960 --> 00:14:44,696 THAT DON'T MATCH EXPERIMENTALLY 371 00:14:44,696 --> 00:14:45,664 DETERMINED STRUCTURE WITH HIGHER 372 00:14:45,664 --> 00:14:48,600 CONFIDENCE IN THE ALTERNATIVE. 373 00:14:48,600 --> 00:14:50,168 THERE'S A LOT OF GROWTH TO BE 374 00:14:50,168 --> 00:14:51,069 DONE IN THE AREA BECAUSE IT 375 00:14:51,069 --> 00:14:54,039 DOESN'T SEEM LIKE ALPHA FOLD'S 376 00:14:54,039 --> 00:14:58,276 ARCHITECTURE AS IT IS, IS WELL 377 00:14:58,276 --> 00:14:59,077 SUITED TO PREDICT THE ALT 378 00:14:59,077 --> 00:15:04,082 ALTERNATIVE CONFIRMATIONS 379 00:15:04,082 --> 00:15:08,386 OUTSIDE ITS TRAINING SET. 380 00:15:08,386 --> 00:15:10,655 IT'S MEMORIZING SOME STRUCTURES 381 00:15:10,655 --> 00:15:14,392 AND MAYBE WE CAN GET IT ALPHA 382 00:15:14,392 --> 00:15:17,595 FOLD TO REVEAL IT'S MEMORIZED 383 00:15:17,595 --> 00:15:19,364 STRUCTURES THROUGH MINIMAL 384 00:15:19,364 --> 00:15:21,966 SEQUENCE INFORMATION. 385 00:15:21,966 --> 00:15:23,968 WE'RE BLINDFOLDING IT AND SAY 386 00:15:23,968 --> 00:15:26,905 WE'RE NOT GOING TO GIVE YOU 387 00:15:26,905 --> 00:15:28,540 ENOUGH INFORMATION WITH ROBUST 388 00:15:28,540 --> 00:15:31,309 INFORMATION BUT JOG YOUR MEMORY 389 00:15:31,309 --> 00:15:32,510 AND REMEMBER YOUR TRAINING. 390 00:15:32,510 --> 00:15:34,646 WE HAVE OUR INPUT SEQUENCE AND 391 00:15:34,646 --> 00:15:37,248 FULL LENGTH MSA AND WE CAN 392 00:15:37,248 --> 00:15:41,386 GENERATE A RANDOM MSA AND SUB 393 00:15:41,386 --> 00:15:46,891 SAMPLE AT VERY SHALLOW DEPTHS. 394 00:15:46,891 --> 00:15:49,160 THREE AND SIX SEQUENCES. 395 00:15:49,160 --> 00:15:49,894 NOT A LOT. 396 00:15:49,894 --> 00:15:54,899 WE CAN CHOOSE USING AN UNBIASSED 397 00:15:54,899 --> 00:15:57,602 METHOD OUR OPTIMAL SAMPLING 398 00:15:57,602 --> 00:15:59,671 DEPTH AND CAPTURE THE 399 00:15:59,671 --> 00:16:00,004 CONFIRMATIONS. 400 00:16:00,004 --> 00:16:01,206 WE CAN CONTRAST THIS WITH A 401 00:16:01,206 --> 00:16:03,975 METHOD THAT WAS RECENTLY 402 00:16:03,975 --> 00:16:04,809 PUBLISHED IN NATURE THAT CLAIMED 403 00:16:04,809 --> 00:16:10,915 ALPHA FOLD PREDICTIONS WERE 404 00:16:10,915 --> 00:16:11,616 GENERATIVE AND THE GROUP CLAIMED 405 00:16:11,616 --> 00:16:14,586 IF YOU PUT IN SHALLOW SEQUENCE 406 00:16:14,586 --> 00:16:15,954 ALIGNMENTS FROM DIFFERENT 407 00:16:15,954 --> 00:16:18,890 SEQUENCE BASE, ALPHA FOLD WOULD 408 00:16:18,890 --> 00:16:20,391 PREDICT ALTERNATIVE INFORMATION 409 00:16:20,391 --> 00:16:23,228 AND THE CO-EVOLUTIONARY 410 00:16:23,228 --> 00:16:24,395 INFORMATION WAS SUFFICIENT TO 411 00:16:24,395 --> 00:16:27,966 PREDICT THESE ALTERNATIVE FOLDS. 412 00:16:27,966 --> 00:16:30,602 SO, IF THIS CLAIM IS CORRECT AND 413 00:16:30,602 --> 00:16:33,371 ALPHA FOLD IS INDEED GENERATIVE 414 00:16:33,371 --> 00:16:36,841 WE WOULD EXPECT THEIR METHOD FOR 415 00:16:36,841 --> 00:16:41,346 GETTING THIS CO-EVOLUTIONARY 416 00:16:41,346 --> 00:16:43,448 INFORMATION WOULD OUTPERFORM OUR 417 00:16:43,448 --> 00:16:45,316 METHOD FOR RANDOM SEQUENCE 418 00:16:45,316 --> 00:16:47,318 SAMPLING WHERE THEY'RE MAKING NO 419 00:16:47,318 --> 00:16:49,120 PRETENSES OF CO-EVOLUTIONARY 420 00:16:49,120 --> 00:16:49,387 INFERENCE. 421 00:16:49,387 --> 00:16:52,123 IN FACT THE OPPOSITE IS TRUE. 422 00:16:52,123 --> 00:16:54,893 CF RANDOM OUT FORMS THE CLUSTER 423 00:16:54,893 --> 00:16:56,561 IN ALL THE EXPERIMENTALLY 424 00:16:56,561 --> 00:16:57,328 DETERMINED TARGETS. 425 00:16:57,328 --> 00:16:59,697 IT'S ALWAYS MORE ACCURATE AND 426 00:16:59,697 --> 00:17:00,331 HIGHER CONFIDENCE AND IN 427 00:17:00,331 --> 00:17:03,301 PERFORMANCE METRICS IT'S MORE 428 00:17:03,301 --> 00:17:05,270 ACCURATE BY MATTHEW'S 429 00:17:05,270 --> 00:17:10,408 CORRELATION COEFFICIENT, 430 00:17:10,408 --> 00:17:11,042 COEFFICIENT AND MORE EFFICIENT 431 00:17:11,042 --> 00:17:13,545 IN THE SAMPLING OF CONFIRMATIONS 432 00:17:13,545 --> 00:17:15,547 AND THE NUMBER OF PREDICTIONS IS 433 00:17:15,547 --> 00:17:17,282 LOWER THAN WHAT THE CLUSTER 434 00:17:17,282 --> 00:17:18,383 NEEDS. 435 00:17:18,383 --> 00:17:21,286 FURTHERMORE, WE CAN SHOW THAT 436 00:17:21,286 --> 00:17:22,487 ACTUALLY THESE CO-EVOLUTION IS 437 00:17:22,487 --> 00:17:24,856 NOT DRIVING THE PREDICTIONS OF 438 00:17:24,856 --> 00:17:26,624 ALTERNATIVE CONFIRMATIONS 439 00:17:26,624 --> 00:17:28,693 BECAUSE WE CAN TAKE EVERY 440 00:17:28,693 --> 00:17:30,862 MULTIPLE SEQUENCE ALIGNMENT USED 441 00:17:30,862 --> 00:17:33,498 BY ALPHA FOLD AND DO 442 00:17:33,498 --> 00:17:34,666 CO-EVOLUTIONARY ANALYSIS ON THEM 443 00:17:34,666 --> 00:17:36,501 AND WHAT YOU CAN SEE IS IN THESE 444 00:17:36,501 --> 00:17:38,703 RED BOX ARE THE CASES WHERE 445 00:17:38,703 --> 00:17:41,306 ALPHA FOLD IS PREDICTING 446 00:17:41,306 --> 00:17:42,640 ALTERNATIVE CONFIRMATIONS. 447 00:17:42,640 --> 00:17:44,475 THERE'S LESS INFORMATION ABOUT 448 00:17:44,475 --> 00:17:46,945 THE ALTERNATIVE CONFIRMATION IN 449 00:17:46,945 --> 00:17:50,315 ALL THESE CASES COMPARED TO THE 450 00:17:50,315 --> 00:17:50,949 DOMINANT CONFIRMATION AND YET 451 00:17:50,949 --> 00:17:53,685 ALPHA FOLD IS PREDICTING THE 452 00:17:53,685 --> 00:17:54,819 ALTERNATIVE CONFIRMATION AND 453 00:17:54,819 --> 00:17:55,687 THERE'S ESSENTIALLY NO 454 00:17:55,687 --> 00:17:56,821 INFORMATION ABOUT THE 455 00:17:56,821 --> 00:17:58,890 ALTERNATIVE CONFIRMATION IN ANY 456 00:17:58,890 --> 00:18:01,292 OF THE SEQUENCE ALIGNMENTS WE'RE 457 00:18:01,292 --> 00:18:01,926 TESTING. 458 00:18:01,926 --> 00:18:02,894 AGAIN, CO-EVOLUTION IS NOT 459 00:18:02,894 --> 00:18:04,329 DRIVING THESE PREDICTIONS AND WE 460 00:18:04,329 --> 00:18:06,898 THINK A MORE LIKELY EXPLANATION 461 00:18:06,898 --> 00:18:07,765 IS MEMORIZATION. 462 00:18:07,765 --> 00:18:09,367 SO NOW I'VE SHOWN YOU THREE 463 00:18:09,367 --> 00:18:11,736 CASES BUT DOES THIS METHOD 464 00:18:11,736 --> 00:18:12,170 GENERALIZE? 465 00:18:12,170 --> 00:18:14,472 IT TURNS OUT YES IT DOES. 466 00:18:14,472 --> 00:18:17,876 IF WE RUN IT ON ALL FOLD 467 00:18:17,876 --> 00:18:19,711 SWITCHING PROTEINS WE HAVE STILL 468 00:18:19,711 --> 00:18:22,413 A MODEST 34% SUCCESS RATE BUT 469 00:18:22,413 --> 00:18:25,450 ONLY WITH 20,000 PREDICTIONS. 470 00:18:25,450 --> 00:18:28,720 YOU CAN COMPARE THIS WITH ALL 471 00:18:28,720 --> 00:18:30,588 OTHER METHODS COMBINED IT GIVES 472 00:18:30,588 --> 00:18:35,693 LIKE A 26%, 27% SUCCESS RATE BUT 473 00:18:35,693 --> 00:18:37,428 IT'S SAMPLING 10 TIMES MORE. 474 00:18:37,428 --> 00:18:40,698 I THINK CF RANDOM IS A BETTER 475 00:18:40,698 --> 00:18:43,167 EXPLANATION THAN THE OTHER 476 00:18:43,167 --> 00:18:46,504 METHODS PROPOSED PREVIOUSLY. 477 00:18:46,504 --> 00:18:52,677 CF RANDOM RUNS NOT ONLY ON FOLD 478 00:18:52,677 --> 00:18:53,778 SWITCHING PROTEINS AND OTHER 479 00:18:53,778 --> 00:18:57,682 KINDS AND BINDING PROTEIN AND WE 480 00:18:57,682 --> 00:19:00,351 BENCHMARKED IT AGAINST SPEECH AF 481 00:19:00,351 --> 00:19:02,921 ANOTHER METHOD TO PREDICT 482 00:19:02,921 --> 00:19:06,224 ALTERNATIVE CONFIRMATIONS BY TM 483 00:19:06,224 --> 00:19:08,326 SCORE THEIR PERFORMANCES ARE 484 00:19:08,326 --> 00:19:10,895 COMPARABLE BUT CF RANDOM TAKES 485 00:19:10,895 --> 00:19:12,697 WAY LESS SAMPLING. 486 00:19:12,697 --> 00:19:13,932 THIS IS THE TOTAL NUMBER OF 487 00:19:13,932 --> 00:19:17,402 STRUCTURES USED TO GENERATE THE 488 00:19:17,402 --> 00:19:18,770 ENSEMBLES WHEREAS SPEECH AF IS 489 00:19:18,770 --> 00:19:19,737 THE NUMBER OF GOOD STRUCTURES 490 00:19:19,737 --> 00:19:22,473 THEY GENERATED NOT THE TOTAL 491 00:19:22,473 --> 00:19:22,774 NUMBER. 492 00:19:22,774 --> 00:19:26,711 SO WE'RE SAMPLING THREE TIMES 493 00:19:26,711 --> 00:19:28,179 LESS, THREE TIMES, FIVE TIMES 494 00:19:28,179 --> 00:19:30,848 MAYBE SIX OR SEVEN TIMES LESS 495 00:19:30,848 --> 00:19:33,384 THAN SPEECH AF AND PERFORMING 496 00:19:33,384 --> 00:19:33,651 THE SAME. 497 00:19:33,651 --> 00:19:34,886 FURTHERMORE, WE CAN COMPARE THIS 498 00:19:34,886 --> 00:19:37,121 TO ANOTHER METHOD CALLED AF 499 00:19:37,121 --> 00:19:39,624 SAMPLE 2 TESTED ON ANOTHER DATA 500 00:19:39,624 --> 00:19:40,625 SET CALLED THE OC23. 501 00:19:40,625 --> 00:19:42,894 YOU CAN SEE IT HAS A HIGH 502 00:19:42,894 --> 00:19:46,030 SUCCESS RATE ONCE AGAIN AND YET 503 00:19:46,030 --> 00:19:49,667 IT SAMPLES WAY WAY LESS AS 504 00:19:49,667 --> 00:19:51,970 OPPOSED TO 1,000 STRUCTURES AND 505 00:19:51,970 --> 00:19:53,204 HITTING THE RIGHT ANSWER PRETTY 506 00:19:53,204 --> 00:19:54,872 MUCH EVERY TIME. 507 00:19:54,872 --> 00:20:01,346 SO WE THINK THIS DATA ALL 508 00:20:01,346 --> 00:20:03,681 TOGETHER IS A GOOD EXPLANATION 509 00:20:03,681 --> 00:20:05,450 NOTHING TO DO WITH 510 00:20:05,450 --> 00:20:06,451 CO-EVOLUTIONARY INFERENCE BUT 511 00:20:06,451 --> 00:20:08,219 RATHER HAVING TO DO WITH MEM 512 00:20:08,219 --> 00:20:10,121 RISING FROM THE TRAINING SET OR 513 00:20:10,121 --> 00:20:13,091 IN SOME CASES GIVING A SMALL 514 00:20:13,091 --> 00:20:14,892 AMOUNT OF CO-EVOLUTIONARY 515 00:20:14,892 --> 00:20:18,896 INFORMATION THAT ALLOWS MORE IN 516 00:20:18,896 --> 00:20:22,000 A GIVEN CONFIRMATION SO, IT'S 517 00:20:22,000 --> 00:20:23,601 NOT SWITCHING FOLDS BUT FEWER 518 00:20:23,601 --> 00:20:26,904 CONSTRAINTS ALLOW MORE MOVEMENT. 519 00:20:26,904 --> 00:20:27,772 HOPEFULLY AT THIS POINT YOU'RE 520 00:20:27,772 --> 00:20:30,241 CONVINCED ALPHA FOLD IS 521 00:20:30,241 --> 00:20:30,541 ASSOCIATIVE. 522 00:20:30,541 --> 00:20:32,443 THEN THE QUESTION BECOMES IS HOW 523 00:20:32,443 --> 00:20:35,380 IS IT MAKING THESE ASSOCIATIONS? 524 00:20:35,380 --> 00:20:37,915 WE'RE ABLE TO ANSWER BY LOOKING 525 00:20:37,915 --> 00:20:40,385 AT BEAUTIFUL WORK BY BRIAN 526 00:20:40,385 --> 00:20:45,490 WILTMAN'S GROUP LOOKING AT XCL1 527 00:20:45,490 --> 00:20:47,959 WIN A MONOMERIC CONFIRMATION AND 528 00:20:47,959 --> 00:20:50,728 A DIFFERENT HYDROGEN BOND 529 00:20:50,728 --> 00:20:52,597 NETWORK AND HYDROPHOBIC PACKING 530 00:20:52,597 --> 00:20:54,832 YOU CAN SEE HERE. 531 00:20:54,832 --> 00:20:56,834 IF YOU PUT SEQUENCE INTO ALPHA 532 00:20:56,834 --> 00:20:58,770 FOLD 3 AND ASK IT TO PREDICT A 533 00:20:58,770 --> 00:21:01,873 DIMER IT GIVES THIS CRAZY THING 534 00:21:01,873 --> 00:21:02,573 NOTHING LIKE THE EXPERIMENT. 535 00:21:02,573 --> 00:21:05,643 HOWEVER, IF YOU PUT IN THAT SAME 536 00:21:05,643 --> 00:21:07,845 SEQUENCE TO ALPHA FOLD WITH A 537 00:21:07,845 --> 00:21:09,647 SINGLE SEQUENCE AND ZERO 538 00:21:09,647 --> 00:21:12,517 RECYCLES WHICH IS THE BEST PROXY 539 00:21:12,517 --> 00:21:14,685 FOR MEMORIZE BECAUSE IT'S NOT 540 00:21:14,685 --> 00:21:16,020 USING CO-EVOLUTION AND ONLY 541 00:21:16,020 --> 00:21:16,754 GOING THROUGH THE NETWORK ONCE 542 00:21:16,754 --> 00:21:19,390 YOU CAN SEE IT DOES WELL GETTING 543 00:21:19,390 --> 00:21:22,393 THE ALTERNATIVE CONFIRMATION. 544 00:21:22,393 --> 00:21:23,494 WE THINK ALPHA FOLD MEMORIZED 545 00:21:23,494 --> 00:21:25,329 THE STRUCTURE DURING TRAINING. 546 00:21:25,329 --> 00:21:26,898 WHAT'S COOL ABOUT BEING ABLE TO 547 00:21:26,898 --> 00:21:28,533 USE THIS SINGLE SEQUENCE MODEL 548 00:21:28,533 --> 00:21:30,868 IS THEN WE CAN START MODELLING 549 00:21:30,868 --> 00:21:33,871 MUTATIONS AND IT TURNS OUT THAT 550 00:21:33,871 --> 00:21:35,907 BRIAN'S GROUP MADE A BUNCH OF 551 00:21:35,907 --> 00:21:36,107 THEM. 552 00:21:36,107 --> 00:21:37,875 HERE THEY ARE. 553 00:21:37,875 --> 00:21:41,612 NOT ONLY A BUNCH OF MUTANTS BUT 554 00:21:41,612 --> 00:21:43,848 TESTED WHICH ONES FOLD AND WE 555 00:21:43,848 --> 00:21:45,049 COULD RUN ALPHA FOLD ON EVERY 556 00:21:45,049 --> 00:21:46,684 ONE OF THESE MUTANTS AND SEE HOW 557 00:21:46,684 --> 00:21:48,286 IT DOES. 558 00:21:48,286 --> 00:21:50,221 SO HERE ARE THE RESULTS. 559 00:21:50,221 --> 00:21:55,059 SO IF IT'S ONLY PREDICTING THE 560 00:21:55,059 --> 00:21:57,095 MONOMERIC FOLD AND NOT FOLD 561 00:21:57,095 --> 00:21:58,629 SWITCHING WE'D EXPECT TO SEE 562 00:21:58,629 --> 00:22:00,198 ONLY THESE RED PLOTS ABOVE THE 563 00:22:00,198 --> 00:22:01,065 DOTTED LINE. 564 00:22:01,065 --> 00:22:03,067 SO YOU CAN SEE IN THE FOUR 565 00:22:03,067 --> 00:22:05,503 CASES, THAT'S EXACTLY WHAT WE'RE 566 00:22:05,503 --> 00:22:06,704 SEEING AND THESE ARE CORRECT 567 00:22:06,704 --> 00:22:09,207 PREDICTIONS. 568 00:22:09,207 --> 00:22:10,708 UNFORTUNATELY ALL THE RED BOXES 569 00:22:10,708 --> 00:22:13,478 ARE CASES WHERE IT'S 570 00:22:13,478 --> 00:22:14,412 MISPREDICTING THESE PROTEINS 571 00:22:14,412 --> 00:22:16,614 SWITCH FOLDS WHEN ACTUALLY THEY 572 00:22:16,614 --> 00:22:17,248 DON'T. 573 00:22:17,248 --> 00:22:18,282 SO WHAT'S GOING ON? 574 00:22:18,282 --> 00:22:21,552 WHY IS IT MAKING THE MISS 575 00:22:21,552 --> 00:22:21,853 PREDICTIONS? 576 00:22:21,853 --> 00:22:24,755 TWO TALENTED STUDENTS IN MY LAB 577 00:22:24,755 --> 00:22:28,659 WENT AND GOT ALL THE PDB 578 00:22:28,659 --> 00:22:31,395 STRUCTURES FOR BOTH FOLDS IN 579 00:22:31,395 --> 00:22:34,298 ALPHA FOLD'S TRAINING SET AND 580 00:22:34,298 --> 00:22:34,832 COMPARED THEIR SEQUENCES. 581 00:22:34,832 --> 00:22:37,068 AND THEY FOUND WHEN YOU MAKE 582 00:22:37,068 --> 00:22:39,036 THESE COMPARISONS THERE'S THREE 583 00:22:39,036 --> 00:22:41,272 POSITIONS THAT DIFFER FROM ONE 584 00:22:41,272 --> 00:22:45,042 ANOTHER, POSITION 11, 38 AND 33. 585 00:22:45,042 --> 00:22:48,012 -- 43, SORRY. 586 00:22:48,012 --> 00:22:50,781 AND WHEN YOU HAVE THE MUTATIONS 587 00:22:50,781 --> 00:22:52,183 UNIQUE TO THE ALTERNATE FOLD YOU 588 00:22:52,183 --> 00:22:54,385 GET A FOLD SWITCH PREDICTION AND 589 00:22:54,385 --> 00:22:58,089 IF NOT YOU GET A CHEMOKINE 590 00:22:58,089 --> 00:22:58,389 PREDICTION. 591 00:22:58,389 --> 00:23:00,091 SO THEY INVESTIGATED EVERY 592 00:23:00,091 --> 00:23:01,359 INDIVIDUAL MUTATION AND SAID 593 00:23:01,359 --> 00:23:03,427 OKAY, POSITION 11 IS IMPORTANT 594 00:23:03,427 --> 00:23:06,464 FOR PREDICTING THE FOLD. 595 00:23:06,464 --> 00:23:07,498 BUT OUT TURNS OUT REGARDLESS OF 596 00:23:07,498 --> 00:23:10,034 WHICH AMINO ACID THEY HAVE IN 597 00:23:10,034 --> 00:23:12,203 POSITION 11, ALPHA FOLD IS 598 00:23:12,203 --> 00:23:15,006 PREDICTING THE SAME STRUCTURES. 599 00:23:15,006 --> 00:23:16,707 FURTHERMORE, THIS IS THE SAME 600 00:23:16,707 --> 00:23:20,144 THING TRUE FOR I38 FOR POSITION 601 00:23:20,144 --> 00:23:22,713 38 WHETHER A T OR I 602 00:23:22,713 --> 00:23:24,348 CORRESPONDING TO ONE FOLD OR 603 00:23:24,348 --> 00:23:25,283 ANOTHER PREDICTING THE SAME 604 00:23:25,283 --> 00:23:26,117 THING IN BOTH CASES. 605 00:23:26,117 --> 00:23:29,954 HOWEVER, IF WE LOOK AT POSITION 606 00:23:29,954 --> 00:23:32,356 43, WE GET A VERY DIFFERENT 607 00:23:32,356 --> 00:23:32,657 RESULT. 608 00:23:32,657 --> 00:23:35,459 SO JUST SWITCHING THIS ONE 609 00:23:35,459 --> 00:23:38,129 MUTATION FROM AN L TO AN R 610 00:23:38,129 --> 00:23:40,698 COMPLETELY ALMOST COMPLETELY 611 00:23:40,698 --> 00:23:44,502 CHANGES THE FOLD TO THE 612 00:23:44,502 --> 00:23:46,137 CHEMOKINE FOLD WHEREAS IF WE DO 613 00:23:46,137 --> 00:23:53,344 AN L 43L IT CHANGES TO THE 614 00:23:53,344 --> 00:23:54,912 DIMERIC FOLD. 615 00:23:54,912 --> 00:23:57,148 WHAT'S INTERESTING ABOUT THIS IS 616 00:23:57,148 --> 00:23:58,316 ALPHA FOLD THIS MUTATION HAS 617 00:23:58,316 --> 00:24:01,652 NOTHING TO DO WITH STRUCTURE. 618 00:24:01,652 --> 00:24:02,853 IT'S FUNCTION. 619 00:24:02,853 --> 00:24:09,293 THIS IS A CONSERVED SALT BINDING 620 00:24:09,293 --> 00:24:12,563 SITE IN THE CHEMOKINE FOLD BUT 621 00:24:12,563 --> 00:24:13,397 BECAUSE ALPHA FOLD IS 622 00:24:13,397 --> 00:24:16,100 ASSOCIATING THE HIGH FUNCTIONAL 623 00:24:16,100 --> 00:24:19,203 CONSERVATION AS A STRUCTURAL 624 00:24:19,203 --> 00:24:20,171 DETERMINATE. 625 00:24:20,171 --> 00:24:22,907 THIS IS NOT BASED ON BIO PHYSICS 626 00:24:22,907 --> 00:24:25,676 BUT RATHER AMINO ACID PATTERNS 627 00:24:25,676 --> 00:24:30,514 ALPHA FOLD HAS SEEN DURING 628 00:24:30,514 --> 00:24:30,781 TRAINING. 629 00:24:30,781 --> 00:24:36,654 SO GOING TO PREDICTING SWITCHING 630 00:24:36,654 --> 00:24:38,422 FROM SEQUENCE IT STRUGGLED 631 00:24:38,422 --> 00:24:41,359 WITWITH 632 00:24:41,359 --> 00:24:44,362 ALTERNATIVE CONFIRMATIONS AND 633 00:24:44,362 --> 00:24:46,163 THEY'RE ASSOCIATIVE MEANING 634 00:24:46,163 --> 00:24:48,499 PREDICTS OF ALTERNATIVE 635 00:24:48,499 --> 00:24:51,936 CONFIRMATIONS ARE LIMITED BY ITS 636 00:24:51,936 --> 00:24:57,308 TRAINING SET AND SEQUENCE 637 00:24:57,308 --> 00:24:57,875 ALIGNMENTS HAVE ALTERNATIVE 638 00:24:57,875 --> 00:24:58,242 CONFIRMATIONS. 639 00:24:58,242 --> 00:24:59,944 ALPHA FOLD'S MISSING IT RIGHT 640 00:24:59,944 --> 00:25:01,479 NOW AND HOPEFULLY THEY CAN 641 00:25:01,479 --> 00:25:02,947 DEVELOP A BETTER METHOD TO PICK 642 00:25:02,947 --> 00:25:03,514 IT UP. 643 00:25:03,514 --> 00:25:05,182 SO WITH THAT I WOULD LIKE TO 644 00:25:05,182 --> 00:25:07,852 THANK THE MEMBERS OF MY GROUP 645 00:25:07,852 --> 00:25:09,520 AND VARIOUS CONTRIBUTORS TO THE 646 00:25:09,520 --> 00:25:16,727 WORK AND I'D ESPECIALLY LIKE TO 647 00:25:16,727 --> 00:25:21,766 THANK DEVLINA AND JOE AND LESLIE 648 00:25:21,766 --> 00:25:23,567 WHO DID THE BULK OF THE WORK AND 649 00:25:23,567 --> 00:25:25,336 HAPPY TO TAKE QUESTIONS. 650 00:25:25,336 --> 00:25:26,904 >> THIS WAS SO INTERESTING AND 651 00:25:26,904 --> 00:25:28,706 EXCITING. 652 00:25:28,706 --> 00:25:34,345 YOU HAVE GIVEN US SO MUCH FEED 653 00:25:34,345 --> 00:25:34,912 FOR THOUGHT. 654 00:25:34,912 --> 00:25:36,947 WHY IS ALPHA FOLD NOT FORMING 655 00:25:36,947 --> 00:25:37,148 WELL? 656 00:25:37,148 --> 00:25:39,550 IS THERE NOT ENOUGH DATA FOR THE 657 00:25:39,550 --> 00:25:40,217 TRAINING? 658 00:25:40,217 --> 00:25:43,054 WHY DO YOU THINK IT'S FAILING? 659 00:25:43,054 --> 00:25:46,557 >> AS I SAID, IT'S A DATA POOR 660 00:25:46,557 --> 00:25:46,857 AREA. 661 00:25:46,857 --> 00:25:52,430 FOLD SWITCHING IS DATA POOR. 662 00:25:52,430 --> 00:25:53,764 WHAT MAKES ALPHA FOLD IS THE 663 00:25:53,764 --> 00:25:55,599 LARGE TRAINING SET IN THE PDB. 664 00:25:55,599 --> 00:25:57,968 BECAUSE THERE'S SO FEW EXAMPLES 665 00:25:57,968 --> 00:26:00,071 OF FOLD SWITCHING, ALPHA FOLD 666 00:26:00,071 --> 00:26:01,505 DOES NOT HAVE ENOUGH DATA TO 667 00:26:01,505 --> 00:26:01,806 GENERALIZE. 668 00:26:01,806 --> 00:26:03,974 I ALSO THINK THERE'S ISSUES WITH 669 00:26:03,974 --> 00:26:10,081 HOW IT WAS TRAINED. 670 00:26:10,081 --> 00:26:13,884 IT KIND OF IMPLICITLY ASSUMES 671 00:26:13,884 --> 00:26:16,787 THE WAY IT'S TRAINED AND USED 672 00:26:16,787 --> 00:26:19,056 THAT A SINGLE AMINO ACID 673 00:26:19,056 --> 00:26:20,491 SEQUENCE ASSUMES ONE STRUCTURE. 674 00:26:20,491 --> 00:26:22,960 THAT MAKES IT HARD FOR IT TO BE 675 00:26:22,960 --> 00:26:26,063 ABLE TO PREDICT TWO STRUCTURES 676 00:26:26,063 --> 00:26:33,671 IF AN IMPLICIT ASSUMPTION IS 677 00:26:33,671 --> 00:26:34,672 GOING TO ONE FOLD. 678 00:26:34,672 --> 00:26:38,275 ALSO THE WAY IT'S TRAINED IT'S 679 00:26:38,275 --> 00:26:39,910 TRAINED ON DEEP MULTIPLE 680 00:26:39,910 --> 00:26:40,745 SEQUENCE ALIGNMENTS. 681 00:26:40,745 --> 00:26:44,281 ONE THING TO IMPROVE IT IS NOT 682 00:26:44,281 --> 00:26:48,919 ONLY TRAIN IT ON THE DOMINANT 683 00:26:48,919 --> 00:26:49,887 FOLD AND SEQUENCE AND SHALLOW 684 00:26:49,887 --> 00:26:53,290 SUB FAMILY ALIGNMENTS MY LAB 685 00:26:53,290 --> 00:26:54,592 HAVE SHOWN CONTAIN INFORMATION 686 00:26:54,592 --> 00:26:55,626 ABOUT THE ALTERNATIVE FOLD AND 687 00:26:55,626 --> 00:26:58,162 USE THAT TO ASSOCIATE WITH THE 688 00:26:58,162 --> 00:26:58,562 ALTERNATIVE FOLD. 689 00:26:58,562 --> 00:27:01,132 SO THAT MAY BE A WAY TO IMPROVE 690 00:27:01,132 --> 00:27:02,600 ITS ACCURACY. 691 00:27:02,600 --> 00:27:04,468 >> SOUNDS LIKE AT SOME FINE 692 00:27:04,468 --> 00:27:05,970 TUNING IS NEEDED ON TOP OF THE 693 00:27:05,970 --> 00:27:06,337 MODEL LEARNING. 694 00:27:06,337 --> 00:27:08,439 >> YES. 695 00:27:08,439 --> 00:27:16,247 >> WE HAVE A QUESTION FROM THE 696 00:27:16,247 --> 00:27:16,747 AUDIENCE. 697 00:27:16,747 --> 00:27:19,450 ARE CHAPERONE PROTEINS INVOLVED 698 00:27:19,450 --> 00:27:21,218 IN THE WHOLE SWITCHING PROTEINS? 699 00:27:21,218 --> 00:27:23,087 >> THAT'S A GREAT QUESTION. 700 00:27:23,087 --> 00:27:28,826 AS FAR AS I KNOW THE ANSWER IS 701 00:27:28,826 --> 00:27:29,326 NO. 702 00:27:29,326 --> 00:27:30,628 CERTAINLY THE EXAMPLES I 703 00:27:30,628 --> 00:27:33,764 FOCUSSED ON TODAY DO NOT REQUIRE 704 00:27:33,764 --> 00:27:34,064 SHAM ROANS. 705 00:27:34,064 --> 00:27:37,635 -- CHAPERONES. 706 00:27:37,635 --> 00:27:38,335 >> THANK YOU VERY MUCH FOR 707 00:27:38,335 --> 00:27:40,704 REPLYING TOE THAT QUESTION. 708 00:27:40,704 --> 00:27:41,672 -- TO THAT QUESTION. 709 00:27:41,672 --> 00:27:42,940 I DON'T SEE ANYMORE QUESTIONS 710 00:27:42,940 --> 00:27:43,808 FROM THE AUDIENCE. 711 00:27:43,808 --> 00:27:48,879 THANK YOU, LAUREN. 712 00:27:48,879 --> 00:27:52,016 THIS WAS INTERESTING. 713 00:27:52,016 --> 00:27:53,951 DOES ANYBODY ELSE HAVE QUESTIONS 714 00:27:53,951 --> 00:27:55,820 BEFORE WE SWITCH TO 715 00:27:55,820 --> 00:27:56,187 DR. OVCHARENKO? 716 00:27:56,187 --> 00:27:57,988 >> I HAVE TO PICK UP MY KIDS 717 00:27:57,988 --> 00:28:00,724 SOON SO THIS IS YOUR CHANCE. 718 00:28:00,724 --> 00:28:02,827 >> IF SOMEBODY HAS TO ASK MORE 719 00:28:02,827 --> 00:28:04,428 QUESTIONS, KEEP SENDING THEM AND 720 00:28:04,428 --> 00:28:06,197 HOPEFULLY I CAN FORWARD THEM TO 721 00:28:06,197 --> 00:28:07,231 DR. PORTER. 722 00:28:07,231 --> 00:28:10,434 I KNOW IT TAKES TIME. 723 00:28:10,434 --> 00:28:11,836 THE TALK IS SO RICH IN 724 00:28:11,836 --> 00:28:12,903 INFORMATION IT TAKES TIME TO 725 00:28:12,903 --> 00:28:14,405 COME UP WITH QUESTIONS SO IF WE 726 00:28:14,405 --> 00:28:15,873 GET MORE QUESTIONS I'LL FORWARD 727 00:28:15,873 --> 00:28:19,076 THEM TO LAUREN. 728 00:28:19,076 --> 00:28:20,711 THANK YOU SO MUCH FOR YOUR TIME. 729 00:28:20,711 --> 00:28:22,980 THIS IS EXTREMELY INTERESTING. 730 00:28:22,980 --> 00:28:24,448 >> THANK YOU. 731 00:28:24,448 --> 00:28:24,982 >> ALL RIGHT. 732 00:28:24,982 --> 00:28:29,520 SO NOW WE'RE SWITCHING GEARS AND 733 00:28:29,520 --> 00:28:32,490 READY FOR DR. OVCHARENKO. 734 00:28:32,490 --> 00:28:42,967 >> THANK YOU SO MUCH, LANA. 735 00:28:46,971 --> 00:28:48,873 CAN I SEE MY SCREEN? 736 00:28:48,873 --> 00:28:50,941 >> WE CAN. 737 00:28:50,941 --> 00:28:51,842 >> WONDERFUL. 738 00:28:51,842 --> 00:28:55,779 SO AFTER THIS FASCINATING TALK 739 00:28:55,779 --> 00:28:58,549 BY LAUREN AND PROTEIN FOLDS I 740 00:28:58,549 --> 00:29:02,119 WOULD LIKE TO SHIFT THE DEARS 741 00:29:02,119 --> 00:29:04,755 COMPLETELY AND MOVE AWAY FROM 742 00:29:04,755 --> 00:29:09,760 PROTEINS AND DIVE INTO THE DNA 743 00:29:09,760 --> 00:29:11,362 OF THE HUMAN GENOME AND TELL YOU 744 00:29:11,362 --> 00:29:13,063 ABOUT THE WORK OF MY LAB IN 745 00:29:13,063 --> 00:29:18,936 TRYING TO FIGURE OUT HOW WE CAN 746 00:29:18,936 --> 00:29:22,039 IDENTIFY REGULATORY VARIANTS 747 00:29:22,039 --> 00:29:23,507 USING DEEP LEARNING. 748 00:29:23,507 --> 00:29:28,746 OVER 20 YEARS AGO THE HUMAN 749 00:29:28,746 --> 00:29:34,785 GENOME HAS BEEN SEQUENCED WITH 750 00:29:34,785 --> 00:29:36,954 30,000 GENES AND WE CAN LOOK 751 00:29:36,954 --> 00:29:39,557 SPECIFICALLY AT THE DNA THAT'S 752 00:29:39,557 --> 00:29:41,592 NOT GENEIC. 753 00:29:41,592 --> 00:29:44,929 AND THAT COMPRISES 98% OF THE 754 00:29:44,929 --> 00:29:46,597 SEQUENCE OF THE HUMAN GENOME. 755 00:29:46,597 --> 00:29:49,400 SINCE THE SEQUENCING OF THE 756 00:29:49,400 --> 00:29:52,202 HUMAN GENOME THERE HAVE BEEN A 757 00:29:52,202 --> 00:29:54,071 LOT OF SPECULATIONS THAT THESE 758 00:29:54,071 --> 00:29:56,574 ARE NOT PART OF THE GENOME COULD 759 00:29:56,574 --> 00:29:57,608 BE JUNK. 760 00:29:57,608 --> 00:30:02,913 YOU HAVE DNA, PSEUDO GENES, 761 00:30:02,913 --> 00:30:04,748 MICROSATELLITES, ETCETERA. 762 00:30:04,748 --> 00:30:08,152 AND 20 YEARS AGO MY GOOD FRIEND 763 00:30:08,152 --> 00:30:09,253 WHO TEACHES AT THE UNIVERSITY OF 764 00:30:09,253 --> 00:30:11,388 CHICAGO DECIDED TO DO A CRAZY 765 00:30:11,388 --> 00:30:11,689 EXPERIMENT. 766 00:30:11,689 --> 00:30:16,694 HE SAID WHY DON'T WE LOOK AT THE 767 00:30:16,694 --> 00:30:20,230 GENOME AND IDENTIFY THOSE MANY 768 00:30:20,230 --> 00:30:25,202 GENES NOT THE DNA AND CALL THEM 769 00:30:25,202 --> 00:30:28,405 GENE DESERTS AND TAKE THEM OUT 770 00:30:28,405 --> 00:30:29,340 COMPLETELY OF THE MOUSE GENOME 771 00:30:29,340 --> 00:30:33,277 AND DID THAT WITH TWO OF THE 772 00:30:33,277 --> 00:30:36,180 GENES AND CAME UP WITH AN 773 00:30:36,180 --> 00:30:40,751 ABSOLUTELY NORMAL VIABLE AND 774 00:30:40,751 --> 00:30:42,920 HEALTHY MICE SUGGESTING INDEED 775 00:30:42,920 --> 00:30:46,223 HUGE CHUNKS OF MOUSE WITH 776 00:30:46,223 --> 00:30:51,395 GENOMES ARE COMPLETELY 777 00:30:51,395 --> 00:31:00,304 DEFENDABLE AND 778 00:31:00,304 --> 00:31:01,205 [AUDIO DIGITIZING] 779 00:31:01,205 --> 00:31:03,841 AND LET'S LOOK AT THE DIFFERENT 780 00:31:03,841 --> 00:31:06,877 EXAMPLE AT LACTOSE INTOLERANCE. 781 00:31:06,877 --> 00:31:09,913 SO LACTOSE INTOLERANCE IS AN 782 00:31:09,913 --> 00:31:15,185 ACQUIRED TRAIT OF THE HUMAN 783 00:31:15,185 --> 00:31:17,254 POPULATION. 784 00:31:17,254 --> 00:31:21,392 LACTOSE TOLERANCE TRACES BACK 785 00:31:21,392 --> 00:31:25,596 INTO THE DOMESTICATION OF COWS 786 00:31:25,596 --> 00:31:26,897 AND MAINLY PRESENT IN THE 787 00:31:26,897 --> 00:31:29,166 EUROPEAN PART AND SOME PARTS OF 788 00:31:29,166 --> 00:31:33,837 AFRICA AND ASIA WHERE THE 789 00:31:33,837 --> 00:31:36,874 DOMESTICATION OF COWS TOOK 790 00:31:36,874 --> 00:31:37,307 PLACE. 791 00:31:37,307 --> 00:31:40,444 WE KNOW LACTOSE TOLERANCE AND 792 00:31:40,444 --> 00:31:42,246 INTOLERANCE, DEPENDING ON HOW 793 00:31:42,246 --> 00:31:44,381 YOU LOOK AT IT IS ASSOCIATED 794 00:31:44,381 --> 00:31:47,151 WITH A SINGLE NUCLEOTIDE 795 00:31:47,151 --> 00:31:55,225 MUTATION CLOSE TO THE LACTASE 796 00:31:55,225 --> 00:31:57,428 GENE AND HOW IS IT TINY CHANGES 797 00:31:57,428 --> 00:32:02,933 COULD HAVE PRONOUNCED IMPACT 798 00:32:02,933 --> 00:32:06,070 WHEN THEY'RE IN PARTS OF NON 799 00:32:06,070 --> 00:32:07,237 CODING PARTS OF THE GENOME. 800 00:32:07,237 --> 00:32:09,173 WE HAVE LEARNED RECENTLY THE 801 00:32:09,173 --> 00:32:11,175 MUTATION NOT ASSOCIATED WITH 802 00:32:11,175 --> 00:32:13,944 JUST LACTOSE INTOLERANCE, 803 00:32:13,944 --> 00:32:15,345 THEY'VE HAVE BEEN LINKED TO 804 00:32:15,345 --> 00:32:21,285 DOZENS AND DOZENS OF HUMAN 805 00:32:21,285 --> 00:32:22,853 DISEASES AND DISORDERS INCLUDING 806 00:32:22,853 --> 00:32:33,363 WHERE THERE'S A MUTATION OF A 807 00:32:34,832 --> 00:32:38,902 GAIN ALMOST IS PART OF ANOTHER 808 00:32:38,902 --> 00:32:41,038 GENE LEADS TO THESE DISEASES 809 00:32:41,038 --> 00:32:43,140 WITH QUITE DRASTIC PHENOTYPE. 810 00:32:43,140 --> 00:32:46,777 SINGLE NUCLEOTIDE CHANGES IN NON 811 00:32:46,777 --> 00:32:48,078 CODING DNA HAVE BEEN ASSOCIATED 812 00:32:48,078 --> 00:32:53,817 WITH PARKINSON'S, HEART 813 00:32:53,817 --> 00:32:57,755 DISEASES, TYPE II DIABETES AND A 814 00:32:57,755 --> 00:33:01,592 VARIETY OF CANCERS. 815 00:33:01,592 --> 00:33:02,593 ETCETERA. 816 00:33:02,593 --> 00:33:06,230 AND IT COULD BE THAT PARTS OF 817 00:33:06,230 --> 00:33:09,533 THE NON CODING GENOME ARE 818 00:33:09,533 --> 00:33:10,834 DISPENSABLE YET THERE'S FRAGILE 819 00:33:10,834 --> 00:33:17,307 IN THE CODING PARTS OF OUR 820 00:33:17,307 --> 00:33:24,214 GENOME THAT ARE UNDISPENSABLE 821 00:33:24,214 --> 00:33:26,917 AND LOOKING AT STUDIES YOU'LL 822 00:33:26,917 --> 00:33:30,354 SEE THAT ONLY 5% OF THESE 823 00:33:30,354 --> 00:33:33,557 VARIANTS ARE COORDINATED AND 95% 824 00:33:33,557 --> 00:33:41,064 OF THEM ARE NON-CODING. 825 00:33:41,064 --> 00:33:46,770 SUGGESTING THERE'S QUITE A FEW 826 00:33:46,770 --> 00:33:52,242 OF VARIANTS ASSOCIATED WITH THE 827 00:33:52,242 --> 00:33:52,543 DISEASES. 828 00:33:52,543 --> 00:34:02,386 ONE IS A MUTATION IN A DISTAL 829 00:34:02,386 --> 00:34:07,457 ENHANCER OF GENE LINKED TO 830 00:34:07,457 --> 00:34:12,029 PARKINSON'S AND IT DISRUPTS THE 831 00:34:12,029 --> 00:34:18,936 COMPLEX TO THE DISTAL ENHANCER 832 00:34:18,936 --> 00:34:24,808 AND RESULTS IN THE HIGHER LEVEL 833 00:34:24,808 --> 00:34:34,685 OF THE NUCLEIN EXPRESSER. 834 00:34:34,685 --> 00:34:36,353 AND I'D LIKE TO TELL YOU NOT 835 00:34:36,353 --> 00:34:40,057 ABOUT THE NON-CODING VARIANTS 836 00:34:40,057 --> 00:34:41,992 BUT THE COMPUTATIONAL APPROACH 837 00:34:41,992 --> 00:34:45,362 WE USED TO PREDICT THE PART OF 838 00:34:45,362 --> 00:34:50,500 THE NON-CODING VARIANTS ON HUMAN 839 00:34:50,500 --> 00:34:50,767 GENOTYPES. 840 00:34:50,767 --> 00:34:52,302 IN PARTICULAR INTERESTED IN 841 00:34:52,302 --> 00:34:53,670 FIGURING WHAT IS THE CAUSAL 842 00:34:53,670 --> 00:34:58,275 VARIANTS IN THE WHOLE POOL OF 843 00:34:58,275 --> 00:35:00,077 ASSOCIATED GWAS VARIANTS. 844 00:35:00,077 --> 00:35:02,946 WE USED DEEP LEARNING AND 845 00:35:02,946 --> 00:35:06,683 REGULATORY GENOMICS THAT 846 00:35:06,683 --> 00:35:08,685 ENCOMPASSES OTHER DATA SETS AS 847 00:35:08,685 --> 00:35:14,524 WELL AS WE RELY ON EXPERIMENTAL 848 00:35:14,524 --> 00:35:16,360 VALIDATION DONE BY OTHER 849 00:35:16,360 --> 00:35:16,760 INSTITUTES. 850 00:35:16,760 --> 00:35:19,496 SO FAR WE HAVE BEEN ABLE TO 851 00:35:19,496 --> 00:35:21,665 DEMONSTRATE THAT THIS METHOD CAN 852 00:35:21,665 --> 00:35:23,333 PREDICT NON CODING VARIANTS THAT 853 00:35:23,333 --> 00:35:26,937 ARE LINKED TO AUTISM, TYPE II 854 00:35:26,937 --> 00:35:33,443 DIABETES AND LYMPHOMA AND I'LL 855 00:35:33,443 --> 00:35:35,445 TOUCH A LITTLE BIT ON TYPE II 856 00:35:35,445 --> 00:35:35,712 DIABETES. 857 00:35:35,712 --> 00:35:37,281 HOW DOES THIS METHOD LOOK LIKE? 858 00:35:37,281 --> 00:35:40,384 I'M SURE THE GROUP IS VERY 859 00:35:40,384 --> 00:35:45,622 FAMILIAR WITH THE DEEP LEARNING 860 00:35:45,622 --> 00:35:50,928 METHODS AND THIS IS THE NEURAL 861 00:35:50,928 --> 00:35:51,495 NETWORK METHOD. 862 00:35:51,495 --> 00:35:53,297 WE USE TWO PHASES. 863 00:35:53,297 --> 00:35:54,564 FIRST PHASE WE TAKE THE SEQUENCE 864 00:35:54,564 --> 00:36:03,674 OF THE HUMAN GENOME REVIEW THE 865 00:36:03,674 --> 00:36:09,179 MODELS INTO MOTIVES AND TRY TO 866 00:36:09,179 --> 00:36:12,316 PREDICT THE SIGNALS IN THE HUMAN 867 00:36:12,316 --> 00:36:16,787 GENOME AS RECORDED BY THOUSANDS 868 00:36:16,787 --> 00:36:17,688 OF EXPERIMENTS. 869 00:36:17,688 --> 00:36:22,225 WE TRY TO PREDICT WHERE 870 00:36:22,225 --> 00:36:23,427 TRANSCRIPTION FACTORS ARE 871 00:36:23,427 --> 00:36:25,562 BINDING AND CELL LINES AND WHAT 872 00:36:25,562 --> 00:36:30,600 IS THE CHROMATIN STATE. 873 00:36:30,600 --> 00:36:37,140 THEN WE TAKE THAT INFORMATION 874 00:36:37,140 --> 00:36:40,644 AND VIEW THE SECOND MODEL TRICK 875 00:36:40,644 --> 00:36:42,179 TO PREDICT REGULATORY ELEMENTS. 876 00:36:42,179 --> 00:36:43,880 THOSE ARE FUNCTIONAL BLOCKS IN 877 00:36:43,880 --> 00:36:51,688 THE NON-CODING THAT CONTROL GENE 878 00:36:51,688 --> 00:36:53,991 EXPRESSION AND REGULATE HOW AND 879 00:36:53,991 --> 00:36:56,660 WHEN DIFFERENT GENES GET 880 00:36:56,660 --> 00:36:56,927 EXPRESSED. 881 00:36:56,927 --> 00:37:03,166 WE CALL IT A TWO-PHASE ENHANCER 882 00:37:03,166 --> 00:37:10,407 MODEL AND NAME IT. 883 00:37:10,407 --> 00:37:16,146 HOW DOES IT LOOK WHEN WE APPLY 884 00:37:16,146 --> 00:37:16,680 THE MODEL? 885 00:37:16,680 --> 00:37:20,484 WE GET VALUES THAT ARE CLOSE TO 886 00:37:20,484 --> 00:37:25,655 90%, 95% FOR REGIONS OF OPEN 887 00:37:25,655 --> 00:37:27,858 CHROMATIN AND HISTONE MARKS AND 888 00:37:27,858 --> 00:37:30,293 TRANSCRIPTION FACTORS AND THAT'S 889 00:37:30,293 --> 00:37:30,494 FINE. 890 00:37:30,494 --> 00:37:33,363 WE GET OVER ALL ON THE CURVE OF 891 00:37:33,363 --> 00:37:34,531 AROUND 85% AND THAT WAS THREE 892 00:37:34,531 --> 00:37:36,733 YEARS AGO AND NOW WE GET 95% AND 893 00:37:36,733 --> 00:37:38,935 IT DOESN'T MATTER WHAT THE 894 00:37:38,935 --> 00:37:46,910 NUMBER IS. 895 00:37:46,910 --> 00:37:49,880 DEEP LEARNING IS SO MUCH BETTER 896 00:37:49,880 --> 00:37:52,582 AND CLUSTERS AROUND 77% AND GOOD 897 00:37:52,582 --> 00:37:55,519 FOR OUR GROUP BECAUSE WE HAVE 898 00:37:55,519 --> 00:37:58,121 BEEN FIGHTING THE INACCURACY OF 899 00:37:58,121 --> 00:38:03,260 PREDICTIONS FOR DECADES AND WHEN 900 00:38:03,260 --> 00:38:05,929 WE GOT OUR HANDS ON DEEP 901 00:38:05,929 --> 00:38:07,030 LEARNING WE WERE QUITE SURPRISED 902 00:38:07,030 --> 00:38:09,533 BY THE ACCURACY OF THE 903 00:38:09,533 --> 00:38:10,200 PREDICTIONS THAT COULD BE MADE 904 00:38:10,200 --> 00:38:14,104 BY THE METHOD THAT IS SIMPLY A 905 00:38:14,104 --> 00:38:14,938 BLACK BOX. 906 00:38:14,938 --> 00:38:17,007 YOU PUT THE GENOME IN AND 907 00:38:17,007 --> 00:38:18,542 REGULATORY INFORMATION IN AND IT 908 00:38:18,542 --> 00:38:19,109 STARTS MAKING ACCURATE 909 00:38:19,109 --> 00:38:23,914 PREDICTIONS. 910 00:38:23,914 --> 00:38:26,917 WE SAW IT TAKES TIME TO TRAIN 911 00:38:26,917 --> 00:38:29,786 THE MODEL AND WE TRIED TO LOOK 912 00:38:29,786 --> 00:38:32,022 AT DIFFERENT OPTIMIZERS AND WHAT 913 00:38:32,022 --> 00:38:34,458 WE SAW, YOU CAN SEE IT FROM THE 914 00:38:34,458 --> 00:38:40,797 ANIMATION ON THE RIGHT IS THAT 915 00:38:40,797 --> 00:38:41,665 DIFFERENT OPTIMIZERS BEHAVE 916 00:38:41,665 --> 00:38:42,933 QUITE DIFFERENTLY. 917 00:38:42,933 --> 00:38:46,303 IT HAS BEEN TAKEN SO MUCH TIME 918 00:38:46,303 --> 00:38:50,941 FOR US TO TRAIN -- FOR US TO 919 00:38:50,941 --> 00:38:57,147 MODEL ON GOOGLE CLOUD AND 920 00:38:57,147 --> 00:39:00,917 BIOWUFL AND THERE'S APPLICABLE 921 00:39:00,917 --> 00:39:02,853 TO THE ENHANCERS AND WORKS QUITE 922 00:39:02,853 --> 00:39:04,821 MAGICALLY AND QUITE FAST IN OUR 923 00:39:04,821 --> 00:39:05,021 HANDS. 924 00:39:05,021 --> 00:39:06,189 OKAY. 925 00:39:06,189 --> 00:39:08,625 SO WE CAN PREDICT THE REGULATORY 926 00:39:08,625 --> 00:39:08,959 ELEMENTS. 927 00:39:08,959 --> 00:39:15,699 WHAT WE CAN DO, WE CAN PERFORM 928 00:39:15,699 --> 00:39:18,401 IN SILICO MUTAGENESIS AND WE CAN 929 00:39:18,401 --> 00:39:19,903 LOOK AT ANY SEQUENCE IN THE 930 00:39:19,903 --> 00:39:21,138 HUMAN GENOME AND GET A SCORE OF 931 00:39:21,138 --> 00:39:24,307 THE SEQUENCE OF HOW LIKELY IT IS 932 00:39:24,307 --> 00:39:26,710 TO ACT AS AN ENHANCER AND START 933 00:39:26,710 --> 00:39:32,115 CHANGING EVERY NUCLEOTIDE AND 934 00:39:32,115 --> 00:39:34,417 SCORE IN EACH VARIANT CHANGE AND 935 00:39:34,417 --> 00:39:40,190 SEE IF THERE IS ONE THAT WOULD 936 00:39:40,190 --> 00:39:40,991 ENHANCE ACTIVITY. 937 00:39:40,991 --> 00:39:44,461 WHAT YOU SEE IS WE CANNOT APPLY 938 00:39:44,461 --> 00:39:45,862 EVERY SINGLE NUCLEOTIDE AND FOR 939 00:39:45,862 --> 00:39:50,433 SOME OF THEM WE MIGHT GET 940 00:39:50,433 --> 00:39:51,935 STRONGER EFFECT AND FOR SOME WE 941 00:39:51,935 --> 00:39:53,637 MAY GET LOWER EFFECT. 942 00:39:53,637 --> 00:39:55,405 WHAT I WOULD LIKE YOU TO KEEP IN 943 00:39:55,405 --> 00:39:58,408 MIND IS POSITIVE CHANGES IS A 944 00:39:58,408 --> 00:40:02,913 STRONGER NEGATIVE IMPART ON THE 945 00:40:02,913 --> 00:40:03,313 ENHANCER ACTIVITY. 946 00:40:03,313 --> 00:40:05,382 WHEN WE APPLY THE MODEL TO 947 00:40:05,382 --> 00:40:06,750 BINDING OF TRANSCRIPTION 948 00:40:06,750 --> 00:40:12,489 FACTORS, WHAT WE SAW IS THAT WE 949 00:40:12,489 --> 00:40:16,826 CAN PREDICT BINDING OF ALLELE 950 00:40:16,826 --> 00:40:17,761 TRANSCRIPTION FACTORS WITH 951 00:40:17,761 --> 00:40:18,295 PRECISION. 952 00:40:18,295 --> 00:40:19,863 SOMETHING THAT HAS NOT BEEN 953 00:40:19,863 --> 00:40:22,866 POSSIBLE TO DO UNTIL THE 954 00:40:22,866 --> 00:40:26,036 IMPLEMENTATION OF DEEP LEARNING. 955 00:40:26,036 --> 00:40:28,271 THIS IS WHAT WE GET WHEN WE 956 00:40:28,271 --> 00:40:30,273 PROFILE INDIVIDUAL ENHANCERS. 957 00:40:30,273 --> 00:40:34,844 IT SUGGESTS RANDOM ENHANCER THAT 958 00:40:34,844 --> 00:40:37,614 YOU SEE OVER HERE AND YOU SEE 959 00:40:37,614 --> 00:40:39,249 PEAKS AND VALLEYS. 960 00:40:39,249 --> 00:40:42,385 THESE ARE THE MEASURES OF THE 961 00:40:42,385 --> 00:40:43,987 MUTATIONAL IMPACT ON THE 962 00:40:43,987 --> 00:40:46,823 ACTIVITY OF THIS PARTICULAR 963 00:40:46,823 --> 00:40:47,157 ENHANCER. 964 00:40:47,157 --> 00:40:48,491 WHAT'S INTERESTING IS THAT WHEN 965 00:40:48,491 --> 00:40:52,462 WE OVERLAY THAT WITH WHAT'S 966 00:40:52,462 --> 00:40:56,333 KNOWN ABOUT BINDING OF SPECIFIC 967 00:40:56,333 --> 00:40:57,033 TRANSCRIPTION FACTORS IT'S 968 00:40:57,033 --> 00:41:02,806 ALMOST IDENTICALLY PREDICTS TWO 969 00:41:02,806 --> 00:41:09,045 BINDING SITES OF THE HNF 4A AND 970 00:41:09,045 --> 00:41:12,882 WE CAN RUN ANOTHER DEEP LEARNING 971 00:41:12,882 --> 00:41:16,653 MODEL THAT ANALYZES THE ENHANCER 972 00:41:16,653 --> 00:41:20,123 PROFILES AND CAN PREDICT TWO 973 00:41:20,123 --> 00:41:21,391 ADDITIONAL TRANSCRIPTION 974 00:41:21,391 --> 00:41:21,625 FACTORS. 975 00:41:21,625 --> 00:41:25,996 WE PREDICT WHAT IS KNOWN AND 976 00:41:25,996 --> 00:41:26,296 WHAT'S NEW. 977 00:41:26,296 --> 00:41:31,768 THAT'S FINE. 978 00:41:31,768 --> 00:41:34,638 HOW DOES IT SCORE? 979 00:41:34,638 --> 00:41:36,606 SO WE DON'T CARE THAT MUCH ABOUT 980 00:41:36,606 --> 00:41:39,342 PREDICTION OF ENHANCERS. 981 00:41:39,342 --> 00:41:42,612 WE CARE ABOUT THE METHOD TO 982 00:41:42,612 --> 00:41:45,215 PREDICT THIS MUTATION WITHIN THE 983 00:41:45,215 --> 00:41:46,850 REGULATORY DNA. 984 00:41:46,850 --> 00:41:48,752 THERE'S A HUGE EXPERIMENTAL DATA 985 00:41:48,752 --> 00:41:51,521 SET, TENS OF THOUSANDS OF 986 00:41:51,521 --> 00:41:52,822 VARIANTS THAT CHANGE CANCER 987 00:41:52,822 --> 00:41:58,194 ACTIVITY IN THE HUMAN GENOME AND 988 00:41:58,194 --> 00:42:08,505 WE COMPARED AND SAW THE DENSITY 989 00:42:08,505 --> 00:42:11,441 OF THE VALIDATE THE PREDICTS IN 990 00:42:11,441 --> 00:42:16,846 WHAT WE IDENTIFY IN BINDING 991 00:42:16,846 --> 00:42:23,586 SITES OF THE ENHANCERS IN THE 992 00:42:23,586 --> 00:42:26,923 FUNCTIONAL VARIANTS AND PREDICT 993 00:42:26,923 --> 00:42:27,891 THE BINDING SITES SO MUCH CANCER 994 00:42:27,891 --> 00:42:30,927 HOWEVER, WE ASKED THE QUESTION, 995 00:42:30,927 --> 00:42:34,364 WHAT ABOUT THE SCORE OF OUR 996 00:42:34,364 --> 00:42:34,898 METHOD? 997 00:42:34,898 --> 00:42:36,333 IF WE DELINEATE EVERYTHING BY 998 00:42:36,333 --> 00:42:41,805 THE SCORE AND SEE IF THE SCORE 999 00:42:41,805 --> 00:42:47,277 OF THE PREDICTIONS LEADS WE SAW 1000 00:42:47,277 --> 00:42:54,117 INDEED THE DAMAGING EFFECT THE 1001 00:42:54,117 --> 00:42:54,918 GREATER THE EXPERIMENTAL SIGNAL 1002 00:42:54,918 --> 00:42:58,154 NOT ONLY THE METHOD TO PREDICT 1003 00:42:58,154 --> 00:42:59,823 BUT THE SCORING CAPABILITIES CAN 1004 00:42:59,823 --> 00:43:02,926 BE VALIDATED USING EXPERIMENTAL 1005 00:43:02,926 --> 00:43:11,267 DATA. 1006 00:43:11,267 --> 00:43:14,637 WE CAN PREDICT EFFECT OF 1007 00:43:14,637 --> 00:43:16,706 ENHANCER GENE BUT WHAT MAY BE 1008 00:43:16,706 --> 00:43:21,845 MORE INTERESTING IS WE CAN 1009 00:43:21,845 --> 00:43:24,280 PREDICT AND POSITIVE SCORING, 1010 00:43:24,280 --> 00:43:25,382 NEGATIVE ACTIVITY AND EFFECTS ON 1011 00:43:25,382 --> 00:43:28,251 THE ENHANCER ACTIVITY AND 1012 00:43:28,251 --> 00:43:29,953 SCORES, POSITIVE EFFECTS ON THE 1013 00:43:29,953 --> 00:43:33,089 ENHANCER ACTIVITY. 1014 00:43:33,089 --> 00:43:34,791 WE CAN PREDICT WHAT IS 1015 00:43:34,791 --> 00:43:37,694 SUBJECTIVE OF THE ENHANCER 1016 00:43:37,694 --> 00:43:43,700 ACTIVITY AND THAT WE'VE SEEN SO 1017 00:43:43,700 --> 00:43:46,836 OFTEN WE GET TONS AND IT HAS 1018 00:43:46,836 --> 00:43:53,977 BEEN CRAZY AND HAS BEEN 1019 00:43:53,977 --> 00:43:58,081 DIFFICULT UNTIL OUR COLLEAGUES 1020 00:43:58,081 --> 00:44:00,316 DEMONSTRATED OUR REGULATORY 1021 00:44:00,316 --> 00:44:01,985 AMOUNTS ARE OPEN TO INNOVATION 1022 00:44:01,985 --> 00:44:08,591 WHEN THEY TESTED ENHANCERS AND 1023 00:44:08,591 --> 00:44:16,099 THEY'VE DONE THIS IN MICE 1024 00:44:16,099 --> 00:44:17,600 THEY'VE BEEN ABLE TO SHOW THEY 1025 00:44:17,600 --> 00:44:19,235 DO NOT DESTROY ENHANCERS. 1026 00:44:19,235 --> 00:44:22,105 THEY STRENGTHEN OR CREATE NEW 1027 00:44:22,105 --> 00:44:26,276 ENHANCERS FROM THE PREVIOUS 1028 00:44:26,276 --> 00:44:27,277 ENHANCER TEMPLATES AND WE CAN 1029 00:44:27,277 --> 00:44:33,082 HAVE THE SAME EFFECT USING OUR 1030 00:44:33,082 --> 00:44:34,818 DEEP LEARNING MODEL. 1031 00:44:34,818 --> 00:44:39,989 WE JOINED WITH THE GENOME 1032 00:44:39,989 --> 00:44:41,658 INSTITUTE AND DECIDED TO LOOK AT 1033 00:44:41,658 --> 00:44:45,795 THE DISEASE CAUSAL MUTATIONS IN 1034 00:44:45,795 --> 00:44:48,231 TYPE II DIABETES. 1035 00:44:48,231 --> 00:44:51,668 THERE WAS A COLLECTION OF 1036 00:44:51,668 --> 00:44:55,438 ENHANCERS IN OUR METHOD AND 1037 00:44:55,438 --> 00:45:00,944 TRAIN THE MODEL AND START 1038 00:45:00,944 --> 00:45:01,444 PRED 1039 00:45:01,444 --> 00:45:02,912 PREDICTING NET ONLY ENHANCERS 1040 00:45:02,912 --> 00:45:04,848 BUT MUTATIONS LIKELY TO DISRUPT 1041 00:45:04,848 --> 00:45:09,319 THE ENHANCERS. 1042 00:45:09,319 --> 00:45:19,796 WE USED THAT DATA AND IT'S 1043 00:45:37,180 --> 00:45:39,015 LIKELY TO BE PREDICTED AS SUCH 1044 00:45:39,015 --> 00:45:46,022 FROM THE DATA. 1045 00:45:46,022 --> 00:45:48,458 I WOULD LIKE TO ASK YOU LOOK AT 1046 00:45:48,458 --> 00:45:49,926 THE PROFILERS OF ENHANCERS AND 1047 00:45:49,926 --> 00:45:52,128 WHAT YOU SEE, YOU SEE THE SAME 1048 00:45:52,128 --> 00:45:56,566 PEAKS AND VALLEYS AND YOU SEE 1049 00:45:56,566 --> 00:45:57,700 TWO SPECIFIC NARROW PEAKS WITHIN 1050 00:45:57,700 --> 00:46:00,570 THE CONSEQUENCE OF THE ENHANCER 1051 00:46:00,570 --> 00:46:03,306 AND DEPICTED BY THE 1052 00:46:03,306 --> 00:46:03,673 EXPERIMENTATION. 1053 00:46:03,673 --> 00:46:06,709 THERE ARE TWO NUCLEOTIDES IF 1054 00:46:06,709 --> 00:46:08,978 TOUCHED WILL DESTROY THE 1055 00:46:08,978 --> 00:46:09,279 ENHANCER. 1056 00:46:09,279 --> 00:46:14,784 IF YOU LOOK UNDER, THEY'RE OUR 1057 00:46:14,784 --> 00:46:17,053 COMPUTATIONAL PREDICTS AND ALIGN 1058 00:46:17,053 --> 00:46:19,188 WITH EXPERIMENTAL DATA AND HOW 1059 00:46:19,188 --> 00:46:22,825 METHOD PREDICT THE CAUSAL AND 1060 00:46:22,825 --> 00:46:23,927 DISRUPTIVE VARIANTS. 1061 00:46:23,927 --> 00:46:25,962 OKAY, WE CAN COMPARE OUR METHOD 1062 00:46:25,962 --> 00:46:27,297 TO OTHER METHODS AVAILABLE FOR 1063 00:46:27,297 --> 00:46:30,934 SIMILAR TASK AND WHAT WE'VE SEEN 1064 00:46:30,934 --> 00:46:33,036 ACROSS DIFFERENT CELL LINES AND 1065 00:46:33,036 --> 00:46:36,839 CONDITIONS OUR METHOD PERFORMS 1066 00:46:36,839 --> 00:46:40,743 AS GOOD OR OTHER METHODS OR 1067 00:46:40,743 --> 00:46:42,512 BETTER THAN WHAT'S AVAILABLE. 1068 00:46:42,512 --> 00:46:47,850 HOW CAN WE APPLY TO TYPE II 1069 00:46:47,850 --> 00:46:49,652 DIABETES TYPE II DIABETES? 1070 00:46:49,652 --> 00:46:51,487 AND IN PARTICULAR, HOW CAN WE 1071 00:46:51,487 --> 00:46:58,828 HELP THE COMMUNITY TO RESOLVE 1072 00:46:58,828 --> 00:47:03,633 THE UNCERTAINTY OF THE GWAS 1073 00:47:03,633 --> 00:47:04,367 STUDIES? 1074 00:47:04,367 --> 00:47:08,271 IF YOU START WITH DIABETES GWAS 1075 00:47:08,271 --> 00:47:10,840 THERE'S 400 SOME SNIPS AND THAT 1076 00:47:10,840 --> 00:47:14,043 WOULD BE FIND BUT YOU HAVE TO 1077 00:47:14,043 --> 00:47:18,848 HAVE THE STRUCTURE OF THE HUMAN 1078 00:47:18,848 --> 00:47:20,283 GENOME INTO THE SNIPS AND GOOD 1079 00:47:20,283 --> 00:47:22,852 LUCK TRYING TO FIGURE OUT WHICH 1080 00:47:22,852 --> 00:47:33,196 ONES ARE NOT SIMPLE. 1081 00:47:37,600 --> 00:47:43,706 AND LOOK AT THE SNIPS TO 50 AND 1082 00:47:43,706 --> 00:47:44,707 WE CAN MAKE TYPE II DIABETES 1083 00:47:44,707 --> 00:47:47,844 MORE FEASIBLE. 1084 00:47:47,844 --> 00:47:52,315 AND IN THE CASE OF RNF6 GENE 1085 00:47:52,315 --> 00:47:57,253 WHAT IS KNOWN IS UP STREAM OF 1086 00:47:57,253 --> 00:48:01,090 THE GENE THERE'S A LD BLOCK THAT 1087 00:48:01,090 --> 00:48:06,929 ALL LOOK IDENTICAL AND THEY ARE 1088 00:48:06,929 --> 00:48:07,997 LOOK ASSOCIATED WITH THE DISEASE 1089 00:48:07,997 --> 00:48:09,532 AS THEIR NEIGHBORS. 1090 00:48:09,532 --> 00:48:13,469 HOW CAN YOU LOOK AT THIS 1091 00:48:13,469 --> 00:48:14,837 DUPLICATE AND FIGURE OUT WHAT IS 1092 00:48:14,837 --> 00:48:19,676 THE CAUSE OF THE DISEASES? 1093 00:48:19,676 --> 00:48:21,678 WE APPLIED OUR METHOD ONLY ONCE 1094 00:48:21,678 --> 00:48:26,315 TO THAT AND LUCKILY THIS SNIP 1095 00:48:26,315 --> 00:48:27,250 HAS BEEN EXPERIMENTALLY 1096 00:48:27,250 --> 00:48:31,688 DEMONSTRATED TO BE THE CAUSAL 1097 00:48:31,688 --> 00:48:36,159 VARIANT OF THIS AND IT'S NOT 1098 00:48:36,159 --> 00:48:40,863 THAT INTERESTING HOWEVER, WHAT 1099 00:48:40,863 --> 00:48:49,305 WE CAN DO THE GENES LEAD TO 1100 00:48:49,305 --> 00:48:51,207 DISEASES AND WHEN WE LOOKED AT 1101 00:48:51,207 --> 00:48:52,642 SOME THAT'S ONE OF THE EXAMPLES. 1102 00:48:52,642 --> 00:48:58,481 IF YOU LOOK AT GWAS YOU SEE TON. 1103 00:48:58,481 --> 00:49:00,316 THAT'S THE BOTTOM PAIR. 1104 00:49:00,316 --> 00:49:02,618 IF YOU LOOK AT THE GLUCOSE YOU 1105 00:49:02,618 --> 00:49:05,988 GET FEWER VARIANTS BUT YOU STILL 1106 00:49:05,988 --> 00:49:08,324 GET QUITE A FEW ONLY A DOZEN 1107 00:49:08,324 --> 00:49:09,425 ASSOCIATED WITH THE DISEASE. 1108 00:49:09,425 --> 00:49:11,794 HOWEVER, WHEN YOU APPLY THE 1109 00:49:11,794 --> 00:49:14,197 METHOD YOU ONLY GET TWO AND ONE 1110 00:49:14,197 --> 00:49:15,231 OF THEM SEEMS TO BE MUCH 1111 00:49:15,231 --> 00:49:15,865 STRONGER CANDIDATE THAN THE 1112 00:49:15,865 --> 00:49:17,400 OTHER ONE. 1113 00:49:17,400 --> 00:49:22,839 AND THIS IS THE WORK OF OUR 1114 00:49:22,839 --> 00:49:23,973 COLLEAGUE FROM THE JACKSON LAB 1115 00:49:23,973 --> 00:49:26,309 AND THEY ARE HAVE BEEN ABLE TO 1116 00:49:26,309 --> 00:49:27,977 LOOK AT OUR PREDICTIONS WHERE WE 1117 00:49:27,977 --> 00:49:31,881 SAID, OKAY, IT LOOKS LIKE THERE 1118 00:49:31,881 --> 00:49:34,217 ARE THREE STRONG ASSOCIATIONS 1119 00:49:34,217 --> 00:49:36,919 AND WE PREDICT THERE'S ONLY ONE 1120 00:49:36,919 --> 00:49:41,524 IN THE MIDDLE THAT'S THE DISEASE 1121 00:49:41,524 --> 00:49:42,492 CAUSAL OR PHENOTYPE CAUSAL 1122 00:49:42,492 --> 00:49:46,896 ALLELE AND THEY'VE BEEN ABLE TO 1123 00:49:46,896 --> 00:49:51,968 CONFIRM JUST THAT USING ASSAYS 1124 00:49:51,968 --> 00:49:55,004 AS WELL AS THE REPORTER AND 1125 00:49:55,004 --> 00:49:57,240 DEMONSTRATING THE CHANGE WE 1126 00:49:57,240 --> 00:50:00,176 PREDICT AS CAUSAL THERE'S A 1127 00:50:00,176 --> 00:50:02,912 DRASTIC EFFECT IN THE GENE 1128 00:50:02,912 --> 00:50:05,948 EXPRESSION. 1129 00:50:05,948 --> 00:50:09,418 SO THAT'S GREAT, THAT'S FINE. 1130 00:50:09,418 --> 00:50:11,287 FOR US WE WERE HAPPY ABOUT THAT 1131 00:50:11,287 --> 00:50:13,856 AND THE PAPER HAS BEEN PUBLISHED 1132 00:50:13,856 --> 00:50:16,225 LAST YEAR BUT THIS IS THE GROUP 1133 00:50:16,225 --> 00:50:20,496 THAT MOST LIKELY USES MUCH MORE 1134 00:50:20,496 --> 00:50:22,064 ADVANCED METHODS THAN CNS. 1135 00:50:22,064 --> 00:50:27,170 I HAVE BEEN TALKING ABOUT CNNs 1136 00:50:27,170 --> 00:50:31,808 AND WE ASK THE SAME QUESTION. 1137 00:50:31,808 --> 00:50:35,945 SO, IS IT REASONABLE TO ASSUME 1138 00:50:35,945 --> 00:50:41,017 THAT THIS AREA OF ADVANCED A.I., 1139 00:50:41,017 --> 00:50:46,122 CNN STILL MAKES SENSE? 1140 00:50:46,122 --> 00:50:50,960 AND AT LEAST WE HAVE THE 1141 00:50:50,960 --> 00:50:52,862 TRANSFORMER MODELS, THEY SEEM TO 1142 00:50:52,862 --> 00:50:54,530 BE MAKING A BREAKTHROUGH AND WE 1143 00:50:54,530 --> 00:50:56,766 ASKED THE SAME QUESTION, CAN WE 1144 00:50:56,766 --> 00:51:01,771 TAKE THIS MORE ADVANCED MODELS 1145 00:51:01,771 --> 00:51:02,872 AND APPLY THEM TO GENES AND 1146 00:51:02,872 --> 00:51:04,907 THERE HAVE BEEN TWO AREAS OF 1147 00:51:04,907 --> 00:51:06,542 RESEARCH IN THE REGULATORY 1148 00:51:06,542 --> 00:51:12,915 GENOMICS OR GENOMICS FIELD THAT 1149 00:51:12,915 --> 00:51:13,683 HAVE EMERGED AND THEY'RE BASED 1150 00:51:13,683 --> 00:51:15,351 ON TRANSFORMER MODELS. 1151 00:51:15,351 --> 00:51:18,187 ONE IS THE INDUSTRY DEVELOPED 1152 00:51:18,187 --> 00:51:24,293 MODEL CALLED NUCLEOTIDE 1153 00:51:24,293 --> 00:51:27,430 TRANSFORMER AND IT HAS USED 1154 00:51:27,430 --> 00:51:30,633 DOZENS OF GPUs, VERY EXPENSIVE 1155 00:51:30,633 --> 00:51:39,575 TO TRAIN AND THE OTHER IS DNA 1156 00:51:39,575 --> 00:51:41,177 AND THERE'S MODEL MIX 1157 00:51:41,177 --> 00:51:43,946 TRANSFORMERS WITH CNNs. 1158 00:51:43,946 --> 00:51:45,815 WE TOOK QUITE A FEW OF THESE 1159 00:51:45,815 --> 00:51:46,048 MODELS. 1160 00:51:46,048 --> 00:51:48,317 WE TOOK VARIANTS OF THE MODELS 1161 00:51:48,317 --> 00:51:52,321 AND WHAT WE DECIDED TO DO, WE 1162 00:51:52,321 --> 00:51:56,192 DECIDED TO LOOK AT THE 1163 00:51:56,192 --> 00:51:58,194 EXPERIMENTAL DATA ON THE PART OF 1164 00:51:58,194 --> 00:52:00,296 REGULATORY VARIANTS AND COMPARE 1165 00:52:00,296 --> 00:52:04,867 THE PREDICTIVE ABILITY WITH 1166 00:52:04,867 --> 00:52:15,111 OTHER MODELS. 1167 00:52:15,678 --> 00:52:18,281 YOU SEE THEY'RE COLOR CODED AND 1168 00:52:18,281 --> 00:52:21,584 THEY CAN RESPOND TO HYBRID OR 1169 00:52:21,584 --> 00:52:23,986 TRANSFORMER MODELS IN BLUE AND 1170 00:52:23,986 --> 00:52:27,556 SAW SECONDLY, CNNs WORK SO MUCH 1171 00:52:27,556 --> 00:52:27,790 BETTER. 1172 00:52:27,790 --> 00:52:33,629 WE HAVE NO EXPLANATION TO THIS 1173 00:52:33,629 --> 00:52:44,106 BUT WE CAN USE CNNs. 1174 00:52:44,106 --> 00:52:53,349 I TOOLED TOLD YOU ABO -- I TOLDU 1175 00:52:53,349 --> 00:52:54,417 ABOUT TYPE II DIABETES AND WOULD 1176 00:52:54,417 --> 00:52:59,221 LIKE TO TALK TO YOU ABOUT OTHER 1177 00:52:59,221 --> 00:52:59,522 DISEASES. 1178 00:52:59,522 --> 00:53:00,957 CALLED OUR MODEL AND 1179 00:53:00,957 --> 00:53:04,860 RAMIFICATION OF IT IS LIMITED TO 1180 00:53:04,860 --> 00:53:07,596 A SPECIFIC DISEASE OR CELL LINE. 1181 00:53:07,596 --> 00:53:10,900 MAYBE PANCREATIC ISLETS ONLY OR 1182 00:53:10,900 --> 00:53:14,437 COULD IT BE WE CAN BE ABLE TO 1183 00:53:14,437 --> 00:53:15,004 PREDICT ACCURATELY DISEASE 1184 00:53:15,004 --> 00:53:20,376 CAUSAL VARIANTS FOR OTHER 1185 00:53:20,376 --> 00:53:20,776 DISEASES? 1186 00:53:20,776 --> 00:53:29,919 AND WE IDENTIFIED STUDIES 1187 00:53:29,919 --> 00:53:33,823 FOCUSSING ON ALZHEIMER'S, 1188 00:53:33,823 --> 00:53:38,928 PARKINSON'S, SCHIZOPHRENIA, ALS 1189 00:53:38,928 --> 00:53:40,096 WHERE THE INFORMATION ON DISEASE 1190 00:53:40,096 --> 00:53:42,098 CAUSAL VARIANTS IS AVAILABLE. 1191 00:53:42,098 --> 00:53:44,100 SO WE KNOW AT LEAST A FEW OF 1192 00:53:44,100 --> 00:53:54,643 THEM WE CAN TRY TO REPLICATE AND 1193 00:53:55,411 --> 00:54:02,518 MAKE ACCURATE PREDICTIONS OR NOT 1194 00:54:02,518 --> 00:54:10,860 A 1195 00:54:10,860 --> 00:54:12,995 AND ALL, EVERY CAUSAL SNIP 1196 00:54:12,995 --> 00:54:15,464 THAT'S A RED TRIANGLE IS 1197 00:54:15,464 --> 00:54:18,701 PREDICTED AS A HIGHLY SCORING 1198 00:54:18,701 --> 00:54:22,872 DAMAGING VARIANT OR ENHANCER 1199 00:54:22,872 --> 00:54:27,843 VARIANT ACCORDING TO OUR MODEL. 1200 00:54:27,843 --> 00:54:33,916 AND CONFIRM AS CAUSAL IS NOT AS 1201 00:54:33,916 --> 00:54:35,818 HIGHLY SCORING SUGGESTING WE NOT 1202 00:54:35,818 --> 00:54:46,328 ONLY PREDICT WHAT'S KNOWN BUT 1203 00:54:46,595 --> 00:54:48,964 FROM VARIANTS THAT ARE LIKELY TO 1204 00:54:48,964 --> 00:54:50,066 BE VARIANTS OF DIFFERENT 1205 00:54:50,066 --> 00:54:50,332 DISEASES. 1206 00:54:50,332 --> 00:55:00,876 WITH THAT, I WOULD LIKE TO THANK 1207 00:55:01,444 --> 00:55:08,851 YOU FOR ORGANIZING THE EVENT AND 1208 00:55:08,851 --> 00:55:14,457 ANG LOOK AT SHUN LEE WHO HAS 1209 00:55:14,457 --> 00:55:19,462 DONE A LOT AND SHE'S NOW AN NCI 1210 00:55:19,462 --> 00:55:30,005 AND TO THANK OUR COLLABORATORS. 1211 00:55:30,539 --> 00:55:33,375 ONE THAT I'LL BE HAPPY IT ANSWER 1212 00:55:33,375 --> 00:55:35,077 QUESTIONS. 1213 00:55:35,077 --> 00:55:38,514 >> THANK YOU FOR THE SUPER 1214 00:55:38,514 --> 00:55:39,648 EXCITING PRESENTATION. 1215 00:55:39,648 --> 00:55:42,918 WE DO APPLICATIONS OF DEEP 1216 00:55:42,918 --> 00:55:46,155 LEARNING TO TEXT BUT TAKING THE 1217 00:55:46,155 --> 00:55:49,391 MODELS AND APPLYING TO BIOLOGY 1218 00:55:49,391 --> 00:55:51,560 AND PROTEINS AND DISEASES WHICH 1219 00:55:51,560 --> 00:55:53,963 IS A WHOLE NEW WORLD TO US WHICH 1220 00:55:53,963 --> 00:55:54,897 IS SO EXCITING. 1221 00:55:54,897 --> 00:55:56,432 I WANTED TO ASK YOU IF YOU COME 1222 00:55:56,432 --> 00:55:59,101 UP WITH NEW VARIANT PREDICTIONS 1223 00:55:59,101 --> 00:56:04,907 USING THIS DEEP LEARNING MODELS 1224 00:56:04,907 --> 00:56:08,177 DO YOU HAVE A MECHANISM TO 1225 00:56:08,177 --> 00:56:13,482 VERIFY AND HOW IS IT FOR US AT 1226 00:56:13,482 --> 00:56:14,717 NIH TO USE THE PREDICTIONS. 1227 00:56:14,717 --> 00:56:16,886 >> YOU'RE LOOKING AT THE HEART 1228 00:56:16,886 --> 00:56:17,419 OF THE PROBLEM. 1229 00:56:17,419 --> 00:56:27,863 VALIDATION IS KEY FOR US. 1230 00:56:28,864 --> 00:56:29,932 THERE'S EXPERIMENTAL DATA SETS 1231 00:56:29,932 --> 00:56:35,571 WE CAN USE TO VALIDATE OUR 1232 00:56:35,571 --> 00:56:44,813 MODEL. 1233 00:56:44,813 --> 00:56:47,283 WE CAN DO HIGH SCALE VALIDATIONS 1234 00:56:47,283 --> 00:56:50,920 BUT THEY'RE NOT TARGETED TO 1235 00:56:50,920 --> 00:56:55,224 SPECIFIC PREDICTIONS THAT WE 1236 00:56:55,224 --> 00:57:02,097 MAKE. 1237 00:57:02,097 --> 00:57:05,901 WOE HAVE WONDERFUL COLLABORATORS 1238 00:57:05,901 --> 00:57:08,737 AT THE NIH INCLUDING THOSE THAT 1239 00:57:08,737 --> 00:57:14,910 TAKE OUR PREDICTIONS AND LOOKS 1240 00:57:14,910 --> 00:57:25,387 AT REPORTER ASSAYS AND SHOW 1241 00:57:27,456 --> 00:57:29,458 COMPUTATION IN EXPERIMENTS AND 1242 00:57:29,458 --> 00:57:34,897 IT TAKES TIME AND MONEY. 1243 00:57:34,897 --> 00:57:45,441 AND I'M QUITE HAPPY TO BE PART 1244 00:57:45,641 --> 00:57:47,476 OF NIH WHERE WE HAVE A LARGE 1245 00:57:47,476 --> 00:57:53,282 NETWORK OF POSSIBLE 1246 00:57:53,282 --> 00:57:54,316 COLLABORATIONS. 1247 00:57:54,316 --> 00:57:56,919 AND SURE THE INTERESTED IN 1248 00:57:56,919 --> 00:58:01,957 ESTABLISHING BRIDGES BETWEEN 1249 00:58:01,957 --> 00:58:07,162 EXPERIMENTATION AND COMPUTATIONS 1250 00:58:07,162 --> 00:58:17,673 AND WHAT -- THANK YOU FOR THE 1251 00:58:20,075 --> 00:58:22,077 SCIENTIFIC ENVIRONMENT WHERE WE 1252 00:58:22,077 --> 00:58:23,746 HAVE PEOPLE AND GROUPS. 1253 00:58:23,746 --> 00:58:25,581 I HAVE ANOTHER QUESTION FROM THE 1254 00:58:25,581 --> 00:58:28,784 AUDIENCE AND HE'S ASKING DOES 1255 00:58:28,784 --> 00:58:32,054 THIS MODEL PREDICT WHETHER GENES 1256 00:58:32,054 --> 00:58:34,023 OR GENE EXPRESSION IS INFLUENCED 1257 00:58:34,023 --> 00:58:39,128 BY NON-CODING VARIANTS AND 1258 00:58:39,128 --> 00:58:41,230 DIRECTION OF EXPRESSION CHANGES 1259 00:58:41,230 --> 00:58:41,830 SUCH AS UPREGULATION OR DOWN 1260 00:58:41,830 --> 00:58:46,201 REGULATION. 1261 00:58:46,201 --> 00:58:47,736 >> IT'S A FANTASTIC QUESTION AND 1262 00:58:47,736 --> 00:58:51,540 THE ANSWER IS YES AND NO. 1263 00:58:51,540 --> 00:59:02,084 WE DO NOT PREDICT THE IMPACT OF 1264 00:59:02,985 --> 00:59:06,855 REGULATORY VARIANCE AND THERE'S 1265 00:59:06,855 --> 00:59:08,157 ANOTHER COMPLEX TASK AS I DIDN'T 1266 00:59:08,157 --> 00:59:13,195 GO IN THE DETAILS BUT YOU MIGHT 1267 00:59:13,195 --> 00:59:14,930 KNOW THAT EVERY GENE HAS A LARGE 1268 00:59:14,930 --> 00:59:22,371 NUMBER OF REGULATORY ENHANCERS. 1269 00:59:22,371 --> 00:59:24,807 THERE'S HALF A DOZEN ENHANCERS 1270 00:59:24,807 --> 00:59:29,611 YOU HAVE CHROMATIN CONFIRMATION 1271 00:59:29,611 --> 00:59:33,849 AND GENE REGULATION IS QUITE 1272 00:59:33,849 --> 00:59:34,249 COMPLEX. 1273 00:59:34,249 --> 00:59:40,389 WE PREDICT THE EFFECT ON OTHER 1274 00:59:40,389 --> 00:59:50,566 ELEMENTS. 1275 00:59:53,268 --> 00:59:55,871 AND FIGURING OUT THE ROLE OF 1276 00:59:55,871 --> 00:59:58,540 ENHANCERS AND HOW THEY 1277 00:59:58,540 --> 01:00:00,342 COLLABORATE AND UP AND DOWN 1278 01:00:00,342 --> 01:00:02,711 REGULATE THE ACTIVITY OF THE 1279 01:00:02,711 --> 01:00:02,911 GENES. 1280 01:00:02,911 --> 01:00:04,913 WE'RE TRYING TO VIEW THESE ALL 1281 01:00:04,913 --> 01:00:07,616 INCLUSIVE MODEL BUT SO FAR WE 1282 01:00:07,616 --> 01:00:09,385 HAVE NOT BEEN ABLE TO CROSS THAT 1283 01:00:09,385 --> 01:00:11,053 BRIDGE YET. 1284 01:00:11,053 --> 01:00:13,856 SO OUR MODEL PREDICTS THE IMPACT 1285 01:00:13,856 --> 01:00:17,926 ON THE ACTIVITY OF CANCERS AS A 1286 01:00:17,926 --> 01:00:19,895 DOWN STREAM EFFECT IT DOES 1287 01:00:19,895 --> 01:00:22,931 EFFECT GENE EXPRESSION BUT WE DO 1288 01:00:22,931 --> 01:00:25,434 NOT QUANTIFY THAT. 1289 01:00:25,434 --> 01:00:25,934 >> THANK YOU VERY MUCH. 1290 01:00:25,934 --> 01:00:27,636 >> THANK YOU SO MUCH. 1291 01:00:27,636 --> 01:00:30,906 I WANT TO IT'S 3:00. 1292 01:00:30,906 --> 01:00:39,348 I WE HAVE TO WRAP UP AND THANK 1293 01:00:39,348 --> 01:00:43,252 YOU FOR YOUR EXCITING IMPORTANT 1294 01:00:43,252 --> 01:00:46,922 WORK AND WE HOPE TO SEE YOU IN 1295 01:00:46,922 --> 01:00:47,222 THE FUTURE. 1296 01:00:47,222 --> 01:00:57,466 THANK YOU.