>> GOOD AFTERNOON, I'M BARBARA WOOL, FROM NCI CANCER CENTER FOR GENOMICS AND IT'S A REAL DELIGHT FOR ME TO BE ABLE TO INTRODUCE OUR SPEAKER FOR THE AFTERNOON, GAD GETZ WHO CAME DOWN FROM THE PROCEED TO TALK TO US ABOUT COMPUTATIONAL APPROACHES TO UNDERSTANDING THE GENOME AND IN PARTICULAR, THE MUTATED CANCER GENOME, AS WELL AS GENETIC AND GENOMIC VARIATIONS THAT PREDISPOSE PEOPLE TO CANCER. GADDY, HAS BEEN SORT OF A STAR CENTRAL FIGURE IN BRODE COMPUTATIONAL GENOMICS SINCE A POST DOC, HE'S BEEN THE HEAD OF COMPUTATIONAL ANALYSIS FOR CANCER AT THE BROOD, AND--BROAD AND I THINK I COULD GO THROUGH SOME OF THE THINGS HE'S DONE RECENTLY BUT I THINK WITHOUT ADIEU, I WOULD RATHER HAVE HIM DO THAT FOR YOU, SO GADDY, TAKE IT AWAY. [ APPLAUSE ] >> THANK YOU VERY MUCH, BARBARA AND DR. VARMUS, FOR HAVING US TO TO PRESENT THIS. I WANT TO TALK ABOUT THE EXCITEMENT ANALYZE HAPPENING THE CANCER DENOPES AND THE COMP--GENOMES AND THE COMPUTATIONAL CHALLENGES AND APPROACHES YOU ARE TAKING IN GENERAL AT THE BROAD AND IN GENERAL, AND FIRST I WANT TO ACKNOWLEDGE THE PEOPLE, SO THIS IS NOT JUST MY WORK THAT MANY PEOPLE INVOLVED AS AN ANALYSIS TEAM THAT INCLUDES BIOLOGISTS AND THE PHYSICIANS AND COMPUTATIONAL BIOLOGISTS COMING FROM DIFFERENT BACKGROUNDS FROM MAP AND STATISTICS AND PHYSICS AND OF COURSE BIOLOGY AND SOFTWARE ENGINEERING, ET CETERA, THERE'S A FIRE HOSE TEAM WHICH I'LL TALK ABOUT, THE G-DAC TEAM AND PROJECT MANAGEMENT DIFFERENT PLATFORM AT THE BROAD, GENETIC ANALYSIS PLATFORM, THE SAMPLE PLATFORM AND THE GROUP DEALING WITH GERM LINE ANALYSIS AND OUR CANCER GENOME STEERING COMMITTEE INCLUDING STACIE, MATTHEW, TODD, LEVI AND ERIC, AND OF COURSE, THANKS TO THE NCRITIC I AND NG RI FOR THE CANCER GENOMEAT LAS PINTAS FOR KHI IS THE FLAGSHIP PROJECT IN THE FIELD. SO, WHEN WE THINK ABOUT CANCER THERAPEUTICS, THE FIRST THING WE WANT TO DO IS DISCOVER WHAT ARE THE GENES IN PATHWAYS THAT ARE THE KEY PLAYERS IN CANCER, IN CANCER BASICALLY. IN ORDER TO DO THAT, WE HAVE TO--WE HAVE 2 MAJOR TASKS, 1, I CALL CHARACTERIZATION WHICH IS IDENTIFYING ALL THE SOMATIC AND GERM LINE EVENTS IN A PARTICULAR PATIENT. TAKING A PIECE OF THE TUMOR, BLOOD AND COMPARING ALL THE CHANGES IN TERMS OF THEIR MUTATION, DNA CHANGES, METHYLATION, COPY NUMBER CHANGES, TRANSLOCATION, EXPRESSION OF DIFFERENT GENES, PROTEOMIC CHANGES, ALL THESE CHANGES WE NEED TO CHARACTERIZE ON A SINGLE PATIENT. THEN, THE NEXT TASK IS TAKING ALL THESE DATA ON MANY DIFFERENT PATIENTS, AND INTERPRETING THESE RESULTS AND ASKING WHAT ARE THE GENES AND PATHWAYS THAT ARE STATISTICALLY SIGNIFICANTLY ALTERED MORE THAN YOU WOULD EXPECT BY CHANCE, THIS WOULD IDENTIFY THE CANCER GENE PLAYERS, THE GENE IN THE PATHWAYS IN THE CANCER ALSO IDENTIFIED SUBTYPES OF THE PATIENTS THAT MAY HAVE DIFFERENT RESPONSE OR DIFFERENT DISEASE ACTUALLY DIFFERENT DISEASE. SO LET'S FIRST TALK ABOUT CHARACTERIDESSATION SO WHAT IS THE GOAL OF CHARACTERIZATION IS TO REACH A COMPLETE BASE LEVEL RESOLUTION OF THE TUMOR GENOME AND UNDERSTAND IT'S EVOLUTION, NOT JUST WHAT--WHAT IT IS NOW, HOW DID IT GET TO THE STAGE THAT IT'S NOW, AND ALL THE MECHANISMS THAT ACTUALLY SHAPED IT. CLEARLY NEXT GENERATION SEQUENCING IS REALLY REVOLUTIONIZING OUR ABILITY TO DO THAT, BUT ALSO, POSES SOME CRITICAL CHALLENGES. THE FLOOD OF DAT ASPECIFIC INSTANCE AND SENSITIVITY AND WE'LL GO OVER THERE. SO CANCER GENOME PROJECTS WERE ENABLED BECAUSE OF THIS GRAPH THIS, IS THE--WE ALL KNOW ABOUT THIS, THIS IS THE DROP OF SEQUENCING OVER LAST DECADE GOING FROM $30,000 FOR MEGABASE TO LESS THAN 10-CENTS IN MEGABASE AND THAT'S DUE TO DIFFERENT TECHNOLOGY CHANGES SO THAT ENABLES US TO DO A COMPREHENSIVE ANALYSIS OF CANCER GENOMES, SO WHAT CAN WE DETECT BY SEQUENCING, AND THESE ARE PIECES OF PRAG MENTORSHIP SKILLS TO SEQUENCE BOTH ENDS OF THEM AND WE COULD ALIGN TO THE GENOME AND IDENTIFY POINT MUTATIONS BASES THAT ARE DIFFERENT FROM THE REFERENCE OR NORMAL GENOME, IDENTIFY INSERTIONS AND DELETIONS BY THE DEPTH OF SEQUENCING COPY NUMBER CHANGES BY THE MAPPING THE 2 ENDS TO 2 DIFFERENT LOCATIONS IN THE GENOMES WE CAN DETECT TRANSLOCATIONS OR REARRANGEMENTS AND WE CAN DETECT SEQUENCES NOT HUMAN IN THE SAMPLE, VIRAL SEQUENCES OR OTHER PATHOGENS AND THESE ARE KIND OF THE PROBLEMS THAT WE NEED TO DRESS OR FINDINGS THAT WE NEED TO HAVE. SO CANCER GENOME HA'S THE MAP OF CANCER GENOME PROGECS, THE TCGA, THE NATIONAL PROJECT IS ANALYZING NOW, 35 TUMOR TYPES WITH 10 SMALLER RARE TUMOR TYPES BUT THE MAIN 1S ARE 500 TUMOR NORMAL PAIRS EACH, EACH TUMOR IS GOING THROUGH WHOLE EXHOME SEQUENCING, 10% MAY BE GOING SOON, WHOLE GENOME SEQUENCING, MRNA, AND RNA, AND GENOTYPING AND COPY NUMBER AND METHYLATION'RE RAISED AND MOVING TOWARDS SEQUENCING AND PLATTELOMERIC RTTA, PROTEIN ARRAY. FOLLOWING THE TCGA, THE CONSORTIUM FORMED CALLED THE INTERNATIONAL CANCER GENOME CONSORTIUM THAT AIMS TO STUDY 50 TUMOR TYPES WITH THE BASIC KIND OF SAME STUDY TYPES WITH THE 500 NORMAL PAIRS EACH AND THAT IS KIND OF ONGOING IN AN UMBRELLA ORGANIZATION FOR DIFFERENT TCGA PROJECTS AROUND THE WORLD. THE BROAD INSTITUTE HAS OTHER SOURCES OF FUNDING TO DO OTHER EXPERIMENTS AND THIS IS THE OTHER TYPES WE'RE DOING IN THE CANCER GENOME PROJECTS. AND IT'S EXPECT THAD WE'RE ALREADY IN YEAR 2, SO IT'S EXPECTED TO BE NEXT YEAR, THERE WILL BE 30 AT THE TIMA BITES OF THESE DATA IN THE WORLD. WHAT YOU CAN SEE, THAT EVERY YEAR, THE BLUE BAR REPRESENT HOSYOU MUCH WE PRODUCE THAT YEAR AND MORE THAN EVER PRODUCING THE HISTORY OF GENOMICS, SO IT'S TWICE AS MUCH AS THE HISTORY OF DENATIONAL LIBRARY OF MEDICINICS THIS, IS IN THE EXPONENTICAL PHASE OF GROWTH OF DATA AND THAT MEANS WE'RE LIVING ON THE THAT YOU NEED TO TACKLE WITH THE DATA THAT YOU PROVE THIS YEAR, IT'S BASICALLY ESSENTIALLY LIKE TACKLING THE DATA YOU EVER PRODUCE, IT'S THE SAME THING. SO THERE'S FLOOD OF DATA AND WE NEED TO GET READY FOR THAT. SO HERE'S A MAP OF THE ICGC WITH THE PROJECTS AROUND THE WORLD WITH EUROPE AND ASIA AND AUSTRALIA AND CANDATHE BROAD HAS A COLLABORATION WITH THE COLOR SLIM FOUNDATION WITH MEXICO PROVIDING KIND OF ICGC DIFFERENT PROJECT SO THAT'S THE--JUST TO GIVE YOU ANOTHER EXAMPLE, JUST AT THE BROAD, TO APRIL, THIS YEAR, WITH SEQUENCE ALMOST 5000 TUMOR NORMAL EXOHMS AND ACTUALLY NOW IT'S MUCH HIGHER THAN THAT, BUT LET'S SAY CLOSE TO 250 TUMOR NORMAL WHOLE GENOMES FROM DIFFERENT TUMOR TYPES AND THIS NUMBER IS GROWING AS YOU SAW THOSE GRAPHS, AND IT'S--IT'S THE BROAD AND OTHER CENTERS AS WELL. SO NOW WHEN YOU COMPARE THIS GRAPH OF THE DROP IN PRICE TO THE [INDISCERNIBLE] WHICH IS THE DROP OF THE BASIC COMPUTE COSTS, DISKS AND COMPUTERS YOU SEE THIS DIFFERENCE, SO THAT'S ACTUALLY REAL CONSEQUENCES, IT'S NOT JUST AN ANECDOTE TO COMPARE TO [INDISCERNIBLE]. FIRST OF ALL IT MEANS THERE ARE IS A FLOOD OF DATA, BECAUSE WE USE MORE THAN WE USE TO GENERATE. BUT THE FRACTION OF THE COST IS DIFFERENT OVER TIME, SEQUENCING BECOMES SO CHEAP, THEN EVENTUALLY THE OVERALL KET IS THE COMPUTE, AND IN FACT, IT'S THE SAMPLE PREPARATION AND ACQUISITION PLUS ANALYSIS THAT TAKES OVER MORE THAN ACTUAL SEQUENCING, SO EVEN SEQUENCING WILL BE FREE, YOU HAVE TO ANALYZE THE DAT AND ACQUIRE A SAMPLE AND THAT'S SOMETHING THAT WE NEED TO TAKE INTO ACCOUNT. AS A RESULT, ALSO WE NEED TO BUILD AUTOMATED PIPELINES TO HANDLE THE FLOOD OF DATA TO DO STANDARDIZATION AND REPRODUCIBLE RESULTS AND WE BUILD ANALYSIS PIPELINES FOR THAT. AND WE SHOULD ALSO HAVE TOOLS TO VISUALIZE THE DATA AND AUTOMATICALLY REPORTS TO HELP ACTUALLY PRODUCE OR KIND OF TAKE ALL THIS AMOUNT OF DATA AND PRODUCE REPORTS THAT WE AS HUMAN AND BIOLOGIST CAN READ AND UNDERSTAND. SO WE ACTUALLY HAD TO DEVELOP TOOLS TO FIND ALL THESE EVENTS, FROM MUTATION DETECTION, DELICATOR FOR SECTION, SEGMENTING COPY NUMBER DATA, DERANGEMENT AND BREAK POINT FOR FINDING THE PATH SEEK, SOME OF THEM ARE PUBLISHED RIGHT NOW AND THESE ARE KIND OF TOOLS THAT ARE--THAT ARE AVAILABLE. SO WE DEVELOPED A PIPELINE, SO THE PIPELINE AT THE BROAD IS 2 PIECES, SO 1 IS DONE AT THE SEQUENCING PLATFORM AND WE ALINE THEM TO THE GENOME AND WE MARK DUPLICATE READS AND SOMETIMES THE SAME POLARIZED KAOUL SEQUENCE SEVERAL TIMES, SO YOU DON'T WANT TO DOUBLE COUNT THE SAME MOLECULE SEVERAL TIMES AND THEN THERE'S THE STEP OF BASE QUALITY RECALIBRATION SO EVERY BASE THAT COMES OFF THE MACHINE, COMINGS ALSO WITH THE NUMBER BUT REPRENTING THE PROBABILITY THAT THIS IS A MISTAKE, THESE PROBABILITIES ARE INACCURATE SO IF THEY SY SAY IT'S 1 IN A THOUSAND, ACTUALLY IT'S MAYBE 1 IN A HUNDRED. SO WE NEED TO CALIBRATE THIS QUALITY SO WE BELIEVE THESE QUALITY SCORES AND THERE'S A FACE THAT WEICAL PLATE THEM AND--CALIBRATE THEM AND THEN PUT THESE INTO THE 5 THAT MANY OF YOU INTERFACE WITH AND THIS HOLDS ALL THE REGIONS AND THEIR ALIGNMENT TO THE GENOMES AND THE QUALITY SCORES. WHAT CAN WE DO WITH THESE? >> WE CAN SUBMIT THEM TO DB-GAP OR WE CAN LOAD THEM TO IVG, TO COMMONLY USED TOOL TO LOAD THESE FILES AND ZOOM IN AND OUT AND PAN IT SEE HERE YOU SEE A TUMOR IN NORMAL AT P53 AND YOU SEE THESE COLORED BASES HERE AND THERE A REPRESENT THE MUTATION AS A SNP, AS A SPLICE SIDE OF P53, YOU DON'T SEE THEM IN THE NORMAL, THAT'S A SOMATIC MUTATION. ONCE WE HAVE THIS FIRES THEY GOING THROUGH MY GROUP, THE CANCER GENOME ANALYSIS GROUP BUT FIRST THEY GO THROUGH A BATTERY OF QC TOOLS TO MAKE SURE THERE'S NO PROBLEMS AND BELIEVE ME ALL PROBLEMS YOU CAN THINK ABOUT, HAPPEN, YOU HAVE SWAPS OF LINES AND SWAPS OF SAMPLES AND MIX UP OF SAMPLES, ALL THESE PROBLEMS OCCUR AND WE NEED TO FIGURE THEM OUT AND THAT'S VERY IMPORTANT FOR CLINICAL PURPOSES BECAUSE IF YOU DON'T WANT TO BE TREATED FOR A--FOR A REARRANGEMENT BECAUSE YOUR NEIGHBOR HAD THE ARRANGEMENT, YOU WANT TO BE TREATED FOR THE MUTATIONS IN YOUR SAMPLE SO QC IS VERY IMPORTANT, ONCE WE PASS THIS QC WE RUN A PATTERN, ALL THESE TOOLS TO DETECT THESE EVENTS AND THEN WE ANNOTATE THE EVENTS TO IDENTIFY ONCE WE FIND A MUTATION, WHAT IS ACTUALLY THE CAUSE IN TERMS OF THE PROTEIN, IS IT KNOWN BEFORE, OTHERS, THIS IS MARKED FOR--WE CALL THIS TOOL ONCAITATOR, I KNOW THEY LIKE THIS NAME OFRPGAITATEOR, NOW THIS COMES AN INTERPRETATION PHASE WHICH I TALK ABOUT LATER. SO ALL THESE PIPELINE, WE BUILD ABOVE THIS--IN THIS INFRASTRUCTURE WHICH WE CALL FIRE HOSE TO HANDLE THE FLOOD OF DATA, THAT'S THE REASON WE CAN IT FIRE HOSE AND THIS IS THE SYSTEM THAT HOLDS THE DATA, ACTUALLY KNOWS WHERE THE DATA ARE, DATA ABOUT THE DATA, METADAT ATHE THEMSELVES ON THE DESK, AND ALSO HAVE ALL THE WORK FLOW, THE MODULES AND PIPELINES THAT YOU INSTALL IN THEM. AND THEN YOU COULD RUN, IT KNOWS WHAT TO RUN, WHAT RUNS ON WHAT IS WHAT HAS BEEN RUN BEFORE AND YOU CAN RUN IT AGAIN AND CAN YOU KEEP VERSIONS OF THE DATA, ALGORITHMS, PIPELINE AND THE FLOW, YOU COULD REPRODUCE YOUR RESULT. THAT'S CRUCIAL. IF PEOPLE COULD DO ANALYSIS FROM THE HOME DIRECTORY, WE WOULD NEVER BE ABLE TO WRITE PAPERS OR PRODUCE RESULTS OF PAPERS AND HA'S EXTREMELY IMPORTANT, 1 VERY IMPORTANT CONCEPT OF THE SYSTEM, IS CALLED THE WORK SPACE, WHERE HAVE YOU A COPY OF THE SYSTEM WHERE YOU COULD DO THINGS THAT DON'T INTERFERE WITH EACH OTHER. SO FOR EXAMPLE WHEN A DEVELOPER DEVELOPS A NEW VERSION OF THE TOOL, THEY INSTALL IT IN CERTAIN THEIR OWN WORKS, THEY TEST IT OUT AND THIS DOESN'T INTERFERE WITH A PRODUCTION AND ANOTHER DEVELOPMENT AND ONCE WE PUBLISH A PAPER, WE FREEZE THAT WORK SPACE SO WE COULD REPRODUCE THE RESULTS WITH THE ANALYSIS TOOLS AND DATA THAT WERE USED FOR THE PAPER BECAUSE OTHERWISE WE WOULDN'T BE ABLE TO REPRODUCE ALL RESULTS AGAIN AND VERY IMPORTANT. SO THAT'S ONLY FROM JANUARY THIS YEAR, BUT MUCH MORE THAN THAT NOW, USES AT THE BROAD, THAT USE FIRE HOSE, WE HAVE ROUGHLY THOUSAND KIND OF COMPUTE NODES, 500 TERRA BITES OF DISKS, AND 32 MILLION JOBS ARE RUN BY JANUARY, THAT'S PROBABLY--MAYBE DOUBLED BY NOW. WE USE THE SAME INFRASTRUCTURE TO ANALYZE THE TCGA DAT ATHE PART OF THE GDAC, SO THE BROAD HAS THE GDAC, OF THE TCGA, GENOME DATA ANALYSIS CENTERS WHERE WE'RE KIND OF THE PRIMARY CENTER THAT MAKES ALL THE DATA FROM THE DATA COORDINATING CENTER HAVE WE ANALYZE IT THROUGH CANNED ANALYSIS AND PUT IT BACK AT THE DCC, AND WE'RE DOING THIS THROUGH THE SAME INFRASTRUCTURE FIRE HOSE. HERE'S AN EXAMPLE OF A STRING OF MANY TOOLS AND FOR EXAMPLE, THERE'S JUST A COPY NUMBER, MUTATION, AND 2 OF THOSE WERE GENERATED AT THE BROAD OR NOT THE BROAD, IN OTHER PLACES AND WE PUT THEM IN THESE WORK FLOWS, WE CALL THEM D. A. T., THEY'RE DIRECT GRAPHED AND WE SUBMIT THE RESULTS TO THE DCC, AND ALSO TO REPORT THEM IN OUR WEB SITE. SO IF YOU'RE INTERESTED ON ON TCGA DAT AGO TO THE INSTITUTE, EVERY--EVERY 2 WEEKS, WE HAVE A SNAPSHOT OF ALL THE TCGA DATA AND EVERY MONTH WE ANALYZE THE DAT AND WE COULD SEE FOR EVERY TUMOR TYPE, ALL THE RESULTS THAT ARE AUTOMATICALLY GENERATED PREPUBLICATION, SO THAT'S VERY IMPORTANT FOR ALL OF YOU THAT ARE DOING ANALYSIS ON TUMOR TYPES, ANALYZING TCG. SO LET'S TALK ABOUT--WHAT IS ACTUALLY CANCER, SO IT STARTS FROM MAYBE ADULT SEMESTER CELL THAT DIVIDES NORMALLY IN THE BODY AND EVERY DIVISION YOU ACQUIRE MUTATION IN THE GENOME. SOME OF THEM ARE CORRECTED BY REPAIR MECHANISMS AND WE COULD SEE EVIDENCE FOR THAT. AND SOME OF THEM ARE WHAT'S CALLED PASSENGERS THAT DON'T DO ANYTHING, 1 POINT THERE'S A MUTATION OF THE FITNESS OF THE CELLS AND THOSE START TO GROW AND BEYOND KIND OF THE NEIGHBOR CELLS AND THEY CONTINUE TO ACQUIRE THE EFFECTS AND THEN ANOTHER DRIVER EVENT IS THAT INCREASES FITNESS AND ANOTHER 1 AND ANOTHER 1 AND EVENTUALLY WE TAKE A PIECE OF THE TUMOR, THAT CONTAINS NORMAL CELLS IN TUMOR CELLS BUT THE TUMOR CELLS ARE HETEROGENEOUS ROW GENIUS SO FOR EXAMPLE, ONLY PART OF THEM CONTAIN THE DRIVER, AND ANOTHER PART IS THIS DIFFERENT PASSENGERS, THIS IS THE LAST CLONAL DRIVER, A DRIVER THAT WAS IN ALL THE CANCER CELLS WE'VE BEEN ANALYZING SO THIS IS THE PICTURE THAT YOU NEED TO THINK ABOUT IN CANCER, AND JUST TO GET THE NOMENCLATURE RIGHT, THE DRIVER EVENT IS AN EVENT THAT INCREASES THE FITNESS OF THE CELL WHEN IS IT OCCURRED AND THE CANCER GENE OR CANCER PATHWAY IS A GENE THAT COVERS DRIVING EVENTS. SO P53, IT'S HARD FOR THE FITNESS OF THE CELL AND IT COULD HARBOR PASSENGER EVENTS THAT BASICALLY DON'T INCREASE THE FITNESS OF THE CELLS, LIKE SILENCE MUTATIONS PERHAPS OR OTHER AND THIS IS SOMETHING YOU NEED TO BE AWARE OF. SO NOTE THE NEXT CHALLENGE IS SPECIFICITY. NEED TO PROTECT AGAINST 2 TYPES OF AREAS SO THE SIGNAL WE'RE LOOKING FOR, SO FOR EXAMPLE, POINT MUTATION IS VERY, VERY, RARE, IT'S 1 SOMATIC MUTATIONS ON THAT MEGABASE, ROUGHLY ON THAT ORDER SO IF WE WANT THE LIST OF MUTATIONS TO PRODUCED TO HAVE BEEN SNIFF% VALID OR TRUE--95% VALID OR TRUE, WE NEED AN ERROR RATE THAT IS MUCH LESS THAN .05 ERRORS PER MEGABASE. THE COLOR GROWS EVERY BASE IN THE GENOME AND IS THERE A MUTATION HERE, IS SO THAT TOOL HAS TO BE MORE ACCURATE THAN 99.99% OF THE TIME, YOU KNOW? THAT'S NOT TO GENERATE A LARGE NUMBER OF OF FALSE-POSITIVES. THAT'S A VERY DIFFICULT TASK. SO WE HAVE 2 TYPES OF FALSE-POSITIVES. ONE WHEN FROM'S NO EVENT, NO SOMATIC MUTATION, BUT FOR SOME SEQUENCING ERROR, IT SEEMS LIKE THIS SITE AS A BASE CHANGE, VALIANT, IT'S NOT IN THE NORMAL, ONLY IN THE TUMOR, SO WE THINK THERE'S A SOMATIC EVENT BUT ACTUALLY THAT'S A MISTAKE, AND EVERY BASE IN THEOR COULD HAVE SUCH MISTAKES AND SOME BASES MORE THAN OTHERS BUT THIS IS KIND OF 1 TYPE OF FALSE-POSITIVE. AND THE OTHER TYPE OF FALSE-POSITIVE, IT'S THAT ACTUALLY THE GERM LINE EVENT, BUT BECAUSE WE DIDN'T SEEQUENCE DEEP ENOUGH, THE NORMAL, WE DIDN'T SEE THE HETEROZYGOUS SITE AND WE SEE IT'S IN THE ON TUMOR BUT DON'T REALLY BELIEVE IT'S IN THE NORMAL SO WE CALL IT A SOMATIC EFFECT AND THAT'S ANOTHER TYPE OF MISTAKE. THIS CAN ONLY OBSERVE ON VARYING SITES. WE KNOW SNIFF% OF THE VARIATION IN THE GENOME, WE KNOW THOSE SITES, DEPENDING ON WHICH VERSION OF DB-SNP YOU TAKE, RARE GERM LINE EVENTS ABOUT WHICH WE DON'T KNOW THAT ARE STILL A HIGHER RATE, THIS IS A SOURCE OF NOISE WE NEED TO BE AWARE OF MAKE SURE WE DON'T MAKE THIS MISTAKE. IN FACT WE DON'T MAKE THESE BECAUSE TYPICALLY WE SEQUENCE THE NORMAL DEEP ENOUGH AND IN THEORY IF WE HAVE MORE THAN 8 READS INTO THE NORMAL, WE'RE PRETTY SURE WE DON'T MAKE THIS TYPE OF MISTAKE. SO HOW DO WE CALL MUTATIONS AND HERE WE GIVE EXAMPLE OF THE MUTATION WE DO THE SAME THING SO THERE'S PREPROSAOUSING STEP WHERE WE ARE LOCALLY REALIGN THE REEDS THEY HAVE MISMATCHES BUT IN FACT, THERE'S SOMETHING HERE IF YOU REALIGN THEM, YOU SEE THERE ARE NO MISTACHS SO WE SEE THIS LOCAL RE ALIGNMENT. SO WE USE THIS CLASSIFICATION TO SAY THERE'S ENOUGH EVIDENCE THAT THAT THERE'S VALIANT INLET TUMOR AND NOT VARIANT IN THE NORMAL, IN THAT CASE WE CALL IT A CANDIDATE SOMATIC MUTATION BUT THEN WE NEED TO HAVE ADDITIONAL FILTERS THAT REMOVE ADDITIONAL MUTATIONS OR FALSE DETPEBGZS THAT ARE CALLED BY MISPLACEMENT OF REEDS OR CORRELATED NOISE AND WE DO THAT BY DIFFERENT FELTERS SO HOW ARE WE DOING, WE HAVE POINT MUTATIONS THAT ARE VALIDATION IS VERY HIGH, ROUGHLY 95%, FOR IN THERE, IT'S NOT THAT HIGH, THEY'RE WAORBGING TO IMPROVE THIS, FOR REARRANGEMENT WE ARE AT 92%. ONE IMPORTANT POINT THAT EACH VALIDATING IS A TRICKY QUESTION. BECAUSE WHAT IS THE TECHNOLOGY THAT USED TO VALIDATE MUTATION? ALL TECHNOLOGIES LIKE SEQUENCE NORMAL, ABI CAPILLARY SEQUENCING, ACTUALLY ARE NOT THAT SENSITIVE TO DETECT THE MUTATIONS THAT WE'RE FINDING BY NEXT GENERATION SEQUENCING SO WE NEED TO VAL CADE USING NEXT GENERATION SEQUENCING USING THE MUTATIONS. SO NOW THAT THE THIRD CHALLENGE OF THE CHARACTERIZATION PHASE IS SENSITIVITY. SO THAT BIG INTO THE DETECT MUTATIONS DEPENDS ON THE COVERAGE, DEPTH OF DEQUENCING AND THE MUTATION--SEQUENCING AND THE MUTATION OF FRACTION. WHAT ARE THE PARAMETERS. NOW DEEP WAS SEQUENCING, HOW MANY REIDS COVER THAT BASE. BUT THE LIMITATION, FRACTION, DEPENDS ON 2 NUMBERS WHICH IS THE PURITY EMPLOYEE, SO WHEN YOU TAKE A SAMPLE, NORMAL CELLS AND TUMOR CELLS, YOU TAKE OUT THE DNA AND MIX THEM IN THE TUBE, SO THE AMOUNT OF OF DNA COMING FROM THE TUMOR DEPENDS OF COURSEOT PURITY. WHAT FRACTION OF CANCER CELLS ARE IN THIS SAMPLE AND ANOTHER PARAMETER THAT DETERPLDS THE AMOUNT OF DNA FROM THE TUMOR CELL SYSTEM WHAT IS THE PHROEUTY, WHAT IS THE AVERAGE DNA MASS IN A TUMOR CELL COMPARED TO A NORMAL CELL. IF THE TUMOR HAS MASS, THEN HAVE YOU A LOT OF THAT MAP, THE OVER ALL MAP FROM THE TUMOR. HERE'S AN EXAMPLE. HERE'S A TUMOR SAMPLE, 2/3RDS TUMOR AND ONE-THIRD NORMAL AND IN THAT REGION OF THE GENOME, GOT 4 COPIES IN TUMOR AND HERE'S CLONAL MUTATION, IN ALL CANCER CELLS AND IT APPEARS IN 1 ALLELE. SO WHAT'S THE ALLELIC FRACTION IN THIS CASE, IT'S 2 ALLELES, MUTATED OUT OF 10 OVERALL ALLELES SO IT'S 20% ALLELIC FRACTION. SO IN CANCER WE SEE MUTATIONS LOWER THAN 20% AND WE DEVELOPED A TOOL CALLED ABSOLUTE TO FIND THIS PURITY AND WE COULD CALCULATE THESE ALLELIC FRACTIONS. SO THESE TOOLS ABSOLUTELY RECEIPTLY PUBLISHED USED MUTATION DATA AND COPY NUMBER DATA TO FIND THESE AND HERE'S THE GENOME, YOU COULD SEE GAINS AND LOSSES THIS, IS AN ALLELE SPECIFIC COPY NUMBER AND YOU CAN COULD SEE, ALLELE, 1, 2, 3, SO BASICALLY THE PROBLEM NEEDS TO SOLVE, IS AN EQUALLY SPACED GRID THAT PEEKS THE SYSTEMS IN THE HISTOGRAMS OF THE DIFFERENT LEVELS, CLEARLY THE MULTIPLE SOLUTIONS THAT COULD FIT THIS GRID, SOME OF THEM MORE LIKELY THAN OTHERS. WE USE THE FIXED DATA AND ALSO PRIOR KNOWLEDGE ABOUT WHAT ARE THE CHROMOSOMES THAT ARE GAINED AND LOST IN A PARTICULAR TUMOR TYPE TO FIT THE ANSWER AND WE ARE ACTUALLY VALIDATING IT VERY WELL, SO THIS--THIS RESULT IS VALIDATED BOTH BY A MIXING EXPERIENCE AND BY DIRECT MEASUREMENTS OF THE PLOITY, AND WE COULD EVALUATE THE SAMPLES. SO ONCE WE CALCULATE THE PURITY AND PLOITY, NOW WE CAN CALIFORNIACULATE EVERY REGION OF THE GENOME IF THERE WAS A POINT MUTATION, WHAT WOULD HAVE BEEN THE ALLELIC FRACTION AND NOW WE COULD USE THESE GRAPHS WHICH ARE THE SENSITIVITY OF POWER LOSS TO SAY IF MUTATION WAS AT .1 FRACTION, WHAT IS THE PROBABILITY YOU DETECT IT WITH 30 X SEQUENCING, WITH 50 X SEQUENCING AND SO WE COULD CALCULATE THIS SENSITIVITY. WE COULD DO THE SAME FOR POINT MUTATIONS, AS TO REARRANGEMENT. SO WE COULD ANSWER NOW, HOW DEEP WE NEED TO SEQUENCE BASED ON THE PURITY OF THE SAMPLE. AND THAT'S A VERY IMPORTANT PARAMETER AND WE DON'T WANT IT TO WRONG. SO 1 POINT THAT THE SENSITIVITY TO CALL IT GERM LINE SNPs ACCIDENT SO SOMEBODY SAYS, YEAH, I CALLED ALL THE GERM LINE IN THE SAMPLE, THAT DOESN'T SAY ANYTHING ABOUT THE SOMATIC MUTATIONS BECAUSE THERE'S NO CANCER CELLS, YOU CALL THE GERM LINE CORRECTLY BUT NOT ANY OF THE SOMATIC MUTATIONS SO THEN WE TOOK THE AVALID AND RELIABLIAN MUTATION THAT WE FOUND PART OF THE PROJECT, WE RAN ABSOLUTE AND WE CONVERTED THESE ALLELIC FRACTIONS TO WHAT WE CALL MULTIPLICITY WHICH IS THE NUMBER OF COPIES PER CANCER CELL OF THE MUTATION, SO AS CAN YOU SEE MANY OF THEM HAVE 1 MUTATION PURCELL, BUT REALLY HALF OF THE MUTATIONS ARE SUBCLONAL, SUBSET OF THE CELLS AND NOW WE HAVE A NEW METRIC, CCF, THE CANCER CELL FRACTION, WHAT IS THE FRACTION FRACTION OF THE CELLS IN THE YOU HADITATION. LOOKING AT THESE, WE ARE SURE THESE ARE REAL MUTATIONS AND AND HERE'S AN EXAMPLE, SAMPLE, THIS IS ABSOLUTE COPY NUMBER YOU SEE 0--IT'S 1 POINT AND WE COULD OVERLAY THE POINT MUTATION IF YOU GO HERE TO THE CKF, FRACTION OF CANCER CELLS WE COULD START TO SEE HISTOGRAM OR KIND OF A CLUSTERS OF BOTH COPY NUMBER CHANGES AND MUTATIONS AT DIFFERENT CANCER CELL FRACTION, THESE ARE SUBCLONES SO THIS SAMPLE HAD THE SUBCLONAL OF 60% OF THE CELLS AND ANOTHER SUBCLONE AT 30% AND ANOTHER AT 20%, AND THE FACT THAT THEY SUNK MORE THAN 1 MEANS THERE HAS TO BE 1 WITHIN THE OTHER SO WE COULD UNDERSTAND THE HISTORY OF THE TUMOR BASED ON THE--IT'S LIKE ARCHEOLOGY BASED ON THE CURRENT STATE OF THE CANCER. SO THIS IS AGAIN THE FLOOD OF SENSITIVITY, THE MUTATION OF CLONAL ALLELIC FRACTION. BUT IF WE WANT TO FIND SUBCLONAL MUTATIONS, WE NEED TO SEQUENCE MUCH DEEPER, HERE'S CONTINUING THIS GRAPH TO MUCH LOWER ALLELIC FRACTION ASSUMING THE MUTATION IS A 20% OF IT. SO YOU SEE, FINDING A MUTATION THAT WOULD OTHERWISE WOULD BE 20% ALLELIC FRACTION DUE TO THE PURITY AND PLOITY OF THE SITE, IF IT WAS ONLY 120% OF THE CELLS WE CAN'T JUST DO WHOLE EXOHMS, 830 X OR 50 X COVERAGE, WE NEED TO SEQUENCE DEEP IF WE WANT TO FIND THE MUTATION. SO NOW WE STARTED TO STUDY THESE SUBCLONAL MUTATIONS ALONG WITH IT, SO WE HAVE A SAMPLE BEFORE AND AFTER TREATMENT OR DIFFERENT STAGES OF THE TIME, OF THE DEVELOPMENT, THIS IS PART OF A PROJECT WITH THE CATHY WU, AND CLL, AND OTHER PROJECTS AS WELL, SO YOU COULD THINK OF 2 SCENARIOS. THE HERE'S A TUMOR. IT GROWS. IT'S QUITE UNIFORM AND THEN YOU TREAT IT AND IT COMES BACK AND YOU EXPECT THAT THERE'S NO RESOLUTION, ALL THE MUTATION YOU FOUND BEFORE, YOU SEE IT LATER IN THE FRACTION. YOU COULD IMAGINE ANOTHER SCENARIO, WHERE YOU--HAD IS A TUMOR, YOU TREAT IT AND A SUBCLONE GROW AND WHAT YOU SEE THEN, THERE'S SOME MUTATION USED TO BE AT THE LOW ALLELIC FRACTION NOW ARE INCREASING THEIR ALLELIC FRACTION AND THAT KIND OF AN EVOLUTION. SO WE SEE THAT, WE SEE BOTH FOR SOMATIC COPY NUMBER, ALTERATIONS AND THERE'S NO EVOLUTION CASE, IT WASOT DIAGONAL, AND WE SEE THE EVOLUTION CASE WHERE THEY'RE INCREASE NOTHING A SECOND TIME POINT. ACTUALLY WE'VE SEEN ENRICHMENT OF DRIVER MUTATION IN THIS INCREASING THE 1S THAT KIND OF HAVE EVOLUTION AND P53, AND WE ACTUALLY DID A MORE SOPHISTICATED, SHOWING A SOPHISTICATED SIGNIFICANT AND VERY INTERESTINGLY AND THIS IS UNPUBLISHED DATA SO WE'RE GOING TO TREAT IT AS SUCH, AND WE SEE THAT IF AT THE TIME 1, THE FIRST TIME, HAVE YOU A SUBCLONAL DRIVER MUTATION, THE TIME TO SECOND THERAPY, THE DISTANCE BETWEEN THE FIRST THERAPY AND SECOND THERAPY IS MUCH SHORTER IN THE SIGNIFICANT WAY, SO THAT'S A CLINICALLY USEFUL PARAMETER IF YOU SEE A PATIENT THAT HAS DRIVER SUBCLONAL MUTATION, YOU SHOULD EXPECT THEM TO OCCUR EARLIER THAN LATER. SO WHAT DO WE SEE ACROSS CANCER AFTER APPLYING ALL THESE TOOLS? THESE ARE THE PEOPLE CALLING THE HAMBURGER PLOT, BUT WHAT YOU SEE IS ACROSS MANY DIFFERENT TUMOR TYPES. EVERY POINT IS A SAMPLE. YOU SEE THE DIFFERENT MUTATION RATES, CANCER HAS MORE THAN 3 ORDERS OF MAGNITUDE DIFFERENT BETWEEN THE HIGH MUTATED CANCERS LIKE MELANOMA AND LUNG TOWARD THE VERY LOW CANCERS LIKE RABATOID AND THYROID AND SARCOMA, ET CETERA, IT'S A HUGE DIFFERENCE IN RATES. WITHIN TUMOR AND IN TUMORS VERY LARGE MUTATION RATE. WHEN YOU LOOK AT THE BASE CHANGES IN DIFFERENT TUMORS, YOU SEE THAT DIFFERENT TUMOR VS DIFFERENT SPECTRA, THE SMOKING 1S VERY CLEAR PATTERN AND YOU SEE THE UV EFFECT OF THE--OF THE BASICALLY THE TRANSITION OF THE UV, AND INTERESTINGLY YOU SEE 1S THAT DON'T HAVE THIS PATTERN AND THEY ALSO HAVE LOW MUTATION RATE. IN THESE 1S, ARE ACTUALLY MELANOMA ON THE BOTTOM OF THE FOOT THAT BASICALLY DIDN'T SEE THE SUN, SO THAT--THE DIFFERENT TYPES OF MELANOMA. YOU COULD EVEN EXPAND THIS SPECTRUM TO LOOKING AT THE BASE BEFORE AND BASE AFTER OF THE MUTATION AND THIS IS WHAT WE CALL THE VANILLA, THE TYPICAL PATTERN WHERE THIS IS OH VALID AND RELIABLIAN CANCER, YOU SEE VERY MUCH HIGHER MUTATION RATE IN CPGs, THESE ARE THE Cs THAT ARE CHANGING TO Ts. THE Cs FOLLOWED BY A G. THIS KIND OF COLUMN OR ROW HERE THIS, IS SPONTANEOUS DEANMATION OF METHALATED CPGs. BUT OTHER TOMBOR--TUMOR TYPES SHOW DIFFERENT PATTERNS AND YOU COULD ORGANIZE THIS WHERE THE MUTATION RATE IS THE RADIUS AND THE SPECTRUM IS KIND OF ANGEL AND YOU COULD CLUSTER THEM AND YOU SEE THAT, MELANOMA HAS ITS OWN PATTERN, LUNG HAS ITS OWN PATTERN AND KIDNEY, AND HERE YOU SEE A PATTERN THAT INVOLVES SOME OF THE HEAD AND NECK, SOME CERVICAL AND COORDINATE WIDE HPV AND ALSO CALL IT WITH SOME THYROID RESPONSE. SO THAT'S SOMETHING INTERESTING. YOU SEE HERE, ESOPHAGEAL AND COLLECTOR AND GASSING, SOME OF THE ESOPHAGEAL SO YOU LEARN WHAT ARE THE CAUSES OF CANCER BY THESE PATTERNS OF MUTATION AND HERE'S KIND OF WHAT WE CALL THE LEG PLOTS OF THESE TUMOR TYPES. YOU SEE HERE THE LONG AND HEAD AND NECK, THIS IS--THIS ROW HERE IS THE HPV SIGNATURE. SIMILAR THING WE SEE THE ARRANGEMENT, FOR EXAMPLE, COLON CANCER, THE SAMYS WITH NO REARRANGEMENT AND SAMPLES WITH MANY REARRANGEMENTS, THERE'S A HUGE RANGE AND WE NEED TO UNDERSTAND WHAT ARE THE CAUSES OF THESE DIFFERENT REARRANGEMENTS AND BETWEEN CANCERS, THEY DON'T HAVE A LOT OF REARRANGEMENT AND HAVE MANY REARRANGEMENT AND PROSTATES OF COURSE MANY REARRANGEMENTS SO WE TEDE TO UNDERSTAND THAT. SO NOW GOING TO THE SECOND FAST, THE INTERPRETATION, LET'S SAY WE KNEW EVERYTHING ABOUT THE SINGLE TUMOR, NOW WE GO ACROSS POPULATION AND ASK, WHAT ARE THE DRIVER GENES VERSIONS THE PASSENGER GENES SO HOW DO WE DO THAT, WE BUILD A MODEL OF BACKGROUND MUTATION AND PROPHECYS AND THEN WE ASK FOR EVERY GENE OR REGION OF THE GENOME OR PATHWAY, DO WE SEE MORE MUTATION THAT IS EXPECTED FROM THIS BACKGROUND MODEL, THE STATISTICAL MODEL AND THOSE THAT CANDIDATE DRIVER EVENTS BUT THEY COULD ALSO BE, REPRESENT INACCURACIES OF THE MOTHER AND WE CONSTANTLY IN THIS LOOP OF TRYING TO UNDERSTAND ARE THESE REAL DRIVER EVENTS AND IT'S NOT, DOESN'T HAVE A BAD, WRONG MOODLE AND WE'RE FIXING THIS KIND OF PSYCHE TOLL BECOME MORE AND MORE ACCURATE. SEWE DEVELOP SEVERAL TOOLS FOR THAT, SO LO GIFTIC COPY ANALYSIS, POINT MUTATION, FINDING THOSE DRIVER EVENT THAT TAKE INTO ACCOUNT THE BACKGROUND RATES OF DIFFERENT SITES AND DIFFERENT GENOMES, AND DEVELOPING TOOLS LIKE NECK SICK, OTHER PLACES THAT DO TOOLS TO DO NETWORK ANALYSIS, FINDING SUBNETWORKS THAT ARE MUTATED MORE THAN EXPECTED BY CHANCE, SO I TALK BRIEFLY ABOUT THIS 1. SO BASICALLY THE IDEA THAT EVERY SAMPLE, YOU LOOK AT EXAMPLE COPY NUMBER GAIN, AND NOW YOU HAVE A SCORE ACROSS ALL THE SAMPLE WHICH IS IS BASICALLY THE FREQUENCING TIMED THE AMPLITUDE OF THESE EVENTS, AND YOU CALCULATE WHAT--UNDER THE HYPOTHESIS THERE'S NO DRIVERS AND THIS COULD HAVE OCCURRED ANYWHERE IN THE GENOME, EQUALLY PROBABLY SO YOU SHUFFLE THEM AND THE DISTRIBUTION FOR THE SCORE, AND NOW YOU CALCULATE THE SIGNIFICANCE FOR MULTIPLE HYPOTHESIS BECAUSE YOU'RE TESTING ALL THESE REGIONS IN THE GENOME AND IDENTIFY THE SIGNIFICANT MUTATED GENES AND THIS IS BASICALLY THE BASIS OF LOGISTIC AND WE ARE BLINDED NOW TO ALL TUMOR TYPES AND THAT'S A STANDARD ANALYSIS. WE NOTICE SOMETHING INTERESTING, EITHER THEY'RE FOCUSED MEANS DISTRIBUTION ROUGHLY OVER 1 OVER THE LENGTH OF THE SIZE OR EITHER WHOLE CHROMOSOME THAT MOVES THE 2 DIFFERENT MECHANISMS THAT CAUSE THESE KIND OF ALTERATION. SO NOW WE COULD COMPUTATIONALLY DISSECT THE DATA, THESE ARE THE COPY NUMBER DATA, RED ARE GAIN, BLUE ARE LOSSES. WE COULD DISSECT THE DATA WE OBSERVE, 1 CAUSED BY FOCAL EVENTS AND 1 CAUSED BY A DIFFERENT PROCESS AND WE DO THE ANALYSIS SEPARATELY FOR EACH 1 OF THEM AND WE DID IN THE OVARIANCE PAPER AND ANY OTHER PAPER, WE FIND SIGNIFICANTLY FOCUS PEAKS OF GAINS AND PEAKS OF [INDISCERNIBLE] AND AMP LIAISON AMPLIFIED--AMPLIFIED CHROMOSOME ARMS. NOW, GOING BACK TO THE GD AKREBS A--CGAC, YOU COULD CLICK HERE, PICK YOUR TUMOR TYPE YOU LIKE AND SEE THAT AUTOMATICALLY, LOGISTIC RUNS OFF EXAMPLE, THIS IS BREAST CANCER, AND YOU SEE THE PEEKS IDENTIFIED WITH THE GENES IN THEM, YOU SEE SICK LINE D1, D2 AND YOU FIND THOSE FOR THE LATEST TCGA DATA AND SUE COULD DO THIS NOW. BASICALLY. SO FOR POINT MUTATION, WE USE A SIMILAR APPROACH, EVERY SAMPLE HAS MUTATIONS IN DIFFERENT GENES WITH DIFFERENTICIZES, WE TALLIED THE MUTATIONS, ACCORDING TO THE TYPE OF MUTATION, THAT ARE TAKE INTO ACCOUNT THE FACT THAT IT'S MORE LIKELY TO SEE THESE MUTATIONS AND OTHERS, ET CETERA,S AND THEN WE GET THE SCORE AND FIND THE SIGNIFICANCE OF THOSE GENES AND THESE ARE MUTATED GENES AND WE APPLIEDED THAT TO GBM, THE FIRST PILOT, THE FIRST TUMOR TYPE IN THE TCGA PROJECT TO LUNG ADENO CARCINOMA TO HEAD AND NECK AND YOU SEE P53 AND MUTATE INDEED ALL CANCERS AND ALL OF THEM HAVE THIS LONG TAIL WHERE IT ENDS AT SOME POINT, NOW IT ENDS AT SOME POINT NOT BECAUSE THERE ARE NO MORE SIGNIFICANTLY MUTATED OR DRIVER GENES, IT'S BECAUSE YOU DIDN'T HAVE STATISTICAL POWER TO FIND THOSE GENES. SO HOW DO WE KNOW THAT, FOR EXAMPLE, IN THE OVARIAN, THIS LIST ENDED AT GENES MUTATED AT 2%. BUT WE KNOW THAT URGTER DOWN THE LIST, MUTATIONS IN BRASE MODEL AND KRASE MODEL AND NRASE MODEL, AT THE HOT SPOTS OF THOSE GENES, SO WE KNOW THE REAL DRIVER THAT JUST ARE NOT AT THE HIGH FREQUENCY ENOUGH TO BE DETECTED BY THIS MECHANISM. SO THEN WE NEED TO CALCULATE WHAT IS THE POWER? SO THIS IS A CALCULATION WE DID BACK IN 2000--BEFORE 2007 AND WE CALCULATED IF WE WANT TO FIND ALL SIGNIFICANT MUTATED GENES THAT ARE MUTATED AT 3% OF THE POPULATION, WITH ASSUMPTION OF THE BACKGROUND RATE OF 1.5 MUTATIONS PER MEGABASE, HOW MANY PATIENTS WOULD WE NEED AND THE AVERAGE GENE SIDE OF 1500 CODING BASES, IT'S ROUGH LOW 500 THIS IS THE BASICIS OF THE 500 TUMOR NORMAL PAIRS IN TUMOR TYPE. HOWEVER, NOW, THAT WE SAW THE SPECTRUM ACROSS CANCER, WE SEE THE DIFFERENT TUMOR VS DIFFERENT BACKGROUND MUTATIONS AND THIS CALCULATION DOESN'T HOLD. FOR LUNG IN MELANOMA SAMPLES, IN ORDER TO HAVE THE SAME POWERFUL GENES MUTATE INDEED 3%, YOU NEED--THOUSANDS OF SAMPLES, NOT 500 TO GET THAT, AND IF YOU WANT TO GO DOWN TO THE 2% OR 1%, YOU NEED 10S OF THOUSANDS OF SAMPLES, SO DON'T--DON'T THROW AWAY YOUR SAMPLES, WE NEED THEM. SO, NOW THE OTHER TRICKS TO INCREASE THE POWER OF THIS TEST, 1 TRICK IS LOOK AT THE POSITION OF THE MUTATION. IF ALL MUTATIONS ARE CLUSTERED, IN 1 PART OF THE GENE THIS, IS UNLIKELY BY CHANCE, SO THIS ADDS ADDITIONAL ABILITY AND WE SAW THAT IN MANY CANCER PROJECTS FOR EXAMPLE, THIS WAS THE MULTIPLE MILE ANOMA. IF YOU DON'T HAVE MEN MUTATIONS BUT THEY FALL IN KNOWN HOT SPOTS IN THE GENE, KNOWN SITES THAT WERE SEEN IN OTHER TUMOR TYPES THAT'S ANOTHER INDICATION THIS IS NOT A RANDOM EVENT SO WE COULD USE THAT TO INCREASE THAT POWER TOW FIND CANCER GENES AND WE COULD ALSO GO TO GENE SETS OR PATHWAYS WHERE EVERY GENE BY ITSELF IS NOT, AND THE WHOLE PATH WAIS MUTATED BECAUSE DIFFERENT PATIENTS MUTATE DIFFERENT GENES IN THAT PATHWAY SO THESE ARE ALL TRICKS THAT WE ACTUALLY ADD INTO MUTSIG TO FIND THESE TYPES OF SIGNATURES IN THE DATA BUT THEN WE ENCOUNTERED A PROBLEM. AS WE--THE DATA SET GREW AND MUTATION RATE ANALYZE BECAME HIGHER WE FOUND SOMETHING PETRESSABLE CUREIAR, WE FOUND ANALYZING 178 LUNG SAMPLES, WE WAX PLIED THIS MUSIC ALGOIT IMPEDIMENTS AND WE FOUND 428 SIGNIFICANT GENES THAT DOESN'T SEEM RIGHT AND MORE OVER, IT WAS PEPPERED WITH 92 OLFACTORY STEPS I DON'T BELIEVE THAT 92 OLFACTORY RECEPTOR ACTUALLY CAUSED CANCER. AND THERE'S ANOTHER INDICATION WAS AT 20% OF THE 83 GENES MORE THAN 4000 AMINO ACID ARE ACTUALLY IN THE LIST LIKE NEW TITAN, 16, 17, 2 AND 3, THIS IS UNLIKELY BY CHANCE, ALSO GENES THAT HAVE VERY LONG FOOT PRINTOT GENOMIC FOOT PRINT THERE, THIS IS UNLIKELY BY CHANCE. TCGA REPORTED THE SECOND MOST COMMON MUTATED GENE IN OVARIAN IS CMBSE3, IT'S NOT A CANCER GENE. IT'S NOT HAPPENING THE BAD RATE AND WHAT CAUSES THIS. WHAT CAUSES THIS IS HETEROGENEITY. SO IF WE ASSUME A SINGLE BAD RATE OF LET'S SAY 3 MUTATIONS OF MEGABASE, ALL THE GENES THAT PASS THIS THRESHOLD WE CALL THEM SIGNIFICANT BUT IN FACT IF THERE ARE 2 TYPES OF GENES, 75% OF THEM, ARE MUTED AT 2%, 2 MUTATIONS PERMEGABASE AND 25 OF THEM, 35% AT 6 MUTATIONS, IT'S STILL 3 MUTATIONS FOR MEGABASE, BUT WE SEE MANY MORE FALSE-POSITIVES. SO WE TEDE TO UNDERSTAND THE HETEROGENEITY, WE ALREADY SAW, IT'S BETWEEN TUMORS WITHIN TUMORS, OF DIFFERENT SPECTRA, BUT ALSO WE KNOW THAT HETEROGENEITY, DUE TO EXPRESSION LEVEL, DIFFERENT GENES, THE LOWER--THE HIGHER THE EXPRESSION LEVEL, HAVE LOWER MUTATION RATE AND THAT IS BECAUSE OF TRANSCRIPTION COUPLE REPAIR MECHANISM THAT REPAIRS MUTATION THAT ACTS MORE ON GENES AT THE HIGHLY EXPRESSED. SO WE TEDE TO TAKE THAT INTO ACCOUNT--NEED TO TAKE THAT INTO ACCOUNT. MORE OVER WE LEARN FROM THE GERM LINE PEOPLE, PARTICULARLY COLLEAGUES THAT ALONG THE GENOME, THERE'S A DIFFERENT RATE OF GERM LINE MUTATION AND THAT ACTUALLY CORRELATES WITH REPLICATION TIMING SO WE TESTED IT ALSO BASED ON SOMATIC MUTATION AND WE SEE THAT INDEED, MUTATION RATES ALONG THE GENOME IS CORRELATED WITH REPLICATION TIMING, THE LATER THE REPLICATION, THE MORE MUTATION THERE ARE AND MANY MODELS FOR IT, 1 OF THEM THE DEPLETION OF NUCLEOTIDES OF THE LATE STAGE OF STALLING OF THE DNA REPLICATION. CSMB3 IS NORMAL COUPLING FROM OVARIAN IS THE LATEST SITE AND THAT'S PROBABLY BECAUSE IT DEFINITELY HAS A HIGHER BACKGROUND RATE AND IF WE CORRECT FOR THAT, WE SEE--SO FIRST OF ALL AND SEE ALL THESE GENES, THE LONG GENES AND THE HIGH AND THE OLFACTORY RECEPTORS, THESE ARE ALL SKEWED TOWARDS REPLICATION AND LOW EXPRESS TIMES SO THESE ARE HIGHER MUTATION RATES SO WHEN WE CORRECT FOR THAT, WE ACTUALLY SEE THAT WE'RE GETTING A CLEAN LIST OF GENES AND ALL THESE RECEPTORS GO AWAY AND HOW THESE ALL GO AWAY, AND WE GET A LIST THAT MAKES SENSE, AND THESE ARE ALL SOME ARE KNOWN, SOME ARE NEW CANCER GENES. SO THE MODEL USED TO BE FLAT, BUT NOW IT LOOKS MORE LIKE THIS, THE DIFFERENT SITE IN THE GENOME BASED ON DIFFERENT PARAMETERS HAVE DIFFERENT MODEL OF REP WELLICATION, OF MUTATION RATE AND WE SUMMARIZE THIS BY THIS HUGES EQUATION THAT TAKES INTO ACCOUNT THE REPLICATION TIMES AND THE PATTERNS OF MUTATION AND OVERALL KIND OF A MASTER EQUATION THATICAL FACTORS THAT FIT AND MUTATION RATE, AND THE NEXT IS ARE WE FINDING NEW GENES ALL THESE STUDIES WILL BE KNOWN GENES IN THE MAP KINASE AND CELL CYCLE AND WE FIND THESE GENES BUT ARE WE FINDING NEW GENES THAT WE NEVER SUSPECTED BEFORE IN CANCER. AND THE ANSWER IS YES, I'LL GIVE YOU SOME EXAMPLES, SO, MULTIPLE MYELOMA, WE STUDIED MULTIPLE MYELOMA, AND MUTATIONS IN IN ARE ALL KIND OF BOUND INTO THIS COUPLED IN THIS BINDING POCKET, 46 C DIDN'T KNOW ANYTHING ABOUT IT BUT IT'S EXPRESSION LEVEL, WITH RNA PROCESSING, OR BASICALLY RNA PROTEIN AND, AND THEREFORE WE EXPECT THAT IT'S INVOLVED IN RNA PROCESS, IN CLM, THE SECOND MOST MUTATED AND SPLICING FACTOR, THAT IS INVOLVED INLET USNP, AND THE THOSE ARE THE DOMAIN IN THE CF3 B 1. AND THIS GENE ACTUALLY WILL SHOW THAT BASED ON SOME TOOL REPORTED GENE ACTUALLY WHEN IT'S MUTATED, THE RATIO OF SPLICED AND UNSPLICED RATIO IS HIGHER IN THE YOU HADITATED FORM. --IN THE MUTATED FORM. BUT I TELL YOU THE TRUTH, WE DON'T KNOW WHAT IT DOES, BUT IT'S MUTATE INDEED 14% OF CLF, AND IT'S A SLICING FACTOR THAT A YEAR BEFORE THIS FINDING NO 1 WOULD SUSPECT RELATED TO CANCER, AND IN FACT, THAT YEAR, 2011 WAS THE YEAR OF THE AND SLICING FACTOR BECAUSE IT WAS FOUND IN MY O DYSPLASIA SYNDROME AND WE FOUND IT IN LUNG ADENO CARCINOMA, AND BASICALLY UTF 1 WHICH IS ALSO IN THIS FAMILY, SO THIS SPLICING MACHINERY IS NOW BEING KIND OF SIGNIFICANTLY ALTERED IN DIFFERENT TUMOR TYPE AND WE NEED TO UNDERSTAND WHAT'S GOING ON AND WHAT ARE THEY DOING. WE DON'T KNOW. ANOTHER EXAMPLE SPOP, SPOP IS A SUBSTRATE BINDING UNIT OF E3, IT'S IT IS HIGHEST RATE OF MUTATION, 13% OF AGGRESSIVE PRIMARY PROSTATE CANCER, THIS IS THE NEW GENE AND THOSE MUTATIONS ARE AGAIN, CO LOCALIZED IN THE SUBSTRATE BINDING POCKET OF THE PROSTATE CANCER. SORRY. INTERESTINGLY, THIS MUTATION ARE ALSO ANTICORRELATED OR NOT--NOT WITH THE KNOWN KIND OF REFUSION GENE TO REARRANGEMENT SO IT SEEMS LIKE IT'S A DIFFERENT SUBTYPE OF POSTASTY CANCER THAT--PROSTATE CANCER THAT HAS THESE MUTATIONS BUT IT'S STILL KIND OF EARLY TO SAY, THAT'S AGAIN, WORK WITH--LED BY LEVI, GARRA WAY AND THAT. ANOTHER INTERESTING FINDING IS HOW SIMPLE A CANCER CAN BE, SO RABATOID CANCER, THESE ARE A CHARTED CANCERS THAT--THAT BASICALLY WHEN YOU LOOK AT THEIR GENOME, THEIR GENOME IS--THESE ARE NORMAL CELLS, THESE ARE TUMOR CELLS, THE COPY NUMBER IS FLAT, MUTATIONS, ARE NEARLY NO MUTATION, .2 MUTATIONS TO MEGABASE, NO MUTATIONS PER SAMPLE, ONLY SMART B-1 IS 1 GENE IN THE SWISS COMPLEX THAT IS BI LEGALICALLY LOST IN ALL THE CASES. SO IT'S BASICALLY A SINGLE GENE CANCER, YOU HAVE TO LOSE THIS CHROMATIN MODELING GENE IN ALL OF THESE CANCERS AND THAT'S THE ONLY THING GOING ON, THERE ARE FEW MUTATIONS THAT ARE DIFFERENT PLACES BUT THEY'RE NOT RECURRING, NOTHING ELSE IS GOING ON IN THIS GENOME, BUT THIS GENE IS MUTATED AND THAT'S--I MEAN AMAZING TO SEE THAT A SINGLE GENE AND PROBABLILY BECAUSE IT'S RELATED TO SOME DEVELOPMENTAL STAGE IT HAPPENS, CAUSES THIS RARE CHILDHOOD CANCER. THE LAST OPEN QUESTION WE HAVE WHICH I CALL THE DARK MATTER OF CANCER DENOPIC SYSTEM IF WE LOOK AT ALL OUR PATIENTS AND WE LOOK AT ALL THE DRIVER GENES EITHER BY MUTATION GENES OR COPY NUMBER CHANGES AND WE TELL THEM, IT'S LUNG CANCER. WE SEE BISQUELY IN 38,% OF THE PATIENTS WE DON'T SEE ANY EVENT IN ANY DRIVER GENE SO WHAT'S CAUSING CANCER IN THIRD OF THE LUNG CANCERS? WE DON'T KNOW. SO EITHER THIS IS METHYLATION EPIGENETIC CHANGES THAT WE DIDN'T [INDISCERNIBLE] SO MUTATIONS IN REGULATORY REGIONS OF GENES THAT THIS IS EXOMEDATA WE DIDN'T FIND. OR ANOTHER THEORY THAT MAYBE THERE ARE MORE CANCER GENES BUT ARE MUTATED AT VERY, VERY, LOW PERCENTAGES THAT IF WE KNEW ALL OF THEM WE COULD FILL THE DARK MATTER BUT THERE IS A HOLE IN DIFFERENT TUMOR TYPES, THIS WHOLE IS A DIFFERENT SLIDE BUT THERE'S A LACK OF HAPPENING OF A THIRD OR HALF THE CANCERS THAT WE DON'T KNOW WHAT CAUSES THOSE CANCERS AND THAT'S A BIG HOLE THAT WE NEED TO FILL. SO JUST TO SUMMARIZE CHALLENGES SO FAR. WE HAD 2 BIG TASKS SO THERE'S MATURATION SEQUENCING, ABILITY TO DO, TO CHARACTERIZE THE GENOMES, WE HAD THE FLOOD OF DATA WE DEVELOP FIRE HOSE INFRASTRUCTURE AND PIPELINES, THERE'S A SPECIFICITY, AND RARE, RARE, EVENTS, SENSITIVITY DEPENDS ON THE COVERAGE AND ALLELIC FRACTION AND THERE--THE IDEA OF FINDING SUBCLONAL MUTATION AND THEN INTERPRETATION, WE NEED TO MODEL THE BACKGROUND MUTATION RATE, IN OTHER TOOLS. WE COULD INCREASE POWER BY LOOKING IT, USING PRIOR KNOWLEDGE, AT SITES OR PATHWAYS THERE'S VALUABLE MUTATION RATE ALONG THE GENOME WE NEED TO MODEL AND THERE'S THIS DARK MATTER OF CANCER DENATIONAL LIBRARY OF MEDICINICS AND WE'RE FINDING NEW GENES, LIKE 3 B 1, OR MELANOMA OR STOPPING PROCESS, SO NOT JUST GENES BUT NEW--NEW PATHWAYS THAT WE'RE NOT SUSPECTED BEFORE. SO WHAT ARE THE REMAINING CHALLENGES AND OPPORTUNITIES. SO LET'S SAY WE DO THAT, WE RUN TCGA AND OTHER PROJECT AND WE KNOW ALL THE KEY, WE HAVE A LIST OF KEY DRIVERS FOR EVERY TUMOR TYPE. SO THIS IS KIND OF THE TYPE OF ANALYSIS. BUT ARE THERE REALLY DRIVERS? SO HOW CAN WE SHOW EXPERIMENTALLY, THESE ARE CALLED CANCER, THIS CAME OUT OF THE ALGORITHMS, WE NEED TO VALIDATE THEM EXPERIMENTALLY AND UNDERSTAND WHAT HALLMARKS OF CANCER DO THEY ACTUALLY SERVE. SO 1 WAY TO DO IT IS COMPUTATIONALLY, LIKE WE DID FOR THE C, LOOKED AT EXPRESSION LEVEL ACROSS SAMPLE, COMPARED TO OTHER GENES, AND START TO UNDERSTAND WHETHER THE RULES OF THESE GENES AND AS MANY EXPERIMENTS WE COULD DO, AND OTHER EFFORTS THAT YOU ARE ALL ARE AWARE OF AND TRY TO UNDERSTAND WHAT IS THE CONSEQUENCE OF THESE MUTATIONS AND WHAT HALLMARKS OF CANCER, LIKE THIS KIND OF A HALLMARK DO THOSE MUTATION SERVE. THEN THE NEXT CHALLENGE, EVEN IF WE KNOW THE CONTEXT AND WE KNOW WHAT THE GENES DO, NOW, HOW DO WE KIND OF INTERVENE WITH THE FUNCTION OF--THE VALLENERABILITY OF THE CELL AND HAVE PARTICULAR MUTATION, AND CAN WE MODAND HE WILL PREDICT THE AFFECT OF DIFFERENT ACTIONS IF WE GIVE THIS DRUG OR THIS CELL, WILL IT KILL THE CELL? MUTATIONS THAT DEVELOP, HOW TO TUMORS EVOLVE WITH THE SUBCLONALLITY GOING SO HOW DO SUBCLONES INTERACT WITH EACH OTHER, ALL THESE ARE STILL QUESTIONS THAT WE NEED, WE NEED TO SOLVE. THIS IS ALL IN THE RESEARCH, BUT THE KIND OF PARALLEL EFFORT IS TO MOVE ALL THIS KNOWLEDGE INTO THE CLINIC. SO HERE'S THE PIPELINE FOR RESEARCH. IN CLINIC, THE CHARACTERIZATION PHASE IS QUITE SIMILAR, IT NEEDS TO WORK FROM SFPE, SMALLER AMOUNT OF DNA BUT WE COULD DO THAT, WE COULD ONLY SHOW THAT WE COULD DO THAT SO THAT KIND OF WORKS. NEXT STEP IS ANNOTATION, ONCE WE FIND A MUTATION, WE NEED TO SAY, YEAH THIS, MUTATION IN CHROMOSOME 7 IS A BRASE MODEL, IT'S SIGNIFICANT AND YET IT MUTATE INDEED MELANOMA, IT'S AN ACTIVATING MUTATION AND WE KNOW THAT IT'S SENSITIVE TO PLEXICO, SO WE ANNOTATE ALL THESE TOWARD THE EVENT. AND THEN WE NEED TO MOVE TO CLINICAL DECISION PHASE, SO BASE AUDIO THE PATIENT DATA AND THE TUMOR CONTEXT AND THE--AND ALL THE POSSIBLE ACTION, WHAT'S THE NEXT KIND OF TREATMENT WE NEED TO GIVE THE PATIENT? WE NEED TO HAVE DYNAMIC CLINICAL TRIALS, BECAUSE WE LEARN, LEARN THE RULEs AS THEY GO AND APPLY TO NEXT PATIENTS BECAUSE THERE ARE TOO MANY RULES TO LEARN AND WE CAN'T WAIT UNTIL ALL THIS TRIES TO END AND THEN DO ANOTHER 1 IT WILL BE TOO SLOW. THERE'S COMBINATION THERAPY. WE NEED TO HIT MULTIPLE PLACES TO ACTUALLY BLOCK THE PATHWAY THAT ARE ALTERED. SO, WE KNOW IT WAS FOR SFB. WE ARE BUILDING A CANCER ACTION DATABASE WAOU WE NEED A NATIONAL EFFORT AND THERE'S 1 GOING ON TO BASICALLY MAKE A DATABASE OF ALL MUTATIONS AND WHAT ARE THE POTENTIAL ACTIONS THAT WE COULD TAKE FOR EXAMPLE, IN THE CASE OF THIS, AND ET CETERA, AND THE NEXT THING IS TO BUILD REALLY THE SYSTEM THAT IS AN INTERACTIVE SYSTEM THAT HELPS CLINICIANS MAKE DECISIONS. SO JUST KEYBOARDED OF THE BIG PICTURE, MEDICINE IN GENERAL, NOT JUST CANCER, CANC E-PRESCRIBING TREATMENT IS BECOMING AN INFORMATION RICH DISCIPLINE. SO, IT'S NOT JUST MEASURING A FEW MEASUREMENT, WE GET TERRA BITES OR GIGA BITES OF PATIENTS AND IMAGE DATA AND PROTEOMIC DATA AND ELECTRONICCAL MEDICAL RECORDS ALL THESE DATA NEED TO BE INTEGRATED TO MAKING THE NEXT DECISION SO HERE'S AN EXAMPLE THIS IS PREBIRTH CONDITION AND AT SOME POINT, MAYBE THEY HAVE WHOLE GENOME SEQUENCE AT BIRTH. AND X-RAY AND AT SOME POINT THEY HAVE CANCER SO THEY HAVE A PWOPSY SEQUENCE AND DNA AND RNA AND METHYLATION, AND THEN BEING TREATED BUT THEN YOU MONITOR IT BY CIRCULATING TUMOR CELLS AND TREATMENT, OTHER MEASUREMENTS, AND THEN YOU YOU HAVE THE RELAPSE SO YOU EMERGE ANDY SEQUENCE THE RELAPSE AND THEN YOU DECIDE WHAT TO DO AND YOU INTEGRATE ALL THESE DATA TOGETHER, TO MAKE THE DECISION FOR THAT PATIENT AND THAT'S A CHALLENGE THAT WE FEED TO ADDRESS BUT AS WE GET MORE DATA WE SUFFER FROM THE CURSE OF DEMENTIONALLITY, IT PRAG MENTORSHIP SKILLS THE PATIENT, EVERY PATIENT IS DIFFERENT ALL THESE DIFFERENT MUTATIONS AND WHAT ARE THE IMPORTANT AND RELEVANT PARAMETERS THAT'S IMPORTANT PART WE NEED TO FIGURE OUT SO EACH PATIENT IS MODELED BASED ON HISTORY, VERY HIGH AS WE MENTION THE SPACE AND LOOKING FOR RARE CASES, SO COMPLETE RESPONSES, SO, WE'RE LOOKING FOR RARE EVENTS IN HIGH DIMENSION OF SPACE. COMPUTATIONALLY THIS IS NOT AN EASY PROBLEM BECAUSE YOU HAVE TO LOOK FOR RARE EVENTS AND YOU NEED MANY POINTS IN THE SPACE TO LEARN THOSE RULES SO THAT'S WHY WE NEED A NATIONAL AND MAYBE A GLOBAL REPOSITORY OF DATA WE COULD LEARN THESE RULES, SO WE NEED TO BUILD A SYSTEM THAT CAN HANDLE MILLIONS OF PATIENTS, OR DATA FROM MILLION SAMPLES. THIS IS ANOTHER SOURCE. WE NEED TO BUILD A GLOBAL AND SECURE SYSTEM THAT SHARE DATA, QUERY AND ANALYZE AND USED IN THE CLINIC SO WE CAN ACT ON THESE FINDING RAPIDLY, SO FINE'S I NOT AGAIN THANK ALL THE PEOPLE THESE PROJECTS AND MY KIND OF COLLABORATORS LIKE MATTHEW AND LEVI AND ERIC AND STACIE AND CATHY AND MANY OTHER COLLABORATORS THAT THESE ARE ALL INVOLVED AND OF COURSE, MANY SOFTWARE ENGINEERS AND BUILD ALL THESE TYPE LINES AND TOOLS WHICH HANDLE ALL THIS AMOUNT OF DATA, AND SO NOW I'M HAPPY TO TAKE ALL THESE QUESTIONS. [ APPLAUSE ] YES, FRANK? >> SO HOW DO YOU ACCESS ALL THESE GENOME SEQUENCE SAMPLES--[INDISCERNIBLE]--Micro RNA, ALL THOSE AND CAPTURED A PARTICULAR VERSION OF THAT. DOT WHOLE THING OVER AGAIN AND THE WHOLE GENOME SEQUENCING AND SEE WHAT'S MISSED? >> IT'S A COMP LITICATED QUESTION SO YES, I THINK WHOLE GENOME SEQUENCING WILL HELP, HOWEVER, AS YOU SAW, YOU NEED DEEP SEQUENCING YOU KNOW FOR THE MUTATION, SO, YOU REALLY NEED TO DO, HUNDRED OR 150 X WHOLE GENOME SEQUENCING AND THAT'S STILL EXPENSIVE AND EXPENSIVE EXPERIMENT, SO, SO THAT IS 1 THING AND I DO BELIEVE THAT WE WILL NEED TO GO BACK AND SEQUENCE THOSE, SO I DON'T THINK TCGA ORIGINAL SIMILAR PROJECTS, THAT'S IT, WE'RE DONE DONE FOREVER AND GOING HOME, THESE ARE BETTER TECHNOLOGY AND EPIY GENETICS SO SEQUENCING TO UNDERSTAND THE KIND OF--AND OTHER EPIGENETIC MODELS, WE NEED TO REALLY COMPLETE THE PICTURE WITH THE DARK MATTER SO I DON'T THINK IT'S THE END OF THE TORY BUT WE'RE GETTING CLOSER AND CLOSER AND HAVING SOME SAMPLES OF WHOLE GENOME SEQUENCING, IT'S FINE OR NONCODING REGIONS THAT ARE IMPORTANT. I KNOW OF MAYBE 1 OR 2 EXAMPLES. AND THEY MIGHT BE FUNCTIONAL BUT IT'S NOT A COMMON--IT'S NOT A COMMON THEME THAT WE SEE AND WE COULD SAVE IT TO WITH THE EXOME PLUS AND PLUS SOME REGULATORY REGIONS AND SO, THERE'S STILL A GAP BETWEEN THE EXHOME OR SOMETHING LIKE THAT SO THERE'S STILL A WAY TO INCREASE THE EXOME TO CLEAR THE REGION BUT HAVING STUDIES WITH WHOLE GENOMES, DEEP WHOLE GENOMES WOULD BE INTERESTING TO DO. STEPHEN? >> [INAUDIBLE QUESTION FROM AUDIENCE ] >> SO FOR EXAMPLE, THE FIRST EXOHM OF RB-1 IS A GC RICH EXOME, SO THERE ARE NO READS FROM THAT. SO WE ARE BLIND TO THE FIRST OF RB-1. WE COULD BE MISSING MUTATION THIS IS CHANGING THE BUFFER AND DIFFERENT SKIN CONDITION AND GO INTO LESS PC R, THE PC R CONDITIONS THAT BASICALLY LOSE, REACH AND REGIONS THERE IS AN EFFORT TO COMPLETE THE GENOME AND WE WANT THE TO COMPLETE THESE PARTS THAT WE'RE INTERESTED IN. SO, BUT, WE MIGHT NEED TO MULTIPLE APPROACHES BY DOING 2 CAPTURES OR SEQUENCING DIFFERENT CONDITION THAT WOULD BE MORE FAVORABLE OF IGC VERSES LOW GC AND THERE COULD BE OTHER WAYS TO COMPLETE THE GENOME. SO THE EFFORTS IN DOING THAT AND I ALREADY KNOW THAT FOR WHOLE GENOME SEQUENCING WE'RE DOING MUCH BETTER THAN WE USED TO DO WHERE YOU SEE THESE GRAPHSOT COVERAGE AND THE FUNCTION OF GC CONTENT AND USUALLY THEY WOULD DOFF LIKE THIS AND NOW THEY KIND OF DROP DROP BUT THEY DON'T REACH 0, THEY DROP MAYBE 3 FOLD OR SOMETHING LIKE THAT BUT NOT 20 FOLD OR 50 FOLD AND THAT'S--THAT'S A BIG DIFFERENCE AND THERE'S WORK GOING ON IN THAT DIRECTION. REGARDING THE ENCODE DATA WE ARE PLANNING TO DEVELOP KIND OF AN EXTENDED EXOHM CAPTURE REAGENT THAT CONTAINS NONCODING REGIONS BASED ON REGULATORY REGIONS FROM THE NCODE DATA AND YOU SEE TRACKS THAT CONTAIN ALL THESE DIFFERENT SITES, THAT AND THEN BASICALLY CAPTURE LARGER FRACTION OF THE GENOME AND MIGHT SEAL THOSE GAPS BY CAPTURE TECHNOLOGY. SO WE ARE GOING TOWARDS BOTH ANSWERING BOTH THESE QUESTIONS BUT BUT MAYBE THE END WOULD BE DEEP WHOLE GENOME SEQUENCING OF SAMPLES BUT THIS IS STILL NOT FEASIBLE KIND OF I THINK IN A FEW CASES IT IS POSSIBLE. YEAH. >> [INDISCERNIBLE] SOUTH AMERICA SO PERCENTAGE OF TOTAL READ TO NOT ALIGN REFERENCE GENOME. LIKE FORGET THE DETAIL OF THAT BUT IT'S ROUGHLY THE PERCENT BUT THERE IS EAGERLY WAITING FOR THE NEXT VERSION OF THE HUMAN GENOME SO WE KNOW THAT MANY OF OUR MISTAKES THAT WE MENTIONED ABOUT AND WE HAVE THE BUILD AND WE TRY TO PLACE IT IN THE BEST PLACE WE CAN, BUT WE DON'T KNOW OF SOME OTHER AREAS IN THE GENOME AND WE ALREADY KNOW THAT WE HAVE IN THE PAROF THE THOUSAND GENOMES, THERE'S THIS DECOY SEQUENCES THAT IF WE WERE TO ALIGN IN AN EXTENDED GENOME WE WOULD NOT MAKE THIS ALIGNMENT ERRORS WHICH CAUSED FALSE-POSITIVE, EAGERLY AWAITING THE GENOME THAT WE COULD ALIGN TOO, SO TCGA IS ALL WORK NOTHING THE 19 BUT WE--I HEARD THAT NEXT WE'RE WE WILL HAVE AG 20 OR BASICALLY OR BUILD 38 AND I THINK THAT WOULD IMPROVE A LOT, THE AHRO EUPBMENT TO THE GENOME, SO, I MEAN IF WE COULD--IF WE COULD MAKE IT EARLIER THAT WOULD BE BETTER AND THEREFORE PEOPLE INVOLVED IN THAT, THAT'S CRUCIAL TO DO. >> YOU MENTIONED COPY NUMBERS NUMBERS AND PATHWAYS AND THAT WOULD SAVE 2 OR 3% OF THE PRESENTATION AND YOU HAVE THIS IN THE GENE BUT AT THE END OF THE DAY, THE NEW THINGS EASILY FIT INTO A MODEL IN WHICH YOU ARE HAVING QUANTITATIVE CHANGES THAT DISRUPT NETWORK BEHAVIOR AND THAT REALLY IS THE CRITICAL ACTIVITY AND IN THE END WHEN YOU GO TO YOUR CLINICAL ISSUES, THAT'S REALLY WHERE THE VALUE IS GOING TO BE, WHICH IS TO REBALANCE THOSE QUANTITATIVE EFFECTS AND DO IT IN A WAY IN WHICH IT'S NOT EASY TO ESCAPE, SO COULD YOU TALK A LITTLE BIT MORE ABOUT HOW YOU'RE APPROACHING THOSE MORE QUANTITATIVE CHANGES EVEN THOUGH YOU DON'T FIND REGULATORY IN YOUITATIONS. >> SO, YES, SO FIRST OF ALL WE'RE WORKING ON TOOLS THAT INTEGRATE, SO, JUST THINK COPY NUMBER, WE ARE NOW WORKING ON TWOF THE INTEGRATE ALL OF THEM BECAUSE I DON'T REALLY CARE OF IF THE GENE IS AMPLIFIED OR MUTATED. IT'S ALTERED I WANT TO CAPTURE ALL ALTERATIONS IN THE GENE SO WE'RE WORKING TO 1 TOOL THAT DOES SIGNIFICANT FOR CALCULATION FOR GENES AND PATHWAYS INCLUDE ALLING ALL THE CHANGES SHOW IS 1 PART OF YOUR QUESTION. THE OTHER PART IT'S COMPLICATED AND WE--WE'RE USING PROTEIN, PROTEIN INTERACTION THAT WAS PSYCHIATRICS AND OTHER DATABASES TO BUILD THIS GRAPH AND WE'RE LOOKING FOR REGIONS IN THE GRAPHS THAT HAVE MORE MUTATIONITANCE YOU KP-BGD BY CHANCE BUT THESE ARE THE COMPLICATED PROBLEMS BECAUSE YOU COULD CUT THE GRAPH IN MANY DIFFERENT WAYS SO YOU NEED TO DO IT IN AN EFFICIENT WAY THAT DOESN'T GENERATE TOO MANY HYPOTHESIS BECAUSE TAKEN YOU'RE LIKE WITH MOST HYPOTHESIS SO EVERYTHING I SHOWED I DIDN'T GO TO THE DETAILS OF THE ANALYSIS BUT IT'S FULLY KIND OF A REGULAR STATISTICALLY WITH PVALUES AND Q-VALUES CALCULATION AND NOW WE ALSO HAVE VERSIONS OF THIS APREACH SO ALL OF THIS IS UNDER ALL THE STATISTICALLY KIND OF SOUND. SO WHAT YOU MENTIONED LAST IS ABOUT THE CLINICAL ASPECT IS YEAH, WE DON'T YET NO. I MEAN ONCE WE FIND A MUTATION IN A CLINICAL SAMPLE AND NOT--WHEN YOU DO NOTATIONS WE DON'T KNOW ANYTHING ABOUT THIS IS A MUTATION, WE DON'T KNOW ANYTHING ABOUT IT, MAYBE WE CHRONIC LIVER DISEASE PLACE IT IN THE PATHWAY BUT MAYBE NOT AND THIS IS KIND OF A CHALLENGE THAT WE TEDE TO ADDRESS, MAYBE, YOU KNOW LOOKING AT NUMBER OF HOPS AWAY IN THE GRAPH, WELL KNOWN M-GENES THESE ARE THE TOOLS, MATHEMATICAL TOOLS WE HAVE BUT SINCE THOSE GRAPHS ARE NOT PERFECT, I THINK THERE'S STILL MORE KIND OF RESEARCH INVESTIGATING WHAT ARE THE RELEVANT PATHWAYS AND WHAT ARE THE RELEVANT GENES THAT WE DON'T KNOW EVERYTHING BUT WE'RE GETTING INTO MAYBE WE COULD EXPLAIN, YOU KNOW, YOU COULD LOOK AT THE HALF FULL OF THE--THEY'RE--THE FULL HALF OF THE GLASS WE DON'T KNOW HOW TO EXPLAIN MAYBE IT THERE BUT WE DO KNOW HOW TO EXPLAIN 2/3RDS OF THE CANCER SO THAT'S THATIA A BIG FRACTION AND IF WE HAD TARGETED THERAPY AGAINST THE COMP BINNATION OF THESE EVENTS MAYBE WE COULD TREAT A REASONABLE FRACTION OF THE REPRESENTATION SO THAT'S THE GOOD SIDE. BUT I DON'T THINK WE'RE CLOSE TO FULLY UNDERSTANDING CANCER IN THE NEXT, YOU KNOW WHEN TCGA ENDS. YEAH. >> AGAIN, FOLLOWING ON RON'S POINT, IT SEEMS TO ME, WHEN WE FIND A MUTATION WHAT IT'S TELLING US MORE THAN ANYTHING ELSE IS THAT THE PATHWAY, WAS IMPORTANT AT SPECIAL POINT AND THEN ACTUALLY THERE CAN BE TUMORS THAT DO NOT HAVE A YOU HADITATION IN THAT PATHWAY, THAT RELY ON THAT AND THEN, IT SEEMS THAT SOME OF THEM ACQUIRE A MUTATION THAT'S SORT OF HARDWARE THAT PATHWAY LOSE DEPENDENCEOT MICROENVIRONMENT, OTHER ISSUES BUT IT'S OFTEN STARTS OUT AS LIKE A CYTOKINE, AUTOCRIB PATHWAY OR PERO CRIB PATHWAY FROM THE MICROENVIRONMENT, AND THEN IT'S--THAT'S THE PATHWAY THAT THE CANCER THEN CHOOSES TO FIX IN PLACE. >> THAT'S EXACTLY RIGHT, SO THIS--I MAYBE DID IT TOO FAST BUT THIS SAVES OF THE SIGNIFICANCE IN THAT AND IT STARTS TO ADDRESS THIS QUESTION. WE'RE LOOKING FOR PATHWAYS SO FIRST OF ALL WE COULD--YOU COULD THINK OF IT AS THE STAGE KIND OF DESIGN BUT FIRST OF ALL WE FIND GENES THAT ON THEIR OWN ARE SIGNIFICANT SO MID 88, IS SIGNIFICANT ON ITS SO NOW WE COULD TEST ALL THE PATHWAYS THAT INCLUDE KNOWN DRIVER GENES FOR SIGNIFICANCE SO WE DON'T NEED TO TEST ALL THE PATHWAYS, WE JUST NEED TO TEST A FEW THAT HELPS US BECAUSE IT'S FEWER HYPOTHESIS. THEN ONCE WE FIND SOME OF THEM, LOOKING AT OTHER GENES IN THOSE PATHWAYS, OTHER--DO THEY HAVE MORE MUTATIONS IN OTHER GENES, COMPARED TO WHAT YOU EXPECT BY CHANCE, SO THAT'S THE NEXT QUESTION, THEN WE COULD TAKE PATHWAYS THAT NONE OF THE GENES ARE SIGNIFICANT AND THEN SEE OTHERS THESE PATHWAYS SIGNIFICANT MORE THAN EXPECTED BY CHANCE SO THAT THE HID STAGE OF THIS TEST AND THAT'S BASICALLY THE STAGE KIND OF DESIGN. >> THE LAST STAGE I WANT TO REITERATE THAT THE LAST STAGE SEEMS LIKE IT COULD BE A TUMOR THAT ARE QUASI NORMAL. [INDISCERNIBLE] SURELY GENETIC DISEASE--[INDISCERNIBLE]. >> RIGHT, I TOTAL AGROW, AND MOTHER CELLS BY DAUGHTER CELLS MULTIPLE WAYS THROUGH EPIY GENETIC CHANGES OR THROUGH--SINCE NOT LIKE IN EVOLUTION, ACTUALLY PROTEINS, FROM THE--FROM THE OTHER SELL GO INTO THE DAUGHTER CELLS YOU COULD HAVE KIND OF A TRIAL LIKE TRANSFER OF KNOWLEDGE FROM OTHER CELLS TO THE OTHER CELLS, THAT WOULD BE A VERY NICE FINDING TO FIND SUCH A THING AND THAT KIND OF--I BET YOU COULD GET AN OPEN PRIZE. YEAH. --NOBEL PRIZE, YEAH. >> OKAY BEFORE WE AWARD THE NOBEL PRIZE 1 LAST QUESTION ABOUT THE DARK MATTER, SO CAN YOU AT LEAST HAVE A RESEARCH PROPOSITION, ASSESS HOW MUCH OF THAT DARK MATTER IS REGULATORY IN 1 WAY OR ANOTHER BY MAKING THE CONJECTURE THAT IT'LL BE REFLECTED IN THIS ALTERATIONS IN RNA. YOU WORK WITH THE RNA SIDE AS WELL AS DNA SIDE RIGHT NOW. SO 1 WOULD PREDICT THAT WHETHER IT'S METHYLATION OR WHETHER IT'S MUTATION AND REGULATORY MODULE, OR WHETHER IT BE EVEN SOME EXOTIC REGULATORY RNA THAT IF IT'S GOING TO AFFECT OUTPUT FROM A NUCLEUS OUT, IT WILL PROBABLY BE REFLECT INDEED SOME RNA LEVEL. >> ABSOLUTELY RIGHT. THE REASON IT'S EASIER TO START WITH DNA THAN RNA--FIRST OF ALL IT'S ABSOLUTELY RIGHT. IF YOU SEE FOR EXAMPLE, CDK2 A, YOU SEE IT METHALATED, SOMETIME VS LOW EXPRESSION, YOU DON'T KNOW WHY IT HAS LOW EXPRESSION BUT YOU COULD INTEGRATE THAT INTO THE MODEL. THE COMPLEXITY OF USING RNA, THAT'S THE REASON IT COMES KIND OF LATER IN THE GAME, IS THAT YOU DON'T HAVE A CONTROL AS HAVE YOU FOR DNA WHICH IS MUTATED IN THE RNA, YOU DON'T KNOW THE CELL OF ORIGIN, YOU DON'T KNOW WHAT'S NORMAL IN THAT STATE, SO IT'S DIFFICULT TO KNOW THAT IT'S LOW, BUT BY COMPARING TO MULTIPLE TUMORS, YOU CAN SEE IF THE EXPRESSION OF A GENE LOOKS LIKE THE CASE ISS WHERE DO YOU KNOW, YOU SEE SOME STOP CO DONES, AND YOU SEE ANOTHER EXAMPLE HAS A LOW EXPRESSION BUT I DON'T KNOW WHY, YOUAN OITATE IT AND POTENTIALLY HARBORING SOME EVENT, YOU DON'T KNOW WHAT IT IS AND THEN DOING THE STATISTICS ON THAT, IT'S A GOOD IDEA, WE SHOULD DO THAT. I MEAN IT'S A--IT'S A-- >> THAT'S WHEY WANT TO SEE AND PLAY WITH, WE CAN TALK MORE ABOUT IT LATER. >> OKAY. >> ARE THERE ANY OTHER BURNING QUESTION SYMPATHETIC NERVOUS SYSTEM. --BURNING QUESTIONS. OKAY, NO OTHER BURNING QUESTIONS. THANK YOU SO MUCH. [ APPLAUSE ]