>> GOOD AFTERNOON. WELCOME TO NIH MEDICINE MIND THE GAP SEMINAR SERIES. THIS SEMINAR SERIES EXPLORES RESEARCH DESIGN, MEASUREMENT, INTERVENTION, DATA ANALYSIS AND OTHER METHODS OF INTEREST TO PREVENTION SCIENCE. OUR GOAL IS TO ENGAGE IN THE PREVENTION RESEARCH COMMUNITY AND THOUGHT-PROVOKING DISCUSSIONS TO PROMOTE THE BEST USE OF -- TO PROMOTE THE USE OF THE BEST AVAILABLE METHODS AND PREVENTION RESEARCH. AND TO SUPPORT THE DEVELOPMENT OF BETTER METHODS. BEFORE WE BEGIN, I HAVE SOME HOUSEKEEPING ITEMS. TO PARTICIPATE BY TWITTER, FOLLOW US AT@NIHPREVENT, HASHTAG NIH -- PREVENTION@MAIL.NIH.GOV. THERE'S ALSO A LINK TO THE FEEDBACK FORM AT THE BOTTOM OF THE VIDEOCAST PAGE, WHERE YOU CAN SUBMIT QUESTIONS DURING THE TALK. AT THE CONCLUSION OF TODAY'S TALK, WE WILL OPEN THE FLOOR TO QUESTIONS THAT HAVE BEEN SUBMITTED VIA EMAIL AND TWITTER. LASTLY, WE WOULD APPRECIATE YOUR FEEDBACK ABOUT TODAY'S SEMINAR. FOLLOWING TODAY'S TALK, PLEASE VISIT THE ODP WEBSITE AT PREVENTION.NIH.GOV/MINDTHEGAP AND CLICK THE LINK TO THE SEMINAR EVALUATION UNDER THE RESOURCES SECTION. YOUR INSIGHTS WILL HELP US TO CONTINUE TO IMPROVE THIS SEMINAR SERIES. AT THIS TIME, I'D LIKE TO TURN THINGS OVER TO DR. DAVID MURRAY, ASSOCIATE DIRECTOR FOR PREVENTION AND DIRECTOR OF THE OFFICE OF DISEASE PREVENTION. >> THANK YOU. WE'RE VERY PLEASED TODAY TO HAVE DR. PHIL BOURNE WITH US. HE IS THE ASSOCIATE DIRECTOR FOR DATA SCIENCE HERE AT THE NATIONAL INSTITUTES OF HEALTH AND IN THIS ROLE, HE SETS A VISION FOR AND OVERSEES DATA SCIENCE INCLUDING THE SHARING AND SUSTAINABILITY OF DATA AND TOOLS. DR. BOURNE'S OWN RESEARCH FOCUSES ON BIOLOGICAL AND EDUCATIONAL OUTCOMES DERIVED FROM COMPUTATION AND SCHOLARLY COMMUNICATION THROUGH ALGORITHMS, TEXT LINING, MACHINE LEARNING, META ANALYSES, BIOLOGICAL DATABASES AND VISUALIZATION. BEFORE COMING TO THE NIH, DR. BOURNE WAS AT THE UNIVERSITY OF CALIFORNIA SAN DIEGO AND, IN FACT, HE'S NOT VERY FAR FROM THAT CAMPUS TODAY SINCE HE'S COMING TO US FROM SAN DIEGO. HE ALSO COFOUNDED FOUR COMPANIES, COFOUNDED, WAS THE FIRST EDITOR-IN-CHIEF OF THE OPEN ACCESS JOURNAL PLOS COMPUTATIONAL BIOLOGY AND SERVED AS PRESIDENT OF THE INTERNATIONAL SOCIETY FOR COMPUTATION KNOLL BIOLOGY. THE INTERNATIONAL SOCIETY FOR COMPUTATIONAL BIOLOGY, AND THE AMERICAN MEDICAL INFORMATICS ASSOCIATION. AT THIS TIME, I WOULD LIKE TO WELCOME DR. BOURNE AND TURN THE PRESENTATION OVER TO HIM. >> THANK YOU, DAVID, AND GOOD MORNING, GOOD AFTERNOON, GOOD EVENING TO EVERYONE WHEREVER YOU HAPPEN TO BE. IT'S A PLEASURE TO GIVE THIS PRESENTATION TO YOU. I'M GOING TO TALK A LITTLE ABOUT THE NOTION OF BIG DATA AND HOW HOPEFULLY IT RELATES TO YOUR PARTICULAR INTEREST IN DISEASE PREVENTION AND PROMOTING BETTER HEALTH. SO THE AGENDA IS QUITE SIMPLE, REALLY. WE'LL LOOK AT WHAT'S BIG DATA ANYWAY, WHAT ARE THE IMPLICATIONS FOR HEALTHCARE GENERALLY, WHAT ARE THE IMPLICATIONS MORE SPECIFICALLY FOR NIH, AND THEN LET'S LOOK AT A FEW EXAMPLES THAT ARE ONGOING THROUGH SOME INITIATIVES WE'RE ALREADY FUNDING OF HOW THIS RELATES TO PREVEPTION AND PREVENTION AND PROMOTING BETTER HEALTH. SO LET'S GET STARTED WITH WHAT IS BIG DATA ANYWAY. I'M FRANKLY NOT PARTICULARLY ENAMORED BY THE TERM. I LIKE IT BECAUSE IT ACTUALLY BRINGS ATTENTION TO AN IMPORTANT PROBLEM. I DON'T PARTICULARLY LIKE IT BECAUSE IT SORT OF -- IT'S NOT VERY SPECIFIC AND IT'S ONE OF THE THINGS -- BEING OVERHYPED IN THE WAY OVER THINGS HAVE BEEN IN THE PAST. WHILE I THINK IT'S GOING TO HAVE A TREMENDOUS IMPACT ON BIOMEDICAL RESEARCH, I THINK THAT'S GOING TO COME OVER TIME. WHEN THAT TIME IS, THAT TIME FRAME IS AN OPEN QUESTION AND I'LL GET TO THAT IN A SECOND. LET ME JUST TELL YOU THAT -- SORT OF QUANTIFY THE PROBLEM. SO WE DON'T KNOW HOW MUCH DATA WE CURRENTLY HAVE IN THE NIH COMMUNITY THAT WE FUND. I MADE A GUESSTIMATE AT THAT NUMBER BY TAKING WHAT WE DO KNOW IN THE INTRAMURAL PROGRAM AT NIH AND KNOWING THAT THAT PROGRAM IS ABOUT 15% OF ALL OF NIH ACTIVITIES, 85% BEING IN THE EXTRAMURAL COMMUNITY. I MADE A GUESSTIMATE THAT WE HAVE 650 PETABYTES OF INFORMATION. NOW, THAT'S QUITE MEANINGLESS UNLESS YOU REFER IT TO SOMETHING, SO JUST BY WAY OF REFERRAL, I FOUND OUT THAT IN 2012, THE LIBRARY OF CONGRESS HAD THREE PE T 3 PETABYTES. VERY COMPLEX INFORMATION IN SOME CASES AND OF DIFFERENT FORMS, SOME HIGHLY STRUCTURED DATA FROM NOT SO STRUCTURED. CURRENTLY THE NCBI NLM HAS ABOUT 20 PETABYTES OF INFORMATION, WHICH IS ABOUT 3% OF ITS TOTAL, AND THAT COULD GROW AS MUCH AS 10 PETABYTES THIS YEAR, PARTLY BECAUSE OF A LARGE INFLUX OF GENOTYPING DATA FROM THE VARIOUS INSTITUTES AND ACTIVITIES, THE AREA IS GROWING VERY FAST. I THINK A VERY TELLING NUMBER IS THE SO HAULE CALLED DARK DATA. RECENTLY A GROUP OF RESEARCHERS AT NLM TOOK A YEAR'S WORTH OF PUBMED CENTRAL FULL PAPERS AND ANALYZED THEM AND THE DATA THAT WAS REFERENCED IN THOSE PAPERS AND LOOKEDDALITY WHAED AT WHAT PERCENTAGE OF THAT DATA WAS ACTUALLY FOUND IN RECOGNIZED ARCHIVES, WHEREAS WHAT WASN'T. ONLY 12% WAS RECOGNIZED SURE TO BE ABLE TO RETRIEVE PLACES. 88%, EITHER SITS ON A SERVER AT A LAB OR ON A WEBSITE, SO CLEARLY IT'S NOT TO SAY THAT ALL OF THAT DATA IS VALUABLE, BUT MUCH OF THAT DATA WOULD BE REQUIRED TO ACTUALLY REPRODUCE THE EXPERIMENTS IN THOSE PUBLISHED PAPERS. SO IF 88% OF IT IS BEING LOST, THAT'S A PRETTY SERIOUS SITUATION. THE COST THAT WE'VE ESTIMATED, WHICH IS ACTUALLY A REASONABLY ACCURATE NUMBER FOR DATA ARCHIVES THAT WE'RE MAINTAINING, WITHOUT NCBI AND NLM, IS A COST OF $1.2 BILLION OVER THE LAST SEVEN YEARS. SO WE'RE ALREADY SPENDING SIGNIFICANT AMOUNTS OF MONEY WITH THE ADVENT OF EVEN MORE DATA, THAT COST IS LIKELY TO GO UP, SO WE NEED TO COME UP WITH NEW WAYS OF DEALING WITH THE SITUATION IF WE'RE GOING TO REMAIN COST-EFFECTIVE, AND, IN FACT, IMPROVE OUR ABILITY TO USE ALL THIS DATA. SO THAT LEADS US TO THE GENERAL NOTION THAT BIG DATA IN BIOMEDICINE IS SPEAKING TO SOMETHING MORE FUNDAMENTAL THAN JUST MORE DATA. IT'S SPEAKING TO THE NEED FOR NEW METHODOLOGY, NEW SKILLS, NO EMPHASIS, NEW MODES OF DISCOVERY. I'LL GIVE YOU SOME EXAMPLES OF THE KINDS OF THINGS THAT ARE HAPPENING WHICH I THINK ARE RELEVANT TO YOUR COMMUNITY. SO ONE OF THE IMPLICATIONS OF THIS KIND OF CHANGE GIVEN THIS DATA GROWTH AT NIH, WHAT ARE THE GENERAL IMPLICATIONS. I WOULD POTENTIALLY ARGUE, THEY GET NERVOUS WHEN I USE THE WORD DISRUPTION, I ACTUALLY REPORT DIRECTLY TO THE NIH DIRECTOR, FRANCIS COLLINS, BUT YOU KNOW, I THINK IT'S WORTH THINKING ABOUT IN THIS CONTEXT AT LEAST. SO I WANT TO JUST ILLUSTRATE THIS IN GENERAL TERMS FROM A COUPLE OF EXAMPLES IN BOOKS THAT I THINK ARE REALLY INTERESTING IF YOU'RE INTO THIS SORT OF THING. THE FIRST BOOK, THE SECOND MACHINE AGE, OF COURSE RELATES TO THE IDEA THAT WE HAD THE INDUSTRIAL REVOLUTION AND THEN WE HAVE A SECOND REVOLUTION THAT'S BASED AROUND DATA AND INFORMATION. THE THESIS OF THIS BOOK IS THAT ESSENTIALLY WHAT'S HAPPENING IS HAPPENING FASTER THAN ANYONE ANTICIPATED. WHILE EVERYBODY THOUGHT WE WOULD HAVE SELF DRIVING CARS, THE FACT THAT WE ALREADY HAVE THEM, QUITE A BIT, IS SORT OF EVIDENCE OF SOMETHING AHEAD OF ITS TIME IN ONE SENSE. THIS KIND OF CAPABILITY IS POSSIBLE BECAUSE 750 MEGABYTES OF INFORMATION IS BEING DIGESTED EVERY SECOND TO DRIVE THAT CAR. SO IT'S REALLY HAVING THE SUPPORTING COMPUTE INFRASTRUCTURE THAT MAKES IT ALL POSSIBLE, THAT RELATES TO OTHER THINGS THAT I WON'T GET INTO. THE OTHER IS POINT OF DECEPTION, I'D LIKE TO REFER TO, SO WE MIGHT BE DECEIVED AT HOW FAST THINGS ARE HAPPENING, RELATES TO THIS BOOK CALLED "BOLD." I'M GOING TO EMPHASIZE THIS WITH SPECIFICALLY EXAMPLES. SO THE EXAMPLE I'LL USE IS PHOTOGRAPHY, THE FIRST IMMEDIATELY YOU'LL SAY WHAT ON EARTH HAS THAT GOT TO DO WITH MY BIOMEDICAL RESEARCH? BEAR WITH ME A SECOND, WE'LL GET THERE. ALL OF THIS STUFF STARTS WITH, IN THE BOTTOM LEFT-HAND CORNER, DIGITIZATION. SO AS CON AT THE PRESENT TIME CONTENT BECO MES DIGITIZED, AT AN EXPONENTIAL RATE, THINGS BEGIN TO HAPPEN. IN THE CASE OF DIGITAL PHOTOGRAPHY, OF COURSE THE ADVENT OF THE DIGITAL CAMERA, WHICH GOT INVENTED BUT THEN PUT ON THE SHELF IN FAVOR OF THE CHEMICAL BUSINESS, WOULD PROVE TO BE A HUGE MISTAKE IN THIS PERIOD OF LOW -- RELATIVELY LOW EXPONENTIAL GROWTH CURVE WHERE YOU ENTER THIS DECEPTIVE PHASE WHERE PEOPLE DON'T FULLY RECOGNIZE WHAT'S HAPPENING ON AN EXPONENTIAL SCALE. I THINK THAT'S LIKELY WHAT HAPPENED TO KODAK, THEY GOT BACK INTO THE MARKET BUT WAY TOO LATE WHEN OTHERS HAD REALLY TAKEN THAT MARKET AS THE NUMBER OF DIGITAL PHOTOGRAPHS AND THE CAPABILITIES OF CAMERAS WAS ESSENTIALLY GROWING EXPONENTIALLY. THIS LEADS TO A DISRUPTIVE INFLECTION POINT WHEN THE GROWTH COMPANIES ENORMOUSLY APPARENT, AND THEN THAT LEADS TO THE NOTION DEMATERIALIZATION, THAT CLEARLY IS WHAT HAPPENED IN A NUMBER OF INDUSTRIES, INCLUDING PHOTOGRAPHY, TO THE POINT WHERE INSTAGRAM, REALLY AN EXAMPLE OF AN ORGANIZATION THAT SUPPORTS COMMUNICATION BY IMAGES WHICH IS A COMPLETELY DIFFERENT BUSINESS MODEL THAN WHAT PHOTOGRAPHY ORIGINALLY WAS HAS HAD A HUGE IMPACT. INSTAGRAM AT THIS MOMENT IS -- MORE THAN KODAK EVER WAS. SO WHAT DOES THAT LEAD TO WITH NIH SPECIFICALLY? YOU COULD POTENTIALLY ARGUE THAT WE ARE AT THIS DECEPTIVE POINT BECAUSE THE DIGITIZATION BY MEDICAL CONTENT, PARTICULARLY IN THE FORM OF -- BUT ALSO IN THE FORM OF EVERYTHING INCLUDING -- PUBLICATION, ANALYSIS TALKS AND SO ON PARTICULARLY WITH THE ADVENT OF ELECTRONIC HEALTH RECORD IS CLEARLY GROWING AT AN EXPONENTIAL RATE AND WE COULD BE AT A DESELFTIVE POINT. WHETHER WE LEAD TO DISRUPTION IS HARD TO SAY. PATIENT-CENTERED HEALTHCARE SYSTEM, I'LL TALK A BIT MORE ABOUT IN A MINUTE. SO THAT'S POTENTIALLY A LARGE SCALE WHAT COULD HAPPEN. THERE IS SOME PRETTY SERIOUS IMPLICATION, ONE RELATES TO THE SUSTAINABILITY OF THAT DATA. SO ON THE LEFT-HAND SIDE, YOU CAN SEE ESSENTIALLY THE NIH BUDGET, AND -- WHICH IS FLAT AT LEAST OF LAST YEAR, TBROAING SLOWLY, WHICH IS GOOD, BUT CERTAINLY NOT AT THE RATE OF WHAT'S SHOWN ON THE RIGHT, PUBLICATION ON AN ANNUAL DATABASE -- CLEARLY THIS IS OUT OF WHACK BECAUSE WE CAN'T SUSTAIN THIS KIND OF GROWTH WITHOUT STATING THE GOAL OF BECOMING MORE EFFICIENT OR SOME COMBINATION OF BOTH. NOT SO LONG AGO, ERIC GREEN, DIRECTOR OF THE HUMAN GENOME INSTITUTE AND JON LORSCH, DIRECTOR OF GM, PROVIDED PERSPECTIVE ON THOUGHTS ON HOW TO ADDRESS THE SITUATION, SOME OF WHICH I'LL TOUCH ON HERE. ANOTHER ACT THAT CAUSES REPRODUCIBILITY. IF AS I ALREADY MENTIONED -- REPRODUCING THOSE STUDIES IS GOING TO BE EXTREMELY DIFFICULT. THIS HAS BEEN IDENTIFIED NOT JUST FOR DATA REASONS BUT OTHER REASONS AS WELL IN VARIOUS STUDIES RECENTLY LOOKING AT THESE SORTS OF ISSUES. AT THE SAME TIME, THEY SORT OF CHANGED THAT WITH THE ONLINE DATA JOURNALS AND A WHOLE SERIES OF OTHER INTERESTING INITIATIVES. I WOULD SAY THE LAST IMPLICATION WHICH BRINGS US TO THE NOTION OF THE PATIENT-CENTERED HEALTHCARE SYSTEM IS THE PRECISION MEDICINE INITIATIVE THAT WAS ANNOUNCED IN JANUARY 2015, WHICH HAS ALREADY GOT A LOT OF TRACTION WITHIN NIH TO THE POINT OF A NUMBER OF LARGE -- BEING MADE AND REAL ACTIVITY AROUND THAT PROJECT. I WON'T GET INTO THE DETAILS OF THAT EXCEPT TO SAY THAT ONE OF THE GOALS HERE IS TO ESTABLISH A 1 MILLION OR MORE VOLUNTEER COHORT OF PARTICIPANTS WHO ARE PROVIDING A VARIETY OF HEALTH-RELATED INFORMATION FOR GENOTYPING INFORMATION THROUGH TO A PHENOTYPE INFORMATION IN THE FORM OF HEALTH RECORDS AND SO ON, INCLUDING OTHER KINDS OF DATA ASSOCIATED WITH LIFESTYLE BIOSAMPLES AND SO ON. SO I THINK THIS IS A GREAT OPPORTUNITY WITHIN THIS CONTEXT THAT IN ITSELF PROVIDES SOME LEVEL OF DISRUPTION AND MAYBE THE END POINT OF THAT DISRUPTION DEPENDING ON HOW YOU LOOK AT IT. SO WHAT ARE SOME OF THE IMPLICATIONS THAT THIS KIND OF SET OF DEVELOPMENTS AND THIS FUTURE -- BEFORE I GET INTO MORE SPECIFICS AND PERHAPS MORE PERTINENT TO YOUR IMMEDIATE INTEREST? FIRST OF ALL I THINK BECAUSE OF THE SHEER SCALE OF THE ACTIVITIES, OPEN COLLABORATIVE SCIENCE BECOMES INCREASINGLY IMPORTANT BOTH NATIONALLY AND INTERNATIONALLY. IT'S IRONIC TO ME THAT WE THINK OF DATA, SCIENTIFIC DATA IN A GLOBAL WAY. WE GRAB DATA FROM ALL OVER THE WORLD AND USE IT, ET CETERA THE SUPPORT OF THAT DATA IS FUNDED NATIONALLY, WHICH CREATES INTENTION IN THE SYSTEM BECAUSE OF IT. AND INCREASINGLY, AND I WON'T GO INTO THE DETAILS OF THE INITIATIVES FOR PARTNERING MUCH MORE CLOSELY WITH OUR INTERNATIONAL AND OTHER NATIONAL FUNDING PARTNERS. SO THE REASON BEING, I THINK IT'S GOING TO BE CRITICAL TO SUSTAINING THIS ENTERPRISE. I THINK THE IMPLICATIONS ARE THE VALUE OF DATA BECOMES INCREASING VALUE FOR SCHOLARSHIP. NOW YOU COULD DISPUTE THAT AND SAY, WELL, THE PAPERS ARE THE REAL -- THEY ALWAYS HAVE BEEN, BUT I THINK THAT -- OF COURSE I MAY BE DRINKING MY OWN KOOL-AID BUT I THINK TO SOME EXTENT, THAT IS CHANGING. AN EXAMPLE IN MY OWN CAREER IS, I HAVE A PAPER THAT HAS OVER 20,000 CITATIONS. NO ONE HAS EVER READ THAT PAPER. THAT PAPER ACTUALLY DESCRIBED BIOLOGICAL DATABASE WHICH PEOPLE USE ALL THE TIME, SO IF PEOPLE COULD ACTUALLY CITE THE DATA DIRECTLY WITHIN THAT DATABASE THEY USED, IT WOULD BE MUCH BETTER FOR THE PEOPLE WHO ACTUALLY PROVIDED THAT DATA, IT WOULD GIVE A MORE REALISTIC PICTURE OF REALLY WHAT'S GOING ON. WE DON'T VA LOU DATA CITATION RIGHT NOW, AND AGAIN, I DON'T HAVE TIME TO GO INTO THAT TODAY, BUT WE'RE WORKING HARD ON TRYING TO CHANGE THAT WITHIN THE NIH AND THAT'S REALLY AN EXAMPLE OF A CULTURAL SHIFT. I'M GOING TO SAY A LITTLE ABOUT IMPROVING THE EFFICIENCY OF THE RESEARCH ENTERPRISE, AND I THINK THAT COULD CLEARLY HAPPEN IN QUITE A BIG WAY, AND THAT'S WHERE WE'RE PUTTING A LOT OF -- I'LL TALK MORE ABOUT IT IN A MINUTE. THEN TRAINING, CLEARLY THE NUMBER OF PEOPLE TRAINED TO SUPPORT BIOMEDICAL DATA SCIENCE AND THE NUMBER OF PEOPLE NEEDED EVEN NOW IS TOTALLY OUT OF WHACK WHACK. IT'S ONLY GOING TO GET WORSE BEFORE IT GETS BETTER, I THINK. WE'RE PUTTING THE PROGRAMS THAT I'M RESPONSIBLE FOR, WHICH I'LL SAY A LITTLE ABOUT IN A SECOND, WITH PUTTING ABOUT 20% OF OUR BUDGET IN. THEN OF COURSE THERE'S ALWAYS THIS ISSUE ABOUT BALANCING ACCESSIBILITY VERSUS SECURITY OF PARTICULARLY ACROSS HUMAN SUBJECTS DATA BUT OTHER DATA TOO, AND THAT HAD TO BE TAKEN INTO ACCOUNT. I WON'T SAY TOO MUCH ABOUT THAT TODAY. WHAT ARE THE IMPLICATIONS IF WE DON'T ACT? GIVEN THIS LARGE INFLUX OF DATA, SOME OF WHICH IS -- MAY NOT BE PARTICULARLY USABLE, AND WHAT ARE THE IMPLICATIONS OF THAT? SO I'M GOING TO GIVE YOU A LITTLE -- I HEARD ABOUT LAST WEEK ACTUALLY WHEN I WAS -- ESSENTIALLY A DAY FOLLOWING JOE BIDEN AT THE UNIVERSITY OF CHICAGO LOOKING AT THE GENOMIC DATA COMMONS THAT'S BEING DEVELOPED BY TH THE NATIONAL CANCER INSTITUTE. SO I THINK THE -- YOU COULD ALSO THINK ABOUT THIS AS YOU GET DOWN TO PERSONALIZED HEALTHCARE WHERE YOU'RE LOOKING AT AT LEAST VERY SMALL GROUPS OF PEOPLE WITH VERY SIMILAR GENETIC PROFILES, BUT IN SOME SENSE, EACH OF WHICH IS DIFFERENT TO THE OTHER GROUPS OF PROFILES, BUT YOU ARE GETTING MORE PRECISE AND YOU'RE ESSENTIALLY ALMOST ANALYZING THINGS LIKE A RARE DISEASE. SO IT'S A COMPLEX DISEASE WITH ALL THESE DIFFERENT TRAITS. SO I THINK IT'S A DIFFERENT WAY OF LOOKING AT IT, BUT ULTIMATELY I THINK THE SAME THOUGHTS AND THOUGHT PROCESSES ARE VERY USEFUL. SO LET'S LOOK AT A SPECIFIC EXAMPLE. SO DIFFUSE INTRINSIC PONTINE GLIOMAS, DIPG, OCCURS IN ABOUT ONE IN 100,000 INDIVIDUALS, IT PEAKS AT AROUND 6 TO 8 YEARS OF AGE. AS YOU CAN SEE FROM THE SURVIVAL CURVE ON THE BOTTOM RIGHT, IT'S NOT VERY ENCOURAGING, MEDIAN SURVIVAL OF 9 TO 12 MONTHS. SURGERY FOR THIS KIND OF CONDITION IS REALLY NOT THAT POSSIBLE. CHEMOTHERAPY IS INEFFECTIVE AND RADIOTHERAPY IS TYPICALLY ONLY TRANSITIVE AND ADDS VERY LITTLE TIME TO LIFE EXPECTANCY. BUT CLEARLY, A DISTRESSING, IMPORTANT PROBLEM. SO THAT WAS THE MESSAGE THERE. THERE HAVE BEEN A NUMBER OF GENOMIC STUDY, AND I THINK ONE THAT REALLY KICKED THIS OFF, I'M GOING TO USE THIS TO ILLUSTRATE THE POINT I WANT TO MAKE, WAS ESSENTIALLY THE IDENTIFICATION IDENTIFICATION -- AS RECURRENT -- THIS WAS DISCOVERED IN ABOUT 2012. UNFORTUNATELY THAT INFORMATION IN TERMS OF THERAPY COULDN'T REALLY BE USED. THERE WAS NO WAY OF TREATING IT RELATIVE TO THOSE PARTICULAR MUTATIONS. HOWEVER, WHAT HAPPENED, THREE YEARS LATER, IN AN EXPANDED DATASET THAT WAS ACTUALLY ACCESSIBLE TO TWO OTHER GROUPS AS WELL, THOSE TBREUPS, THEN THE ORIGINAL GROUP THEMSELVES, IDENTIFIES MUTATIONS IN ACVR1 AS A SECONDARY CO-OCCURRING MUTATION. IT'S A TARGETABLE KINASE OF WHICH IS ALREADY UNDER DEVELOPMENT. SO THIS DISCOVERY ACTUALLY PROVIDED SOME POSSIBILITY TO BEGIN TO TREAT THIS CONDITION. SO WHAT THAT MEANS IS THERE'S ABOUT 300 OF THESE SUCH PATIENTS A YEAR, 60 HAVE THIS MUTATION, SO IF THIS LARGE SCALE DATASET WAS AVAILABLE AND INTEGRATED MORE WITH TCGA AND SO ON, TCGA BEING -- IT WOULD HAVE BEEN IDENTIFIED POTENTIALLY MUCH EARLIER. THAT'S THE POINT. IF THAT DATA HAD BEEN SHARED AND ACCESSIBLE TO MANY DIFFERENT RESEARCHERS, POTENTIALLY, I SAY POTENTIALLY BECAUSE IT'S STILL VERY EARLY DAYS IN HOW EFFECTIVE THE DRUGS ARE GOING TO BE, OR THE DRUG IS GOING TO BE, POTENTIALLY THERE COULD BE LIFE SAVING GOING ON HERE BECAUSE OF IT. SO I THINK IT'S REALLY IMPORTANT THAT WE CONFORM THE WAY WE SHARE DATA, WE SHARE OPENLY AND ACCESSIBLE USABLE FORM AS POSSIBLE. WE CALL THIS THE NOTION OF THIS ABILITY FAIR, WHICH IS THE NOTION OF FIND, FINDABLE, ACCESSIBLE, WHICH MEANS USABLE, INTEROPERABLE, AND REASONABLE. AND THERE'S BEEN A LOT OF WORK BY A COMMUNITY AROUND SO CALLED FAIR PRINCIPLES, HIGHLIGHTED IN THAT PAPER ON THE BOTTOM OF THE SCREEN. SO THE COMMON -- IS A MEANS TO BE FAIR. WHAT I MEAN BY THAT IS, THINK OF A COMMONS AS -- THINK FIRST OF ALL OF THE INTERNET, AND HOW THE INTERNET WAS FORMED. ESSENTIALLY THE INTERNET DIDN'T START OFF AS ONE GREAT CONGLOMERATE NETWORK. IT STARTED OFF AS A SERIES OF SMALL NETWORKS WHICH AT CERTAIN TIME POINT SAW THE MERIT IN MERGING TOGETHER AND BECOMING INTRAOPERABLE, SO IT ULTIMATELY GREW INTO ONE LARGE INTERNET. I WOULD ARGUE THAT WE ARE NOW LOOKING AT THIS IDEA OF THE INTERNET OF DATA, WHERE WE HAVE THESE DISCRETE POCKETS OF DATA, AND THE INTEGRATION AND VALUE OF SEEING THAT MERGE SOME WAY OR BEING JOINTLY ACCESSIBLE IS COMPELLING AND WILL DRIVE THE SO CALLED INTERNET OF DATA. WE'RE TRYING TO OUTSELL -- IN THE NIH THROUGH THE DEVELOP THEMENT OF WHAT WE CALL THE COMMONS, WHICH IS EFFECTIVELY -- WHILE IT'S PHYSICALLY DISCRETE, AS NETWORKS ARE, IT APPEARS TO BE A SINGLE ENTITY WHERE DATA, ANALYTICS -- AND OTHER COMPONENTS INCLUDING FINAL PUBLICATION, CAN BE SHARED IN A COOPERATIVE ENVIRONMENT. SO THERE ARE EXAMPLES OF -- IF YOU THINK OF THESE SUBNETWORKS THAT ALREADY EXIST WHICH ARE ESSENTIALLY THEIR OWN LITTLE COMMON, SO THRA THERE'S THE HUMAN MICROBIOME PROJECT WHICH IS ABOUT 40 TERABYTES OF DATA, MOSTLY THIS SITS ON CLOUD RESOURCES, AND THEN THERE'S THE NCI GENOMIC DATA COMMONS, WHICH IS ABOUT 5 PETABYTES, TO BRING TOGETHER A LOT OF GENOMIC DATA AND VARIOUS SOURCES UNDER THE NCI UMBRELLA. IT'S SO COMPELLING, AT LEAST FOR ME IN THE EXAMPLE IF TOLD YOU, THAT WHILE THIS HAS ONLY JUST BEEN LAUNCHED, AND IT'S CURRENTLY 5 PETABYTES OF INFORMATION, IT'S LIKELY, I THINK, TO SEE VERY EXTENSIVE USE. IT'S SO 12ING THAT, NSKTS, THE PERSON WHO RUNS IT, BOB GROSSMAN, THE PERSON ON THE LEFT IN THAT PICTURE, IS BEING ENCOURAGED BY VICE PRESIDENT BIDEN WHO VISITED LAST WEEK TO GET ON TO THE JOB. AS PART OF THE CANCER MOONSHOT. SO I THINK THERE'S A LOT OF COMPELLING REASONS FOR EXPLORING THESE TYPES OF ENVIRONMENTS. NOW, YOU KNOW, SO TO THAT END, WE ARE SPENDING, WITHIN THE PROGRAM THAT I RUN, FOR BD2K, THE BIG DATA TO KNOWLEDGE PROGRAM, WE'RE SPENDING A VERY SIGNIFICANT PART OF OUR BUDGET NOW AND MORE IN FY 17 ON THE IDEA OF DEVELOPING THE COMMONS CONCEPT. SO WHAT THAT MEANS IS REALLY INDEXING RESOURCES SO THEY CAN BE FOUND, DEVELOPING STANDARDS THAT THE COMMUNITY CAN LEARN ABOUT, AN ADULT, FOR COLLECTING STANDARDS THAT THEY'VE ALREADY DEVELOPED, TO STANDARDIZE DATA AS MUCH AS WE CAN, IT PROVIDES -- COMPUTING FACILITIES, AND IT ALSO AT LEAST HAS THE PROMISE OF A CERTAIN LEVEL OF SUSTAINABILITY, WHICH I WON'T SAY TOO MUCH ABOUT TODAY, BUT -- BECAUSE I WANT TO GET ON TO USE CASES THAT I THINK ARE RELEVANT TO YOU, BUT THIS IS CLEARLY IMPORTANT. AT THE SAME TIME, WE ALSO SPEND A VERY SIGNIFICANT A BUDGET, WHICH IS ABOUT THE ORDER OF $60 MILLION, DATA SCIENCE RESEARCH EFFORTS. AND IT'S THOSE THAT I'M GOING TO REALLY FOCUS ON FOR THE REST OF THE TALK WITH HOPEFULLY USE CASES DEVELOPING AROUND DATA SCIENCE RESEARCH THAT HAVE RELEVANCE TO YOUR PARTICULAR INTERESTS IN PREVEPTIO PREVENTION AND SO FORTH. SO THAT'S WHAT I'LL FOCUS ON FOR THE REMAINDER. SO LET ME GIVE YOU SOME OF THESE EXAMPLES THAT ARE BEING APPLIED TO THESE PREVENTIONS IN PROMOTING BETTER HEALTH. THE FIRST ONE IS ACTUALLY NOT FROM US BUT I JUST USE IT AS A POINT OF REMPS T REFERENCE. THIS IS ACTUALLY AN EXAMPLE FROM A NUMBER OF LABS, THERE'S A COLLEAGUE OF MINE IN DENMARK, AND THIS IS A CO-MORBIDITY NETWORK THAT'S BEEN DEVELOPED USING 15 YEARS OF ELECTRONIC HEALTH RECORDS FOR THE WHOLE DANISH POPULATION. SO I THINK IT'S JUST TELLING FOR ME, AT LEAST, WHO'S NOT ACTUALLY IN THIS BUSINESS, A PREDICTIVE TOOL JUST INSTRUCTED IN A VERY SIMPLE WAY. SO ESSENTIALLY ALL THAT'S GONE ON HERE IS THROUGH ICB-9 CODES AND SO ON, DISEASE CONDITIONS HAVE BEEN IDENTIFIED WHICH REPRESENT -- IN THIS NETWORK. THE EDGES IN THE NETWORK AND THE THICKNESS OF THOSE EDGES REPRESENTS JUST BASED ON WHAT ACTUALLY HAPPENS TO THE DANISH POPULATION, WHICH RELATIVELY SPEAKING IS QUITE HOM HOMO GENIUS -- THE LIKELIHOOD OF GOING FROM ONE CONDITION TO ANOTHER. SO FOR EXAMPLE, IF YOU HAVE COPD COPD, AT THE TIME THERE'S A STRONG LIKELIHOOD THAT YOU'LL HAVE RESPIRATORY FAILURE. NOW OF COURSE AT THAT LEVEL, THERE'S NONE LISTED PARTICULARLY SURPRISING. AS YOU START, YOU FIND INTERESTING CONNECTION, SEEMS TO ME, AT LEAST, THAT THEY OFFER SIGNIFICANT POTENTIAL FOR INTERVENTION. THIS WOULD BE GOOD FOR HAVING THE DISCUSSION BECAUSE I WANT TO LEARN ABOUT THIS, IS THE PART THAT I WORRY ABOUT IS THE TRANSLATION. IN OTHER WORDS, HOW YOU GET SIMPLE TOOLS LIKE IN SOME SENSE THIS REPRESENTS IN THE HANDS OF CARE PROVIDERS AND HAVING THEM EFFECTIVELY OFFER INTERVENTION. SO THAT WAS JUST AN EXAMPLE, THERE'S NO -- JUST LOOKING EXACTLY WHAT'S IN THE RECORD. THE QUESTION IS, CAN YOU LEARN FROM WHAT IS IN THE RECORD, AND THAT IS -- PREDICTIVE COMPUTATIONAL PHENOTYPING. WHICH IS LED BY UNIVERSITY OF WISCONSIN. THERE'S A SERIES OF PROJECTS, DIFFERENT TYPES OF THINGS THAT YOU CAN SEE THERE ON THE LEFT, RANGING FROM -- I'LL FOCUS ON THE EHR PIECE HERE, OBVIOUSLY ALL THE WAY DOWN. AND SO THOSE ARE DIFFERENT PHENOTYPING EFFORTS, AND THEN THIS THERE ARE LABS THAT TAKE THAT DATA AND DOING CERTAIN THINGS WITH IT. SOME NOT I WOULD SAY MUNDANE BUT ON THE OTHER HAND IT'S INCREDIBLY IMPORTANT DATA MANAGEMENT IN THE VALUE OF THE INFORMATION, STATISTICALLY, CREATING REPRESENTATIONS THAT CROSS CUT DATA TYPE, AND THE MODELING ASPECT OF IT, ALL OF WHICH IS REALLY IMPORTANT. SO LET'S JUST LOOK AT THE EHR BASED PHENOTYPING. SO ESSENTIALLY WHAT HAPPENED IN THE PAST IS THE IDEA OF RETROSPECTIVE PHENOTYPING, WHICH IS REALLY WHAT I JUST SHOWED YOU ON THE OTHER SLIDE, BUT THEN THE QUESTION IS, CAN YOU DO PERSPECTIVE PHENOTYPING, CAN YOU PREDICT NOT BY JUST ACTUALLY HAPPENED BUT WHAT CAN HAPPEN, AND LEARN MORE THAN WHAT'S IMMEDIATELY OBVIOUS, IN OTHER WORDS, CAN YOU USE EXISTING EHR DATA AS A TRAINING SET AND USE MACHINE LEARNING ALGORITHMS TO ACTUALLY DO SOME REAL PERSPECTIVE PHENOTYPING AND MAKE REDICKSES ABOUT THE LIKELIHOOD EMERGING. THAT'S WHAT THIS GROUP HACK PLAYING AROUND WITH. THIS IS INDEED VERY FORMATIVE RESEARCH. I'M SHOWING IT TO YOU BECAUSE I THINK IF THE IDEA IS TO ILLUSTRATE WHEN YOU HAVE LARGE AMOUNTS OF DATA, BASED ON 1.5 MILLION SUBJECTS IN THE MARSHFIELD CLINIC ACROSS ALL CD-9 CODES, WHAT YOU CAN SEE IN THIS KIND OF PRESENTATION IS THE AREA UNDER THE CURVE, SO FARTHER TO THE RIGHT, THE MORE PREDICTIVE -- THERE IS, SO AFTER UP WITH MONTONE MONTH AND SIX MONTHS, YOU CAN SEE A CERTAIN POWER TO PREDICT PHENOTYPIC OUTCOMES BASED ON THIS LARGE COHORT OF PATIENTS. VERY PRELIMINARY BUT PROMISING IN -- AS USEFUL TOOLS GOING FORWARD, BASED ON USING PATIENT-CENTERED OUTCOMES FROM LARGE COHORTS OF INFORMATION. LET'S LOOK AT A DIFFERENT EXAMPLE WHICH COMES FROM THE MOBILIZED CENTER, DIFFERENT TYPE OF DATA, AGAIN, WITH THE ABILITY TO PREDICT -- PROVIDE THE ABILITY TO POTENTIALLY MAKE INTERVENTION. AGAIN, THIS IS STILL VERY FORMATIVE RESEARCH. SCOTT'S GROUP AT STANFORD. SO THIS IS JUST ONE OF THEIR ACTIVITIES, BUT THEY'RE ESSENTIALLY USING SMARTPHONE DATA, WHICH PROVIDES LIABILITY DATA AND BMI DATA, AND IT'S BASED ON 2 MILLION SUBJECTS WORLDWIDE, GLOBAL STUDY REPRESENTING -- OVER 100 BILLION DATA POINTS, WHICH IS MUCH LARGER THAN THE IN-HOME STUDY LIABILITY THAT WAS DONE AND OTHER ASPECTS THAT WERE DONE BY THE CDC. SO THIS IS ESSENTIALLY A GIVEAWAY OF DATA THAT'S COMING OFF THIS WITH THE EXTREME THAT THE GROUP HAS ACCESS TO. SO FIRST OF ALL, THE TOP RIGHT-HAND SIDE, YOU CAN SEE THAT THERE'S ABOUT 40,000 INDIVIDUALS, AGE 60 OR PLUS, SO YOU CAN SEE THE AGE DISTRIBUTION. SO IT'S NOT JUST THE OLD IF YOU FUDDY DUDDYs LIKE MYSELF THAT USE CELL PHONES, THEY HAVE THE AGE DISTRIBUTION TO DO THE STUDY. LOOKING AT THE COUNTS OF BMI FROM THIS, YOU CAN SEE THAT THAT FOLLOWS A PRETTY NORMAL DISTRIBUTION, CLOSE TO 30% OF THE POPULATION WORLDWIDE. SAD BUT TRUE. THEN WHEN YOU START ANALYZING THE ACTIVITY OF FOLKS USING THIS DATA, ONE OF THE THINGS THAT I THINK IS INTERESTING BUT NOT A NEW FINDING IS THAT FIRST OF ALL, FOR THOSE OF YOU WHO USE THESE DEVICES, YOU TEND TO SET YOURSELF FOR 10,000 STEPS PER DAY, AND MYSELF INCLUDED, OFTEN OR MORE THAN OFTEN DON'T ACTUALLY REACH THOSE 20,000 STEPS. MEN TYPICALLY COVER MORE STEPS THAN WOMEN, AND SORT OF HOW IT RELATES TO THE BMI CATEGORY. IT'S OBVIOUSLY ACTIVITY TENDS TO INCREASE WITH INCREASING BMI OR THE OTHER WAY AROUND, SO THERE'S NOTHING PARTICULARLY SURPRISING IN THIS EXCEPT IT'S CONFIRMATION ABOUT A MOTION THAT WAS DONE IN ONE EXAMPLE, THAT WAS DONE ABOUT THE GENDER DIFFERENCE A NUMBER OF YEARS AGO. BUT THEN WHEN IT GETS MORE INTERESTING, WHICH IS SHOWN ON THE RIGHT, IS THE NOTION ABOUT DURATION. SO IT TURNS OUT AT LEAST ACCORDING TO THIS VERY PRELIMINARY DATA, IS THAT EVEN WITH ONE -- ABOUT GREATER THAN FIVE MINUTES OF SIGNIFICANT ACTIVITY PER DAY, BMI TENDS TO BE LOWER, AND THEN IT SORT OF ACTUALLY FLATTENS OFF AND CERTAINLY AFTER A NUMBER OF BOUTS, ACTUALLY BEGINS TO GO UP. SO I THINK THE RELEVANCE OF THAT DATA IS STILL YET TO BE FULLY UNDERSTOOD, BUT I THINK THIS IS JUST COMING OFF -- OF THOSE AT THE VICES. I THINK IT CERTAINLY OFFERS A RICHFIELD OF STUDY TO BEGIN LOOKING AT THIS TO UNDERSTAND HOW ARE YOU GOING TO OFFER INTERVEPTIONS THROUGH LARGE SCALE EXPERIMENTS LIKE THIS? PLANETARY SCALE, THEY USE THE GINI COEFFICIENT SCALE HERE, USED TO MEASURE OBESITY INEQUALITY. SO BASICALLY WHAT THAT MEANS IS THE ACTIVITY AND EQUALITY, THE LEARNLGER OF THE NUMBER, THE MORE VARIANCE THERE IS IN THE POPULATION WITH RESPECT TO THEIR LEVEL OF ACTIVITY. AT THE SAME TIME, ON THE OTHER -- THERE'S INTERESTING OUTLINES LIKE VOTE NAM THAT COULD BE BIG DATA BUT BEYOND THAT, AGAIN, THE KIND OF THINGS THAT YOU COULD PERHAPS IMAGINE IN TERMS OF THE VARIANCE IN A GIVEN COMMUNITY. SO IN THE UNITED STATES, FOR EXAMPLE, THERE'S A LARGE ACTIVITY INEQUALITY AT ALL HIGH LEVELS OF OBESITY. SO THAT'S QUITE CLEAR AND YOU CAN THEN BREAK THAT DOWN ONTO A MORE -- LEVEL. ON THE RIGHT SIDE, BASED ON THIS DATA THAT'S FOUND JUST IN THE UNITED STATES, THE Y AXIS IS THE AVERAGE LEVEL OF OBESITY IN A GIVEN CITY WITHIN THE U.S. AGAIN, CERTAINLY WORTHY OF CONSIDERATION, IT'S SORT OF ALONG THE LINES CITY BY CITY OF WHAT YOU WOULD EXPECT. INTERESTINGLY ON THE X AXIS, IS DATA THAT'S TAKEN DIRECTLY FROM YELP, WHICH SL PUBLIC AND DOWNLOADABLE. IT'S MEASURING THE NUMBER OF FAST FOOD RESTAURANTS WITHIN CERTAIN REGIONS -- WELL, WITHIN REGIONS OF CITY. THE MORE FAST FOOD RESTAURANTS THERE ARE, THERE TENDS TO BE HIGHER LEVELS OF OBESITY. AGAIN, NOT PERHAPS SURPRISING, BUT CERTAINLY A CORRELATION THERE. THEN THE QUESTION BECOMES WHAT DO YOU DO WITH INTERVENTION? IT WOULD BE REALLY INTERESTING TO START READING BASED ON SOME OF THAT DATA, IN FACT, SOME OF IT IS YG ON BUT IN HEALTHY CITIES, YOU TRY AND KICK THIS BALANCE, IT WOULD BE VERY INTERESTED IN TIMES AHEAD TO LOOK AT THE IMPACT THIS KIND OF DATA TELLS BUT WHAT HAPPENS WHEN YOU MAKE INTERVENTION, EITHER THE KIND OF FOOD RESTAURANTS YOU HAVE OR THE KINDS OF LAYOUTS OF CITIES TO ENCOURAGE MORE ACTIVITY AND SO ON. SO I THINK THAT'S ALL VERY PROMISE PPROMISING. ONE MORE EXAMPLE WHICH BRINGS IN ANOTHER TYPE OF DATA FROM DIFFERENT TYPES OF CENTERS. WE'VE SEEN ONE KIND OF CENTER THAT WAS USED IN THE SMARTPHONE TO GET MOBILITY -- THE MOBILE CENTER TO KNOWLEDGE -- I'M BLANKING. IT WILL COME TO ME IN A SECOND. THEY'RE NOT ONLY USING EXISTING CENTERS BUT THEY ARE ALSO DEVELOPING SENSORS THEMSELVES, TR EXAMPLE, IN SMART TBLASES TO LOOK AT PUPIL DILATION AND SO ON. AND CHEST MONITORS AND SO ON. AGAIN, A LOT OF THESE KINDS OF DEVELOPMENTS, VERY FORMATIVE. AN EXAMPLE OF MEASURING, ESSENTIALLY USING CELL PHONES TO GET A CONSTANT EKG READOUT, THE PROBLEM WITH THAT RIGHT NOW IS JUST TRIVIAL THINGS, IT DRAINS THE BATTERY IN NO TIME AT ALL, SO THIS IS AN EXAMPLE OF EARLY STAGE KIND OF DEVELOPMENTS WHICH I THINK WILL BECOME IMPORTANT AT THE TIME. AND THEN ASSOCIATED WITH THOSE TYPES OF EXPENSES, YOU'RE LOOKING AT DIFFERENT KINDS OF EXPOSURES, WHETHER IT'S THE EXPOSURES TO DIFFERENT TYPES OF FOODS, TOBACCO PRODUCTS, ALCOHOL AND SO ON. THEN WHAT ARE THE MAY HAVE YOURS USINBEHAVIORS,USING SENSORS, RELATIVE TO T HESE TYPES -- ESSENTIALLY THAT LEADS TO OUTCOMES, WHICH CAN BE MEASURED AND POTENTIALLY INTERVENTION IS ULTIMATELY MADE. AGAIN, THIS IS ALL VERY FORMATIVE STUFF, BUT I THINK NEVERTHELESS HAS PROMISE. SO JUST LOOKING AT ONE EXAMPLE, SO THIS PARTICULAR GROUP IS WORKING ON TWO MAJOR AREAS. ONE IS CHRONIC HEART FAILURE AND THE OTHER IS SMOKING HABIT AND CESSATION. SO THEY HAVE DEVELOPED SENSORS TO DETERMINE WHEN SOMEONE ACTUALLY LAPSES IN SMOKING CESSATION. SO THAT TURNS OUT TO BE AN INTERESTING INTRIGUING PROBLEM IN ITSELF BECAUSE FIRST OF ALL THE -- IS USUALLY FAIRLY FEMORAL -- JUST TO GIVE YOU AN EXAMPLE, ON THE RIGHT-HAND SIDE THERE, YOU CAN SEE EXACTLY HOW DIFFERENT PEOPLE HOLD CIGARETTES AND SO ALL OF THAT LEADS TO DIFFERENT TYPES OF MOTION WHICH NEEDS TO BE DISTINGUISHED FROM OTHER KINDS OF MOTION LIKE YAWNING, EATING AND SO ON. SO I THINK THEY'VE ACTUALLY MADE SOME PROGRESS ON THAT, CONTROLLED EXPERIMENTS, SO THAT THEY CAN WITH THE DEVICE THEY'VE BEEN USING DETECT TIMES OF DURATION OF THOSE MASTERS. AND THAT THEN LEADS TO SOME OBSERVATIONS SO THAT THE PERSON USUALLY IS VERY SHORT DURATION AND WHICH THEY GET LONGER AFTERWARDS, AND ALSO HOW PATIENTS REPORT LAPSES DOESN'T REFLECT VERY WELL WHAT THEY ACTUALLY DETECT FROM THE SENSORS. SO AGAIN, THIS SORT OF VERY EARLY STAGE WORK THAT POINTS TO THE POSSIBILITY OF SOME FORM OF INTERVENTION. SO IN SUMMARY, I THINK FROM MY POINT OF VIEW, BIG DATA OFFERS UNPRECEDENTED OPPORTUNITIES FOR WHAT CAN HAPPEN GOING FORWARD. THEY REQUIRE A CULTURAL SHIFT, WHICH IS RELATIVELY SMALL IN SOME COMMUNITIES AND LARGE FOR OTHERS. FOR EXAMPLE, IN THE GENOMICS COMMUNITY, THERE'S THE CULTURE OF SHARING AND SHARED RESOURCES AND SO ON ALREADY, AND IN MORE CLINICAL AREAS, NOT SO MUCH. EITHER WAY, EVEN FURTHER CULTURAL SHIFTS ARE NEVER EASY. CENTERS ARE CRITICAL. I THINK -- THERE'S DIFFERENT AREAS WE'RE WORKING ON. I MENTIONED THE COMMON ENVIRONMENT AND THESE EFFORTS BOTH BY NCI, BY BIG DATA TO KNOWLEDGE AND A NUMBER OF OTHERS, AND IEHS IS NOW DEVELOPING A COMMONS, HUMAN GENOME IS IN THE PROCESS OF DEVELOPING A COMMONS, ALL UNDER THE FAIR PRINCIPLES WHERE WE HOPE TO FOSTER MUCH MORE INTERCHANGE AND SHARING OF INFORMATION. AND SO I'D CERTAINLY LIKE TO HEAR FROM YOU, WE TRY TO ENCOURAGE CHANGE, I'D VERY MUCH LIKE TO HEAR OPPORTUNITIES FROM YOU AS TO HOW THIS KIND OF DATA COULD BE USED EFFECTIVELY. I TALK ABOUT THIS TO FRANCIS COLLINS, HE SAID THIS IS ALL GREAT BUT GIVE ME THE -- SHOW ME HOW, OVER TIME, THIS IS GOING TO HAVE A REAL IMPACT ON HEALTHCARE. AND THAT'S WHAT I NEED YOUR HELP WITH. THANKS VERY MUCH. LET'S SAY IT'S A TEAM EFFORT BY NO SHORT MEASURE, AND MY ROLE IS ACROSS THE NIH, A TRANS-NIH ROLE, AND A GREAT TEAM OF PEOPLE THAT WORK ON THIS, BUT IMPORTANTLY OR EQUALLY IMPORTANT AS THE GREAT THEME ARE THE REPRESENTATIVES FROM ACROSS THE 27 INSTITUTES AND CENTERS OF NIH, TO FORM THE EXECUTIVE COMMITTEE AND HELP MOVE THESE INITIATIVES FORWARD. SO THANK YOU FOR THAT. AND I'LL LEAVE A WEBPAGE AND AN EMAIL ADDRESS IF YOU WISH TO CONTACT ME ABOUT THIS LATER, AND HOPEFULLY I'LL HAVE SOME TIME FOR QUESTIONS AND HOPEFULLY I'LL BE TALKING TO SOMEONE THAT STILL OUT THERE. SO THANKS VERY MUCH. >> THANK YOU, DR. BOURN. YES, WE'RE STILL HERE. >> THAT'S A RELIEF. >> BUT WE DO HAVE QUESTIONS. >> SO I'LL START WITH ONE THAT'S KIND OF A TWO-PART QUESTION. FIRST THE STATEMENT IS DOESN'T THE SCIENTIST CHANGE IN A WORLD WITH MASSIVE AMOUNTS OF DATA? AND THEY SAY -- >> SORRY, MISSED THAT, DOESN'T A SCIENTIST WHAT? >> DOESN'T THE QUOTE-UNQUOTE SCIENTIST CHANGE IN A WORLD WITH MASSIVE AMOUNTS OF DATA? AND THEY FOLLOW THAT BY SAYING, IT SEEMS A DIFFERENCE BETWEEN INSTAGRAM AND THE USE OF ELECTRONIC HEALTH RECORDS DATA IS THAT ANYONE CAN USE THE FORMER. WHEREAS VERY FEW SCIENTISTS CURRENTLY HAVE THE EXPERTISE TO USE THE AVAILABLE DATA. THOUGH MANY MAY HAVE QUESTIONS THEY WANT TO ANSWER. THE GAP BETWEEN TRAINING AND DEMAND SEEMS LIKE A RATE LIMITING STEP. >> I THINK THOSE ARE GOOD COMMENT QUESTIONS -- WELL, COMMENTS MAINLY IN WHICH I COULD ALSO COMMENT ON AS WELL. I MEAN, I THINK THAT SCIENTISTS -- THERE IS A NEED FOR CHANGE, AND I DON'T THINK THERE'S ANY QUESTION ACROSS THE BIOMEDICAL SCIENCES THAT THERE'S A GROWING NEED, A GROWING NECESSITY TO HAVE TRAINING IN ANALYTICAL TECHNIQUES THIS MAY NOT HAVE BEEN NECESSARY IN PREVIOUS GENERATION. WE'VE SEEN THIS HAPPEN -- I'M BASICALLY A BIOINFORMAT CYSTACIST ORIGINALLY, WE SAW THIS 20 YEARS AGO WHEN WE WERE PUSHED INTO IT WITH THIS HUMAN GENOME PROJECT, SO THAT DID LEAD TO A NEW -- AND I HAVE A WHOLE LECTURE ABOUT THIS, BUT I WON'T BORE YOU WITH THAT NOW, BUT HOW THAT GROUP OF PEOPLE SORT OF EVOLVED AT THE TIME AND HOW THEY BECAME INTERGREATED INTO THE SYSTEM AND ULTIMATELY BECAME EQUAL PARTNERS IN THE ENTERPRISE. SO THAT WAS A RESULT OF CHANGE, THIS IS CHANGE AS WELL. IN TERMS OF THE USABILITY OF DATA, I THINK THAT BY DIFFERENT GROUPS, IT'S TRUE THAT ANYONE COULD -- CAN USE INSTAGRAM, BUT YOU CAN'T NECESSARILY MAKE SENSE OF ELECTRONIC HEALTH RECORDS, BUT I THINK PART OF THAT HAS TO DO WITH HOW WHICH PRESENT VISUALIZED -- AND CREATE USER INTERFACES TO ENABLE MEANINGFUL USE OF THIS KIND OF INFORMATION. AND YOU KNOW, OBVIOUSLY THERE'S LOTS OF CAVEATS THERE. FOR EXAMPLE, IF YOU HAVE AN APP WHICH SUGGESTS SOME FORM OF INTERVENTION, AND YOU MAKE THAT INTERVENTION AND WHATEVER IT IS DOESN'T IMPROVE, CAN YOU SUIT MAKER OF THE APP? ALL OF THESE KINDS OF ISSUES WHICH HAVE TO I THINK IN A VERY FORMATIVE STAGE WE'RE STILL WORKING THROUGH, BUT I THINK THERE'S NO QUESTION THAT IT REALLY BEHOOVES THE PEOPLE WHO ARE ALREADY WORKING IN THIS FIELD TO ADDRESS THE USABILITY QUESTION, AND THEN, OF COURSE, THERE'S THE MEANINGFULNESS OF THE INTERPRETATION THAT PEOPLE MAKE OF DATA. SO NONE OF THESE -- AGAIN, I THINK BY TRAINING AS MANY PEOPLE AS WE CAN, WE'LL AT LEAST MAKE SOME PROBLEM. >> THERE'S A FOLLOW-UP, CAN NIH ICs DEVELOP COMMON BASED INITIATIVES, CAN THE COMMONS BE USED TO INTEGRATE ELECTRONIC HEALTH RECORD IN NIH SUPPORTED RESEARCH ACROSS THE U.S. NOW? >> SO WE HAVE THESE INITIATIVES GOING ON, AND I MENTION THESE VARIOUS -- WE CALL THEM PILOTS ACTUALLY THAT ARE GOING ON. I MEAN, IT'S VERY EARLY IN THEIR DEVELOPMENT, SO I DIDN'T GO INTO A LOT OF THE DETAILS AND THE EVALUATION OF THE SUCCESS OF THOSE. I WOULD SAY THAT THE GOAL IS -- MY PERSONAL GOAL, IN FIVE YEARS, I'D LIKE TO BE ABLE TO SAY WELL, LOOKING AT WHAT WE'RE DOING NOW, AND HOW WE'RE MANAGING DATA AND HOW MUCH IT'S COSTING AND HOW MUCH USAGE THERE IS AND SO ON, WE'VE REALLY INCREASED THAT EFFICIENCY BY, LET'S SAY I'M SHOOTING FOR 5%, FOR 5% OF $32 BILLION A YEAR, THAT WOULD BE A LOT OF YOU DOING RESEARCH. SO THAT'S -- IT'S A LOFTY GOAL BUT THAT'S THE KIND OF WAY WE'RE THINKING ABOUT IT. BUT HOW DO YOU GET THERE? AND SO THE ISSUE IS THAT THE PROGRAM I DESCRIBED, BD2K, IS PUTTING THE ORDER THIS YEAR PROBABLY $40 MILLION INTO FOSTERING THIS KIND OF ACTIVITY. SO THAT REALLY FRANKLY IS A DROP IN THE BUCKET OF WHAT'S ACTUALLY ACTUALLY -- I'VE BEEN HAVING THESE DISCUSSIONS WITH FRANCIS COLLINS AND LARRY TABAK, WE NOW FORMED A TASK FORCE TO DETERMINE HOW WE CAN -- WHAT'S THE RIGHT THINGS TO DO AND HOW DO ACCELERATE THE PROCESS SO THAT WE GET TO BETTER END POINTS WITHOUT COMPLETELY DRAINING THE BANK. YOU KNOW, JUST TO GIVE YOU AN EXAMPLE WHICH GOT THEIR ATTENTION, I'VE SHOWED YOU DATA -- NOT DATA, I SHOWED YOU A BULLET, I DIDN'T SHOW YOU THE DATA BEHIND IT, THAT WE SPENT $1.2 BILLION IN THE LAST SEVEN YEARS ON DATA RESOURCES. IF YOU ALSO RECALL, I SAID WE'RE SAVING 12% OF THE DATA CURRENTLY FROM PUBLICATION, SO IF YOU SAVED IT ALL, YOU'D BE SPENDING $1.2 BILLION A YEAR JUST ON THAT DATA. SO CLEARLY THE COSTS ARE GOING TO CONTINUE TO INCREASE AS THE DATA INCREASES, WE NEED TO FIGURE OUT WAYS OF BRINGING THOSE DOWN SO THE COMMONS IS A WAY OF GETTING THERE, BUT HOW DOES IT ANSWER THE QUESTION, HOW DO THEY GET ON TO THE ON RAMP, AND THAT'S ONE OF THE ISSUES I'VE ASKED THIS TASK FORCE, TO HAVE THE TASK FORCE ADDRESS. BECAUSE THIS IS A CRITICAL ISSUE ISSUE. WE HAVE A VERY SMALL OFFICE, AND AS YOU SAW, THE PEOPLE IN THE I C, THEY GIVE A PIECE OF THEIR TIME TO HELP MOVE THE STORE BUT THEY'RE ONLY WORKING A SMALL PERCENTAGE OF THE TIME. SO WE NEED LOTS OF ENGAGEMENT ACROSS THE IC. CLEARLY THE CENTER FOR INFORMATION TECHNOLOGY ALL HAVE ROLES TO PLAY HERE. THAT'S ALL BEING CONSIDERED BY THE TASK FORCE. >> ALL RIGHT. THANK YOU. WE HAVE ANOTHER THREE MORE QUESTIONS, HOPEFULLY WE HAVE TIME TO GET TO AT LEAST SOME OF THEM. FOR THOSE ON THE PHONE AND WATCHING, IF YOU HAVE QUESTIONS THAT WE DON'T GET TO, PLEASE EMAIL US AT PREVENTION@MAIL.NIH.GOV, OR YOU CAN EMAIL DR. PHILIP BOURNE@NIH. SO DR. BOURNE, COULD YOU COMMENT ON THE ROLE, HELPFUL OR OBSTRUCTIVE, OF LARGE SCALE COMPANIES COLLECTING DATA TO COMMERCIALIZE IT? >> YES, INTERESTING QUESTION. SO I THINK CLEARLY THAT IS GOING ON IN DROVES IN LOTS OF INDUSTRIES. THE QUESTION OF THAT DATA WITHIN THE HEALTHCARE INDUSTRY PARTICULARLY AS IT RELATES TO ELECTRONIC HEALTH RECORDS, IS I THINK -- CERTAINLY THE PROJECT WITHIN BD2K HAVE ACCESS TO THAT DATA. IN FACT SOME OF THESE COMPANIES ARE ACTUALLY DEVELOPING PATIENT-CENTERED OUTCOMES OF THEIR OWN USING THIS DATA, BUT OTHERS DON'T SEEM TO BE IN QUITE THE SAME WAY. AND ARE WILLING TO AT LEAST PROVIDE THAT INFORMATION TO RESEARCHERS FOR PURPOSES OF SORT OF LEADING TO BETTER OUTCOMES. BUT IT'S NOT PROBABLY WHERE IT SHOULD BE AND I THINK THE ABILITY TO OPERATE CERTAINLY IS NOT AS EASY NECESSARILY FOR RESEARCHERS AS IT IS IN OTHER COUNTRIES WHICH HAVE FOR EXAMPLE MORE SOCIALIZED HEALTHCARE SYSTEMS. ON THE OTHER HAND, THE GROUP AT HARVARD HAS ACCESS TO I THINK 55 MILLION AETNA RECORDS FOR EXAMPLE LOOKING AT PRESCRIPTIONS AND OTHER ASPECTS OF THOSE RECORDS, SO I THINK THERE'S -- IT'S SPOTTY IN HOW THAT DATA IS ACCESSIBLE. OF COURSE I'M NOT TOUCHING ON THE WHOLE CONSENTING ISSUE AND THE WHOLE PRIVACY ISSUES, BUT CLEARLY THAT'S A BIG PART OF IT BUT JUST IN GENERAL TERMS, I THINK WE'VE GOT QUITE A LONG WAY TO GO. I THINK THE INITIATIVES LIKE THE MOONSHOT AND PRECISION MEDICINE, PRECISION MEDICINE PARTICULARLY FOR THE CONSENTING ASPECT AND THE ACCESSIBILITY OF INFORMATION, I THINK WILL HELP DRIVE CHANGE. BUT IT'S GOING TO BE VERY IMPORTANT THAT THOSE PROJECTS GO FORWARD, THAT THE VALUES TO THE PARTICIPANT CAN BE SEEN AT A RELATIVELY EARLY STAGE TO KEEP THEM ENGAGED IN THE INCREASING PROVISION OF INFORMATION. AND THEN OF COURSE HAVING HEALTHCARE PROVIDERS ALSO PARTICIPATING IS GOING TO BE CRITICAL. >> THANK YOU. WE'RE REACHING THE END OF OUR TIME, AND THOUGH WE HAVE LOTS MORE QUESTIONS, WE PROMISED THAT WE WOULD WRAP AT 2:30 SO I THINK WE'RE GOING TO STOP AT THIS POINT. WE'D ENCOURAGE PARTICIPANTS WHO DIDN'T GET TO POSE THEIR QUESTIONS OR WHO HAVE FOLLOW-UP QUESTIONS TO ACCEPTED A SEND AN EMAIL TO U S AT THE OFFICE OF DISEASE PREVENTION OR CONTACT DR. BOURNE HERE AT NIH. DR. BOURNE, THANK YOU VERY MUCH FOR YOUR TIME TODAY. WE APPRECIATED YOUR PRESENTATION AND THE DISCUSSION THAT WE JUST HAD. THANK YOU. >> THANK YOU VERY MUCH. >> AND FOR THOSE ON THE PHONE, THANK YOU FOR PARTICIPATING. ON THE MIND THE GAP WEBSITE, WHICH IS AGAIN PREVENTION.NIH.GOV/MINDTHEGAP, YOU WILL FIND SEVERAL RESOURCES FOR THIS TALK, INCLUDING THE SLIDES, REFERENCES AND A LINK TO COMPLETE AN EVALUATION. YOUR FEEDBACK IS VERY IMPORTANT TO US AS WE PLAN FUTURE SESSIONS. THANK YOU AGAIN FOR YOUR TIME.