I'M KEVIN CAMPHAUSEN, HEAD OF RADIATION ONCOLOGY AT THE NIH. I HAD A LONG INTEREST IN DATA, WHEN WE CALLED IT DATA, IT WAS BIOINFORMATICS NOW, OTHER PEOPLE ARE USING OTHER TERMS, BIG DATA, BIG SCIENCE, LOTS OF OTHER THINGS MIXED IN HERE. I WANT TO WELCOME EVERYBODY TODAY. THERE'S A LOT OF MEETINGS GOING ON. IT'S HARD TO DRAG EVERYBODY HERE IN THE SUMMER, BUT I THINK WE HAD ALMOST 120 PEOPLE REGISTERED FOR THE MEETING, WE'RE EXPECTING 75 TO ACTUALLY COME. THERE'S 10 MORE IN THE VISITORS CENTER GETTING BODY CAVITY SERVES. SERVE -- SEARCHES. I JUST WANT TO SHOW A COUPLE QUICK SLIDES, SOME OF THE PROBLEMS THAT I HAVE TO DEAL WITH EVERY SINGLE DAY. THIS IS WHAT I TALK TO MY PHYSICISTS ABOUT, MY DATA SCIENTISTS ABOUT, SO THIS IS TWO PATIENT IMAGES. EVERYBODY CAN SEE THAT'S THE SAME THING, PATIENT'S GOT A BRAIN TUMOR. IT'S NOT. LEFT IS BRAIN MET, RIGHT IS GLIOBLASTOMA, DIFFERENT DIAGNOSES AND TREATMENTS, EVERYTHING ABOUT THEM IS DIFFERENT, YET THE IMAGES ARE IDENTICAL. THIS IS ONE OF THE THINGS THAT THE IMAGING SCIENTISTS HAVE BEEN STRUGGLING WITH. WHAT KIND OF IMAGES CAN WE GET IN MRI TO BE ABLE TO DEFINE THESE, WHAT PET SCANS, WHAT NOVEL TRACERS CAN WE THROW IN. SO THAT'S ALL STUFF THAT I HOPE WE'RE GOING TO EVENTUALLY BE ABLE TO GET. AS A RADIATION ONCOLOGIST, HOW MANY RADIATION ONCOLOGISTS ARE IN THE ROOM? I LOVE THIS. THIS IS GREAT. AS YOU KNOW WE'RE THE SMARTEST DOCTORS IN THE -- WELL, DON'T TELL THE NEUROSURGEONS. WE'VE BEEN IN THE BASEMENT FOR YEARS. WE FOLLOW LATE EFFECTS. WE SEE OUR PATIENTS A YEAR AFTER WE TREAT THEM. WE HAVE MORE DATA ON OUR PATIENTS THAN ANY GROUP COULD POSSIBLY HAVE, AND WE REALLY AREN'T UTILIZING IT THE WAY WE NEED TO. THAT'S WHAT WE'RE ALL -- THAT'S WHY WE'RE IN THIS ROOM, WE NEED TO FIGURE OUT WHAT DATA IT IS, HOW WE FORMULATE THE DATA, WHAT OTHER DATA IS OUT THERE THAT WE CAN BORROW, HOW WE CAN SHARE OUR DATA. ON A TYPICAL MRI IMAGE THERE'S 52 MILLION VOXELS OF INFORMATION ON THIS SCAN, ONE SCAN. HOW MANY SCANS DO WE GET ON A PATIENT? SIX, EIGHT, 12, DEPENDING ON TIME ON THE MRI, THAT'S ONE DATA POINT, ONE VERY TIME. 20,000 GENES ON THIS PATIENT'S TUMOR WHEN WE TAKE IT OUT AND SEND IT FOR SEQUENCING. I'M SURE YOU ALL LOVE TO SEE THOSE SEQUENCING REPORTS WHEN THEY START COMING BACK. THERE'S ANYWHERE BETWEEN 1 AND 250 VARIANTS PER GENE. P53 IS UP TO 353 VARIANTS. YOU GET THE SCAN WITH THE P53 VARIANT, WHO KNOWS WITH WHAT 82%, WHAT DO WE DO WITH THE INFORMATION, WHAT'S IMPORTANT? THE SEQUENCE ISN'T EVEN TAKING INTO ACCOUNT THE NON-CODING MOLECULES WHICH WE'RE FITTING WITH, siRNA, MI RNA, LONG CODING, THIS DOESN'T APPEAR ON THE SEQUENCING REPORT BUT ALL THAT INFORMATION IS GATHERED ON ALL OF OUR PATIENTS. WE TREAT THEM, RIGHT? FOR GBM WE TREAT THEM EXCELLENT HERE. WHAT'S THE RADIATION EFFECT? THE FIRST MONTH'S SCAN WE TELL OUR PATIENTS IT'S GOING TO LOOK TERRIBLE. WHY DOES IT LOOK TERRIBLE? WHERE DOES IT LOOK TERRIBLE? WHAT RELATED TO THEIR SEQUENCING PATTERN MADE ONE PERSON'S IMAGE LOOK WORSE THAN ANOTHER? WHAT DID PROTONS DO? WHAT DID THE BOOST FIELD LOOK LIKE? ALL THIS INFORMATION IS OUT THERE. I HAVE TEMOLOOZIDO IS ON BOARD. IT'S MY HOPE, DR. JAIN WHO IS GOING TO NEXT, THREE YEARS AGO, I WAS LUCKY ENOUGH THAT THE CENTER FOR CANCER RESEARCH WHO I WORK FOR FUNDED A FELLOWSHIP FOR ME TO START TO REACH OUT INTO RADIATION ONCOLOGY AND FIND SOMEBODY WHO WAS INTERESTED IN RADIATION ONCOLOGY AND INTERESTED IN BIG DATA. DR. JAIN IS OUR FIRST FELLOW, HE'S GETTINGS CLOSE TO FINISHING UP. HIS INTEREST IS DIFFERENT. IT'S NOT INTO THE EXACT CLINICAL PATIENT, BUT LOOKING MORE INTO BIG DATA SETS, BOTH IN THE FDA AND IN CMS, AND THAT'S A WHOLE DIFFERENT SORT OF THING WE CAN TALK ABOUT, PUBLIC POLICY AND DIFFERENT THINGS LIKE THAT. SO TODAY I HOPE YOU'LL SEE A WIDE MIX OF INFORMATION. WE HAVE PEOPLE HERE FROM FUNDING, THE FUNDING PART OF THE NCI, TO HEAR ABOUT WHAT WE NEED, WHAT WE'RE INTERESTED IN, WHAT SORT OF MONEY IS OUT THERE AND WHAT SORT OF MONIES WE NEED IN ORDER TO FUND SO I HOPE EVERYBODY HAS INTERACTION. WE LEFT SOME TIME FOR EVERYBODY TO CHAT. HOPEFULLY THIS THE FIRST OF MANY MEET WHERE IS WE TALK ABOUT DATA AND BIG SCIENCE. WITHOUT FURTHER ADO I'D LIKE TO INTRODUCE DR. JAIN. >> THANK YOU, EVERYONE, FOR COMING FOR OUR INAUGURAL WORKSHOP. I'M EXCITED ABOUT THIS WORKSHOP AND ALSO VERY IMPRESSED WITH THE TURNOUT THAT WE'VE HAD. YOU KNOW, WHEN THEY FIRST ASKED ME HOW MANY ONCOLOGISTS CAN WE GET TO THE MEETING, I SAID WE SHOULD SHOOT FOR ABOUT 20 PEOPLE. I THINK THAT'S GOING TO BE IT. AND, YOU KNOW, SO I THINK IT SPEAKS TO, YOU KNOW, WHERE THE FIELD IS HEADED, AND HOW, YOU KNOW, DATA AND, YOU KNOW, AFFECTS OUR CLINIC, HOW WE THINK ABOUT PATIENT CARE. AND SO I WANT TO JUST BRIEFLY KIND OF TALK A LITTLE BIT ABOUT THE WORKSHOP GOALS, AS I SEE THEM. YOU KNOW, ONE IS WE WANTED TO SHOWCASE EXAMPLES. AND I REALLY UNDERSCORE THE WORD EXAMPLES, BECAUSE THIS IS IN NO WAY OR MEANS REPRESENTATIVE FULLY OF WHAT IS GOING ON OUT THERE IN THE FIELD. THERE ARE INCREDIBLY DIVERSE WORKSHOPS THAT ARE TAKING PLACE ACROSS THE COUNTRY, YOU KNOW, RADIATION RESEARCH PROGRAM RECENTLY HAD A WORKSHOP ON ARTIFICIAL INTELLIGENCE, YOU KNOW, MICHIGAN PUTS ON A TERRIFIC PROGRAM CALLED THE PRACTICAL BIG DATA WORKSHOP. YOU KNOW, AND I COME WITH A PERSPECTIVE OF A COMMUNITY PHYSICIAN, INTERESTED IN DATA INITIATIVES. AND HOW THAT CAN ACTUALLY PRAGMATICALLY AFFECT PATIENT CARE. AND THAT WAS THE PITCH I MADE TO THE NCI AND FDA. LET'S TRY TO THINK ABOUT HOW WE CAN BRING THAT ADDITIONAL PERSPECTIVE TO BIG DATA INITIATIVES AS WE GO THROUGH -- YOU KNOW, HOW CAN WE IMPROVE PATIENT CARE TOGETHER. THIS WAS REALLY THE RESULT OF A COOPERATIVE EFFORT WITH THE CENTER FOR STRATEGIC SCIENTIFIC INITIATIVES, AND WE HAVE MICHELLE BERNY-LANG TO SPEAK ON BEHALF OF WHAT THEY ARE DOING. SPECIFICALLY TO GIVE A SENSE OF WHAT IS HAPPENING, ESPECIALLY FOCUSING ON CLINICIAN-LED INITIATIVES IN OUR FIELD. AND ON CLINICIAN AND PROVIDER ENGAGEMENT. AS A RESULT OF THIS WORKSHOP I HOPE WE CAN ALL THINK ABOUT HOW WE, YOU KNOW, VIEW OUR OWN DATA STEWARDSHIP IN TERMS OF OUR OWN INSTITUTIONS AND HOW DO WE FOCUS ON THAT DATA TO CREATE INITIATIVES THAT WILL PUSH FORWARD PATIENT CARE. YOU KNOW, THROUGH THE CONVERSATIONS THAT I'VE HAD WITH MANY OF YOU WHO ARE HERE TODAY, I THINK A LOT OF IMPORTANT AND GREAT QUESTIONS CAME UP, PARTICULARLY WHAT ARE THE COMPETENCIES FOR OUR FUTURE WORKFORCE, HOW WILL WE ACTUALLY GET THERE, WHAT CHANGES DO WE NEED TO MAKE FROM AN EDUCATIONAL PERSPECTIVE TO ACHIEVE THAT? AND I THINK THOSE ARE GOING TO BE GREAT CONVERSATIONS TO HAVE, NOT JUST DURING OUR BREAKS OR OUR PANELS BUT REALLY STARTING TO HAVE THOSE CONVERSATIONS LONGITUDINALLY. MORE IMPORTANTLY, I THINK THE DISCUSSION WE HAVE TODAY SHOULD TRY TO FOCUS AROUND A FEW CENTRAL THEMES. AND I SAY THIS BECAUSE WE OFTEN GET CRITICIZED FOR BEING SILOED. YOU KNOW, WHETHER IT'S OUR FIELD OF RADIATION ONCOLOGY AS KEVIN ALLUDED TO BEING TUCKED AWAY IN THE BASEMENT AS A PHYSICAL SILO, OR REALLY KIND OF DIGITAL SILOS WHAT DATA IS ACTUALLY KEPT SEPARATE IN A WAY THAT'S NOT MEANINGFULLY SHARED. SO WE SHOULD REALLY BE THINKING ABOUT WHAT ARE THE STANDARDS THAT WE ARE GOING TO AP LINE -- APPLY TO DATA WE USE, HOW CAN WE BE TRANSPARENT, YOU KNOW, WITH THE DATA THAT WE USE, AND THE CONCLUSIONS THAT WE MAKE FROM IT AND HOW CAN WE COOPERATIVELY SHARE THIS DATA. BECAUSE AT THE END OF THE DAY, I VIEW US AS A COMMUNITY. AND WHAT ARE WE A COMMUNITY FOR? WE'RE A COMMUNITY FOR PATIENT CARE. WE'RE HERE TO TAKE CARE OF PATIENTS AND SHOULD WORK TOGETHER TO ACHIEVE THAT EFFORT. SO I JUST WANT TO MAKE SURE I MAKE THAT POINT AND HOPEFULLY THAT'S A STICKING POINT THROUGHOUT THE DAY AS WE HAVE THESE GREAT TALKS AND DISCUSSIONS. SO JUST A COUPLE OF DATA POINTS TO START. WE HAD OVER 100 REGISTRATIONS& FOR OUR FIRST CONFERENCE WHICH WAS TERRIFIC. THIS IS WHAT I WAS PARTICULARLY EXCITED ABOUT, IS THAT OF THE REGISTRATIONS WE HAD 45% OF THOSE REGISTRATIONS COMING FROM THE CLINICIAN/PROVIDER SIDE, AND THAT WOULD BE A DIFFERENT MIX THAN WHAT WE WOULD SEE TRADITIONALLY AT OTHER DATA-ORIENTED MEETINGS, SOMETHING I'M EXCITED TO SEE. DATA SCIENTISTS REPRESENT 25%, MEDICAL PHYSICISTS 15%, AND THE REMAINING 15% ACROSS THE BOARD, PARTICIPATION FROM VARIOUS DEPARTMENTS AND AGENCIES WITHIN THE NIH AND ACROSS GOVERNMENT. 56 DISTINCT INSTITUTIONS OR ORGANIZATIONS AND COMPANIES, I'M NOT GOING TO BOTHER TO NAME THEM ALL BECAUSE I LOVE YOU ALL EQUALLY, OKAY? [LAUGHTER] SO I THINK IT'S REALLY TERRIFIC THAT WE'RE ALL ABLE TO COME TOGETHER TODAY. AND MY PERSPECTIVE AGAIN IS THAT OF A PHYSICIAN WHO IS NOT A DATA SCIENTIST BUT HAS INTEREST IN DATA SCIENCE BECAUSE I THINK THE ROLE IT CAN PLAY IN IMPROVING PATIENT CARE IS IMPORTANT. I WANTED TO GIVE A BRIEF HIERARCHY OF NEEDS ABOUT HOW I THINK ABOUT DATA. AND I THINK THIS IS REALLY A GREAT PICTURE. THIS IS FROM A DATA SCIENTIST, MARIA RIGATTI. WHEN WE TALK ABOUT A.I. AND DEEP LEARNING AND NATURAL LANGUAGE PROCESSING, WHAT YOUR TAKEAWAY FROM THIS SHOULD BE, THAT THAT IS THE TIP OF THE ICEBERG, OKAY? THAT IS THE THING THAT WE SEE VISIBLE IN TERMS OF INITIATIVES AND THINGS THAT ARE HAPPENING. BUT THERE'S SO MUCH MORE BELOW IT THAT WE NEED TO BE THINKING ABOUT. AND SPECIFICALLY, FOUNDATIONALLY, HOW SHOULD WE THINK ABOUT DATA? WHERE DOES THE DATA THAT WE COLLECT COME FROM? IS THAT DATA STRUCTURED OR UNSTRUCTURED? HOW DOES THE DATA MOVE FROM THAT PERSPECTIVE? SO STARTING FROM THE BOTTOM WHERE THAT DATA COMES FROM, HOW WE COLLECT THAT DATA, HOW DATA MOVES AND HOW WOULD WE ESTABLISH THE INFRASTRUCTURE FOR THAT DATA, HOW DO WE THEN CLEAN THE DATA? STARTING WITH THE CLEAN DATA, THAT'S GOING TO BE A THEME WE SEE, GARBAGE IN, GARBAGE OUT, WE ALWAYS HEARD THAT BEFORE, BUT THE CLEAN DATA WILL SERVE AS THE FOUNDATION FOR THE ANALYTICS AND THE FURTHER -- THE TESTING AND ALGORITHMS WE APPLY TO IT. SO A.I., DEEP LEARNING, THESE ARE ALGORITHMS THAT WE APPLY TO DATA. WE NEED TO BE THINKING ABOUT THE FOUNDATION BELOW THAT. SO I WANT YOU TO TRY TO TAKE THIS DATA SCIENCE HIERARCHY NEEDS AND USE IT AS A FOUNDATION AND REFERENCE AND REALLY A FRAMEWORK FOR HOW YOU THINK ABOUT DATA AS WELL. I KNOW I DON'T HAVE TO TELL A LOT OF YOU IN THE AUDIENCE THAT, BUT IT WAS EYE OPENING AND IMPORTANT TO THINK ABOUT IT THIS WAY AND I HOPE IT'S VALUABLE FOR YOU TOO. I AGAIN WANT TO THANK YOU FOR YOUR PARTICIPATION AND BEING HERE. WE REALLY STRONGLY ENCOURAGE ANY COMMENTS AND THOUGHTS TO BE SENT TO US. YOU KNOW, TODAY AND OFFLINE, POST WORKSHOP. WE'LL BE LOOKING TO CONTINUE THESE CONVERSATIONS AS WE KIND OF THINK ABOUT FUTURE WORKSHOP CONTENT AND COLLABORATIVE OPPORTUNITIES, AND, AGAIN, THANK YOU ALL FOR COMING. I WANT TO INTRODUCE MICHELLE BERNY-LANG FROM THE CENTER FOR STRATEGIC SCIENTIFIC INITIATIVES. >> THANK YOU, ANSHU. GOOD MORNING TO YOU ALL. I WANT TO TAKE A MOMENT TO INTRODUCE HOW OUR CENTER GOT WRAPPED UP IN THIS OPPORTUNITY. IT'S BEEN GREAT TO PLAN THIS MEETING TOGETHER AND GIVE YOU BACKGROUND ON SOME RELEVANT PROGRAMS AND WAYS WE MIGHT BE ABLE TO THINK ABOUT THE FUTURE. AS KEVIN MENTIONED, I'M REPRESENTING ONE OF THE EXTRAMURAL FUNDING PARTS OF NCI, SO WE REALLY LIKE TO USE THESE OPPORTUNITIES TO FIND THE GREAT IDEAS, TO SEE WHAT WE MAY BE ABLE TO DEVELOP IN THE FUTURE. SO AS WAS MENTIONED, I'M COMING FROM THE NCI CENTER FOR INTEREST SCIENTIFIC INITIATIVES, WE'RE LOCATED WITHIN THE OFFICE OF THE DIRECTOR. AND FOR THE PAST TEN YEARS HAVE BEEN CHARGED WITH DEVELOPING EXPLORATORY PROGRAMS, WITH PILOT PROJECTS, WITH TESTING OUT NEW TECHNOLOGIES, AND ADVANCING DATA TOOLS AND STANDARD TO REALLY SUPPORT THE SPECTRUM OF CANCER RESEARCH. AND SOME PROGRAMS THAT YOU MIGHT THINK ABOUT AND MIGHT BE FAMILIAR WITH THAT HAVE COME OUT OF OUR CENTER HAVE BEEN THE CANCER GENOME ATLAS, WE'VE HAD EFFORTS IN NANOTECHNOLOGY, SIMILAR TO THE CANCER GENOME ATLAS. WE HAVE A FOLLOW-ON PROGRAM ON PROTEOMICS. WE CURRENTLY HAVE THE PROVOCATIVE QUESTIONS INITIATIVE. AND THE ONES I WANTED TO HIT ON THAT MIGHT BE RELEVANT TO THIS PRESENTATION TODAY AND THE REST OF THE DAY ARE TWO PROGRAMS DEVELOPING LARGE DATASETS, CANCER GENOME ATLAS AND PROTEOMICS PROGRAM AND SEVERAL SPEAKERS WILL TALK ABOUT HOW THEY ARE LINKED WITH IMAGING DATA AND WORKING WITH THAT DATA. OUR DATA SCIENCE TRAINING PROGRAM, THROUGH THIS WE CAME TO TALK WITH CENTER FOR CANCER RESEARCH AND THINK ABOUT HOW THIS MODEL WHICH I'LL TALK ABOUT IN A SECOND COULD MAYBE BE APPLIED TO OTHER DOMAINS, COULD BE WORKED TOGETHER WITH OTHER AGENCIES, WITH OTHER PARTS OF NIH. SO, THE WAY THAT WE CAME TOGETHER WAS LEARNING ABOUT THE BIG DATA SCIENTIST TRAINING ENHANCEMENT PROGRAM, IT'S BEEN IN OUR CENTER FOR THE PAST FOUR YEARS AND IS A COLLABORATION WITH THE DEPARTMENT OF VETERANS AFFAIRS. AND WHAT WE SAW WITH THIS IS THE V.A. LIKE RADIATION ONCOLOGY REALLY HAS TREMENDOUS DATA RESOURCES FROM THEIR LONGITUDINAL ELECTRONIC HEALTH RECORDS DATA, TO IMAGING DATA, TO MOVING INTO MORE GENETIC AND GENOMIC DATA. THEY WERE INTERESTED IN HOW THEY COULD USE THAT DATA TO ADVANCE BOTH CANCER RESEARCH AND CANCER CARE, AND CONTINUE TO EXPAND ON EFFORTS THAT THEY ARE DOING IN THAT SPACE. AT THE SAME TIME, THROUGH A LOT OF OUR PROGRAMS, WE WERE STARTING TO DEVELOP DATA SCIENTISTS. SO COMPUTATIONALLY TRAINED INDIVIDUALS THAT HAD GREAT MATH AND STATISTICS SKILL SETS AND HAD THE OPPORTUNITY TO GET INVOLVED AND LEARN DOMAINS OF HEALTH CARE AND CANCER RESEARCH. SO WE'VE BROUGHT IN POSTDOCS, TO GO TO THE V.A., TO SPEND A YEAR WORKING WITH THEIR CLINICIANS. TOGETHER THEY COME UP WITH CLINICALLY RELEVANT QUESTIONS IN ONCOLOGY. AND THESE DATA SCIENTISTS TURN TO A MORE HEALTH CARE AND CANCER RESEARCH FOCUS. THIS IS A MODEL THAT WE THINK COULD BE APPLIED TO OTHER DOMAINS AND ACROSS OTHER AGENCIES. AND I WOULD BE REMISS IF I DIDN'T MENTION WE'RE CURRENTLY RECRUITING FOUR FELLOWS. SO IF YOU KNOW OF POSTDOCS WITH THE SKILL SET THAT MIGHT WANT TO PARTICIPATE WE WOULD LOVE TO HAVE MORE JOIN OUR PROGRAMS. IT'S BEEN A GREAT WAY TO SHOW WHAT DATA SCIENTISTS AND CLINICIANS CAN DO TOGETHER THROUGH COLLABORATION. A QUICK SNAPSHOT, IT CAN GIVE AN IDEA OF THE BREADTH OF RESEARCH AND BREADTH OF ADVANCEMENT IN KNOWLEDGE ABOUT CARE THAT CAN BE DEVELOPED. SO THROUGH OUR FELLOW THAT WAS AT STANFORD, PALO ALTO, TALENTED IN NATURAL LANGUAGE PROCESSING AND COULD WORK WITH CHART NOTES AND UNSTRUCTURED DATA, WORKING WITH A SURGEON TO GET A BETTER HANDLE AND MORE KNOWLEDGE ABOUT PAIN AFTER SURGERY. AND QUALITY METRICS USING THOSE CHART NOTES. IN BOSTON WE HAVE A FELLOW WORKING WITH A CLINICIAN THAT FOCUSES ON MULTIPLE MYELOMA. AND THEY WERE ABLE TO USE THE V.A. HEALTH RECORDS AND DEVELOP A COHORT OF BOTH MYELOMA PATIENTS AND MYELOMA PRECURSOR PATIENTS TO LOOK AT THEIR TRAJECTORY OF DISEASE WHEN THEY ARE IN THE V.A. HEALTH SYSTEM. MYELOMA HAS A DIFFERENCE IN INCIDENCE ACROSS CAUCASIAN AMERICANS AND AFRICAN-AMERICANS, AND CAN ALSO BE RELATED TO A DIFFERENCE IN OUTCOMES AS WELL. BUT WHEN THIS WAS STUDIEDED IN V.A. HEALTHCARE SYSTEM INDIVIDUALS WERE RECEIVING THE SAME CARE, THE AFRICAN-AMERICANS FARED BETTER WHEN THEY HAD MULTIPLE MYELOMA. IN HOUSTON, COMPARISONS OF FRAILTY FROM AGE, CONGESTIVE HEART FAILURE IN CANCER, SEEING HOW MACHINE LEARNING MODELS CAN COMPARE WITH CLINICAL EXPERT, FOUND SOME VALUE TO INCORPORATING MACHINE LEARNING TO ASSESSMENTS OR FRAILTY. AND FINALLY THE LAST EXAMPLE, MOST HIGHLY RELEVANT TO THE AUDIENCE, A FELLOW WHO IS HERE TODAY, HE'S AT THE DURHAM V.A. IN DUKE. RADIOMIX BIOMARKERS, ASSOCIATION WITH RESPONSE TO RADIATION THERAPY WITH FOCUS ON LUNG CANCER, INVOLVED IN A VARIETY OF EFFORTS LINK WORK WITH MATH, PHYSICS, RADIOONCOLOGY, BENEFIT OF GROUPS WORKING TOGETHER TO ADDRESS QUESTIONS. THIS IS AN IDEA OF THE BREADTH OF EFFORTS DONE, WE WOULD BE INTERESTED IN SEEING HOW WE CAN EXPAND THESE EFFORTS AND CONTINUE AND THINK OF NEW DOMAINS WHERE THE SAME METHODS CAN BE APPLIED. THINKING OF OPPORTUNITIES, WE WOULD LIKE TO CONTINUE TO APPLY THIS MODEL IN OTHER SETTINGS, AND REALLY PERHAPS COME UP KEY CLINICAL QUESTIONS IN RADIATION ONCOLOGY TO PRIORITIZE AND SEE WHERE CURRENT DATA SCIENTISTS MAY BE ABLE TO DEPLOY THEIR MOVING FORWARD WE KNOW WE'RE HITTING A LARGE AMOUNT OF TOPICS TODAY AND I THINK WE'LL BE ABLE TO COME UP WITH GREAT IDEAS OF TOPICS WE WANT TO ENGAGE IN FURTHER AND EXPLORE MORE AND THINK MORE DEEPLY ABOUT. I THINK THIS MEETING WILL GIVE US A GOOD STARTING POINT AND A GREAT PLACE TO MOVE FORWARD ON BOTH WAYS TO COLLABORATE AND WAYS TO THINK ABOUT AREAS MOVING FORWARD. SO THANK YOU SO MUCH FOR BEING HERE. WE'RE EXCITED TO KICK THIS MEETING OFF. [APPLAUSE] >> I JUST WANT TO TELL EVERYONE WE DO HAVE A HASHTAG FOR THE MEETING, SO PLEASE TWEET IT OUT. INSTAGRAM, IF THAT'S WHAT YOU DO, ANYTHING IS WELCOME. NCI RADONC -- NCIDATARADONC. PLEASE DO THAT. NCI WILL HOPEFULLY RETWEET THAT. I WANT TO GO INTO OUR FIRST SESSION SO I'LL CALL UP DR. ANEJA FROM YALE TO LEAD THE FIRST SESSION ON DATA INFRASTRUCTURE. DR. ANEJA, WELCOME. >> CAN I INTERRUPT? MEN AND WOMEN'S ROOMS ARE ACROSS THE LOBBY, OKAY? THERE'S ANOTHER SET DOWN THERE. >> HEY, EVERYONE. I'M SANJAY ANEJA, ASSISTANT PROFESSOR AT YALE. THE FIRST SECTION WE'LL DISCUSS IS ABOUT INFRASTRUCTURE, AN IMPORTANT TOPIC IN THE REALM OF DATA SCIENCE, SOMETHING I DON'T THINK WE OFTENTIMES, YOU KNOW, ARE AS COGNIZANT OF. I TELL THE POSTDOCS IN MY LAB, 90% OF YOUR TIME IS DOING DATA CLEANING AND MANAGING AND MAYBE 10% IS PRETENDING YOU WORK AT GOOGLE. IT'S IMPORTANT. THIS IS AN AREA WHERE I FIND THERE'S A LOT OF NEED, AND SOME WORK OUR LAB HAS DONE IN THIS AREA, SOMEWHAT -- SOME COULD BE MORE VALUABLE THAN ALGORITHMS WE DEVELOP AND THAT ARE IMPROVED WITH EVERY ITERATION. THE FIRST SPEAKER WE'LL INVITE UP IS DR. ERIKA KIM, SHE HAS A BACKGROUND IN BIOINFORMATICS HERE, SHE'S PROGRAM MANAGER FOR CANCER INFORMATICS BRANCH, AND SHE HAS 12 YEARS OF EXPERIENCE PREVIOUSLY FOR THE NATIONAL HUMAN GENOME PROJECT. >> GOOD MORNING. I WOULD LIKE TO THANK THE ORGANIZERS FOR GIVING ME THE OPPORTUNITY TO SHARE WITH YOU ABOUT THE DATA SCIENCE INFRASTRUCTURE THAT WE AT NCI ARE CURRENTLY DEVELOPING. HOW DO I PROCEED HERE? NEXT, OKAY. SO THE DATA SCIENCE INFRASTRUCTURE WE'RE DEVELOPING IS CALLED CANCER RESEARCH DATA COMMONS, WHICH WILL BE A PART OF THIS LARGER CANCER DATA ECOSYSTEM. TO GIVE YOU A BIT OF A BACKGROUND, IN 2015 PRECISION MEDICINE INITIATIVE WAS ANNOUNCED. IT'S A NATIONWIDE INITIATIVE TO MOVE AWAY FROM THIS ONE-SIZE-FITS-ALL TREATMENT APPROACH AND ESTABLISH TAILORED PERSONALIZED TREATMENT AND PREVENTION STRATEGIES TO PEOPLE'S UNIQUE CHARACTERISTICS, INCLUDING LIFESTYLE, ENVIRONMENT AND UNIQUE BIOLOGY. PRECISION MEDICINE FACES A GRAND CHALLENGE. IT REQUIRES A DEEP UNDERSTANDING OF OUR BIOLOGICAL SYSTEM, AS WELL AS ADVANCES IN SCIENTIFIC METHOD, INSTRUMENTATION AND TECHNOLOGY. AND ADVANCES IN DATA MANAGEMENT AND COMPUTATION. AND ALSO MOST IMPORTANTLY, THE ACQUIRED ABILITY TO APPLY THESE ADVANCES TO DRIVE RESEARCH AND BETTER TREATMENT. AND IN ORDER TO DO THAT YOU NEED TO HAVE AN ABILITY TO SECURELY SHARE DATA ACROSS DOMAINS, INSTITUTIONS AND STAKEHOLDERS. SO THE KEY TENET OF THE PRECISION MEDICINE INITIATIVE IS SECURE RESPONSIBLE ACCESS TO HIGH QUALITY DATA. NOT JUST JUNK IN AND JUNK OUT. SO FOLLOWING YEAR, 2016, CANCER MOONSHOT RESEARCH INITIATIVE WAS ANNOUNCED WITH OVERARCHING GOALS TO ACCELERATE PROGRESS IN CANCER, INCLUDING PREVENTION AND SCREENING, TO ENCOURAGE GREATER AND BETTER COOPERATION AND COLLABORATION WITH ACADEMIA, GOVERNMENT, PRIVATE SECTOR. AND MOST IMPORTANTLY ENHANCE DATA SHARING. SO IN OCTOBER 2015 BLUE RIBBON PANEL OUTLINED TEN RECOMMENDATIONS, ONE KEY RECOMMENDATION TO OUR CENTER IS TO BUILT A NATIONAL CANCER DATA ECOSYSTEM, SUPPOSED TO ENHANCE CLOUD COMPUTING PLATFORM AND PROVIDE SERVICES THAT CAN LINK DISPARATE IMAGING, AND PROVIDE ESSENTIAL UNDERLYING DATA SCIENCE INFRASTRUCTURE STANDARDS METHOD AND PORTALS FOR DATA ECOSYSTEM AND ALSO TO ESTABLISH A SUSTAINABLE DATA GOVERNANCE TO ENSURE LONG-TERM HEALTH OF THE ECOSYSTEM. YOU'RE CHARGED WITH DEVELOPING STANDARDS AND TOOLS SO DATA ARE INTEROPERABLE WHICH I WILL TOUCH ON IN LATER SLIDES. THIS FIGURE ILLUSTRATES A NATIONAL CANCER DATA ECOSYSTEM, COMPONENTS OF PROPOSED ECOSYSTEM. THERE ARE A NUMBER OF PROJECTS AND PROGRAMS AND INITIATIVES THAT ARE INVOLVED TO BE ABLE TO INTEGRATE DATA FROM BASIC RESEARCH ON THE BOTTOM OF THE FIGURE THROUGH CLINICAL CARE. NCI'S CANCER RESEARCH DATA COMMONS IS PART OF THIS LARGER ECOSYSTEM. SO FIRST WHAT IS DATA COMMONS? SOME OF OF YOU MIGHT BE FAMILIAR. DATA COMMONS, CO-LOCATE DATA, STORAGE, COMPUTING INFRASTRUCTURE WITH COMMONLY USED TOOLS AND ANALYTICAL PIPELINES AND TO SHARE DATA TO CREATE AN INTEROPERABLE RESEARCH FOR RESEARCH COMMUNITY. GOALS OF OUR NCI CANCER RESEARCH DATA COMMONS IS ENABLE THE COMMUNITY TO SHARE DIVERS DATA TYPES ACROSS PROGRAMS AND INSTITUTIONS AND PROVIDE SECURE ACCESS TO DATA REGARDLESS OF WHERE IT IS STORED, AND PROVIDE MECHANISM FOR INNOVATIVE TOOL DISCOVERY, VISUALIZATION AND ANALYTICAL TOOLS USING COMPUTING AVAILABLE THROUGH COMMERCIAL CLOUD PLATFORMS. ALSO WE'RE CHARGED WITH HELPING NCI DATA COORDINATING CENTERS TO SUSTAIN AND SHARE THEIR DATA PUBLICLY. SO TO DESCRIBE AND GIVE YOU AN OVERVIEW OF THE NCI CRDC, DATA ARE STORED IN DOMAIN SPECIFIC REPOSITORY. DATA ACCESS IS CONTROLLED THROUGH A CENTRALIZED AUTHORIZATION SYSTEM THAT SECURES THE DATA, WE INVITE RESEARCHERS TO BRING THEIR OWN DATA AND TOOLS TO THE CLOUD AND COMBINE WITH THE DATA THAT'S AVAILABLE IN THE CRDC FOR INTEGRATIVE ANALYSIS. ON THE RIGHT RECOMMENDS A VISION OF OUR CRDC. THERE ARE DIFFERENT CYLINDERS REPRESENTED BY DIFFERENT COLOR, TWO BLUE NODES THAT ARE CURRENTLY OPERATIONAL ARE GENOMIC DATA COMMONS AND PROTEOMIC DATA COMMONS. RED REPRESENTS NODES IN DEVELOPMENT WHICH INCLUDE IMAGING DATA COMMONS, ONE OF THE EXCITING INTEGRATIVE K9, AND WHITE CYLINDERS REPRESENT THE NODES CURRENTLY IN CONCEPTUAL STAGE. AS I MENTIONED IT'S CONNECTED BY THIS CENTRALIZED AUTHORIZATION, WE HAVE THREE DIFFERENT NCI CLOUD RESOURCES WHERE YOU CAN COME IN TO DO THE COMPUTE AND GET ACCESS TO DATA THAT'S STORED IN CANCER RESEARCH DATA COMMONS. WHY ARE WE CREATING THE REPOSITORY AND SO FORTH? THERE ARE A NUMBER OF NCI PROGRAMS GENERATING MULTI-MODAL DATA. AS MICHELLE MENTIONED, THERE'S A ICPC AND APOLLO, ALL GENERATING PROTEOGENOMIC DATA WHICH INCLUDES A NUMBER OF DIFFERENT IMAGING MODALITIES. WE ALSO HAVE NUMBER OF PROGRAMS THAT CAME OUT OF CANCER MOONSHOT RESEARCH INITIATIVE GENERATING ALL DIFFERENT TYPES OF DATA. AS AN EXAMPLE, HUMAN ATLAS TUMOR NETWORK SOME OF YOU MIGHT KNOW IS GENERATING A WEALTH OF DATA THAT ARE DIVERSE, SO TO JUST GIVE AN EXAMPLE OF DIFFERENT DATA TYPES THAT ARE PLANNED TO BE GENERATED AND WILL BE ADDITIONAL NEW DATA TYPES WILL BE GENERATED IN THE FUTURE, INCLUDE CHROMATIN CONFIRMATION, EM IMAGING A NUMBER OF OMICS SEQUENCING DATA, HISTOLOGY AND MULTIPLEXED 2D, 3D IMAGING DATA, MEDICAL IMAGING, ALSO TO ADD ON TOP OF THIS DIVERSE DATA TYPES BEING GENERATED, MANY ARE NOW BEING RESOLVED. IN ORDER FOR H10 PROGRAMS TO SUCCEED WE NEED TO PROVIDE A WAY TO STORE ALL THIS DIVERSE DATA TYPES, TO BE ABLE TO DO MULTI-SCALE MODELING IN ORDER FOR THEM TO CREATE A HUMAN TUMOR ATLAS NETWORK. SO TO DESCRIBE HOW OUR CRTC IS BEING DEVELOPED WE HAVE A NUMBER OF PARALLEL EFFORTS ONGOING, SO ONE OF THE FOUNDATIONAL COMPONENTS OF CRDC IS DATA COMMONS FRAMEWORK. WE'RE CALLING THIS A FRAMEWORK BECAUSE IT'S INTENDED TO DEVELOP AND DEFINE CORE OPERATING PRINCIPLES AND PROVIDE A REUSABLE MODULAR COMPONENT AND SERVICES UTILIZED ACROSS DATA COMMONS. A NUMBER OF AVAILABLE, INCLUDING FENCE, INDEX SERVICE THAT PROVIDES A METADATA VALIDATION TOOLS. THERE'S A DATA MODELS AND DATA DICTIONARIES AVAILABLE FROM DATA COMMONS FRAMEWORK AND API AND CONTAINER ENVIRONMENTS FOR TOOLS AND PIPELINES AND ACCESS TO COMPUTATIONAL WORKSPACE FOR STORING DATA TOOLS AND RESULTS. SO WE AT NCI ARE DEVELOPING THE FRAMEWORK, WE'RE USING IT CURRENTLY TO STAND UP SEVERAL DATA NODES, AND IT WILL BE AVAILABLE FOR THE COMMUNITY TO LEVERAGE OR USE AS A MODEL TO BUILD THEIR OWN DATA COMMONS. IN 2016, THE GDC CAME ONLINE. WHICH I'M NOT GOING TO GO IN DEPTH ABOUT THE GDC BECAUSE I COULD SPEND 20 OR 30 MINUTES TALKING ABOUT GDC DATA. THIS IS LANDING PAGE, YOU CAN CERTAIN FOR YOUR SAFE FAVORITE GENE OF INTEREST, TUMOR TYPE OR PROGRAM. CURRENTLY FOUNDATION MEDICINE, GENOMIC DATA FROM CIPC PROGRAM. THERE'S A URL. THE SECOND NODE THAT BECAME OPERATIONAL IS PROTEOMIC DATA COMMONS, HOSTING DATA FROM AND WILL HOST HIGH VALUE DATA ARE GENERATED FROM MAJOR PROTEOGENOMIC IMAGES, BOTH WITHIN USA AS WELL AS INTERNATIONAL PROGRAMS THAT INCLUDE APOLLO, ICPC AND KIDS FIRST, ALSO DATA HARMONIZATION PIPELINES, DATA PORTAL IS INTUITIVE WEB INTERFACE FOR USERS TO BROWSE AND SEARCH, ANALYZE DATA. IT PROVIDES AND ENVIRONMENT FOR PROTEOMIC DATA, IN THE FUTURE WILL OPERATE WITH DIFFERENT NODES. A USER CAN SUBMIT DATA, SHARE DATA WITH OTHERS BEFORE IT'S MADE PUBLICLY AVAILABLE. SO THIS IS A LANDING PAGE FOR PDC TO COME AND LOOK AND EXPLORE. WE WELCOME ALL DATA TESTERS TO SEE HOW IT IS AND WHAT OTHER FEATURES YOU'RE INTERESTED IN DEVELOPING, SO PLEASE COME AND PLAY WITH THE PROTEOMIC DATA ON PDC. WE'RE AT THE RADIATION ONCOLOGY WORKSHOP, I WANTED TO TOUCH ABOUT THE IMAGING DATA COMMONS. THIS IS A NODE WE'RE CURRENTLY IN THE PROCESS OF AWARDING. SO THE MOTIVATION FOR STANDING UP IMAGING DATA COMMONS IS I DON'T NEED TO PREACH TO THIS AUDIENCE BUT IMAGING PLAYS A CRUCIAL ROLE IN CANCER. DIFFERENT TYPES ARE OFTEN SILOED AND PREVENTS INTEROPERABILITY SO COMMON PLATFORM IS NEEDED TO STORE, ACCESS AND ANALYZE, TO HELP LEAD TO DISCOVERY. IDC WILL USE CORE SOFTWARE COMPONENTS THAT I DESCRIBED UNDER THE DATA COMMONS FRAMEWORK, ALSO USED BY OTHER NODES TO FACILITATE INTEROPERABILITY AND USER EXPERIENCE. SO THE IMAGING IN CANCER IS COMPRISED OF IMAGE TYPES. œI SHOW IMAGES ASSOCIATED WITH LUNG CANCER RANGING FROM DIAGNOSTIC TO FUNDAMENTAL RESEARCH. SO THE CANCER AM ACKNOWLEDGING ARCHIVE TCIA WHICH YOU'LL HEAR MORE ABOUT IN THE NEXT PRESENTATION IS CURRENTLY AN NCI REPOSITORY FOR RADIOLOGY IMAGES, NOW DIGITAL PATHOLOGY IMAGES ARE INCLUDED. MOST ARE IN DICOM STANDARDS. HOSTED COLLECTIONS FROM 45,000 SUBJECTS FROM MAJOR CANCER RESEARCH INITIATIVES AND AS I MENTIONED THERE ARE A NUMBER OF OTHER NCI PROJECTS THAT ARE GENERATING IMAGING DATA THAT'S NOT JUST THE RADIOLOGY IMAGES, DIGITAL PATHOLOGY, LIKE I MENTIONED H10 IS GENERATING A NEW DIVERSE TYPE OF IMAGING DATA AS WELL TO RESOLVE THE HUMAN TUMOR ATLAS NETWORK AT SINGLE CELL RESOLUTION. SO OUR IDEA TO BEST DERIVE IDC DEVELOPMENT, FIRST FOCUS ON ESTABLISHED TECHNOLOGY, BY INITIALLY FOCUSING ON RADIOLOGY METHODS TAKE ADVANTAGE OF ESTABLISHED MODALITIES PROTOCOLS AND STANDARD USING RADIOLOGY, FACILITATING A RAPID ESTABLISHMENT OF IDC, AND OTHERS WILL BE ADDED OVER TIME, WHOLE SLIDE IMAGING, DIGITAL PATHOLOGY THAT HAS NO UNIVERSAL STANDARD, ALSO MULTIPLE METHODS GENERATED FROM H10 PROGRAM FOR EXAMPLE. NOW YOU HAVE REPOSITORIES HOSTING DIFFERENT TYPES OF DATA, WHAT CAN YOU DO WITH IT? HOW DO I COMPUTE? HOW DO I ACCESS AND BRING ANALYTICAL TOOLS? WE AT NCI HAVE FUNDED THREE CLOUD RESOURCES, ONE FOR SYSTEMS BIOLOGY, CANCER GENOMICS CLOUD, SEVEN BRIDGES CANCER GENOMICS CLOUD, A COMMERCIAL PLATFORM. AND FIRE CLOUD FROM BROAD INSTITUTE. THE CLOUD RESOURCES PROVIDE ACCESS TO LARGE CANCER DATASETS, SUCH AS TGCA, WITHOUT DOWNLOADING THE SET AND GIVE ABILITY TO ACCESS POWERFUL ANALYTICAL TOOLS AND PIPELINES, WITH A CLICK OF A FEW MOUSE BUTTONS, BRINGING YOUR OWN DATA TO THE CLOUD RESOURCES AND PIPELINES TO THE DATA THAT EXISTS WITHIN THE CLOUD RESOURCES, ALSO PROVIDES WORKSPACES FOR RESEARCHERS TO SAVE AND SHARE DATA, AND RESULTS OF ANALYSIS SECURELY. IT ALLOWS USERS TO PERFORM LARGE SCALE ANALYSIS USING THE POWER OF COMMERCIAL CLOUD PLATFORMS AND FOR CONTROLLED ACCESS DATA WE PROVIDE dbGAP AND AUTHORIZED USERS ACCESS, AS SOON AS OUR SYSTEMS MEET STRICT FEDERAL SECURITY GUIDELINES. SO GOING DEEPER DIVE INTO AN INDIVIDUAL NODE, HOW IT'S CONNECTED TO THE CLOUD RESOURCES, YOU CAN SEE ONE OF THE NODES REPRESENTED BY THE LEFT FIGURE HAS ON THE BOTTOM CLOUD-BASED DATA REPOSITORY THAT HOSTS DATA. WHICH HAS A DATA MODEL THAT'S DOMAIN SPECIFIC AND SERVICES PROVIDED BY DATA COMMONS FRAMEWORK SUCH AS DIGITAL ID METADATA SERVICES, AUTHENTICATION AND PORTAL. CLOUD RESOURCES ALSO USE FENCE FOR AUTHENTICATION AUTHORIZATION, PROVIDING WORKSPACE, ANALYTICAL TOOLS, AS I MENTIONED. HAVING THIS CLOUD REPOSITORY DOES NOT ALLOW YOU TO INTEROPERATE DIFFERENT DOMAIN-SPECIFIC DATA TYPES. SO, IN THE PAST YEAR OUR TEAM HAS PLACED A LOT OF EMPHASIS ON UNDERSTANDING AND DEVELOPING METHODOLOGIES NEEDED TO DELIVER A MORE INTEGRATED APPROACH, WITH THE FOUNDATION WE'RE CURRENTLY DEVELOPING A SOFTWARE CALLED CANCER DATA AGGREGATOR, HIGHLIGHTED IN RED, THAT CAN ALLOW YOU TO LINK DIFFERENT DATA TYPES THROUGH A STRUCTURED DATA MODEL AND METADATA IN ORDER TO ACHIEVE INTEROPERABILITY. USERS CAN COME FROM DIFFERENT AVENUES. USERS CAN COME FROM A CLOUD RESOURCES, OR PORTALS, OR OTHER THIRD PARTY APPLICATIONS SUCH AS OUR STUDIO OR NOTEBOOK, IN ORDER TO LINK DIFFERENT DATA THAT ARE STORED IN DOMAIN-SPECIFIC REPOSITORIES AS REPRESENTED ON THE BOTTOM. SO I NEED TO TELL YOU A LITTLE BIT MORE ABOUT THIS NEW INITIATIVE THAT WE HAVE GOING ON. IT'S CALLED CENTER FOR CANCER DATA HARMONIZATION, ANOTHER INAUGURAL COMPONENT OF CRDC CREATING A HARMONIZED DATA MODEL AND PROVIDE CROSS-MAPPING, PROVIDING CONCIERGE SERVICES TO METADATA AND TERMINOLOGY.& IT WILL IMPLEMENT A WEB-BASED PORTAL FOR USERS TO SUBMIT HELP DESK TICKETS, FOR EXAMPLE, AND CREATE, ADOPT, DISSEMINATE DATA HARMONIZATION TOOLS DEVELOPED TO USE, TO BE USED BY CRDC NODES AND NCI COORDINATING CENTERS, DEVELOPING MAPPINGS AND MODELS TO SUPPORT DATA AGGREGATION THROUGH CRDC. THE AGGREGATOR WHICH I HIGHLIGHTED A FEW SLIDES AGO PROVIDES A REUSABLE INFORMATICS SERVICE TO CONNECT DIVERSE DATA TIMES THROUGH A MODEL PROVIDED BY CENTER FOR DATA, CANCER DATA HARMONIZATION, IN SUPPORT OF INTEGRATIVE CANCER RESEARCH. OUR VISION IS THAT THIS WILL BE AN API LAYER, THAT WILL ALLOW USERS TO QUERY ACROSS NCI, CRDC, NOVELTY ONLY THAT ADDITIONALLY NCI DCC DATA AND ALSO OTHER REPOSITORIES NCI WISHES TO INTEROPERATE SUCH AS KIDS FIRST, DRC, THROUGH CLOUD RESOURCES AND OTHER APPLICATIONS. WE'RE ACCEPTING PROPOSALS TO DEVELOP CANCER DATA AGGREGATOR AND PROPOSAL DEADLINE IS AUGUST 12, NOON. AND THERE'S A LINK IF YOU'RE INTERESTED. AND TO FINISH UP, I HAVE -- WE HAVE SEVERAL OTHER CONTRACT SOLICITATIONS THAT WAS PUBLISHED. SO YOU CAN SEE THE FULL LIST OF NUMBER OF THE SBIR CONTRACT SOLICITATIONS ON THIS WEBSITE. SOME RELEVANT TOOLS WE'RE INVITING THE ENTREPRENEURS, SMALL BUSINESS, TO COME IN AND DEVELOP TOOLS FOR US, ARTIFICIAL INTELLIGENCE AIDED IMAGING FOR CANCER PREVENTION, DIAGNOSIS, MONITORING, DE-IDENTIFICATION SOFTWARE FOR RESEARCH, AND WE'RE INVITING DIFFERENT PARTIES TO COME AND DEVELOP CLOUD-BASED SOFTWARE AND TOOLS TO PLUG INTO CANCER RESEARCH DATA COMMONS THAT WE'RE DEVELOPING. IF YOU'RE INTERESTED IN GETTING INVOLVED, YOU HAVE THE COMPONENTS OF CRDC, COME AND CHECK OUT OUR GENERAL INFORMATION WEBSITE, VISIT OUR DATA NODES, STUDY MORE ABOUT THE DCF TO SEE IF YOU CAN APPLY THAT OUR DATA COMMONS. THERE ARE A NUMBER -- THREE DIFFERENT CLOUD RESOURCES TO PLAY. WE GIVE $300 FREE COMPUTE CREDITS TO PLAY WITH THE DATA THAT'S STORED IN OUR CLOUD RESOURCES, AND YOU CAN CONTACT OUR TEAM. DR. KIM IS IN CHARGE OF DATA COMMONS, FOR OTHER QUESTIONS NOT RELATED TO IMAGING DATA COMMONS FEEL FREE TO CONTACT ME. AND THANK YOU. [APPLAUSE] >> THANKS, ERIKA. UP NEXT IS JUSTIN KIRBY, THE TECHNICAL DIRECTOR FOR THE CANCER IMAGING ARCHIVE, FOR RADIATION ONCOLOGISTS. JUSTIN WAS INFORMATICS INSTRUCTOR FOR A COURSE I TOOK WHEN I WAS A RESIDENT. >> GOOD MORNING. I WANT TO TALK ABOUT THE CANCER IMAGING ARCHIVE. I'LL START WITH KIND OF A HIGH LEVEL OVERVIEW AND THEN DIVE INTO SOME OF THE COMPONENTS THAT I'LL RUN THROUGH IN THE FIRST COUPLE SLIDES IN MORE DETAIL AFTER THAT. SO THE CANCER IMAGING ARCHIVE STARTED IN 2011, AND THIS WAS SORT OF BORN OUT OF NECESSITY FOR THE CANCER IMAGING PROGRAM. THEY HAD A FEW DIFFERENT NCI GRANTS, AND INTENTION OF THE GRANTS WAS TO BUILD DATASETS THAT COULD SERVE AS BENCH MARKS FOR EVALUATING DIFFERENT TOOLS, SPECIFICALLY IN LUNG CANCER AT THE TIME. IN ORDER TO SHARE THOSE DATASETS THEY REALIZED YOU HAVE TO FIRST DE-IDENTIFY THAT, AND SO THEY BUILT SOME SOFTWARE AND SLOW REALIZED SOFTWARE BY ITSELF WAS NOT ENOUGH, YOU NEEDED A LAYER OF SERVICES ON TOP OF THAT. AND SO A COUPLE YEARS LATER WE SORT OF CONTINUED ON THAT PATH AND NCI DATA SHARING NEEDS CONTINUED TO GROW IN IMAGING SPACE AND SO WE SORT OF EXPANDED THE SCOPE OF THIS TO JUST TRY TO BASICALLY SUPPORT ALL OF THE IMAGING DATA SHARING NEEDS FOR NCI. SO, IN ORDER TO DOOR THAT, THE SERVICES ARE BROKEN INTO A COUPLE PARTS. THERE'S A MULTI-TIERED DE-IDENTIFICATION SERVICE, SUBMISSION SUPPORT FOR PEOPLE PROVIDING THE DATA AND THEN LONG-TERM PERSISTENT HOSTING. IN TERMS OF HOW THE SITE IS BROKEN UP, WE THINK OF IT IN THREE CATEGORIES. SO WE HAVE DATA COLLECTION CENTERS. THESE ARE BASICALLY THE TEAMS THAT WORK WITH THE SUBMITTERS TO COLLECT THE DATA, CURATE, DE-IDENTIFY, CLEAN IT UP. DATA ACCESS TOOLS, WHAT END USERS WOULD COME IN CONTACT WITH, THIS IS WHERE YOU WOULD BROWSE AND SEARCH AND USE DIFFERENT APIs TO ACCESS THE DATA. AND THEN THE THIRD PIECE IS DATA ANALYSIS CENTERS. WE REALIZED EARLY ON IT WASN'T REALLY FEASIBLE TO BUILD DIFFERENT TOOLS THAT PEOPLE WOULD WANT TO USE TO WORK WITH THE DATA, ALL THE DIFFERENT INTERFACES AND WAYS PEOPLE MIGHT WANT TO ACCESS DATA. WE ENCOURAGED THE COMMUNITY TO TAKE CARE OF THAT. SO THERE'S SORT OF A REGISTRY OF PEOPLE, OR TOOLS AVAILABLE THAT KIND OF EXTEND OUR FUNCTIONALITY. IN TERMS OF THE TECH STACK WE USE TO EXECUTE THE ARCHIVE, I'M NOT GOING TO GET INTO DETAILS BUT OUR PHILOSOPHY IS VERY œSO WE'VE BEEN VERY FORTUNATE TO BE ABLE TO TAKE ADVANTAGE AND BUILD ON TOP OF THE WORK THAT A LOT OF OTHER GREAT PEOPLE HAVE DONE BEFORE US, AND EVERYTHING WE DO WE WANT TO BE THE SAME WAY. SO IF SOMEBODY COMES ALONG AND WANTS TO BUILD AN ARCHIVE FOR SOME OTHER DISEASE BESIDES CANCER THAT'S OUTSIDE OF OUR SCOPE OF INTEREST, WE'RE TRYING TO MAKE THAT KNOWLEDGE AVAILABLE IT OTHER PEOPLE TO BE ABLE TO LEVERAGE IT. WHERE TCI STANDS RIGHT NOW, NEW COLLECTION PROPOSALS COME IN ON MONTHLY BASIS, WE REVIEW THOSE AND THAT'S ONE OF THE WAYS THAT WE ADD DATA. I'LL GET MORE INTO THAT LATER. THERE'S AT THIS POINT WE'RE UP TO 100 COLLECTIONS, WHICH WE HAVE AROUND 46,000 SUBJECTS AVAILABLE FOR DOWN LOAD. THERE'S RADIOLOGY, RADIATION THERAPY, STRUCTURED DOSE PLANNING INFORMATION AND PATHOLOGY. THERE'S A WIDE VARIETY OF CANCERS AVAILABLE, AS WELL AS PHANTOMS, AND MOST DATASETS EARLY ON IT WAS A MIRACLE IF YOU COULD GET SOMEBODY JUST TO SHARE THEIR IMAGES WITH YOU BUT AS TIME HAS GONE BY PEOPLE HAVE GOTTEN BETTER ABOUT SHARING DATA AND BECOME MORE INTERESTED IN IT AND SO THAT'S RAISED THE BAR OVER TIME IN TERMS OF THE QUALITY OF THE DATA SETS THAT COME THROUGH AND THINGS WE'RE ABLE TO ACCEPT. SO ALMOST ANYTHING THAT COMES DOWN THE LINE NOW HAS SOMETHING BESIDES THE IMAGES, WHETHER IT'S CLINICAL INFORMATION LIKE DEMOGRAPHICS, OUTCOME, TREATMENT INFORMATION, OR ANOTHER HIGHLY VALUABLE THING IS ANALYSES OF THE IMAGES, SEGMENTATIONS OR RADIOLOGISTS OR PATHOLOGIST ANNOTATION AND MARKUP. THANKS TO A LOT OF THESE LARGE SCALE NIH DATA COLLECTION PROGRAMS, THINGS LIKE TCGA, CPTAC HAVE BEEN ESTABLISHED WITH THE REALLY RICH GENOMIC AND PROTEOMIC INFORMATION LINKED TO IMAGE DATASETS WE HAVE NOW. I WANT TO TAKE A SECOND TO TALK ABOUT THE WAY WE HANDLE DATA USE POLICY AND CITATIONS. THERE'S A COUPLE WAYS YOU CAN TRY TO GET PEOPLE TO SHARE DATA, AND THERE'S THE STICK APPROACH AND THERE'S THE CARROT APPROACH. AND IN SOME CASES, MOST CASES FOR US, WE'RE SORT OF FORCED TO GO WITH THE CARROT. AND SO THE WAY THAT WE DO THAT IS BY TRYING TO INCENTIVIZE PEOPLE WITH ACADEMIC CREDIT AS MUCH AS POSSIBLE, SO GIVE THEM A CITATION FOR THEIR DATASET, THERE'S A NUMBER OF DIFFERENT SORT OF DATA DESCRIPTOR JOURNALS EMERGING TO CARRY THE IDEA FORWARD, NATURE SCIENTIFIC DATA IS THE MOST WELL KNOWN ONE AT THIS POINT. GIVING PEOPLE THE OPPORTUNITY TO GET CREDIT FOR THE TIME EFFORT THAT ALL THE DATA WRANGLING AND PREPARATION THAT'S NECESSARY TO SHARE DATA SET IN A WAY IT'S USEFUL FOR OTHERS, SO WE'RE REALLY TRYING TO PLUG INTO THAT AS MUCH AS WE CAN. AND FOLLOW BEST PRACTICES TO SUPPORT THAT. IN TERMS OF ACCESSING THE DATA, MOST OF WHAT WE HAVE IS 100% OPEN SO YOU DON'T HAVE TO SIGN UP FOR AN ACCOUNT OR REGISTER OR ANYTHING. CAN YOU GO TO THE SITE AND WITHIN A FEW CLICKS YOU'RE DOWNLOADING IMAGES. THERE ARE A HANDFUL OF DATASETS THAT HAVE SPECIAL RESTRICTIONS, AND USUALLY THAT HAD TO DO WITH THE WAY THAT OUR PATIENT CONSENT WAS DONE AT THE TIME, COLLECTING DATA, AND SO IT'S MAYBE 10% OF WHAT WE HAVE THAT FALLS INTO THAT CATEGORY. AND THEN ALMOST EVERYTHING IS SHARED WITH THE CREATIVE COMMONS LICENSE SO ALL WE'RE ASKING YOU TO DO IS JUST ACKNOWLEDGE WHERE THE DATA CAME FROM. AND CITE IT. IN TERMS OF METRICS, IT'S BEEN -- WE'VE SEEN STRONG GROWTH OVER THE YEARS, WE'RE UP TO 15,000 ACTIVE USERS A MONTH, PUSHING OUT 100 TERABYTES TO THE USER EACH MONTH. A LARGE NUMBER OF INCOMING DATA A SETS PEOPLE PROFESSED THAT HAVE BEEN ACCEPTED AND WE'RE WORKING ON MAKING AVAILABLE, THAT'S RESULTED IN 700 PUBLICATIONS SINCE INCEPTION OF ARCHIVE. ANOTHER THING HARD TO TRACK IS INDUSTRY USAGE BUT BECAUSE WE HAVE THE CREATIVE COMMONS LICENSE WE GET OFFLINE FEEDBACK FROM COMPANIES THANKING US SAYING, MAN, I'M REALLY GLAD YOU PUT THIS OUT THERE, WE DEVELOPED OUR INITIAL PROTOTYPES USING YOUR DATASETS, SO THAT'S BEEN GOOD TO HEAR AS WELL. A LITTLE BIT OF DETAIL ON THE REASON THIS IS SO IMPORTANT. WORKING WITH DICOM, THE WAY IMAGES ARE SET UP YOU HAVE PIXEL DATA AND METADATA ABOUT THE IMAGE. THERE'S A BUNCH OF DIFFERENT METADATA THAT'S BURIED IN THERE SOME WELL DEFINED, SO DATES OR IDENTIFIERS, EASY TO AUTOMATE TO REMOVE PHI, BUT YOU HAVE FREE TEXT FIELD, THESE ARE DIFFICULT, PROBABLY PEOPLE AT GOOGLE OR FACEBOOK HAVE ENOUGH NATURAL LANGUAGE PROCESSING EXPERIENCE TO HELP IN THE AREA BUT FOR WHERE WE'RE AT THE CURRENT STATE OF THE ART IS IT REQUIRES SOME LEVEL OF HUMAN INTERVENTION TO REVIEW THE CRAZY THINGS THAT THE TECHNICIANS DECIDE TO WRITE INTO THESE FIELDS AND DEALING WITH THE FACT THAT A LOT OF TIMES THEY DON'T PUT THEM WHERE THEY ARE SUPPOSED TO. THE OTHER SIDE OF THIS IS DEALING WITH PHI AND ACTUAL IMAGES SO YOU CAN GET TEXT BURNED INTO THE IMAGES, YOU HAVE TO REVIEW AND MAKE SURE THERE'S KNOW -- NO PHI THERE. AND VENDORS ARE SPECIFICALLY TRYING TO MAKE LIFE EASIER FOR PEOPLE IN THE CLINIC BY TAKING OTHER TYPES OF DOCUMENTS AND STICKING THEM INTO THE PACK SO THAT'S ANOTHER THING WE HAVE TO WATCH OUT FOR. WE TAKE A TWO-STEP APPROACH SO A LOT OF INSTITUTIONS ARE VERY UNCOMFORTABLE LETTING ANY AMOUNT OF PHI LEAVE THEIR SITE SO WE HAVE A TOOL THAT WILL DO INITIAL DE-IDENTIFICATION ON SIDE BEFORE IT LEAVES THEIR INSTITUTION, THEY CAN REVIEW AND MAKE SURE THEY ARE HAPPY WITH THE WAY THE DATA LOOKS AND SEND US TO US WHERE WE DO A SECOND LEVEL OF CURATION AND Q.C. REVIEW AND CHECK FOR A VARIETY OF THINGS TO MAKE SURE THAT IT'S VALID DICOM AND NOTHING CAME THROUGH IN ANY FREE TEXT FIELDS. AND THIS IS A MAJOR EFFORT. SO AT THIS POINT WE HAVE FIVE DIFFERENT CURATION TEAMS, ALL FOCUSED IN DIFFERENT TOPIC AREAS, AND I'LL GET INTO THAT A LITTLE BIT LATER. SO FOR END USERS, THIS IS WHAT YOU'LL SEE WHEN YOU FIRST LAND AT THE HOME PAGE, CANCER IMAGING ARCHIVE.NET. WE TRIED TO CUT TO THE CHASE AND PUT DATA ON THE HOME PAGE SO YOU DON'T HAVE TO DIG AROUND. THE TABLE HAS A FEW DIFFERENT FIELDS ON IT THAT HELP YOU GET A ROUGH SENSE, CANCER NAME, BODY PARTS, HOW MANY SUBJECTS, WHAT KINDS OF IMAGES, WHAT NON-IMAGE DATA SUPPORTS THAT, AND ACCESS PERMISSIONS AND WHETHER IT'S COMPLETE OR ONGOING AND WHEN IT CAME IN. IF YOU CLICK ON ONE OF THESE YOU GO TO A PAGE AND FIND A SUMMARY THAT TELLS YOU ABOUT THE DATASET IN MORE DETAIL, IMMEDIATELY YOU GET LINKS ON THE DATA ACCESS SECTION HERE WHERE YOU CAN EITHER DOWNLOAD THE IMAGES OR IN SOME CASES WE HAVE LINKS TO OTHER SITES WHERE THE SUPPORTING DATA ARE THAT YOU CAN CLICK ON AND ACCESS FROM THERE. ANOTHER OPTION IS WE HAVE A DATABASE TYPE OF DATA PORTAL THAT ALLOWS YOU TO -- IF YOU WANT TO SEARCH ACROSS A BUNCH OF DATASETS AND DIG MORE INTO SOME -- LIKE YOU CAN FILTER ON A NUMBER OF DICOM TAGS, MODALITY, ANATOMICAL SITE. AND YOU CAN ALSO -- OOPS. YOU CAN ALSO DO A FEW THINGS IN TERMS OF EXPORTING METADATA AND SHARING QUERIES AND THINGS WITH YOUR COLLEAGUES. FOR THOSE THAT ARE HARD CORE DATA SCIENCE FOLKS THAT LIKE TO WORK IN PYTHON OR ANY OTHER PROGRAMMING LANGUAGE, YOU CAN USE OUR API TO ACCESS DATA WITHOUT HAVING TO GO TO THE WEBSITE. AND THAT KIND OF LEADS INTO THE DATA ANALYSIS CENTER. THESE ARE EXAMPLES OF DIFFERENT GROUPS THAT HAVE BUILT TOOLS USING THOSE APIs, SO ONE REALLY POPULAR OPEN SOURCE TOOL THAT'S OUT THERE THAT MANY OF YOU ARE PROBABLY FAMILIAR WITH IS 3D SLICER. THERE WAS A GROUP AT BRIGHAM AND WOMEN'S THAT PUT TOGETHER A BROWSER TO LET YOU IMPORT DATA INTO SLICER. ANOTHER THING WE REALLY ENCOURAGE PEOPLE TO DO IS IF YOU'VE GOT A TOOL THAT INTERACTS WITH OUR API IN SOME WAY YOU CAN TAG YOUR PROJECT IN GITHUB WITH THE TCIA DATA ANALYSIS CENTER, THAT WILL AUTOMATICALLY HAVE IT SHOW UP IN OUR COMMUNITY CODE SHARING SECTION. AND THEN GOING BACK TO WHAT ERIKA WAS SPEAKING ABOUT EARLIER IN THE DATA COMMONS, WE LOOK AT THE IMAGING DATA COMMONS THAT'S COMING DOWN THE LINE AS A DATA ANALYSIS CENTER AS WELL, IN THAT WE'LL BE FEEDING DATA TO THE IDC AND ONCE DATA IS IN THE IDC YOU CAN USE A LOT OF CLOUD OPTIONS FOR PROCESSING THE DATA WITHOUT HAVING TO DOWNLOAD IT. SO WE'LL CONTINUE TO SORT OF FOCUS ON A LOT OF THE PRELIMINARY DE-IDENTIFICATION AND CURATION ISSUES, AND THEN AFTER WE'VE CLEANED THE DATA UP SEND IT ALONG SO YOU CAN TAKE ADVANTAGE OF A LOT OF ADDITIONAL FUNCTIONALITY THAT THEY ARE BUILDING. VERY SIMILAR EXAMPLE IS GOOGLE TOOK OUR PUBLIC DATASETS AND STUCK IT IN THEIR HEALTH CARE CLOUD. YOU CAN CHECK THAT OUT. NOW I'LL TALK ABOUT DIFFERENT DATA SOURCES THAT WE HAVE. BROADLY IT SPLITS INTO WHO TWO MAJOR CATEGORIES, NCI DATA COLLECTION INITIATIVES AND THE OTHER SECTION IS COMMUNITY PROPOSALS WHERE WE GET PEOPLE FILLING OUT APPLICATION FORM PROPOSING DATASET, THESE ARE REVIEWED MONTHLY BY ADVISORY COMMITTEE FOR QUALITY AND UNIQUENESS. AND THE SOURCES OF THOSE TEND TO BE IN A FEW CATEGORIES, A LOT OF STUFF THAT WAS GENERATED AS PART OF A GRANT ACTIVITY, SO NIH DATA SHARING REQUIREMENTS. IT'S BECOME INCREASINGLY COMMON THAT PEOPLE ARE RUNNING CHALLENGE COMPETITIONS. YOU SEE A LOT OF THIS AT PROFESSIONAL SOCIETY MEETINGS THESE DAYS, AND MOST OF THEM NEED HELP WITH CLEANING DATA OR DE-IDENTIFYING AND MAKING IT AVAILABLE IN THE LONG TERM. AND THEN THE LAST CATEGORY IS PUBLICATION DATA SHARING REQUESTS, SO WE'RE VERY PLEASED TO SEE MORE AND MORE JOURNALS ARE SORT OF REQUIRING THAT YOU ACTUALLY PUT THE DATA OUT THERE AS OPPOSED TO JUST LETTING PEOPLE ASK YOU FOR IT LATER. SOME OF THE MAJOR PROJECTS, ERIKA AND MICHELLE BOTH TOUCHED ON THESE BRIEFLY, CANCER GENOME ATLAS, SO THIS STARTED WITH THE GENOMIC PUSH, AND THERE WAS A GUY, CARL JAFFE, WHO WORKED AT THE CANCER IMAGING PROGRAM WHO LOOKED AT WHAT WAS GOING ON AND SAID, WHY DON'T WE GET THE IMAGING FROM THE PLACES PROVIDING THE TISSUE? WE WEREN'T PART OF THE PROGRAM, WE CAME ALONG BEHIND AND CONTACTED ALL THE SITES AND SAID DO YOU HAVE CLINICAL IMAGES FOR ALL THESE PATIENTS? WE'RE FAIRLY SUCCESSFUL. MOST OF THE TCGA CANCER TYPES WERE LOOKING TO ACCRUE 500 PER TYPE, WE GOT INTO 200, SUCCESSFUL IN OBTAINING DATA FROM 21 OF APPROXIMATELY 30 I THINK CANCER TYPES THAT TCGA LOOKED AT, GIVING YOU DATA MODEL A SETS WITH RADIOLOGICAL IMAGING, WE WEREN'T DOING PATHOLOGY, SO PATHOLOGY IMAGE AND GENOMICS WERE STORED IN DATA COMMONS, THAT GIVES THE OPPORTUNITY TO LOOK FOR CORRELATION, AND WE ESTABLISHED TEAMS TO GET PEOPLE THINKING ABOUT THIS IDEA. IT TURNED OUT TO BE QUITE SUCCESSFUL. IN THE END WE HAD FIVE OR SIX DIFFERENT CANCER TYPE SPECIFIC TEAMS GENERATING 60 PUBLISHING INDICATIONS BETWEEN THE DIFFERENT GROUPS. SO THEN WHERE WE ARE -- OOPS, TOO FAR. FOLLOWING ON THAT WAS THE CLINICAL PROTEOMIC TUMOR ANALYSIS CONSORTIUM, SO THIS IS VERY SIMILAR TO TCGA, YOU HAVE ALL THESE -- SORRY, DATA TYPES, IT ADDS PROTEOMICS DATA. TCGA WAS RETROSPECTIVE, PULLING TISSUES OUT OF FREEZERS, WE HAVE RADIOLOGY AND PATHOLOGY IN OUR SYSTEM FOR THIS, YOU KNOW. APOLLO IS LOOKING AT DoD AND V.A. PATIENT POPULATION, THERE'S A COUPLE RETROSPECTIVE PILOTS THAT ARE SORT OF SMALL NUMBER BUT THERE'S GOING TO BE A REALLY LARGE PROSPECTIVE ONE, SO WE'LL BE GETTING RADIOLOGY AND PATHOLOGY IMAGING. NCI QUANTITATIVE IMAGING NETWORK, NCI COOPERATIVE RESEARCH NETWORK OF GRANTEES THAT I TINK THERE WAS 26 AWARDS AT THIS POINT SO 26 CENTERS INVOLVED ACROSS THE COUNTRY, FOCUSED ON DEVELOPING ANALYTIC TOOLS FOR THERAPY RESPONSE AND CLINICAL DECISION MAKING. THAT GROUP HAS RELIED ON TCIA HEAVILY FOR DATA SHARING NEEDS TO DO MULTI-SITE COLLABORATIVE PROJECTS AND CHALLENGES THEY HAVE BEEN CONDUCTING, AND THAT'S RESULTED IN 11 DATASETS THAT HAVE BEEN SHARED ON TCIA FROM THAT GROUP. THERE'S ALSO RECENTLY RESOURCE GRANT THAT WAS AWARDED TO ECOG-ACRIN, THEY ARE GOING TO BE PROVIDING IT WILL TEN CLINICAL TRIALS FROM THEIR GROUP. THREE OF WHICH ARE CURRENTLY AVAILABLE TO MEMBERS OF QIN, OPENING UP ONE YEAR AFTER THEY COME IN, SO THE GENERAL PUBLIC WILL HAVE ACCESS. ON THE CLINICAL TRIAL SUBJECT WE HAVE IN GENERAL NINE DIFFERENT EXISTING TRIALS THAT ARE AVAILABLE IN THE ARCHIVE RIGHT NOW. AND WE JUST RECEIVED FUNDING FOR A NEW INITIATIVE RAMPING UP IN ORIOLES IT -- RAMPING UP IN TERMS OF DATASETS. AND ANOTHER SITE, NCTN DATA ARCHIVE, HOPEFULLY LOTS OF NEW TRIALS COMING DOWN THE PIKE SOON. THE LAST CATEGORY OF DATA IS THIRD PARTY ANALYSIS RESULTS, SO A LOT OF TIMES WE GET A DATASET AND IT'S GOT IMAGES OF HIGH QUALITY BUT THE FIRST THING PEOPLE WANT TO DO IS, OH, WHAT ABOUT THE SEGMENTATIONS, OR WHAT ABOUT THE RADIOLOGIST ASSESSMENTS. IF YOU DOWNLOAD, WE ENCOURAGE, DO ANALYSIS TO SHARE RESULTS OF ANALYSIS BACK ESPECIALLY WITH SEGMENTATIONS OR RADIOLOGIST ASSESSMENTS BECAUSE OTHER PEOPLE CAN THEN USE THOSE AND BUILD OTHER KINDS OF TOOLS THAT MIGHT BE ABLE TO TRY TO PREDICT OR CLASSIFY BASED ON THE INFORMATION YOU'VE GIVEN BACK TO US. SO THIS IS ONE EXAMPLE OF A CHALLENGE COMPETITION THAT'S BEEN RUNNING FOR SEVERAL YEARS NOW, CALLED BRATS, LOOKING TO HELP AUTOMATE BRAIN TUMOR SEGMENTATION. THEY TOOK DATA FROM CANCER GENOME ATLAS AND DID MANUAL SEGMENTATIONS AND SHARED ALL OF THOSE BACK SO THE GROUPS THAT WERE TRYING TO DO THE AUTOMATED APPROACH COULD USE THOSE FOR TRAINING BAT. AND THEN ANOTHER ACTIVITY THAT WE STARTED A COUPLE YEARS AGO, CROWD SECURE CANCER, AGAIN WE HAVE MOST OF THE NEW STUFF COMING IN NOW IS, YOU KNOW, OFTEN ANNOTATED BY THE PERSON WHO PROVIDED THE DATA SET BUT STILL WITH THINGS LIKE CPTAC OR TCGA, NOBODY IS LOOKING BEFORE THEY COME IN TO TCIA, ALL ARE MISSING REGION OR LOCATION INFORMATION. YOU HAVE MILLIONS OF IMAGES AND EVERYONE'S SHORT ON TIME, HOW DO YOU GET ALL THAT STUFF MARKED UP? AND SO WE HAD THIS IDEA, WELL, LET'S CONVINCE RADIOLOGISTS TO TAKE FIVE MINUTES AND ANNOTATE A COUPLE SCANS WHILE THEY ARE SITTING AT A BOOTH. AND IT WORKED OUT FAIRLY WELL. WE WERE ABLE TO GET BETWEEN THREE AND FIVE ASSESSMENTS PER CASE WE PUT INTO THE SYSTEM THE FIRST YEAR. AND SOME OF THE PRELIMINARY ANALYSES OF THIS SHOWED IT WAS FAIRLY ACCURATE SO YOU COULD RELY ON THAT TO DEGREE, IT RELIES ON CURATION TO THROW OUT OUTLIERS, BUT OVERALL WAS FAIRLY SUCCESSFUL. I WANT TO THANK THE OTHER PEOPLE THAT ARE INVOLVED IN MAKING ALL THIS POSSIBLE. DEFINITELY A HUGE TEAM EFFORT. AND I'LL LEAVE YOU WITH THIS. WE KNOW THAT MOST PEOPLE DON'T WAKE UP AND HAVE THEIR MORNING COFFEE AND LOOK AT AN NIH DATABASE, SO IF YOU WANT TO KNOW WHEN NEW DATA IS COMING OUT OR SPECIAL EVENTS OR OTHER THINGS GOING ON WITH THE ARCHIVE, FOLLOW US ON ONE OF THESE PLACES AND WE'LL PUSH THE INFORMATION TO YOU INSTEAD. THANKS. [APPLAUSE] >> GREAT. SO I'LL BE TALKING NEXT. I'M GOING TO SPEAK ABOUT BUILDING AN EFFECTIVE INSTITUTIONAL ARCHITECTURE BASED ON OUR EXPERIENCE AT YALE AND SOMETHING THAT WAS MORE SCALABLE FOR OUR SYSTEM. HERE ARE MY DISCLOSURES. PERFECT. SO AS I MENTIONED, I'M A PHYSICIAN-SCIENTIST. OUR GROUP WORKS IN AREAS OF ARTIFICIAL INTELLIGENCE DATA SCIENCE, DEEP LEARNING, PRIMARILY INTERESTED IN CLINICAL PROBLEMS, WAYS WE WITH PREDICT OUTCOME, PROVIDE OPTIMIZATION, ALONG THE WAY HAVE BECOME PARTICULARLY INTERESTED IN TRYING TO MAKE MACHINE LEARNING ALGORITHMS MORE EFFICIENT, AND ALSO SORT OF BUILDING INFRASTRUCTURE TO MAKE MACHINE LEARNING PROJECTS MORE FEASIBLE. I THINK THAT COMES HAND IN HAND WITH WHEN YOU'RE TRYING TO COMPLETE A LARGE SCALE PROJECT. WE HAVE COLLABORATORS WHO HELP US IN OUR EFFORTS. SO TODAY I'LL BE SPEAKING -- A LARGE PART IS DEEP LEARNING ON IMAGES, I HAVE A LOT OF, YOU KNOW, TALKING NIH AND OTHERS, BUT WE'LL FOCUS ON INFRASTRUCTURE. IF ANYONE IS INTERESTED IN DEEP LEARNING OR MACHINE LEARNING, I'LL BE HAPPY TO TALK ABOUT THAT AFTERWARDS. WHY THIS IS IMPORTANT, SECONDLY THE INFRASTRUCTURE WE DEVELOPED AT YALE AND SOMETHING WE DEVELOPED THAT I THINK ALLOWS US TO SCALE OUR MACHINE LEARNING EFFORTS, DATA SCIENCE EFFORTS, TO OTHER INSTITUTIONS INCLUDING COMMUNITY PRACTICES, NOT JUST ACADEMIC, BUT INSTITUTIONS MAYBE -- OR CLINICIANS MAYBE INTERESTED IN PROVIDING DATA BUT NOT NECESSARILY HAVING THE TIME OR NECESSARILY EXPERTISE TO PROVIDE DATA SCIENCE HELP. FIRST AND FOREMOST WHY IS INFRASTRUCTURE IMPORTANT? THE MORE DATA WE HAVE IN ORGANIZED FASHION MAKES OUR MODELS BETTER. IT ALSO CAN BE THOUGHT OF AS PROMOTING ORGANIZATIONAL EFFICIENCY, HARM HARMONIZING TO FUNCTION MORE EFFECTIVELY AND MAKES US NIMBLE, ADDING NEW DATA STREAMS. IF WE ALREADY HAVE A WELL CURATED DATASET AND DECIDE A NEW STREAM IS IMPORTANT IT'S EASY TO INTEGRATE THAT. LASTLY AND MOST IMPORTANTLY IF WE HARNESS DATA AND DEVELOP MORE ACCURATE MODELS WE CAN IMPROVE HEALTH CARE QUALITY FOR OUR PATIENTS. SO WHY IS BUILDING HEALTH CARE INFRASTRUCTURE -- DATA SCIENCE INFRASTRUCTURE IN HEALTH CARE SO DIFFICULT? I THINK THERE'S A COUPLE INSTITUTIONAL ISSUES WE HAVE. PART OF MY ROLE AT YALE IS RUNNING OUR DATA SCIENCE CENTER WITH COLLABORATORS FROM LABORATORY MEDICINE AND INFORMATION TECHNOLOGY, AND THERE'S A BIG ISSUE, ONE, PRIVACY AND SECURITY. THERE'S LIABILITY ON INSTITUTIONAL LEVEL, THOSE CONCERNS ARE AMPLIFIED EVERY TIME WE HEAR ABOUT NEWS BREAKS OF PRIVACY CONCERNS. EVERY TIME GOOGLE OR FACEBOOK IS DOING SOMETHING WRONG I FEEL LIKE I HAVE TO PAY FOR IT. ALSO INSTITUTIONALLY THERE'S NOT REALLY -- PROVIDING OPEN SOURCE DATA FOR USERS, I THINK THE IDEA BEHIND SORT OF INFORMATION TECHNOLOGY IS SORT OF SAFE IS BETTER, AND SO I THINK OFTENTIMES IT'S EASIER TO NOT ALLOW US TO HAVE ACCESS OR NOT ALLOW IT TO BE AVAILABLE TO RESEARCHERS IN RADIATION ONCOLOGY WE HAVE DOSIMETRY, CLINICAL OUTCOME DATA, NOTES AND IMAGES, AND WE HAVE DIFFERENT IMAGES, CT SCANS, IF WE'RE THINKING ABOUT HEALTHCARE SYSTEM WE HAVE LABORATORY MEDICINE, CLINICAL CHARTS, ALSO PATIENT REPORT OUTCOMES. PATIENTS START WEARING WEARABLES, IT BECOMES MORE LARGE. I DON'T THINK THERE'S ANY EASY COMMON DATA MODELS. PEOPLE HAVE DIFFERENT MODELS, PCORnet, I WOULD SAY BROADLY PEOPLE SAY THOSE MODELS AREN'T EFFECTIVE. WE HAD TWO AT YALE AND WE DON'T FIND THEM PARTICULARLY EFFECTIVE AS STORING DATA. THAT'S THE OTHER ISSUE ON THE INFRASTRUCTURE SIDE. THE OTHER ISSUES ARE ONES A HUGE INITIAL COST, AND BECAUSE OF THAT YOU HAVE TO HAVE INSTITUTIONAL MOMENTUM TO BUILD AN INFRASTRUCTURE. AND SORT OF BE ABLE TO RELATE TO YOUR ORGANIZATION WHY THAT'S USEFUL AND WHY THAT'S A GOOD INVESTMENT, HOW YOU CAPITALIZE OFF THAT INVESTMENT. THE LAST THING, SOMETHING THAT IS OFTENTIMES NOT TALKED ABOUT, THERE'S A LACK OF COORDINATION. SOME PEOPLE DON'T WANT TO SHARE DATA AT INSTITUTIONAL LEVEL. AT YALE WE'RE COLLABORATIVE BUT THAT HURTS ACROSS INSTITUTIONS. WHEN WE TRIED TO PROVIDE GUIDANCE TO OTHER INSTITUTIONS, THAT'S THE BIGGEST IMPETUS, BIOSPECIMEN BANKS OR PEOPLE WHO HAVE IMAGES, THEY DON'T NECESSARILY WANT TO CONTRIBUTE THEM BECAUSE THEY ARE NOT SURE WHO IS GOING TO HAVE ACCESS TO THEM. I THINK ANOTHER COMMON INFRASTRUCTURE ISSUE, THERE'S TOO MANY COOKS IN THE KITCHEN. AT YALE WE HAVE THE UNIVERSITY, SCHOOL OF MEDICINE AND HOSPITAL SYSTEM. EVERYONE HAS A CENTER OR DEPARTMENT OR INSTITUTE. YOU HAVE A SIGN AND YOU GOT IT. ALL OF THEM ARE DATA SCIENCE, STATISTICS, YOU KNOW, IMAGING OR ANALYSIS, OUTCOMES, SOMETHING LIKE THAT. ALL THAT COMES TOGETHER. THEY ALL WANT TO USE THE SAME DATA. THE QUESTION IS LIKE HOW DO WE DO THAT. IDEALLY IF WE CAN COORDINATE OUR EFFORTS, WHAT WE'LL FIND IS WE SPUR CLINICAL INNOVATION, SHIFT PARADIGMS AND IMPROVE OUTCOMES. MY GOAL FOR MY ROLE, THAT QUESTION MARK PERSON, I FOUND WHEN I SCALED ACROSS A LOT OF SYSTEMS, FULL DISCLOSURE THIS ONE IS MINE, AND A LOT OF THESE GROUPS WHEN WE THINK ABOUT THEM WE BOUGHT SOME OF THE SAME RESOURCES, SAME COMPUTING SYSTEMS, CURATED SAME DATA, HIRED SAME RESEARCH ASSISTANTS TO CREATE THE DATA TOGETHER. SIMILARLY WE'VE SORT OF CONTRACTED OUT TO DIFFERENT PEOPLE FOR STUFF THAT OUR OTHER INSTITUTES HAVE EXPERTISE IN. AND SO FIRST WE'LL TALK ABOUT THE YALE HOSPITAL DATA LAKE WHICH ALLOWS US TO AGGREGATE SYSTEMS AND IT IS A SOLUTIN THAT WORKS FOR US, ALLOWS US TO GET ACCESS TO ALL THE DATA, IT TOOK THREE YEARS TO BUILD. AND SOMETHING WE LEVERAGE VERY FREQUENTLY FOR LIVE DATA PROJECT. A WAY WE'VE GOTTEN OUR INSTITUTION INTERESTED IN THE DATA SCIENCE EFFORTS THAT WE'VE BEEN TRYING TO BRING TO YALE. SECONDLY WE'LL TALK ABOUT FEDERATED LEARNING PLATFORM THAT MY LAB SPECIFICALLY BUILT. WE FOUND ALTHOUGH YOU CAN BUILD INSTITUTIONAL DATASET THAT'S GOOD, DATA SCIENCE INFRASTRUCTURE, TRYING TO EXPAND THAT AND GENERALIZE YOUR MODELS REQUIRES YOU TO INTEGRATE WITH PLACES THAT ARE OUTSIDE OF YOUR HOME INSTITUTION. IN ORDER TO MAKE THAT EASY WE'VE DEVELOPED FEDERATED LEARNING TECHNIQUES. WE ARE A DISCIPLINARY TEAM INVOLVED IN THE DATA LAKE, SIMILAR TO A DATA WAREHOUSE. WHEN I WAS IN MEDICAL SCHOOL, WE CALLED THEM DATA WAREHOUSES. NOW THEY ARE CALLED DATA LAKES. THE DIFFERENCE, DATA LAKES DON'T REQUIRE EVERYTHING TO BE IN THE SAME EXCEL SHEET FORMAT. THE IMPORTANT THING, WE DISCUSS LAKES, GO INTO A COMMON DATA PORTAL. WE HAVE A TEAM OF CLINICIANS, ENGINEERS, WHO SUPPORT THE SYSTEM. WE CLONE OUR ENTIRE ELECTRONIC MEDICAL RECORD. I INCLUDE RADIATION ONCOLOGY IN OUR DICOM SERVERS, ALMOST EVERY DAY, SOME WE CLONE AT LARGER PERIODS OF TIME, WE KEEP AT SEPARATE ANALYSIS PLAYGROUND FOR PEOPLE INTERESTED IN WORKING ON DATA SCIENCE PROJECTS. YOU WERE IT'S -- IT'S EASY TO CONVINCE TO CLONE, WE SAY WE COPY IT BUT DON'T MESS AROUND WITH THE REAL STUFF. YOU ARE THE BACKUP OPERATION FOR THE ENTIRE HOSPITAL. WE CLONE OUR EPIC EMR BETWEEN HOURLY TO DAILY DEPENDING ON WHAT PARTS OF THE ELECTRONIC MEDICAL RECORD IS. IT'S DIFFICULT TO GET RADIATION ONCOLOGY DATA OUT OF THE VENDORS, AND OUR GENOMIC DATA WHICH WE JUST STARTED INTEGRATING IN, WE CLONE THAT MONTHLY. IT REQUIRES MORE WORK ON OUR NON-HEALTH CARE OPERATION SIDE, RELATED TO THE LABS AND THINGS OF THAT NATURE. WHAT WE HAVE IS A HADOOP FORMAT TO BE INGESTED, WE SCOOP FOR MOST OF THAT AND WE CAN PROCESS AND PULL USING SPARC, ANALYSTS PULL THE DATA, TWO AT THE END, TWO BOXES IN WHICH WE ANALYZE THE DATA, CREATE DOCKER ENVIRONMENTS AND NOTEBOOKS AND PEOPLE USE THEM. TWO THINGS THAT MADE THIS POSSIBLE INSTITUTIONALLY, MY LAB PURCHASED ONE OF THE GDX 1 COMPUTERS, A HUGE INVESTMENT, AND THEN THE HOSPITAL PURCHASED THE OTHER ONE. SO JUST BALL PARK, $300,000, I DON'T THINK THEY ARE NECESSARY FOR EVERYBODY BUT IF YOU'RE TRYING TO BUILD A HUGE INFRASTRUCTURE IT'S NICE. SECONDLY WE ALLOWED ANYONE TO UTILIZE THIS. DESPITE THE FACT MAYBE CERTAIN PEOPLE HAVE HELPED BUILD THIS AND CURATE, WE ALLOW ANYONE INTERESTED IN UTILIZING THE COMPUTERS TO UTILIZE THEM. THERE'S MORE INTEREST IN PEOPLE TRYING TO MAINTAIN AND SUPPORT WHEN WE HAVE HICCUPS ALONG THE WAY. POLITICALLY THOSE ARE TWO THINGS I'VE LEARNED. IF ANYONE WANTS TO TALK MORE, I CAN TALK AS MUCH AS I CAN. BUT A LOT OF THIS HAS BEEN FORMATTED OVER TIME AND THERE'S BEEN MULTIPLE YEARS OF MEETINGS AND THINGS OF THAT NATURE WHICH WERE PAINSTAKING AND TOOK PARTS OF MY SOUL I WOULD SAY. BUT ALTERNATIVELY ONE THING THAT'S NICE, WE HAVE A PULSE OF THE HEALTHCARE SYSTEM. TWO CABANA DASHBOARDS, TO THE EMERGENCY ROOM, AND OUR VISITS OVER CERTAIN PERIODS OF TIME. WE PULL THOSE UP ON A LAP TOP AT ANY GIVEN TIME. AND ON THE RIGHT WE HAVE ICU DATA FROM THE INTENSIVE CARE UNIT, STREAMING IN REAL TIME. ICU DATA CLONES ALMOST EVERY HOUR. THE REASON THIS IS HELPFUL IS BECAUSE I MEAN THE REASON WHY THIS IS A REALLY GREAT SLIDE TO SHOW, IT'S A WAY TO INTEGRATE OTHER HEALTH CARE PROFESSIONALS. ONE WAY TO BUILD INSTITUTIONAL MOMENTUM IS SHOW PEOPLE THE DATA CAN BE LEVERAGED ACROSS SPECIALTY. RADIATION ONCOLOGY, IN THE BASEMENT TYPICALLY, IT'S QUIET DOWN THERE, WE LIKE IT THERE, BUT AT THE SAME TIME YOU NEED HETEROGENEOUS HEALTH CARE DATA, IT'S IMPORTANT TO GET INFORMATION FROM ALL THE TYPES OF STREAMS. IF YOU'RE TRYING TO INSTITUTIONALLY LEVERAGE YOUR ADMINISTRATORS TO SORT OF INVEST IN THE DATA SCIENCE INFRASTRUCTURE, RADIATION ONCOLOGY DEPARTMENT ISN'T DOING IT. IT'S ICU, INPATIENT, EMERGENCY ROOM, SHOW IT INTEGRATES ACROSS THE ENTIRE HEALTHCARE SYSTEM. OKAY, PERFECT. IN TERMS OF TAKEAWAYS FROM THE YALE DATA LAKE THERE'S A SIGNIFICANT INVESTMENT, YOU NEED CHAMPIONS TO PUSH THROUGH BARRIERS. IT'S TIME INTENSIVE. THERE'S INSTITUTIONAL COMMITMENT THAT'S IMPOSSIBLE. THIS IS SOMETHING WE COULD SCALE UP AT EVERY SORT OF INSTITUTION IN THE COUNTRY BUT WE'VE SEEN DIFFERENT BARRIERS. EVERYTHING IS AMORPHOUS. IN SOME WAYS IT'S HELPFUL TO DEVELOP IN TERMS OF DEVELOPING YOUR DATA SCIENCE INFRASTRUCTURE BUT AT THE SAME TIME I THINK AS WE TRY TO EXPAND IT'S MORE IMPORTANT WE INTERACT WITH LARGER DATA ORGANIZATIONS LIKE THE NIH AND TRY TO FIGURE OUT HOW TO BECOME INTEROPERABLE WITH THEM. AND SO THE NEXT THING WE'LL TALK ABOUT IS A FEDERATED LEARNING PLATFORM, OUR EFFORT IN WHICH WE'VE TRIED TO BASICALLY EXTEND OUR DATA CAPABILITIES TO INSTITUTIONS NOT AT YALE. ONE THING WE FOUND, WE BUILD ALGORITHMS, THEY WORK. WE PUBLISH IT, IT'S GREAT. THERE'S A DIFFICULTY IN TERMS OF IMPLEMENTING PRACTICE BECAUSE WE NEED TO MAKE SURE MODELS GENERALIZE. AND SO WHAT WE OFTENTIMES DO IS INSTITUTE WITH DATA SHARING PARTNERS, FIND PEOPLE WHO ALLOW TO US TEST THE ALGORITHM OR WE'LL TEST WITH THEIR -- THEY HAVE AN ALGORITHM, WE WANT TO TEST ON OUR DATA. THE IRB, YOU SAY YOU'RE SENDING DATA OR OR BRINGING IN, IT'S A HUGE HASSLE, IT MAKES THEM HIRE ANOTHER PERSON TO MAKE SURE THIS DOESN'T GET DONE. [LAUGHTER] SECONDLY, I FEEL LIKE THERE'S A FIRE ALARM THAT GOES OFF. WAIT, SOMEONE'S TRYING TO SEND DATA OUT. NO, NO, NO. ALSO I THINK THERE'S AN ISSUE WITH HIPAA CONCERNS. THERE'S REALLY NOT A GREAT ANSWER THAT I FOUND ABOUT HOW THE ANONYMIZATION WORKS. A LOT OF TECHNIQUES THE TCIA USES, WE'VE TALKED WITH SUMMER'S LAB AT THE NCI ABOUT WHAT THEY HAVE DONE, THEY WANT YOU TO SAY, WELL, IF WE FIGURE OUT A WAY TO BREAK THIS, ARE YOU GOING TO BE LIABLE? OBVIOUSLY CAN'T DO THAT. AND SO IT'S DIFFICULT TO DEAL WITH ISSUES RELATED TO HIPAA OR ANY DATA SHARING. SECURITY, WHEN YOU'RE SHARING DATA OR OPENING HEALTHCARE SYSTEM TO EXTERNAL SOURCE, IT BECOMES INCREASINGLY DIFFICULT. AND SO DATA SHARING MODELS ARE VERY DIFFICULT. OUR HEALTHCARE SYSTEM IS CONCERNED ABOUT THAT. MORE IMPORTANTLY WE'RE CONCERNED ABOUT BRAINING IN DATA TO OUR ORGANIZATION FROM OTHER PLACES THAT MAY BE INFECTED WITH MAYBE NEFARIOUS UNDERLYING THINGS IN THE DATA. THE LAST THING, THERE'S A LOT OF DIFFICULTY WITH CHOREOGRAPHY AND MAKING SURE EVERYONE IS ON BOARD. SO THE TRADITIONAL MODEL HAS BEEN THE CENTRALIZED MODEL, WE PUSH TOWARDS CENTRALIZED SERVER, IT HOUSES THE INFORMATION, OFTENTIMES EXPENSIVE. THE DATA CAN BE STORED IN THE FIRE WALL AT THE INSTITUTION, IT'S STORED AT YOUR INSTITUTION, DOESN'T LEAVE YOUR HOSPITAL, ALL YOU SHARE ARE PARAMETERS OF YOUR MODEL. MY LAB IS DEEP LEARNING, WE SHARE JUST WEIGHTS FOR NEURAL NETWORKS WITH COLLABORATORS. BENEFIT IS THAT WE ONLY SHOW PARAMETERS, NO IDENTIFIABLE INFORMATION. SECONDLY NO PHI LEAVES OUR INSTITUTION. IT'S ALMOST IMPOSSIBLE TO BUILD THE INFORMATION FROM THE MODEL PARAMETERS. SO WHAT WE DID IS TRIED TO TEST USING -- WE DEVELOPED A DEEP LEARNING ALGORITHM THAT CLASSIFIED LUNG NODULES. OURS IS A LARGE ACADEMIC PRACTICE, THE SECOND IS A PRIVATE COMMUNITY PRACTICE SO PRIVATE PRACTICE, PEOPLE INTERESTED IN DEVELOPING A DATA SCIENCE RESEARCH AT THEIR OWN INSTITUTION, BUT NOT NECESSARILY PROVIDING RESOURCES. THIRD IS SMALL PRIVATE PRACTICE WHERE THEY HAD DATA AND COULDN'T HELP ANYTHING ELSE BESIDES HAVING A COMPUTER AND DATA SET. WE LOOKED AT SINGLE INSTITUTION VERSUS EXTERNAL VALIDATION, FEDERATED MODELS. FIRST, SINGLE INSTITUTIONAL TRAINING WE HAD OUR DATA SET AND EXTERNALLY VALIDATED, YOU CAN SEE OUR MODEL IS OVERFITTING, THAT'S THE GREEN ARROW. ACCURACY IS POOR. THAT'S BECAUSE WE'RE NOT GETTING THE WHOLE RICHNESS OF DATA SOURCE. THAT SHOWS THAT WE DO NEED EXTERNAL VALIDATION ACROSS DIFFERENT INSTITUTIONS TO HAVE BENEFITS. AND SO THE WE PUSHED ALL OF THE PARAMETERS TO A CENTRALIZED SEARCHER, NOT LIKE THE OTHER CENTRALIZED MODEL BECAUSE IT'S JUST PARAMETERS, YOU DON'T NEED HIPAA COMPLIANCE, IT'S EASIER. WE HAVE A CENTRALIZED -- COMPUTER IS CENTRALIZED, WE DON'T HAVE TO DE-IDENTIFY -- THIS IS THE TRADITIONAL MODEL, EVERYTHING IS CENTRALLY MANAGED. VALIDATION ACCURACY IMPROVES BUT WHAT WE FIND IS THERE'S A LITTLE BIT OF GENERALIZATION ISSUES. THIS IS THE PARAMETER SERVER. WE SUBMIT PARAMETERS BUT THE DATA STAYS IN THE INSTITUTION, MODEL PARAMETERS ARE SHARED AROUND A CENTRAL SERVER. WE USE AN AMAZON WEB SERVER. WE FOUND THAT THE ACCURACY WAS THE SAME AS IF WE USED THE CENTRALIZED A IZED APPROACH, OUR INSTITUTIONS WERE HAPPIER WITH THIS PRACTICE. THE LAST THING WE TRIED INSTEAD OF CONNECTING TO A CLOUD SYSTEM, SOME WERE WORRIED ABOUT THAT AND DIDN'T WANT A CONSTANT CONNECTION TO A CLOUD, CLOUD SYSTEM, THAT WAS A THIRD PARTY, WE DECIDED TO CREATE A FEDERATED LEARNING SYSTEM. FEDERATED LEARNING IS YOU HAVE ONE PARTNER WHO YOU RECEIVE DATA FROM, ONE WHO YOU SEND YOUR DATA TOO, BASED OFF GOOGLE, DATA TO AGGREGATE CELL PHONE USAGE. WE INSTITUTIONALLY STILL HAVE THE SAME COMPUTE, DATA STAYS? STAYS IN YOUR FIRE WALL. YOU KNOW WHAT'S HAPPENING AND YOUR DATA PARTNERS. WE FOUND THAT ALSO SIMILARLY THE ACCURACY IS ABOUT THE SAME AS IF YOU DO CENTRAL COMPUTE AND MORE IMPORTANTLY, ONE THING I WOULD SAY, TRAININ IS DIFFICULT. ANOTHER GRAPHIC SHOWS ALL THREE SYSTEMS COMPARED TO ONE ANOTHER. YOU SEE THE ACCURACY IS THE SAME. THE ADVANTAGES OF A DISTRIBUTED LEARNING FRAMEWORK THERE'S NO DECREASE IN ACCURACY, MORE COST EFFECTIVE, YOU DON'T HAVE TO PAY FOR SOMEONE TO HOUSE THE INFORMATION, LESS SECURITY CONCERNS ON INSTITUTIONAL LEVEL, AND IT ALLOWS US TO HAVE MORE NIMBLE DATA UPDATES, YOU CAN CONNECT INSTEAD OF CURATE, SEND TO CENTRAL PLACE, PROCESS, AND THE DISADVANTAGE IS THERE'S INCREASED TRAINING TIME WHICH I DIDN'T SHOW THE GRAPHIC BUT IT'S TRANSFERRING THE MODEL PARAMETERS ACROSS THE COUNTRY. SECONDLY MORE I.T. COORDINATION IS NECESSARY IN SOME WAYS BECAUSE YOU NEED TO COORDINATE WITH OTHER INSTITUTIONS AND YOU NEW INSTITUTIONAL INVESTMENT TO HAVE YOUR OWN COMPUTE CENTRALLY. SO IN CONCLUSION, INFRASTRUCTURE IS A CRITICAL COMPONENT, CENTRALIZED DATA APPROACHES ARE IDEAL THEY FACE INSTITUTIONAL PARAMETERS. ONE OF I WANT TO THANK THE PEOPLE WHO HELP KEEP THE LIGHTS ON IN THE LAB, MY POSTDOCS AND PEOPLE IN MY LAB THAT DO A LOT OF WORK ON THIS. THANKS A LOT. [APPLAUSE] NEXT UP DR. HARRY KWON, FROM HOPKINS, WITH A SPECIALTY IN HEAD AND NECK RADIATION ONCOLOGY. THIS TALK IS GOING TO BE MORE GRANULAR. WE STARTED, TODD IS IN THE SECOND ROW, I CAME TO HOPKINS IN 2011, 2012 I THINK IT WAS. AND WE STARTED IN THE BASEMENT. AND THIS WAS IN THE PREEXISTING PROJECT, REALLY SORT OF AN IN SILO DEVELOPMENT WITHIN RADIATION ONCOLOGY. WHAT I WANT TO DO IS -- HOW DO I ADVANCE HERE? TWO-FOLD OBJECTIVES HERE. AS WE PARTNERED WITH TODD, IT BECAME EVIDENT LOOKING BACK THAT THERE ARE A NUMBER OF PRINCIPLES THAT REALLY GOVERN THE SUCCESSFUL GENERATION OF A QUALITY DATASET WE'VE BEEN ABLE TO WORK WITH TO THE POINT WE'VE BEEN ABLE TO RUN A NUMBER OF ALGORITHMIC MODELING APPROACHES TO HAVE GAIN INSIGHT DISCOVER IN CLINICAL DATASET TO SHOW POTENTIAL OF APPLYING DATA SCIENCE IN CLINICAL MEDICINE. THAT'S WHAT I WANT TO SHARE WITH YOU. SOME PRINCIPLES. SO TO SOME DEGREE IT'S INFRASTRUCTURE FROM THE POINT OF VIEW IT FITS IN THIS SESSION BECAUSE IT'S HUMAN INFRASTRUCTURE. AND WE'VE FOCUSED LARGELY MORE ON STRUCTURED DATA CAPTURE, ALTHOUGH THERE IS CLEARLY THE NON-STRUCTURED DATA ELEMENTS ACROSS SYSTEMS AND ACROSS EHRs THAT ARE OBVIOUSLY ATTRACTIVE BUT THERE ARE SOME DOWN SIDES, AS I'M GOING TO POINT OUT. WE'VE FOCUSED PARTICULARLY WITH REGARDS TO STRUCTURED DATA ELEMENTS THAT LEND THEMSELVES PARTICULARLY TO oIS SYSTEMS, SO WE USE MOSAIC AND BROWSER BACKGROUND. THE FIRST PRINCIPLE IS THIS. WE LEARNED THIS IN GRADE SCHOOL. YOU CANNOT MANAGE WHAT YOU CANNOT MEASURE. AND AS SCIENTISTS AND CLINICIANS, IT'S VERY IMPORTANT PARTICULARLY YOUR CLINICIANS AS THEY EMBARK AS PART OF A DATA SCIENCE MEMBER OF A TEAM TO BE THINKING ABOUT WHAT THEY ARE REALLY MEASURING. WE HAVE LOTS OF MEASURES BUT DON'T CRITICALLY THINK ABOUT THAT. WE'VE SPEND A LOT OF TIME AS A GROUP THINKING ABOUT WHAT MEASURES WE WANT TO IMPLEMENT IN THE CLINIC AND WE THREW OUT A LOT AND BRING IN SOME. WE'RE CONSTANTLY THINKING ABOUT THIS. THIS BECOMES AN IMPORTANT ISSUE WITH REGARDS TO NOT ONLY ANSWERING YOUR QUESTION BUT ALSO AFFECTS OBVIOUSLY THE ALGORITHMS THAT YOU'RE GOING TO USE TO TRY TO MAKE SENSE OF YOUR DATA. AND YODA PARTICULARLY WAS VERY WISE IN THIS. THE OTHER IS WE HAVE TO LOOK AT DATA AS A COMMODITY. WE TOOK A LESSON FROM OUR FRIEND THE SQUIRREL HERE. THEY GRAB THEIR NUTS, THEIR DATA, AND THEY STORE IT. AND THEY LOOK AT THAT AS A LONG-TERM INVESTMENT. IT MAY SEEM REALLY SIMPLE. BUT THAT'S ACTUALLY HOW I REALLY THINK WE HAVE TO START THINKING ABOUT DATA AS A COMMODITY. AND ITS VALUE IS DETERMINED BY HOW YOU WANT TO USE IT. AND IN RADIATION ONCOLOGY, WE'RE VERY UNIQUELY POSITIONED BECAUSE A LOT OF OUR TREATMENT IS HIGHLY QUANTIFIABLE. IN SOME WAYS, THE DATA SCIENCE EFFORTS, WE CAN BRING IN NOT ONLY OUTCOMES BUT ALSO OUR TREATMENT, IN A WAY THAT PUTS IT ALL TOGETHER, ALONG WITH OTHER ASPECTS OF PATIENT CARE. AND SO THE LAST PRINCIPLE IS THE FACT THAT HONESTLY, YOUR CLINICIANS ARE NOT GOING TO WANT TO BE PART OF A DATA CAPTURE, DATA COLLECTION EFFORT, UNLESS THERE'S A REALLY GOOD REASON FOR THIS. AND IT'S JUST OBVIOUS. YOU HAVE TO HAVE PASSION FOR THAT QUESTION BECAUSE -- AND THIS IS -- WE WERE PARTICULARLY INTERESTED IN ANSWERING LONGITUDINAL QUESTIONS THAT LENT ITSELF TO THE KIND OF INFORMATIC INFRASTRUCTURE THAT WAS PREEXISTING AND BUILT UPON IN ONCOSPACE. I WANT TO SHOW YOU FOR INSTANCE THIS IS A MORTALITY FIGURE THAT'S NOW EMERGED IN HEAD NECK RADIOTHERAPY THAT SEVEN OR EIGHT YEARS OUT WE NEED TO GET A HANDLE ON THINGS LIKE THAT. IT REALLY REQUIRES US TO BE ABLE TO CAPTURE THIS LONGITUDINALLY IN OUR FOLLOW-UP CARE. AND SO WE FELT THAT THIS WAS A PERFECT PARTNERSHIP WITH TODD, AND SO THIS IS THE PYRAMID THAT I LIKE TO TEACH OUR RESIDENTS, THAT WE'RE REALLY LOOKING AT THIS. THERE'S MANY ASPECTS. IF WE BUILD IT RIGHT, IT'S GOING TO LEAD TO INSIGHT, INTO ACTION, AND THEN IF YOU HAVE A CLOSED LOOP SYSTEM THIS IS THE LEARNING HEALTH SYSTEM, YOU SHOULD BE ABLE TO MEASURE THAT. WE START TO SEE SIGNS OF THAT. THIS IS THE PLATFORM THAT TODD CAN CERTAINLY ELABORATE MORE, BUT WHAT WE'VE REALLY LOOKED TO DO WAS MARRY THE oIS WITH WEB BROWSER ACCESS, NOT ONLY BY CLINICIANS BIBY ALL -- BUT BY ALL MEMBERS OF THE TEAM AND ACROSS THE DEPARTMENT. IT'S BROWSER BASED, WE HAVE OTHER MEMBERS OF THE HEAD AND NECK TEAM INVOLVED WITH PATIENT CARE. THEY ARE ALWAYS VERY INTERESTED IN CAPTURE AND SO THEY CAN ACTUALLY CONSOLIDATE THAT DATA CURATION INTO OUR ONCOBROWSER. WE ACTUALLY HAVE AN INTERFACE WITH OUR LAB DATA, I MEAN WITH EPIC, AND THAT HAS YIELDED A TREASURE TROPH OF DATA THAT I'M GOING TO SHOW YOU THAT WE JUST STARTED TO EXPLORE., AND THIS IDEA OF BRINGING IT UNDER ONE ROOF, WHAT WE CALL ONCOSPACE. AND I WANT TO GO BACK. ONE THING WITH REGARDS TO -- OUR FIELD HAS BEEN FLAT IN TERMS OF HOW WE SUMMARIZE RADIATION DATA. THERE'S A SPATIAL COMPONENT I THINK THIS IS WHERE TODD AND HIS GROUP TAPPED INTO ALLOWS US TO UNDERSTAND SPATIAL IMPACT OF THE DOSIMETRY, AND BECAUSE EVERY DOSE POINT HAS A COORDINATE, AND IT'S THE ABILITY TO TAKE ALL THAT TOGETHER THAT HAS YIELDED INSIGHT INTO HOW WE BETTER MANAGE AND PREVENT TOXICITIES. WE ALSO REALIZED THAT PATIENTS ARE NOT ONLY ASSESSED IN CLINIC AFFECTED BY THEIR ENVIRONMENT, AN ENTIRE UNTAPPED AREA. WE FINISHED RUNNING ONE TRIAL, BRINGING IN DATA, AND IT OPENS UP THE POSSIBILITIES THAT OUR ABILITY TO UNDERSTAND PATIENTS SHOULD GO BEYOND WALLS OF THE CLINIC AND LOOK TO EXPAND OUTSIDE IN TERMS OF THEIR ENVIRONMENT, AND AS IT AFFECTS THEIR TOXICITIES . WE LOOK AT TRYING TO CLOSE LOOP PATIENTS IN TERMS OF DATA. THE MOST IMPORTANT PART IS THAT WE'RE LOOKING TO -- AS PART OF ROUTINE DOCUMENTATION, CAPTURE OUTCOMES AND ASPECTS OF THEIR CARE AT THE POINT OF CARE. I THINK THIS IS A VERY UNIQUE KIND OF DATA SET THAT IT'S BEEN HARD TO GET OTHERS TO UNDERSTAND, THIS IS NOT RETROSPECTIVE DATA. RETROSPECTIVE DATA SUFFERS FROM THE ABILITY -- THE VALUE DETERMINATION IS ACTUALLY -- ITS VALUE IS DETERMINED RETROSPECTIVELY. SOMEONE GOES INTO A CHART AND DETERMINES VALUE, SURMISES OFTENTIMES, AND THAT CREATES UNCERTAINTY WITH REGARDS TO THEIR DATA. THIS DATA IS CAPTURED AT THE POINT OF CARE, NOT ONLY PROSPECTIVE BUT LACKS MEMORY BY AS, EVEN WITH CRFs THAT ARE INVOLVED. FOR THAT REASON I THINK THIS KIND OF DATA SET IS NOT ONLY GRANULAR BUT ACTUALLY HIGH QUALITY, BUT IT IS PRONE TO MISSINGNESS, AS WITH ANY DATA CAPTURE, BUT THESE ARE THE STRENGTHS AND WEAKNESSES. SO WE WANT TO TRY TO CLOSE LOOP IT. WE BUILD A LONGITUDINAL DATA SET WE THINK IS VERY VALUABLE. TOOLS ARE STRAIGHTFORWARD, BROWSER BASED, CUSTOMIZABLE, USING PULLDOWN MENU BUTTONS FOR THE RIGHT DATA VALUE, RIGHT DATA ELEMENT. WE'VE EVEN BUILT IN A SMARTPHONE LINK TO TAKE US TO A BROWSER THAT HAS A LIST OF OUR PATIENTS EVERY DAY AND DISEASE STATUS IS DEFAULTED TO THE LAST DISEASE ARE POPULATED, ALL WE HAVE TO DO IS CLICK ON THAT LINK, E-MAIL AT THE END OF MY CLINIC DAY, IT TAKES ME BEHIND THE FIRE WALL. IF COMPLETED, IT WILL BE MUTED OUT. IF NOT COMPLETED, IT'S THE SAME, THE CLINICIAN HAS TO CLICK VERIFY, VERIFY, VERIFY, IT'S DONE AND UPDATED. IF IT'S NOT, YOU MODIFY AND PULL DOWN AND CHANGE DISEASE STATUS IN TERMS OF WHATEVER VALUE YOU DETERMINE. IT'S THIS IDEA TRYING TO CREATE AGAIN TOOLS THAT WORK WITH OUR ROUTINE WORK FLOW. SO THIS IS THE PART WE'RE STILL TRYING TO ALSO CONTINUE TO IMPROVE. THE IDEA IS THAT IF WE'RE REALLY FOCUSED ON STRUCTURED ELEMENTS WE NEED TO BUILD SUCH AS PART OF ROUTINE DOCUMENTATION, NOT ONLY WITHIN RADIATION ONCOLOGY, ALSO WITHIN THE CONTEXT OF EHR REQUIREMENTS, THESE CONTINUE TO BE PROJECTS THAT ARE ONGOING. BUT THE IDEA IS STRAIGHTFORWARD, WE WANT TO TAKE THESE STRUCTURED ELEMENTS AND CREATE IT WITHIN THE TEMPLATE OF A PROSE DOCUMENT WE'RE USED TO FOR COMMUNICATION. THIS LAST SLIDE, THIS SLIDE IS AN IMPORTANT SLIDE, BELIEVE IT OR NOT, BECAUSE BJ FOGG TALKS ABOUT THIS. THE FUNDAMENTAL PROBLEM WE HAVE IS NOT IN SOME WAYS A TECHNICAL SOLUTION. THE FUNDAMENTAL PROBLEM IS A HUMAN BEHAVIOR PROBLEM. AS CLINICIANS WHO SEE PATIENTS WHO DETERMINE THE OUTCOMES OF PATIENTS, THE PROBLEM IS THAT WE'RE ALL CONSTRAINED BY DISTRACTIONS AND TIME. AND BEHAVIOR CHANGE IS LIMITED BY THAT. AND B.J. FOGG TALKS ABOUT THIS. IT'S A SIMPLE THING TRYING TO KEEP IT SIMPLE AND ALSO HAVE THE MOTIVATION. YOU HAVE TO THINK ABOUT THIS WHEN YOU'RE CREATING THAT STRUCTURE WITHIN THE ENVIRONMENT THAT YOU WORK WITHIN BECAUSE IF YOU DON'T, IF YOU CREATE AMBITIOUS DATA ELEMENTS THAT NEED TO BE CAPTURED, IN SOMEONE WHO IS NOT MOTIVATED OR SAYS OKAY TO A PROJECT, YOU'RE GOING TO NOT BE SUCCESSFUL. THIS IS AN IMPORTANT ASPECT THAT THERE'S AN INVERSE RELATIONSHIP, AND REQUIRES SOME REAL CAREFUL PLANNING AND THINKING ABOUT WHAT QUESTIONS YOU WANT TO START WITH. THIS SLIDE JUST HIGHLIGHTS THAT WE'VE CULTIVATED SUCCESSFULLY A MIND SHIFT IN HELPING PEOPLE TO UNDERSTAND THAT DATA IS REALLY A CURRENCY NOW. AND ONCE YOU ARE ABLE TO DO THAT, THEN IT BECOMES AND ENVIRONMENT THAT'S VERY HELPFUL. I'M SORRY I MISSED THE EARLIER TALKS, I'M NOT SURE WHETHER OR NOT ANYONE TALKED ABOUT THE STRUCTURE OF DATA. AND THIS IS SORT OF MY SLIDE PREVIOUSLY THAT THERE ARE STRENGTHS AND WEAKNESSES TO BOTH. AND BY NO MEANS DO I THINK STRUCTURED DATA SETS IS THIS COMPLETE SOLUTION, BUT I DO THINK THAT IT HELPS MINIMIZE A LOT OF SEMANTIC VARIABILITY THAT WE GET WITH NON-STRUCTURED DATA SETS AND WE'RE RUNNING NLPs, THAT HOPE TO THEN PULL OUT, ANALYZABLE DATA. SO WE STRUGGLE BECAUSE WE'VE BEEN CHALLENGED BY A LOT OF PEOPLE TO COME UP WITH A MINIMUM STANDARD. I DON'T KNOW IF THERE'S MINIMUM STANDARD. I'VE TAKEN AN ATTEMPT TO LIST IT HERE BUT I THINK IT'S DEFINED BY YOUR PURPOSE BEHIND THIS. AND I THINK THAT THAT'S WHERE YOU HAVE TO ASK YOURSELF WHAT IS THE RIGHT DATA SET TO BE LOOKING FOR. IN THE END, IN ORDER FOR DATA SCIENCE TO REALLY WORK, TO SEE DATA PATTERNS, YOU NEED LOTS AND LOTS OF DATA. AND THIS REQUIRES ONGOING YEARS OF EFFORT TO CURATE A LARGE ENOUGH DATA SET THAT TELLS THE PATIENT EXPERIENCE SO YOU UNDERSTAND THIS. SO WE'VE BEEN WORKING AT THIS. THIS TABLE IS UP TO DATE PROBABLY ABOUT TWO YEARS NOW. AND SO WE WERE AT 2.6 ASSESSMENTS. BUT IT'S -- THE NICE THING IS IT CONTINUES TO GROW. SO IN THE END, THIS SLIDE IS OBVIOUS TO THOSE IN THIS AUDIENCE, THE IDEA OF INSIGHT DISCOVERY IS BASED UPON IDENTIFYING PATTERNS IN THE DATA, AND THERE ARE VARIOUS MACHINE LEARNING AND NOW DEEP LEARNING APPROACHES TO THIS. SO I WANT TO NOW TAKE YOU A PRACTICAL EXAMPLE. BACK IN 2015 I THINK WE WERE ABLE TO -- WITH TODD AND AFTER A COUPLE YEARS OF GETTING THIS UP AND RUNNING, TODD AND I THINK IT WAS SCOTT ROBERTSON, RIGHT, TODD? THAT STARTED WORKING ON THIS FIRST ANALYSIS. THIS CURVE IS THE NATURAL HISTORY BACK IN 2015 OF OUR CTCAE TOXICITY. WE'VE BEEN ABLE TO UPDATE IT ROUTINELY. IT'S CONSTANTLY BEING SHOWN ALL THE TIME. IT'S REASSURING THAT WE'RE SEEING IMPROVEMENTS IN THE RATES. BUT THE BEAUTIFUL THING, WHEN WE LOOKED, WE SAW THAT THERE WAS A NATURAL IMPROVEMENT IN XEROSTOMIA AT SIX MONTHS. IT STARTED TO PLATEAU AT 18 MONTHS. YOU'RE SEEING SCOTT'S FIRST ANALYSIS, NORMALIZING DBH VOLUMES, SAYING IT TURNS OUT TO BE NOT SO MUCH AS MAYBE THE AVERAGE OR MAYBE MEDIAN BUT THERE'S A LOW DOSE BATH AROUND THE D 90, D 95, MORE SIGNIFICANT FOR XEROSTOMIA BASED ON PARTITIONS THOSE THAT HAD GRADE 2 OR MORE. WE RAN WITH THIS BECAUSE WE TREAT PATIENTS IN A CO-PLANAR MANNER. THE IDEA OF D95 BEING MORE SIGNIFICANT SUGGESTED A SUPERIOR PORTION OF THE PAROTID GLAND. THE SUPERIOR SUBVOLUME OF RAT PAROTID ARE MUCH MORE SENSITIVE, SO XUAN DID A CARD ANALYSIS USING A CROSS-VALIDATION, THIS CAN BE CRITICIZED FOR OVERFITTING BUT I SHOW THIS SLIDE FOR TWO REASONS. THIS WAS THE FIRST THAT I WAS AWARE OF UNDER ONE ROOF SHOWED INJURY WAS IMPORTANT IN D95 AND SECONDARY CONDITIONALLY FACTORS, PATIENT FACTORS THAT INCREASED RISK OF XEROSTOMIA. ON TODD AND PERNAUB AND WEI JING WHO DID HIS Ph.D. THESIS WITH TODD TOOK IT A STEP FURTHER. THEY CREATED THE ABILITY TO TAKE PATIENTS AND PUT INTO A COMMON REFERENCE FRAME AND REINDEX CT VOXELS, ANALYZED USING CTA GRADING AT THREE MONTHS VERSUS 18 MONTHS, AS A MEASURE OF WHETHER OR NOT THEY HAD INJURY OR THEY HAD RECOVERY. THERE WERE DIFFERENCES USING RANDOM FORCE ALGORITHM IN TERMS OF DOSE VOXELS MUCH MORE IMPORTANT FOR INJURY, AND ALSO PARTICULARLY FOR RECOVERY. IN THAT THOSE WHO RECOVERED TENDED TO ALSO HAVE A LOWER DOSE AS WELL TOO. AGAIN, SORT OF THE SAME STORY THAT EMERGED, LOW DOSE PARTICULARLY TO THE SUPERIOR PORTIONS OF THE PAROTID GLANDS WAS VERY IMPORTANT. TO FURTHER VALIDATE THIS, PATIENT WENT AND DID A DIFFERENT APPROACH AND AGAIN I THINK TODD WILL DISCUSS MORE OF THE SPECIFICS, BUT WHAT SHE DID WAS GEOMETRICALLY SEGMENTED THE PAROTID GLANDS. THIS IS AN INTERESTING BUT VERY COMPLICATED SLIDE THAT IS VERY ENRICHING, WHEN YOU LOOK AT IT. THIS IS A DIFFERENT WAY OF SHOWING TOXICITY, IT'S THREE DIMENSIONAL, IT'S BASED UPON USING A HEAT MAP APPROACH THAT ACTUALLY SHOWS NOT ONLY THE DOSE TO DIFFERENT ASPECTS OF DIFFERENT GLANDS THAT WE FELT WERE IMPORTANT FOR THE XEROSTOMIA GRADING BUT ALSO SHOWS IMPORTANCE MAP. YOU CAN SEE AT LEAST HIGHLIGHTED IS IPSILATERAL GLAND, YOU CAN SEE THIS WAS VERY IMPORTANT FOR THE SUPERIOR PORTION IN TERMS OF IMPORTANCE WEIGHTING. SO, TO US, WE'VE STARTED IMPLEMENTING AS AN ASIDE A D95 CONSTRAINT. WE WOULD LOVE TO BE ABLE TO INFLUENCE A GEOMETRIC CONSTRAINT AS WELL TOO. AND THIS IS HOW IT'S REALLY INFLUENCED HOW WE'RE APPROACHING THIS. AND TIME DOESN'T PERMIT, BUT WE'VE STARTED TO SEE REDUCTION IN XEROSTOMIA GRADING AS A RESULT SINCE WE'VE IMPLEMENTED THIS. I SHOW THIS SLIDE HERE REALLY, TO SHOW THE VALUE OF CT AND RADIOMICS, SHOWING RADIOMICS, THERE ARE DIFFERENT PATTERNS, CLASSIFIABLE USING HIERARCHICAL APPROACH. MORE IMPORTANTLY IN THE CONTEXT OF INJURY HE DEMONSTRATED ACTUALLY D95 SHOWED UP IN DBH FEATURES, WHEN YOU ADD INJURY AND BASELINE MEASURES OF THEIR PAROTID GLANDS THIS IMPROVED OUR MODEL IN TERMS OF BEING ABLE TO PREDICT FOR XEROSTOMIA. THIS IS A CONSISTENT THEME SEEN WITH M.R. THAT KADJA SHEIK HAS DONE AS WELL. THESE ARE GREAT EXAMPLS IN MY MIND OF HOW DATA SCIENCE HAS LED TO BETTER UNDERSTANDING OF HOW WE'RE INJURING PATIENTS AND HOW WE COULD POTENTIALLY IMPROVE THINGS. LASTLY I TOLD YOU WE STARTED TAKING IN LIVE DATA. WE CAN PULL IN THIS. WE GET A RICH DATA SET. WHAT WE'RE NOW SEEING IS THAT RECEIVE LYMPHOPENIA, IMMUNE SURVEILLANCE MODEL THAT'S IMPORTANT IN CANCER CARE HAS STARTED TO INFLUENCE THE RISK OF DISTANT RELAPSE SUCH THAT IF PATIENTS HAVE SEVERE LYMPHOPENIA LESS THAN .05, EVEN AT .2, THEY ARE AT GREATER RISK OF DISTANT RELAPSE. LAST EXAMPLE, WE HAVE TAKEN THE SAME DOSE VOXEL PATTERN, APPLY TO DYSPHAGIA, THERE ARE DIFFERENT PATTERNS IN THE PHARYNX, IT'S INTERESTING THE PAROTID GLAND IS ALSO IMPORTANT FOR DYSPHAGIA AS YOU CAN SEE HERE. IN SUMMARY, YOU CAN'T MANAGE WHAT YOU DON'T MEASURE. YOU HAVE TO THINK ABOUT THIS CAREFULLY. YOU GOT TO FIGURE OUT ALSO THAT NOT ONLY WHAT YOU WANT TO MEASURE BUT ALSO VALUE IT, VALUE IT AS LIKE A SQUIRREL, PUTTING IT AWAY, STORING IT, CREATE THAT MIND SET IN YOUR TEAM SO YOU'RE LOOKING AT THE IMPORTANCE OF THIS DATA. THERE ARE TOOLS THAT NEED DEVELOPED BEYOND ROUTINE WORKFLOW DOCUMENTATION THAT SINCE RADIATION ONCOLOGY IS FUNDAMENTALLY DATABASE DRIVEN THERE ARE OPPORTUNITIES TO CREATE AS I'VE SHOWN YOU BROWSER-BASED APPROACHES TO THIS TO CREATE SOME OF THE FLEXIBILITY. WE'VE DEMONSTRATED SORT OF THE VALUE OF THE GRANULAR DATA SET IN SHORT ORDER IN A COUPLE YEARS. AND THIS DATA CONTINUES TO CURATE OVER TIME, AND WE'RE CERTAINLY EXCITED TO BE ABLE TO BROADEN THIS, TO BE ABLE TO UNDERSTAND AND IMPROVE THE CARE OF OUR PATIENTS. THIS IS CLEARLY A LARGE GROUP EFFORT, TODD AND A NUMBER OF INDIVIDUALS THAT HOPEFULLY HAVE BEEN ABLE TO HIGHLIGHT IN THE SLIDE PRESENTATION. THANK YOU FOR YOUR PATIENCE. [APPLAUSE] >> CAN WE HAVE EVERYONE COME UP WHO JUST GAVE A TALK, AND THEN WE'LL OPEN UP FOR QUESTIONS FOR FIVE TO SEVEN MINUTES. >> GET TWO MICROPHONES SET UP. PLEASE INVITE SOME QUESTIONS AND SOME QUESTIONS. >> MAKE ERIKA AND JUSTIN, COULD YOU DESCRIBE SOME OF THE -- SOME OF YOUR TOOLS ON THE AWS CLOUD, OR GOOGLE CLOUD, HOW DO YOU WORK WITH SOME OF THE GOVERNMENT DATA SETS. YES, I KNOW YOU MENTIONED THAT GOOGLE KIND OF SUCKED IN ALL OF THAT INFORMATION BUT WHEN YOU HAVE TOOLS ON ONE TYPE OF CLOUD BUT YOUR RESOURCES ARE IN A DIFFERENT CLOUD COULD YOU TALK ABOUT HOW YOU KIND OF MAKE IT WORK? >> FOR CANCER RESEARCH DATA COMMONS WE'RE CURRENTLY ACTIVELY DEVELOPING CLOUD AGNOSTIC PLATFORM WHERE YOU CAN ACTUALLY UTILIZE TOLTS YOU DEVELOP ON ONE PLATFORM TO BE ABLE TO ACCESS ON THE OTHER PLATFORM. GDC DATA WE'RE CURRENTLY HOSTING A GOOGLE AND AMAZON, SO YOU CAN BRING TOOLS FROM EITHER OF THE PLATFORMS, BUT FOR PDCH CURRENTLY HOSTED AND ONLY AVAILABLE THROUGH AWS, SO WE'RE DEVELOPING WAYS FOR PEOPLE TO BRING IN CLOUD PLATFORM TOOLS SO YOU CAN BE CLOUD AGNOSTIC. IT'S HOPEFULLY COMING SOON. >> IT'S RELATIVELY DE-IDENTIFIED TO THE EXTENT THAT'S POSSIBLE DO YOU REQUIRE INDIVIDUALS WHO WANT TO PUBLISH TO HAVE SOME TYPE OF IRB APPROVAL PROCESS? OR THAT'S NOT REALLY RELEVANT FOR THIS TYPE OF ANALYSIS? >> FOR MOST OF THE HUMAN DATA, CONSENT AND IRB APPROVAL PROCESS HAS BEEN ALREADY BEEN IMPLEMENTED BY THE PROGRAM ITSELF. SO WHAT WE'RE OFFERING IS A SECONDARY DATA SHARING. YOU STILL NEED TO GO THROUGH DBGaP APPROVAL PROCESS BUT OPEN DATA IS AVAILABLE. >> THANK YOU. >> INSTITUTION AT YALE, WE USE SECONDARY DATA SETS, WE STILL WANT IRB APPROVAL SO THAT'S A GOOD POINT. >> I HOPE THIS ISN'T TOO COMPLICATED, GOVERNMENT GROUPS, FEDERAL LEVEL, THESE ARE MOSTLY RESEARCH TRIAL DATA, DATA THAT HAS LIKE A HIGH LEVEL OF CURATION DONE TO IT BEFORE IT CAN BE POSTED UP THERE. MY QUESTION TO YALE, WHEN WE START LOOKING AT OUR CLINICAL SYSTEMS AND BUILDING DATA LAKES AND PULLING DATA OUT OF THE SYSTEMS ONE OF THE CHALLENGES ESPECIALLY WITH RADIATION ONCOLOGY WHERE I THINK WE FEEL OUR DICOM RT TYPICALLY IS AN ARCHIVE OF TREATMENT PLANS, NOT A RECORD OF WHAT WE DELIVERED TO THE PATIENT. AND I THINK WE, AS A FIELD, DO A TERRIBLE JOB OF HAVING AN ARCHIVE OF WHAT WE ACTUALLY DELIVERED TO THE PATIENT. PATIENT MISSES A FRACTION OR WE CHANGE THE PLAN MID-COURSE, HOW MUCH DO WE FIX THAT? WE'VE TAKEN TO USING HIGHLY INTELLIGENT HIGH SCHOOL KIDS TO DO THAT DURING THE SUMMERTIME, BUT IT'S NOT A SMALL TASK. SO, I'M CURIOUS AS A FIELD SHOULDN'T WE BE MOVING TOWARD REALLY HAVING OUR DICOM RT REPORT BE THAT OF DELIVERED DOSE AND WHAT YOU GUYS MIGHT BE DOING FOR THAT? >> SO SIMILARLY WE EMPLOY COLLEGE STUDENTS TOO, YEAH. >> THEY ARE PRETTY SMART. >> FOR THAT. IT'S SOMETHING -- I THINK THAT'S ALSO AN ISSUE RELATED TO VENDORS SO TREATMENT DELIVERY IS DIFFERENT THAN TREATMENT PLATFORM. I THOUGHT THAT MAYBE IF THOSE WERE TALKING TO EACH OTHER THAT MIGHT BE THE REASON IT COULD WORK POTENTIALLY BETTER. YEAH, WE'VE HAD THE SAME ISSUE OCCUR IN A FEW PROJECTS. WHAT ENDS UP HAPPENING YOU AVOID DOING THE PROJECTS UNTIL YOU REALLY WANT TO. >> AGAIN, THE REASON I WANT TO MAP THAT BACK TO THE FEDERAL, THE GOVERNMENT INSTANTIATION, IF WE WANT TO GET TO BIG DATA WE HAVE TO REALLY MARRY THE CLINICAL ROUTINE DATA TO THAT DATA, AND SO I THINK THERE'S A GAP. YOU CAN'T JUST ELECTRONICALLY PULL IT. THERE'S A LOT OF STUFF GOING ON TO MAKE IT SO WE CAN JUST ABSORB THE MORE CLINICAL DATA THAT WE USE EVERY DAY. >> ROCHELLE? >> A QUESTION FOR SANJAY. FOR YOUR DISTRIBUTED LEARNING WHAT DID YOU DO TO STANDARDIZE WHAT YOU'RE SHARING BETWEEN INSTITUTIONS? >> GOOD QUESTION. IT WAS A LITTLE BIT PROBABLY NOT AS CLEAR. WHAT WE'VE DECIDED TO DO, WE DECIDE THE MODEL ARCHITECTURE UP FRONT AND BASICALLY DECIDED WITH ALL THE PARTNERS WE SHARED THE CODE FOR THE MODEL STRUCTURE AND DEVELOPED A SCRIPT THAT EXPORTS THE WEIGHTS. IT'S THE EQUIVALENT OF RELOADING THE MODEL AND TRAINING AT A DIFFERENT INSTITUTION. YOU COULD DO THAT AT YOUR OWN INSTITUTION. WE STARTED DOING IT BECAUSE WE WOULD PULL OUR MODELS OFF THE SERVER WHEN THEY WOULD GET UPGRADED, RESTART THEM, IT'S THE SAME IDEA IF YOU RESTART SOMEWHERE ELSE AND SO WE SHARE THAT PIECE OF THE MODEL. SO YOU HAVE TO DECIDE THE MODEL ARCHITECTURE AT THE BEGINNING AND WEIGHT PARAMETERS, SUPPORTED AS A MATRIX. WE DO ENCRYPTION ON BOTH SIDES TO BE EXTRA SAFE BUT I DON'T THINK WE NEED IT BECAUSE THERE'S NOTHING THAT IS USEFUL FROM THE WEIGHTS, IT'S JUST A BUNCH OF MATRIX OF NUMBERS. >> DO YOU REQUIRE RAW DATA TO BE SIMILAR FORMAT IN ORDER TO RUN YOUR SCRIPT? >> OH, YEAH, I WILL SAY WE DO USE DICOM DATA, WE'VE ONLY SHOWN IN A MODEL WHICH WE HAVE A COMMON DATA SOURCE, I DO THINK IT WOULD BE DIFFICULT TO SCALE THIS FOR LIKE EHR DATA WHERE EVERYONE IS DOING EVERYTHING DIFFERENTLY, WITH DICOM IT WORKS. >> ONE MORE QUESTION, PANELS WILL BE AVAILABLE AT THE BREAK. >> SO AS WE'VE CONSTRUCTED OUR SYSTEM PULLING OUR DATA AND LIKE YOU WE FIND STANDARDIZATION IS REALLY IMPORTANT AND SO WE HAVE TO GET BUY IN FOR CLINICIANS TO CHANGE THEIR PROCESSES TO ENTER THE DATA IN A STANDARDIZED FASHION, AND EVEN AS WE START TO PRESENT THAT DATA WE START TO GET STRUCK WITH A LITTLE BIT OF A CONUNDRUM, WE'RE ALL ENTHUSIASTIC IN THIS ROOM BUT PEOPLE WE SHOW DATA TO I FIND THE PERSPECTIVE THAT REAL SCIENCE, SCIENCE THAT GETS FUNDED, FOLLOWS A CLINICAL TRIAL MODEL. WHEREAS THE DATA THAT WE'RE STARTING TO PULL FORWARD AND SEE VALUE IN IS MORE OBSERVATIONAL IN APPROACH, AND GETTING OVER THAT HUMP TO EDUCATE THE PEOPLE AROUND US TO CONVINCE FUNDING SOURCES THAT IN FACT THIS IS VALUABLE TO PUT FORWARD, HOW DO WE GET OVER THAT HUMP SO WE'RE NOT THE ONLY ENTHUSIASTS FOR THIS KIND OF DATA, AND IT IS REGARDED AS REAL SCIENCE? >> WELL, RANDOMIZED TRIAL, THE TWO ARE COMPLEMENTARY, EXTERNAL VERSUS INTERNAL VALIDITY, RIGHT? RANDOMIZED TRIAL EXPERIMENTAL DESIGN IS REALLY DEALING WITH THAT INTERNAL VALIDITY. >> YEAH. >> IT'S BEEN VERY WELL SHOWN IN NUMBER OF PAPERS THAT THAT RANDOMIZED TRIAL DESIGN THE WHITE ELEPHANT THAT IT'S GREAT FOR THAT POPULATION BUT IT'S NOT GENERALIZABLE. AND TO ME, IT'S NOT ONE OR THE OTHER. IT'S BOTH. REALLY, WHETHER OR NOT IT STANDS THE GENERALIZABLE DATA SET TO SHOW THE VALUE OF THIS INTERVENTION. AND SO I THINK WE'RE STILL EARLY ON IN THE CLINICAL UNDERSTANDING OF THE VALUE OF THIS APPROACH. REALLY, WE'RE VERY EARLY. WE'VE MET WITH A LOT OF SKEPTICISM ABOUT THIS DATASET. BUT I THINK THAT THOSE IN THIS ROOM ARE CONVERTS. THOSE ARE OUTSIDE. YOU'RE ABSOLUTELY RIGHT. IT TAKES TIME. >> THE TIDE IS SHIFTING TOWARD MORE ACCEPTANCE OF SIGNED OF DATA IN THE OBSERVATIONAL FORMAT BEING REAL WORLD FORMAT BUT I AGREE, STRUGGLE, DEVELOP AN ALGORITHM WITH HOME INSTITUTION, THAT WILL CHANGE WITH THE FDA, THAT'S CHANGING WITH THE FLATIRON PRESENTED INTERESTING DATA THAT SUGGESTED THE REAL WORLD EVIDENCE AREN'T AS EFFECTIVE, THAT MAY WILL CHANGE THE CULTURE AS WE MOVE FORWARD. YEAH, I AGREE. >> THE WHITE PAPER PROVIDES SUGGESTION THERE'S OPENNESS, THINKING ABOUT ALTERNATIVE WAYS OUTSIDE OF THE RCT WHICH A LOT OF PEOPLE ARE ACCEPTING NOW AS REALLY NOT THE ONLY WAY TO ACCELERATE APPROVALS IN THE DRUG SPACE AND WE SHOULD BE SIMILARLY THINKING ABOUT STRATEGIES TO DO THAT WITHIN OUR FIELD AS WELL. >> ALSO I THINK IF WE'RE TALKING ABOUT FUNDING PURELY IT MIGHT BE WORTHWHILE TO EXPLORE OTHER AVENUES BESIDES TRADITIONAL NIH, THEY ARE INTERESTED IN AREAS IN ORDER TO LEVERAGE REAL WORLD EVIDENCE, GENERATION, AND SO THEN SOME OF THE STUFF WE'VE GOTTEN FUNDED IS THE IDEA WE'LL BUILD THE PRODUCT THAT CAN -- OUR DATA LAKE WAS FUNDED FROM AN ONC GRANT, YOU CAN USE THE MONEY WHATEVER WAY YOU WANT TO. THE NIH IS THE NEXT STEP BUT BUILDING THE TOOLS, THEY STARTED FUNDING THOSE ASPECTS. >> WE'LL TAKE A 10-MINUTE BREAK AND START THE SECOND SESSION. >> THE SECOND SECONDS BUILDS UPON INFRASTRUCTURE TO TALK ABOUT REAL WORLD DATA AND EVIDENCE GENERATION, WE'RE GENERATING ALL THIS DATA AND UNSTRUCTURED AND STRUCTURED DATA OFTENTIMES FROM PATIENTS THAT ARE NOT ENROLLED IN CLINICAL TRIALS OR OTHER PROTOCOLS, AND HE THINK THAT PIECE, REAL WORLD DATA AND DATA SOURCES, IS AN ALL-ENCOMPASSING TERM AND THE -- I THINK THOSE TERMS RWD AND RWE GET THROWN AROUND A LOT. IT'S IMPORTANT TO UNDERSTAND THAT DATA AND EVIDENCE GENERATION THAT WE GET FROM THE DATA ARE TWO DISTINCT PIECES, AND SO I THINK THAT WE'LL KIND OF BUILD UPON THAT THEME IN THIS PARTICULAR SESSION. I'LL INVITE OUR FIRST SPEAKER, DR. ADAM DICKER, DIRECTOR OF THE JEFFERSON CENTER FOR DIGITAL HEALTH AND DATA SCIENCE, SYDNEY KIMMEL CANCER CENTER AT THOMAS JEFFERSON UNIVERSITY. DR. DICKER. >> APPRECIATE IT. I'M GOING TO TRY TO BUILD ON EVERYTHING YOU HEARD FROM SANJAY, FROM HARRY, AND OTHERS. MY SLIDES ARE AVAILABLE TO ANYONE WHO WANTS THEM. SEND ME AN E-MAIL OR GIVE ME YOUR CARD AND I'M HAPPY TO GIVE YOU A COPY OF THE SLIDES. THESE ARE MY DISCLOSURES, ET CETERA. I JUST WANT TO BE KIND OF HAVE AN ATTITUDE FOR GRATITUDE. I'VE BEEN PRACTICING, I'M STILL PRACTICING, FOR ABOUT 25 YEARS, AND THERE HAVE BEEN MANY PEOPLE ALONG THE WAY WHO HAVE BEEN VERY HELPFUL, A LOT OF PARTNERSHIPS, NOT FOR PROFIT SPACE, PROFIT SPACE, FEDERAL GOVERNMENT, WHO HELPED ME AND ALLOWED THE WORK. I'M NOT GOING TO SHOW EVERYTHING THAT RELATINGS T O -- RELATES TO THE SLIDE. MY PERSPECTIVE IN NRG RELATES TO TRANSLATIONAL SCIENCE, WORK IN THE TRANSCRIPTOMIC END OF THINGS. WE AND MANY OTHERS HAVE COLLABORATED AND POOLED OUR DATA, AND WORKED WITH A COMPANY AND WORKED WITH DIFFERENT DATA SETS WE HAD, BOTH AT LOCAL INSTITUTIONS AND NOW WE'RE USING IT TO DRIVE NRG TRIALS. AND THIS WORK GOT FDA APPROVAL, GOT CMS APPROVAL, I HAVE NO FINANCIAL INTERESTS WITH THE COMPANY BUT IT'S TAUGHT ME A LOT. I SERVE ON THE NCI CORRELATIVE SCIENTIST, CO-CHAIR THE CTAC COMMITTEE. I'M ON THE INVESTIGATIONAL DRUG STEERING COMMITTEE. I HAVE A COUPLE DIFFERENT PERSPECTIVES. I WANT TO TALK ABOUT -- AND THERE ARE DIFFERENT STAKEHOLDERS IN THE ROOM. AND WE DEFINITELY -- I'M APPRECIATIVE OF THE IDEA THAT WE HAVE TO BUILD A COMMUNITY. I'M SO SUPPORTIVE OF THAT CONCEPT. I'M GOING TO TALK ABOUT CURRENT STATE ISSUES, JUST SO EVERYONE AS THEY THINK ABOUT THINGS AND HOW DO YOU NAVIGATE THE SYSTEM, I'LL TALK A LITTLE BIT ABOUT EDUCATION AS IT SPANS WHAT I SEE IN THE EDUCATIONAL SPACE, AND SOME UNMET NEEDS, ET CETERA. YOU'VE SEEN THIS SLIDE BEFORE. THE COOPERATIVE GROUPS, OTHERWISE KNOWN AS NCTN. THERE WAS THIS EFFORT TO CONSOLIDATE. THERE WAS A BIT OF MUSICAL CHAIRS, TEN GROUPS TO FOUR GROUPS, NRG -- I CAME FROM THE -- I CHAIRED THE TRANSLATIONAL RESEARCH PROGRAM, RTOG, BECAUSE OF THAT LEGACY ASSOCIATION, YOU KNOW, I BECAME PART OF NRG. AND MIKE AND MATT ARE MY CO-CHAIRS FOR TRANSLATIONAL SCIENCE. TWO YEARS AGO WHEN THE GRANT WAS GOING IN, I FELT WE NEEDED TO PIVOT TO DIGITAL HEALTH SPACE, AND I JUST -- LIKE SANJAY MENTIONED, I SAID, YEAH, WE NEED TO CREATE THIS COMMITTEE, WALLY CURRAN WAS ALL FOR IT, I WROTE STUFF FOR THE GRANT, FORTUNATELY IT STORED WELL AND I'VE BEEN WORKING WITH DON AT SWANN ABOUT DIGITAL HEALTH STUFF AND SOME OF THE OTHER GROUPS. SO THIS IS THE BIG PICTURE, RIGHT? BUT TO UNDERSTAND, THERE ARE A LOT OF WORKING PIECES IN THIS PICTURE. THERE'S ALSO SEPARATE GRANTS. THERE'S BIOREPOSITORY, DATA STREAMS, WHETHER IT'S TRANSCRIPTOMIC DATA STREAMS, YOU KNOW, THINGS THAT WORK WITH THE CMACS, IMMUNO-ONCOLOGY SPACE, ET CETERA. THIS IS THE DATA FLOW. IT LOOKS MESSY BECAUSE IT IS A LITTLE MESSY, RIGHT? SHARON HEARTSEN FROM NRG DID THIS FOR ME THE OTHER DAY. THERE'S STUFF THAT GOES INTO METADATA, RAVE, THERE'S DIFFERENT I DON'T KNOW -- THERE'S THE STUDY TEAM WORKING, THE COOPERATIVE GROUPS ARE MACHINES TO RUN TRIALS, RIGHT? THAT'S THEIR KIND OF MAJOR KIND OF ROLE IN LIFE. THERE'S THE BIOSPECIMENS, BIOREPOSITORIES. I'M HAPPY TO GIVE THIS TO ANYBODY. THEN THERE'S IROC, I'LL TOUCH ON THAT, AND AMY IS HERE. AND MAYBE WE'LL TALK MORE ABOUT IROC. THERE ARE OTHER DATA -- THIS FLOWS TO OTHER DATA REPOSITORIES, SOME OF WHICH YOU HEARD FROM TODAY. I WANT TO GIVE A PLUG TO THE FOLKS AT PROJECT DATASPHERE, THERE ARE A FEW HERE TODAY. AND THIS IS AN INCREDIBLE RESOURCE, AND SOME OF THE NCTN STUFF FLOWS IN, HOPEFULLY MORE, THERE'S SOME MONEY ISSUES THAT WOULD ENABLE MORE STUFF TO FLOW IN, AND THEN IN TERMS OF WHAT FLOWS IN, IS IT OUTCOME, PRO DATA, NOT AS MUCH IMAGING DATA IS PART OF THAT. I'LL LET THE DATASPHERE TALK ABOUT THAT. IT'S AN INCREDIBLE RESOURCE, THERE'S A GREAT TEAM BEHIND THAT. A LOT OF THE RTOG AND NRG STUFF IS IN DATA SERVE BUT NOT AS MUCH AS WE WOULD LIKE. AND HOW WE ORGANIZE. WELL, WE'RE THREE LEGACY GROUPS. WE WERE QUASI-DYSFUNCTIONAL BEFORE. NOW WE'RE MORE DYSFUNCTIONAL AS WE CAME TOGETHER. WE'RE STILL WORKING OUT A LOT OF THESE ISSUES BECAUSE WE HAVE A LOT OF DUPLICATIVE STUFF. WE'RE NOT HARM HARMONIZED, AND THE TRANSLATIONAL SCIENCE THAT I LEAD IS PART OF THE THIS NON-SCIENTIFIC SITE COMMITTEE STUFF. BUT IT'S HARD, RIGHT? BECAUSE OF ALL THESE KIND OF LEGACY SYSTEMS THAT HAVE BEEN IN PLACE. BUT ONE THING WE DO DO, AND I APPRECIATE THAT WE NEED TO BE -- WE NEED TO TREAT IMAGING AS A BIOMARKER, SOUNDS OBVIOUS, RIGHT? SO A LOT OF THE STUFF WE DO IS EDUCATIONAL AT THE MOST RECENT MEETING, WHICH WAS LESS THAN A WEEK AGO, YOU KNOW, I FOCUSED ON RADIOMICS AS PART OF OUR SESSION AND SANJAY, WHO IS HERE TODAY, SPOKE, A FORMER TRAINEE FROM JEFFERSON, WORK IN GLIOBLASTOMA, AND WORK IN THE BREAST CANCER SPACE, ALSO LUNG CANCER, THEY SPOKE. IT WAS WONDERFUL. WE NEED TO DO MORE. IN THE DIGITAL HEALTH SPACE, WE TOOK ADVANTAGE OF SANJAY AGAIN WHO TALKED MORE ABOUT A.I. AND VOICE AND THAT'S ANOTHER DATA STREAM THAT WE COULD POTENTIALLY TALK LATER TODAY ABOUT, JULIAN HONG, AND A PATCH THAT STREAMS TEMPERATURE REMOTELY. IT'S BLUETOOTH. AND A LOT OF -- WHEN YOU THINK ABOUT HOSPITAL IN THE HOME AND HOW DO YOU MOVE CAR T CELLS IN INPATIENT TO OUTPATIENT, THESE TYPE OF TECHNOLOGIES CAN BE HELPFUL. WE'RE INTERESTED IN BRINGING THIS TYPE OF DATA SCIENCE AND INFORMATICS INTO THE COOPERATIVE GROUP, THIS IS ONE ATTEMPT OF MANY. I WANT TO GO BACK TO IROC. SO, DAVID FALWELL, THE P.I. OF THE GRANT, WAS GENEROUS WITH THESE SLIDES. SO THERE'S YET ANOTHER GRANT THAT SUPPORTS A LOT OF THIS WORK, IT'S DISTRIBUTED AROUND THE COUNTRY, AND ING IS HERE FROM PHILADELPHIA. AND THERE'S A LOT OF THINGS THAT IROC DOES. THEY DO CREDENTIALING WORK, SHIP ANTHROPOMORPHIC PHANTOMS SO THE DOSE YOU THINK YOU'RE GIVING YOU CAN FIND OUT, IT'S IMPORTANT, IF YOU WANT TO CREDENTIAL SITES TO DO SOPHISTICATED RADIATION THERAPY. I'LL MENTION EVEN UNSOPHISTICATED RADIATION THERAPY WE PUBLISHED A PAPER THAT SHOWED MAJOR/MINOR DEVIATIONS FROM CLINICAL TRIALS LEADS TO SIGNIFICANT IMPACT ON SURVIVAL, AND THIS WAS IN THE 2D ERA, PEDIATRIC AND LUNG CANCER AND PANCREAS TRIALS. AND, YOU KNOW, THERE WERE JUST MAJOR/MINOR DEVIATIONS THAT -- NOT GIVING ADEQUATE DISTANCE FROM THE VERTEBRAL BODY WHEN TREATING KIDS. AS YOU USE NON-CO-PLANAR BEAMS AND OTHER SOPHISTICATED APPROACHES IT'S SO IMPORTANT TO CREDENTIAL SITES. THIS IS THEIR FLOW, RIGHT? SO WHEN YOU THINK ABOUT WHERE IS THE DATA GOING, YOU KNOW, THERE'S ALL SORTS OF IMAGING DATA. NOW, IROC HAS ALL THE CT, YOU KNOW, DOSIMETRY TREATMENT PLANNING DATA PRE-TREATMENT. WHAT THEY DON'T HAVE IS THEY DON'T HAVE ALL THE POST-TREATMENT DATA, RIGHT? STANDARD OF CARE WHEN YOU TREAT A PATIENT WITH GBM, IMAGING EVERY TWO MONTHS, WITH LUNG CANCER MAYBE EVERY THREE TO FOUR MONTHS DEPENDING ON THE SITUATION. SO THERE'S THIS WHOLE VAST AMOUNT OF DATA POST-TREATMENT THAT WE DON'T HAVE. THEN THERE'S THE DATA WHEN A PATIENT'S GETTING BEAM CT, EACH SITE IS GENERATING HUGE AMOUNTS OF DATA AS PART OF THE, YOU KNOW, JOURNEY, JUST GETTING RADIATION. SO THERE'S TONS OF STUFF THAT WE WOULD LIKE TO CAPTURE THAT WE'RE CURRENTLY NOT CAPTURING RIGHT NOW. A COMMENT ON EDUCATION, SO I WAS THINKING ABOUT AN R25 GRANT, HOW DO YOU EDUCATE PEOPLE IN SPACE, SPEAKING TO THE WOMAN IN CHARGE OF THAT, SHE SAYS WHAT'S THE UNMET NEED, I SAID, WELL, I THINK THERE'S AN UNMET NEED, PROMPTING US WORKING GETTING HELP FROM PEOPLE AT ASTRO, TYLER BECK AND JUDY KEENE AND KENT FROM DANA-FARBER, YOU KNOW, TOOK THE LEAD ON THIS. BASICALLY WE WENT TO DEPARTMENT CHAIRS AND SAID TRAINING, INFORMATICS, GENOMICS, WE THREW IT IN THE SAME BUCKET. SUFFICIENT? THE THOUGHT WAS NO. WHEN WHEN YOU ASK THE RESIDENTS, DEPENDING WHERE THEY WERE IN THEIR TRAINING, MOST AGREE THAT IT'S REALLY EITHER AGAIN ABSOLUTELY INSUFFICIENT, WHO CARES HOW YOU LABEL IT, THERE'S A SIGNIFICANT UNMET NEED. WILL UP HELP ADVANCE YOUR CAREER? THESE ARE PEOPLE MORE FOCUSED IN RESEARCH, BUT IT'S WEIGHTED TOWARDS THOSE PEOPLE, WE DIDN'T HAVE A SELECTION BY AS, IT WAS GIVEN TO ALL THE RADIATION ONCOLOGY RESIDENTS, CURRENTLY IN TRAINING. AND THE OVERWHELMING MESSAGE IS THERE'S AN UNMET NEED, AND WE'RE THIRSTY, AND WE'RE HUNGRY. AND THEN A LOT OF IT HAS TO DO WITH FUTURE PROJECTS THAT THEY LIKE TO TACKLE, AND ARE YOU INTERESTED IN TRAINING PROGRAMS AND COURSES. THE ANSWER IS YEAH. WE'RE INTERESTED. SO THEN WE HAVE ASK THE CHAIRS WOULD YOU SEND YOUR PEOPLE TO THIS? THE ANSWER WAS YES. I KNOW IT SOUNDS LIKE A SETUP, BUT SOMEONE HAD TO DOCUMENT THAT THERE'S A NEED OUT THERE SO WE DID IT AND GOT PUBLISHED JOURNAL OF CLINICAL ONCOLOGY, CLINICAL CANCER INFORMATICS. AND I'M JUST GOING TO GO WITH THAT. I THINK WHAT YOU'RE HEARING IS RIGHT NOW THERE ARE SILOS, THERE ARE SILOS WITHIN THE SILOS IN TERMS OF HOW PEOPLE VIEW LIFE. EVERYONE NEEDS AN IDENTITY. WHETHER YOU COME FROM DESIGN BACKGROUND, HEALTH CARE BACKGROUND, TECHNOLOGY BACKGROUND, MATH DATA SCIENCE BACKGROUND. THE GOAL FOR EVERYONE IN THIS ROOM WITHOUT BEING TOO PRESUMPTUOUS, WE WANT TO CREATE A CROSS-TRAINED INDIVIDUAL, RIGHT? I HAD A CONVERSATION LAST WEEK WITH AN EXECUTIVE, M.D. AT MICROSOFT WHO IS RETIRING, DENNIS, HE SAID AT MICROSOFT, THERE ARE SOME INSTITUTIONS THAT ARE A.I. READY AND SOME INSTITUTIONS THAT AREN'T. AND -- NOW THEY HAVE A BIT OF SKIN IN THE GAME. THOSE INSTITUTIONS A.I. READY HAVE AN ADVANTAGE TO INSTITUTIONS THAT DON'T, RIGHT? I THINK WE WANT TO CREATE A FUTURE STATE WHERE YOU HAVE CROSS-TRAINING, NOT SO TRIVIAL, RIGHT? ONE OF THE THINGS I STARTED AT JEFFERSON A FEW YEARS AGO, I WAS NAVIGATING AND SELF LEARNING ABOUT DIGITAL HEALTH, WE CREATED A DIGITAL HEALTH AND DATA SCIENCE CENTER, WE STARTED A COURSE IN THE MEDICAL SCHOOL, RIGHT? WE GIVE MED STUDENTS FIT BIT, ANONYMOUSLY DATA AGGREGATE, TEACH HOW TO ANALYZE, HARRY TALKED ABOUT BEHAVIORAL ECONOMICS, RIGHT? SO BEHAVIOR CHANGE IS SO IMPORTANT IN WHAT WE DO, HOW DO YOU DO IT? WE TEACH THEM ALL DIFFERENT -- IT'S A SURVEY COURSE, AND ANNA MARIE CHANG HELPED ME, AND THEY ARE INCREDIBLE. YOU GET MEDICAL STUDENTS WHO DO GREAT PROJECTS, SOME AS RELATES TO HERE, YOU KNOW, ARE DOING ALL SORTS OF NEURAL NETWORK CLASSIFICATION SCHEMES, HAVEN'T BEEN ABLE TO CONVINCE ANYONE TO GO INTO RADIATION ONCOLOGY YET BUT I'M REALLY SUBTLE. IT TELLS YOU THERE'S AN INTEREST AND HUNGER AMONGST UNDERGRADUATE MEDICAL POPULATION. AT THE GRADUATE LEVEL WE RECOGNIZE THERE'S ALSO THESE NEEDS. WE CREATED, IT'S ROLLING OUT THIS SEPTEMBER, THERE ARE PEOPLE INTERESTED IN DIGITAL HEALTH ENTREPRENEURSHIP, DESIGN AND COMMUNICATION AND BLOCKCHAIN WHICH IS OUR MOST POPULAR COURSE RIGHT NOW. AND WE WANT TO HAVE ONE FOR A.I. IN FACT WHAT WE'RE REALLY GEARING UP TO DO IS TO HAVE A MASTER'S DEGREE IN HEALTH CARE DELIVERY SCIENCE WITH MINOR IN A.I. BECAUSE I THINK THAT'S WHAT A LOT OF THE PEOPLE IN THE ROOM ARE TOUCHING ON, RIGHT? WHEN I TALK ABOUT TRAINEES, YOU KNOW, I COULD BE A PHYSICIAN EDUCATOR, PHYSICIAN-SCIENTIST, PHYSICIAN-CLINICAL RESEARCHER, BUT I THINK NOW IT'S WIDE OPEN. YOU COULD BE A LOT OF THINGS. WE'RE HERE FOCUSED ON THIS ONE HERE. HOW DO YOU PROMOTE THAT? YOU CAN PROMOTE IT, RIGHT? SANJAY TALKED ABOUT FEDERATED MODELS. I THINK WE NEED HELP WITH FEDERATED MODELS IN TERMS OF NCTN AND HOW WE GET WITH THE HELP OF THE GOVERNMENT, HOW DO WE GET THAT DATA AVAILABLE, AND RIGHT NOW ESPECIALLY IN THE POST-TREATMENT SETTING BECAUSE THERE'S A HUGE UNMET NEED THERE. SO FEDERATED MODEL MIGHT BE AN APPROACH. I'M NOT PROPOSING IT'S THE APPROACH, RIGHT? I THINK WE HAVE DIFFERENT DATA STREAMS AND NOW WE'RE DOING TRIALS WHERE WE'RE COLLECTING DATA STREAMS FROM WEARABLES AND OTHER THINGS. WE CERTAINLY ARE INCORPORATING IN THE PRE-TREATMENT SPACE ALMOST ALL OUR GU TRIALS HAVE SOME GENOMIC CLASS FIRE USED FOR CLASSIFICATION PURPOSES. WHERE WE LIKE TO DO, AND WORKING ON THE CLINICAL TRIAL USING RADIOMICS TO GUIDE THE NEUROSURGEONS IN TERMS OF PATIENTS GOING FOR RESECTION AND GBM. WE'D LIKE TO SEE MORE CLINICAL TRIALS BEING DRIVEN BY RADIOMICS, IN TERMS OF PATIENT STRATIFICATION LIKE WE HAVE WITH TRANSCRIPTOMICS RIGHT NOW. SO IN CONCLUSION, I THINK WE NEED GRANTS FOR EDUCATION. I THINK WE NEED HANDS-ON WORKSHOPS. I THINK WE NEED TO BUILD THE COMMUNITY BECAUSE YOU NEED A MULTI-DISCIPLINARY TEAM TO DO THE WORK. NO ONE TYPE OF INDIVIDUAL CAN DO IT. I THINK AT THE UNDERGRADUATE, GRADUATE, AT THE PHYSICS LEVEL, ACGME, THIS TYPE OF TRAINING DOES NOT EXIST. I MENTIONED THAT IN CHINA THERE'S NO SUCH THING AS A DOSIMETRIST, NO SUCH THING AS CERTIFICATION OF A PHYSICIST, SO AROUND THE WORLD ENERGY AND ALL THE OTHERS, WE CONDUCT TRIALS, SITES IN IRELAN, KOREA, ISRAEL, OTHER PARTS OF THE MIDDLE EAST. SO, YOU KNOW, WE'RE NOT JUST A NATIONAL ORGANIZATION. BUT THE EDUCATION IS NOT KEEPING UP. WE NEED HELP. I TALKED ABOUT THE POST TREATMENT, I'M VERY APPRECIATIVE OF THE COMMUNITY. LASTLY, YOU KNOW, JUST TO RIFF ON ERIC TAUPEEL, THE GOAL IS NOT THAT WE DECREASE EMPATHY. THE GOAL IS THAT A LOT OF REDUNDANT STUFF THAT'S TIME CONSUMING OVER IS TAKEN OVER BY A MORE EFFICIENT WAY TO DO THINGS, INCREASE EMPATHY AND EFFICIENCY AND EQUITY OF DELIVERING HEALTH CARE TO ALL. WITH THAT I THANK YOU. [APPLAUSE] >> I'D LIKE TO BRING UP THE NEXT SPEAKER, DR. CLARA LAM WHO IS GOING TO TALK ABOUT NCI SEER PROGRAM AND INCORPORATING RADIATION ONCOLOGY DATA. I HAD THE PLEASURE OF BEING INVOLVED IN THIS PROJECT, IT'S DOING GREAT THINGS. CLARA? >> THANK YOU. GOOD MORNING. I STILL THINK IT'S MORNING. I'M CLARA LAM, SPEAKING ON BEHALF OF SEVERAL PEOPLE THAT ARE PART OF THIS SEER PROGRAM, PART OF THE RADIATION ONCOLOGY TEAM THAT ARE TRYING TO PUT THIS EFFORT IN TO GET SOME LINKAGES GOING. THE FUN OBJECTIVE SLIDE, A BRIEF BACKGROUND ON SEER, AND THEN TALKING ABOUT CAPACITY IN SEER AND HOW WE'RE TRYING TO EXPAND THAT AS WELL AS SOME NEW INITIATIVES WE'RE TRYING TO WORK WITH AND VERY PRELIMINARY RESULTS, SHOWING SOME PROMISE AND HOW WE WANT TO ENHANCE OUR DATA. SO THE SEER PROGRAM FUNDED BY NCI IN 1973, WE PROVIDE FUNDING AND SUPPORT TO REAL HELP WITH TREATMENT AND DIAGNOSIS OF CANCER, WE COVER 35% OF THE U.S. POPULATION. WITH THESE REGISTRIES WE BRING IN JUST OVER HALF A MILLION NEW INCIDENT CASES ANNUALLY. AND ABOUT 80% OF THESE CASES ACTUALLY HAVE REALTIME E-PATH REPORTING, FACILITATING RAMMED CASE IDENTIFICATION, SUPPORTING THE RESEARCH, AND WE'RE WORKING TOWARDS HAVING REGISTRIES USE A COMMON DATA PLATFORM, SEER DMS, PROVIDING A LOT OF CENTRALIZATION, HELP WITH EXTERNAL PARTNERS AND OTHER PEOPLE WHO WANT TO LINK IN. OF COURSE, FACILITATING SCALING OF NEW INITIATIVES ACROSS ALL REGISTRIES SIMULTANEOUSLY SO WE CAN HAVE MORE NATIONAL IMPACT. SO SOME DATA WE ROUTINELY INCLUDE NOW ARE PATIENT DEMOGRAPHIC INFORMATION, GEOSPATIAL DATA, TREATMENT, I'D LIKE TO ADD A CAVEAT THIS IS FIRST COURSE TREATMENT AS OF RIGHT NOW. OF COURSE INFORMATION ON SURVIVAL AND CALLS OF DEATH. THERE'S A LOT OF INFORMATION ON TUMOR CHARACTERISTICS, AND WE CONSOLIDATE THIS INFORMATION FROM CLINICAL IMAGING AND PATHOLOGY REPORTS. THERE'S A LOT OF BIOMARKERS, 32 WE COLLECT THAT I GAVE A SAMPLING OF THE MOST POPULAR WE GET ASKED ABOUT SO LOTS OF THINGS FOR BREAST, ERPR, HER-2, WE HAVE KRAS, CA 125 FOR OVARY, A QUICK SAMPLE BUT WE HAVE 32 BIOMARKERS WE'RE TRYING TO COLLECT RIGHT NOW. SOME REMINDER, MY ASSOCIATE DIRECTOR ASKED ME TO INCLUDE THIS. REPORTING TO STATE CANCER REGISTRIES IS HIPAA EXEMPT, REQUIRED TO MAINTAIN PII FOR LINKAGE AND FOLLOW-UP. THEY HAVE TO REPORT TO US, LEGALLY PERMITED TO COLLECT FROM HEALTH CARE PROVIDERS ON THE PATIENT. ON CANCER THE TREATMENT AND OUTCOMES. NOW, THE WORD HEALTHCARE PROVIDER CAN MAIN A LOT OF THINGS. SOME STATES ARE MORE LIBERAL, SOME STATES MORE STRICT ABOUT THE DEFINITION. ANYONE WHO IS A HEALTH CARE PROVIDER IN 50 STATES HAS TO REPORT STATES OF TO -- REPORT TO THEIR STATE AND EVENTUALLY BACK TO US. VALUE OF SEER DATA IN THE REAL WORLD, WE HAVE OVER 9300 PUBLICATIONS, BASED ON SEER DATA. OVER 50,000 CITING OR USING SEER AS REFERENCE IN INTRODUCTION OR REFERENCE BENCHMARK. IF YOU JUST DO A QUICK PubMed SEARCH, IT'S FUN TO SEE HOW MANY TIMES SEER IS MENTIONED IN THE INTRODUCTION OF ANY PAPER. WE HAVE 49 GRANTS NOW, $31 MILLION, IN 2018 BASED ON SEER DATA USE. 96 GRANTS WITH OVER $24 MILLION IN SEER USED AS REFERENCE BENCHMARK. WE HAVE 5,000 DOWNLOADS OF PUBLIC USE DATA SET EVERY YEAR. IT'S USED 365, 7 DAYS A WEEK, 24 HOURS A DAY, SO THIS DATA IS FREE. IT'S PUBLICLY ACCESSIBLE AND IT'S OUT THERE. I WANT TO GIVE A LITTLE BIT OF CHEER FOR REGISTRIES AND DATA. IT'S VALUABLE FOR A LOT OF REASONS. THEY REPRESENT THE DATA ON ALL CANCER PATIENTS, DEFINED GEOGRAPHIC AREAS, NOT NECESSARILY JUST FROM A CANCER CENTER OR A HOSPITAL SYSTEM. WE CAN GET MORE INFORMATION ABOUT THE COMMUNITY AND ACADEMIC CANCER CENTERS. I CAN'T STRESS ENOUGH THIS IS REALLY IMPORTANT BECAUSE THEY REPRESENT A NONRANDOM SET OF PATIENTS SO THEY ARE NOT COMING FROM A SINGLE CENTER OR EMR. AND SO SOME OF THAT DOESN'T REFLECT WHAT'S REALLY HAPPENING IN THE 85% OF PATIENTS NOT NECESSARILY BEING TREATED AT THE NCI-DATA SCIENCE EFFORTS -- NCI-DESIGNATESSED CANCER CENTERS. WE HAVE ON AVERAGE FOUR RECORDS FOR EVERY CANCER CASE WE COLLECT. THIS INFORMATION CAN COME FROM HOSPITAL ABSTRACT, PHYSICIAN REPORTS, PATHOLOGY REPORTS I MENTIONED EARLIER, AND DEATH CERTIFICATES FOR FOLLOW-UP INFORMATION. WE HAVE ADDITIONAL SOURCES AT REALTIME DATA FEEDS FROM PHARMACIES, ONCOLOGY PRACTICES, SO IT'S NOT JUST ONE RECORD FOR THE CANCER CASE, IT'S CONSOLIDATION AND COLLECTION OF SEVERAL TO CONSOLIDATE AND MAKE SURE WE IDENTIFY THAT ONE CANCER CASE. WE DO ACTIVE MONITORING OF PATIENTS FROM DIAGNOSIS UNTIL DEATH. A LOT OF DATA IS LACKING IN OUTCOME TO PROVIDE DATA, NOT IT PARTICULAR ON TCGA, THEY ARE DOING A GREAT JOB, WE'RE WORKING WITH THEM, BUT CLINICAL TRIALS, THEY DON'T NECESSARILY HAVE ALL THAT DATA AND WE DID HAVE THAT. WE HAVE STRUCTURED DATA ABOUT EACH PATIENT, THIS IS A GOOD AUDIENCE, BUT IT'S REALLY EXPENSIVE TO CREATE A STRUCTURED DATA SET. WE'RE USING WHAT WE HAVE TO REALLY ENHANCE WHAT WE WANT TO GET. AND OF COURSE THE DATA ARE CURATED AND ADJUDICATED BY TRAINED AND EXPERT PERSONNEL. CAVEAT, IT'S NOT PERFECT, CONSOLIDATION INTO ONE SOURCE, CENTRALIZATION, WE'RE NOW WORKING WITH NATURAL LANGUAGE PROCESSING TECHNIQUES TO AUTOMATE BUT SO FAR THE DATA HAS BEEN SHOWN TO BE PRETTY ACCURATE AND COMPLETE. NOW THAT I PAINTED A ROSY PICTURE I WANT TO TOUCH UPON CHALLENGES IN CANCER SURVEILLANCE. WE'RE IN IT FOR THE LONG TERM, SURVEILLANCE IS A LONG THING. CURRENT MANUAL EXTRACTION PROCESS WE HAVE OVER 215 VARIABLES PER CASE THAT WE DIRECTLY ABSTRACT FROM OUR TRAINED REGISTRARS. IT REQUIRES A LOT OF REVIEW OF COMPONENTS OF EMR AND DATA CAN BE REALLY COMPLICATED. STAGING IS A GREAT EXAMPLE OF SOMETHING THAT EVERYBODY NEEDS TO KNOW ABOUT AND EVERYONE USES AS IMPORTANCE FOR TREATMENT, FOR PROGNOSIS, JUST KIND OF FOR PATIENTS TO UNDERSTAND WHERE THEY ARE IN THE SPECTRUM OF WHAT THEY WANT TO DO FOR THEIR TREATMENTS AND EVERYONE KNOWS WHAT CANCER STAGE MEANS AS A CONCEPT. AND THERE ARE 118 DIFFERENT TYPES OF STAGING SCHEMAS, AND WE HAVE DIFFERENT CLASSIFICATION DEPENDING, SO REGISTRARS ARE TRAINED ON THIS TRYING TO CREATE CROSS-WALKS BETWEEN THE DIFFERENT STAGING SYSTEMS AS WE GO ALONG. CLINICIANS SPECIALIZE IN A SINGLE ORGAN SYSTEM, LIMITED DIVERSITY WHO THEY STAGE, WE HAVE REGISTRARS, A LOT HAVE A LOT OF MEDICAL HISTORY BACKGROUND TRAINING, THEY TAKE A LOT OF COURSES TO TO THIS, THERE'S A LOT OF CONTINUING EDUCATION THEY HAVE REQUIRED. THERE'S A LOT OF INFORMATION TO LEARN. THERE'S A LOT OF CODING THAT WE PROVIDE THEM TO DO THIS WORK, BUT IT'S A LOT OF WORK. ONE OF MY FAVORITE GRAPHICS, NAVIGATING HEALTH CARE LAND. SO THIS IS FROM MICHAEL HANKENS. IT EXEMPLIFIES HOW COMPLICATED THE HEALTHCARE SYSTEM IS PARTICULARLY IN THIS COUNTRY, WHERE THE PATIENTS MIGHT RECEIVE CARE, LIMITED OR NO ACCESS BY THE REGISTRARS TO THAT PARTICULAR DATA, SOMETHING WE NEED TO CONSIDER AS WE MOVE FORWARD. AND JUST A COUPLE FUN THINGS AS YOU ALL KNOW, RAPID CHANGE OF CANCER DIAGNOSIS AND TREATMENT, THERE'S INFORMATION ABOUT LIQUID BIOPSY CHANGING THE WAY WE DIAGNOSE CANCERS, THE WAY WE FOLLOW OUR PATIENTS. THERE'S THE DIGITAL IMAGING FOR PATH AND RADIOLOGY AS YOU ALL HEARD FROM THIS MORNING AND WILL CONTINUE TO HEAR THE REST OF THE DAY. THE FEATURES FROM THES IMAGES ARE NOT NECESSARILY WELL CAPTURED, THESE ARE THINGS WE NEED TO THINK ABOUT. THE INCREASING PACE OF NEW THERAPIES THAT ARE BEING APPROVED, AND SO THIS IS A QUICK SNAPSHOT BUT BASICALLY IN 2018, 75 NEW DRUGS WERE APPROVED BY THE FDA FOR USE, 90 I.V. MEDICATIONS, SO EVERY YEAR WE HAVE TO RETRAIN OUR REGISTRARS AND GET THIS INFORMATION. HERE IS A LIST OF 90 NEW DRUGS THAT CAME OUT, IF YOU SEE ANY OF THESE PLEASE PUT THIS INTO THE ABSTRACT. AND SO WE'RE WORKING ON DEVELOPING NEW TRAINING MATERIALS, WORKING ON ACTUALLY TRYING TO HAVE A COLLABORATION WITH THE FDA TO MAKE IT EASIER FOR US. A FUN PITCH, WE JUST LAUNCHED A NEW DATA BASE CALLED CANMED WHICH TAKES THE ALL-APPROVED FDA DRUG LIST, BASICALLY SINCE THE INCEPTION OF SEER, SO 1975, A LITTLE LATER I THINK, IF YOU LOOK UP ANY DRUG OR LOOK UP, IF YOU HAVE A CODE NUMBER OR PICK NUMBER, YOU CAN LOOK UP DRUGS, PACKAGING, WHEN APPROVED, DISCONTIUED, IT'S A ONE-STOP SHOP IF YOU WANT TO KNOW ANYTHING ABOUT CANCER DRUGS, WE NOW HAVE A DATA SOURCE FOR THAT. DATA IS NOT ALWAYS TIMELY. WITH RAPID PATES AND -- RAPID PACE, THE DELAY WITH CAUSE A PROBLEM AND IMPEDE VALUE OF THE DATA ITSELF. LIKE I SAID EARLIER, THE REGISTRARS DON'T ALWAYS HAVE ACCESS TO ALL OF THE APPROPRIATE INFORMATION. THERE'S A LOT OF NEW TREATMENTS THAT ARE BEING DELIVERED IN THE OUTPATIENT SETTING, NOT NECESSARILY IN THE HOSPITAL. PARTICULARLY ORAL CHEMOTHERAPY, TYROSINE INHIBITORS, NOT ALWAYS IN THE HOSPITAL SETTING, HARD TO CATCH WHEN THEY ARE NOT THERE. TESTING FOR VARIOUS THINGS, FOLLOW-UP MIGHT BE DELIVERED OUTSIDE OF THE HOSPITAL SETTING AS WELL. WE'RE TRYING TO FIGURE OUT AVENUES TO GO BACK AND REACH OUT TO THOSE FOLKS AND TRY TO GET THAT DATA BACK IN. OF COURSE PEOPLE MILD BE HAVING MULTIPLE COURSES OF THERAPY OVER THE YEARS. IT'S NOT JUST LIKE A ONE STOP, YOU HAVE CHEMOTHERAPY FOR SIX CYCLES. YOU MIGHT HAVE MORE. YOU MIGHT PAUSE TREATMENT, WHAT HAVE YOU. ASIDE FROM WHETHER OR NOT PATIENTS ARE SURVIVING, WE WANT TO CAPTURE RECURRENCE. CANCER IS CHRONIC AND REQUIRES LONG-TERM MEASURES OF OUTCOME. OUR FOCUS HAS BEEN ON RECURRENCE. AND WE WANT TO CAPTURE SUBSEQUENT COURSES OF THERAPY AND GET INFORMATION ON THE COMORBIDITY CONDITIONS THAT MIGHT BE IMPACTING AND MIGHT BE RESULTING FROM THE THERAPY. A PLUG, WHY WE WANT TO CAPTURE RECURRENCE, WE HAVE ABOUT 17 MILLION CANCER SURVIVORS IN THE UNITED STATES ALONE. THAT'S 5% OF OUR POPULATION. LACK OF RECURRENCE INFORMATION IS NO LONGER ACCEPTABLE. MANY TRIALS ARE FOCUSED ON RECURRENT DISEASE, FOCUSING ON CANCERS WITH HIGHEST MORTALITIES, LIKELY TO MANIFEST WITH RECURRENCE OR METASTATIC DISEASE, WE'VE BEEN FOCUSING ON PANCREAS, OVARIAN, MELANOMA, GLIOBLASTOMA. BUT THIS IS A NICE GRAPHIC BY DR. DESANTIS AND OTHERS THAT PROJECTS WHERE A NUMBER OF CANCER SURVIVORS ARE GOING SO WHILE IT'S WONDERFUL WE'RE DOING A GREAT JOB PROVIDING THE TREATMENT THAT CANCER PATIENTS NEED, WE'RE INCREASING THAT POPULATION. AS OF RIGHT NOW THEY ARE NOT NECESSARILY WELL STUDIED SO THIS IS ANOTHER AREA OF RESEARCH THAT WE NEED TO START DIGGING INTO. WE HAVE A COUPLE THOUGHTS TO ENHANCE SEER DATA. WE WANT TO CREATE THE SYSTEM, REPRESENTING POPULATION LEVEL REAL WORLD DATA, TO SUPPLEMENT THE CLINICAL TRIALS INFORMATION, AND THAT WAY WE CAN UNDERSTAND EFFECTIVENESS OF ONCOLOGY CARE FOR THE 95% OF PATIENTS WHO AREN'T IN A CLINICAL TRIAL SETTING. SO WE'RE TRYING TO TAKE INCREMENTAL APPROACH, AND DOING SEVERAL DEMONSTRATION PILOTS SO WE'RE TESTING PILOTS ABOUT OF WE SCALE TO THE WHOLE SEER PROGRAM, WE'RE USING THIS TIME TO UNDERSTAND BARRIERS AND SOME OF THE GOOD THINGS. AND HOPEFULLY WE CAN SCALE UP AND REALLY CREATE THAT LONGITUDINAL PICTURE FOR THE PATIENT'S TRADITIONAL TRAJECTORY AND CANCER CARE AND TREATMENT. SOLUTIONS AND PROCESS, WE'RE DOING A LOT OF LINKAGE, DOING A LOT OF THINGS IN THE SEER PROGRAM AT THE NCI BUT I WANT TO FOCUS ON THE THIRD POINT, LEVERAGING ACTIVITIES THAT WE HAVE THROUGH COLLABORATIONS WITH EXTERNAL PARTNERS, WORKING WITH COMMERCIAL AND PUBLIC. SO JUST IN CASE YOU THOUGHT SEER MIGHT BE BORED. THIS IS SEVERAL DATA SOURCEES WE CURRENTLY USE TO COME IN FOR CONSOLIDATED CASES AND DATA SOURCES CURRENTLY BEING PILOTED. SO WE'RE STARTING TO COLLECT SOME INFORMATION THAT'S GOING TO PERMIT WHAT WE CAPTURE FOR RADIATION TREATMENT. WE WANT TO EVALUATE SOME OF THE STANDARDS OF CARE IN THIS PRACTICE OVER TIME, AND SO WE'RE TRYING TO FIGURE OUT THE TRENDS IN A MORE CLINICALLY RELEVANT CATEGORIZATION, MAYBE WE CAN DO AUTOMATED WORK AS WELL. AND SO LIKE I SAID WE ONLY CAPTURE INITIAL COURSE OF TREATMENT RIGHT NOW BUT WHAT WE'RE TRYING TO DO IS WORK TO INCREASE DETAIL INCLUDING DOSE, MODALITY, AND HOPEFULLY CAN AUTOMATICALLY PROVIDE INITIAL ANT SUBSEQUENT COURSES OF TREATMENT AND GIVES US THE OPPORTUNITY TO IDENTIFY TREATMENT OF RECURRENT DISEASE AS WELL. SO THIS IS THE BORING PART. WE'RE DEVELOPING A PROTOCOL. WE CAN'T DO ANYTHING WITHOUT A PROTOCOL. WE'RE DESCRIBING DATA ITEMS WE WANT TO LINK TO SEER, TRYING TO DEVELOP METHODS FOR STANDARDIZATION, FOR LINKAGE, FOR ANALYSES TO REPORT THIS OUT, PROVIDING A REFERENCE FOR DATA USERS, WE'RE GOING TO USE THIS INFORMATION, BASICALLY PUT OUT RECOMMENDED USES AND PROVIDE GUIDANCE. DESCRIBE PROCESS FOR SCALING. I THINK PEOPLE TOUCHED ON THIS EARLIER, WHAT ARE SOME CHALLENGES OF DOING ALL THAT? IT'S REALLY HARD TO GET ALL THE PEOPLE IN THE SAME ROOM AT THE SAME TIME AND ON BOARD SPEAKING THE SAME LANGUAGE. IT'S A A LITTLE BIT LIKE CAT HERDING, AND MULTIPLE PHONE CALLS, E-MAILS, CONFERENCE CALLS, MEETINGS LIKE THIS. WE NEED EVERYBODY, CLINICIANS, RESEARCHERS, DATA SCIENTISTS, REGISTRY FOLKS, THE FOLKS WHO KNOW HOW TO PUT DATA BASES TOGETHER AND OF COURSE WE NEED OUR COMMERCIAL PARTNERS SO IT'S NOT LIKE A ONE-STOP SHOP. IT'S A MULTI-PROCESS, MULTI-MEETING, AT THIS POINT MULTI-YEAR ENDEAVOR. AND WHEN SOMEONE SAYS DATA OR LINKAGE, IT MEANS SOMETHING TOTALLY DIFFERENT TO WHOEVER YOU'RE TALKING TO SO PART IS CREATING A DATA DICTIONARY THAT EVERYONE IS SPEAKING THE SAME LANGUAGE. AND THEN EVEN IF WE'RE ABLE TO DO ALL OF THIS, ARE WE EVEN SUPPORTING THE RIGHT KINDS OF ANALYSES THAT ARE ANSWERING THE MOST PRESSING RESEARCH QUESTIONS FOR PATIENTS AND CLINICIANS. BUT THERE ARE OPPORTUNITIES FOR SUCCESS. SO WE'RE PILOTING RIGHT NOW WITH ASHCO, VARIAN AND ELEKTA. WE'RE WORKING ON ANALYSIS RIGHT NOW. THERE'S KAISER PERMANENTE WASHINGTON GROUP AND RUTGERS AND NEW JERSEY REGISTRY, DATA IS FORTHCOMING. AND WE'RE IN DISCUSSIONS WITH SEVERAL OTHER INSTITUTES HERE AS YOU CAN READ OFF THE PAGE. ONE SUCCESSFUL LINKAGE CAN HELP US BROADEN AND SCALE UP THROUGHOUT SEER. JUST FOR FUN I WANT TO POINT OUT A COUPLE THINGS, ASHEVILLE MINIMAL DATA ELEMENTS LIST, THERE'S SIDES OF THIS, BUT SHOWS WHAT THE DATA ELEMENT IS AND WHETHER OR NOT ASTRO IS PUTTING IT ON MINIMAL DATA LIST. AS YOU CAN SEE, WE'RE MEETING WHAT THE MINIMAL DATA SETS ARE, FOR SOME OF THE MORE IMPORTANT STUFF. SO RADIATION MODALITY, NUMBER OF FRACTIONS DELIVERED, TOTAL DOSE DELIVERED, TOTAL DOSE PLANNED, THINGS OF THAT SORT. TREATMENT SITE, TREATMENT TECHNIQUE. THINGS OF THAT SORT. JUST FOR FUN I WANT TO GIVE AN EXAMPLE OF DETAILED DATA WE'RE CAPTURING FROM OUR FIRST PILOT, YOU CAN SEE THE FREQUENCY WE'RE SEEING, WHAT ICDO CODE IS, SPECIFIC CANCER SITE, RADIATION SITE. SIMILAR, TIME OF MODALITY USED, AND WE'RE TRYING TO DO AUTOMATED CAPTURE TO GET MORE DETAILED INFORMATION ON THIS FOR INITIAL AND SUBSEQUENT. SO JUST TO FINALIZE FOR PUBLIC HEALTH IMPLICATION, STANDARDIZATION, CREATING OF EFFICIENT WAYS TO REPORT DATA TO CANCER REGISTRIES WILL BE HELPFUL IN THE LONG RUN, FEEDS INTO EACH OTHER. THE BETTER DETAILED, BETTER STUDIES, BETTER TREATMENT GUIDELINES AND BETTER TREATMENT FOR CANCER PATIENTS. AND WE CAN GET MORE INFORMATION ON OUTCOMES SUCH AS RECURRENCE, METASTASES, PROGRESSION, THINGS OF THAT SORT. THE POTENTIAL OF THIS COLLABORATION, IT'S COMBINED EFFORTS OF ALL OF OUR VOICES AND MANY PERSPECTIVES, SOMETHING WE'RE TRYING TO EMPHASIZE AND WORK WITH. SO THANKS, EVERYONE, ESPECIALLY THE ORGANIZERS, AND ASKING US TO BE HERE. WE ARE REALLY BEHIND THIS EFFORT. I HAVE A VERY PASSIONATE AND ENTHUSIASTIC ASSOCIATE DIRECTOR SPEARHEADING EFFORTS ON SEER SO WE WOULD LIKE TO SEE THIS MOVE FORWARD. THANK YOU VERY MUCH. [APPLAUSE] >> THANKS, CLARA. THAT PRESENTATION REALLY SERVES AS A NICE SEGUE TO THE NEXT TALK. WHEN I ATTENDED A TALK LAST WEEK THERE WAS A COMBINED FDA/AACR PRESENTATION ON USE OF REAL WORLD EVIDENCE. ONE OF THE CHALLENGES ARE DATA STANDARDS. IN PARTICULAR, WHEN THEY ABSTRACT DATA FROM REAL WORLD SOURCES, THE WAY THAT A PARTICULAR FIELD MIGHT BE ENCODED CAN BE, YOU KNOW, VARIED. FOR EXAMPLE, THEY MENTIONED PLATELET COUNT IS REPORTED 80 DIFFERENT WAYS IN A SINGLE ELECTRONIC MEDICAL RECORD THAT THEY LOOKED AT. SO YOU CAN IMAGINE THAT THESE COMPANIES LIKE THE FLATIRONS OF THE WORLD, YOU THINK EVERYTHING IS HAPPENING ELECTRONICALLY BUT THEY HAVE A SMALL ARMY OF TUMOR REGISTRARS MANUALLY STANDARDIZING EVERYTHING. AGAIN, TO HELP TO ALLOW THAT CLEAN DATA TO COME IN AND MAKE SENSE OF IT. I THINK THAT THAT'S SOMETHING OF GREAT INTEREST WHEN WE THINK ABOUT IMAGING AS WELL, OUR NEXT PRESENTATION WILL SPEAK TO THAT. DR. CAR LINE CHUNG, DIRECTOR OF ADVANCED IMAGING, DIRECT IMPROVE OR OF IMAGING TECHNOLOGY, M.D. ANDERSON. >> IT'S BEEN A GREAT MEETING SO FAR. I'M HOPING TO GIVE INSPIRATION AS WELL AS SOBERING THOUGHTS ON THIS TOPIC. SO THESE ARE MY DISCLOSURES. WE WANT TO USE THE DATA WE'RE COLLECTING, A TON IS BEING COLLECTED FROM THE INITIAL OUTSET OF THE BASIC SCIENCES, TALKING ABOUT PRE-CLINICAL WORK THAT'S TRANSLATED INTO EVIDENCE, CLINICAL TRIALS, DEVELOPING IMPLEMENTATION INTO THE REAL WORLD. AND THE BIG PIECE, THE PUBLICATION IS BACK FROM 2013, TALKING ABOUT ALL THE MISSED OPPORTUNITIES, WASTES AND HARM. THIS IS A BIG MOTIVATION AROUND THE DATA SCIENCE PLATFORM. ONE OF THE KEY PIECES IS STANDARDIZE COLLECTION AS WELL AS ABILITY TO HAVE AN OPEN SCIENCE PLATFORM. SOCIAL CHALLENGE OF BEING ABLE TO SHARE THIS DATA AND MANY SPEAKERS TALKED ABOUT THIS ISSUE ALREADY. IN TERMS OF IMAGING DATA, THIS IS THE FUTURE OF IMAGE UTILIZATION. A LOT OF DATA IS EVALUATED QUALITATIVELY, RELATIVE. TUMORS ARE GROWING OR SHRINKING, THINGS ARE BIG OR SMALL. THEY ARE CATEGORICALLY DESCRIBED AS SUSPICIOUS LESION DESCRIBED AS. THIS IS DEPENDENT ON HUMAN EXPERTISE. THERE'S CLINICAL INPUT BUT OFTEN INPUTS ARE UNIDENTIFIED FACTORS WHICH MAKES IT DIFFICULT FOR US TO INTERPRET HOW THAT PERSON HAS GOTTEN TO THIS CONCLUSION. WE WANT TO GET INTO THE QUANTITATIVE ABSOLUTE SPACE, BECAUSE THIS WILL INCREASE POTENTIAL FOR AUTOMATION AND STANDARDIZATION, BUT IF YOU THINK ABOUT THE NUMBER OF DATA POINTS WE'RE ALSO GAINING WITHIN THIS TRANSITION, FROM THE QUALITATIVE TO QUANTITATIVE, IT'S EXPONENTIAL. THINK OF THE NUMBER OF THE VOXELS IN THAT IMAGE, TAKE THE RAW IMAGE, THAT'S DIFFERENT THAN QUALITATIVE RADIOLOGY READ SAYING SOMETHING IS RESPONDING, CATEGORIZED RESPONSE IN DIFFERENCE WAYS, SO THERE'S A LOT OF POTENTIAL MOVING TOWARDS QUANTITATIVE, A LOT OF EFFORT INVOLVED. EVENTUALLY WITH AUTOMATION AND STANDARDIZATION EFFORTS WE'LL INCREASE CONSISTENCY BUT CURRENTLY LACK CLINICAL INPUT. I'LL WALK YOU THROUGH WHAT I THINK WE CAN DO TO MOVE THE CLINICAL INPUT INTO THE QUANTITATIVE SPACE. THIS IS THE PROMISE MUCH RADIOMICS. IT CAN PREDICT SURVIVAL PROBABILITY, A SEPARATION OF SURVIVAL CURVES, THIS IS GREAT. THE CHALLENGE HERE IS WE CAN PUBLISH THESE PAPERS, EVERYONE GETS EXCITED, BUT CAN ANYONE REPRODUCE IT? WHAT IS THAT REPRODUCIBILITY? WE'RE TALKING ABOUT REAL WORLD DATA, THAT'S A GREATER CHALLENGE. LET'S TEASE OUT THE ACTUAL RADIOMICS WORK FLOW AND WHAT ARE EACH OF THESE STEPS THAT ARE TAKEN? WELL, THERE'S THE IMAGE ACQUISITION, IMAGES ARE ACQUIRED AND CURRENTLY IN THE RADIOLOGY WORLD IMAGES ARE ACQUIRED IN QUITE A HETEROGENEOUS WAY EVEN IN A CLINICAL TRIAL. I'M ON THE IMAGING COMMITTEE AT THE ALLIANCE OF CLINICAL TRIALS, A SERIES OF IMAGES COME IN ON A CLINICAL TRIAL WITH A PROTOCOL, WITH A STUDY GUIDE, HOW THEY ACQUIRE THE IMAGES AND THE COMPLIANCE RATE TO THAT CLINICAL TRIAL IMAGE ACQUISITION PROTOCOL WAS ABOUT 20%. SO THESE ARE PATIENTS ON CLINICAL TRIALS. IF YOU THINK ABOUT THAT FOR PATIENTS SUPPOSED TO BE ON PROTOCOL CAN YOU IMAGINE VARIABILITY IN IMAGE ACQUISITION IN THE REAL WORLD. THAT'S ONE CHALLENGE. THE NEXT IS DATA CONVERSION, IMAGE POST-PROCESSING. THIS IS NOT MAGIC. YOU CAN'T FIX EVERYTHING IN THE STEP. YOU HAVE TO TAKE FIRST MILE STANDARDS, YOU CAN'T TAKE IT AT THE LAST MILE. THERE'S DIFFERENCES IN ALGORITHMS USED, AND THEN THERE'S SEGMENTATION. IMAGE INTERPOLATION. YOU EXACT RADIOMIC FEATURE. EVERY STEP ALONG THIS WAY WILL AFFECT THIS RADIOMIC FEATURE THAT'S EXTRACTED. WE TAKE THE RADIOMIC FEATURE AND CORRELATE TO CLINICAL OUTCOMES OR GENETIC MOLECULAR PROFILES. THERE HAVE BEEN STANDARDS, EFFORTS IN SOME AREAS, FOR INSTANCE CLIA STANDARDS MADE A BIG LEAP IN FRONT IN TERMS OF LABORATORY STANDARDS OF WHAT WE'RE ACTUALLY MEASURING IN TERMS OF BIOFLUID BIOMARKERS. WE STAND TO STANDARDIZE THE ENDPOINT. TUMOR RESPONSE, WE'RE CATEGORICAL 2D IMAGING MEASUREMENTS. TAKE RESPONE WHITE PAPERS THAT RECOMMEND WE SHOULD MOVE TO 3D, GOES TO SEGMENTATION, THERE'S A CHALLENGE IN SEGMENTATION AND I'LL GO THROUGH THE CHALLENGES WE FACE TODAY. THE ROLE OF TUMOR SEGMENTATION IS INCREASING. TUMOR RESPONSE IS -- SEGMENTATION CAN BECOME THE OUTCOME MEASURE. IF WE GET THE OUTCOME MEASURE WRONG ALL OF THIS THAT I'VE BEEN TALKING ABOUT IS ACADEMIC EXERCISE BECAUSE WE'RE RELATING TO THE OUTCOME. SO WE REALLY NEED TO GET THE SEGMENTATION RIGHT IF WE WANT TO START MOVING TOWARDS 3D VOLUMETRIC RESPONSE ASSESSMENT. SECOND PIECE, MAJORITY -- A LOT OF RADIATION ONCOLOGISTS ARE IN THE ROOM, AWARE TUMOR SEGMENTATION DEFINES THE TARGET FOR THERAPIES, TRUE AS WE MOVE TOWARD IMAGE GUIDED THERAPY, INCLUDING I R, IMAGE GUIDED SURGERY, AS WELL AS OTHERS. THERE ARE MANY MEASUREMENTS DEFINED ON A REGION OF INTEREST, INCLUDING RADIOMICS. THAT SEGMENTATION DRIVES A LOT OF BIOMARKERS THAT WE SAY WE'RE PULLING IN A QUANTITATIVE FASHION. STANDARDIZING THAT SEGMENTATION IS A CRITICAL STEP FOR QUANTITATIVE EVALUATION OF TREATMENT AND BIOMARKERS. SO LET'S LOOK THROUGH THIS AND LOOK AT THIS -- OOPS, SORRY. LET'S LOOK AT INPUT DATA. THE FORTUNATE THING FOR US IN RADIATION ONCOLOGY, WE WORK WITH THE SAME INPUT DATA AS WE DO FORAYED FOR RADIOMICS. WE HAVE TO RECOGNIZE VARIABILITY AT EACH STEP INTRODUCES A LAYER OF CHALLENGE IN VALIDATING, INTERPRETING, AND CLINICALLY IMPLEMENTING EACH OF THESE TOOLS, WHETHER IT'S TREATMENT PLANNING AND ADAPTIVE TREATMENT PLANNING, WHAT ARE WE ADAPTING TO, RADIOMIC FEATURE EXTRACTION, CLINICAL BIOMARKER EXTRACTION. BOTH STANDARDIZATION IS KEY BUT TRANSPARENCY IS ALSO KEY. ACKNOWLEDGING WHERE WE HAVE SHORTFALL AND EXPRESSING WHAT WE'VE DONE HELPS US WORK TOGETHER. SO HOW CAN WE EVALUATE, HOW GOOD IS DATA GOING IN AND HOW DO WE MAKE DATA QUALITY BETTER? I'VE TALKED ABOUT THE ISSUES WITH THE IMANNUAL IMAGE ACQUISITION. THERE ARE MANY EFFORTS TO MEASURE, AS WELL AS IMPROVE, REPEATABILITY AND REPRODUCIBILITY. THERE'S THE ALLIANCE THAT WORKS WITH INDUSTRY AND DIFFERENT COLLABORATIVE PARTNERS TO STANDARDIZE, CREATE METRICS ON HOW TO DO THAT. MORE DETAIL IN THE NEXT SLIDE. MY CO-CHAIR, AND IF YOU'RE INTERESTED TALK TO ME AFTER THE SITUATION. RADIATION DELIVERY, VARIOUS IMAGING MODALITIES, RADIATION SIMULATION. THERE ARE PHANTOMS AROUND, I'VE GIVEN YOU SOME EXAMPLES. THIS IS A QIBA PHANTOM. THERE'S DIFFERENT TISSUE CHARACTERISTICS IN EACH PANEL. THERE ARE PROTOCOLS IN THE CLINICAL TRIAL SPACE. SOME OF THESE GROUPS ARE WORKING WITH VENDORS TO HOPEFULLY IMPLEMENT AND EMBED THAT TECHNOLOGY TO HELP US STANDARDIZE THIS ACQUISITION. THE QUANTITATIVE IMAGING BIOMARKER ALLIANCE PROVIDES DOCUMENTS THAT ARE DATA DRIVEN, PUBLISHED DATA DRIVING PROFILES, THAT'S A DOCUMENT THAT TELLS THE USER WHAT QUANTITATIVE RESULTS CAN BE ACHIEVED BY FOLLOWING THE PROTOCOL. OR THE PROFILE. ONE PURPOSE OF PROFILE IS ADVISE VENDORS WHAT GOAL THEY NEED TO IMPLEMENT IN THEIR PRODUCTS. AS WELL AS COMMUNICATE NECESSARY PROCEDURES TO USE TO ACTUALLY FOLLOW THE PROFILE AND UNDERSTAND WHAT QUANTITATIVE MEASURE MEANS. EXAMPLE IS CT VOLUMETRY, A COMMITTEE IS TRYING TO MEASURE TUMOR AND QUANTIFY TUMOR VOLUME ON CT, WE HAVEN'T GOTTEN FAR IF THERE'S A WHOLE COMMITTEE. NOW, MOVING INTO A MORE DIFFICULT SPACE IN TERMS OF MRI, THERE ARE OPPORTUNITIES. SO MRI PROVIDES BETTER SOFT TISSUE CONTRAST FOR TARGET AND REGION OF INTEREST DELINEATION, THERE ARE MANY PUBLICATIONS SUGGESTING WHEN YOU INCORPORATE MRI TO CT THE CONSISTENCY OF MANUAL SEGMENTATION ACROSS-OBSERVERS IMPROVES AND CAN PROVIDE INFORMATION THAT MIGHT DRIVE THE BIOMARKER DISCOVERY AS WELL AS RADIOMIC SIGNATURES A STEP FURTHER. THERE'S POTENTIAL TO DO THAT. CHALLENGES ARE THAT PATIENT AND POSITIONING AND RADIOTHERAPY CAN BE CHALLENGING WITH MRI, COILS ARE NOT DEVELOPED FOR PATIENTS IN MOBILIZATION, WE'RE OVERCOMING THAT IN MANY SETTINGS. WE'RE NOW SIMULATING IN TREATMENT POSITION AND IMMOBILIZATION. WE'RE OPTIMIZING PULSE SEQUENCES TO ENSURE GEOMETRIC ACCURACY, A NON-ISSUE FOR RADIOLOGISTS, THEY ARE RELATIVELY LOOKING AT RELATIVE ANATOMY IN SPACE. THEY DON'T APPRECIATE THE NEED FOR GEOMETRIC ACCURACY. WE DO BECAUSE WE NEED TO KNOW THE RADIATION WILL DELIVER TO THAT SPACE IN THAT TISSUE. NOW, THE BENEFIT OF THIS, BECAUSE WE'RE POISE AND THINKING IN THIS SPACE QUANTITATIVE IMAGING SPACE IS WELL ALIGNED BECAUSE THAT QUANTITATIVE IMAGING SPACE RELIES ON GEOMETRIC ACCURACY AND THE RADIOLOGY WORLD IS CATCHING UP AND WE CAN HELP PULL THEM ALONG. WE CAN HELP ENABLE THEM TO ACTUAL I HAD UNDERSTAND THIS BETTER. FINALLY THE CLINICAL EDUCATION AND STANDARDIZATION, IT'S WIDELY USED, IN A QUANTITATIVE SPACE WITHIN RADIATION ONCOLOGY IT'S STILL A RELATIVELY NEW FIELD. THIS IS AN OBVIOUS IMAGE, GREAT WORK FROM DAVE FULLER IN THE AUDIENCE HERE, LOOKING AT HEAD AND NECK CANCERS. YOU CAN SEE IN A STANDARD HEAD COIL THE PATIENT IS MOVING. YOU WANT TO DELINEATE AND GET QUANTITATIVE BIOMARKERS, YOU'D RATHER USE THE IMAGE ON THIS SIDE. THIS IS EXAMPLE OF USE OF FLUX COIL IN RADIATION IMMOBILIZATION. WE'VE TAKEN THE SAME APPROACH ACROSS THE DIFFERENT BODY SITES AT M.D. ANDERSON WITH A LARGE TEAM OF PEOPLE I'LL SHOW AT THE END. THIS IS RELEVANT FOR RADIO MIX AND QUANTITATIVE IMAGING. THESE ARE JUST EXAMPLES OF DIFFERENT PHANTOMS THAT RADIOTHERAPY CENTERS COMMONLY USE, THE SPACE IS STARTING STARTING TO ADOPT. MAKE SURE THEY ARE USING THESE BECAUSE WE'VE ALREADY ALIGNED WITH THEM IN TERMS OF RADIATION TREATMENT PLANNING. IN TERMS OF CLINICAL TRIALS, THE REASON -- I'LL GIVE YOU A BACK STORY BEHIND MOTIVATION, THE REASON WAS BECAUSE THERE WERE MULTIPLE LANDMARK PHASE 3 TRIALS WHERE THE PROGRESSION-FREE SURVIVAL IN GLIOBLASTOMA, A TUMOR THAT IS NOT THOUGHT TO DISSEMINATE OR METASTASIZE, THE PROGRESSION-FREE SURVIVAL DID NOT PREDICT FOR OVERALL SURVIVAL AT ALL. SO THE ONLY CONCLUSION, OUR PROGRESSION FREE SURVIVAL MEASUREMENT IS NOT VERY IF. IF. WE REALIZE THE T1 AND T2, EVEN ON T1 AND T2 ANATOMICAL IMAGE, IT'S DEPENDENT ON SEQUENCE PARAMETERS. THAT IMAGE QUALITY IMPACTED OUR ABILITY TO MEASURE TUMOR RESPONSE. AND THIS IS TUMOR RESPONSE ON 2D, NOT EVEN 3D SEGMENTATION. YOU CAN IMAGINE HOW MUCH MUCH IMPACT HAVE ON 3D SEGMENTATION WHICH COULD IMPACT RADIOTHERAPY, AS WELL AS RADIOMICS. WE NEED TO START STANDARDIZING TO MAKE MEANINGFUL DIFFERENCES. EVEN IF WE HAD ALL THE SAME IMAGING DATASETS, THIS IS WORK I'VE DONE WITH A GRAD STUDENT, WE DISSEMINATED THE SAME SET OF IMAGES FOR SEVERAL CASES, THIS IS ONE OF THE EXAMPLES OF THE CASE, WE SENT T1 AND T2 WEIGHTED IMAGES AND OF TO DELINEATE THE TUMOR. YOU CAN SEE THE PURPLE VOLUME HERE, 50% AGREEMENT VOLUME, AND PINK IS 100% AGREEMENT VOLUME. 100% ENCOMPASSING VOLUME. THIS IS A 100% AGREEMENT VERSUS ALL ENCOMPASSING VOLUME, WIDE VARIABILITY. THESE ARE PEOPLE TREATING PATIENTS WITH GAMMA KNIFE RADIO SURGERY, ONE OF THE MOST PRECISE WAYS OF DELIVERING RADIATION. THE AVERAGE EXPERIENCE OF THE GROUP WAS NINE YEARS. THESE WEREN'T NOVICES. QUITE SCARY, MOTIVATES US TO INCREASE ATLAS-BASED EDUCATION BUT AS WELL AS DEVELOPMENT OF AUTOMATED SEGMENTATION TOOLS TO IMPROVE CONSISTENCY. VARIABILITY CAN IMPACT RADIOMICS. THIS IS 46 PATIENTS, NON-SMALL CELL LUNG CANCER. BASED ON SEGMENTATIONS THEY LOOKED AT 89 RADIOMIC FEATURES, THE NUMBER OF SIGNIFICANT FEATURES VARIED FROM 3 TO 5 TO 0, JUST BASED ON THE CONTOURS, IT CAN MAKE A HUGE DIFFERENCE. LET'S SAY WE DEAL WITH ALL OF THE GARBAGE IN PIECE AND ADDRESS IT AND MANAGE TO DO THAT OVER THE NEXT FIVE YEARS, EVEN WITH THE PERFECT DATA WITH BAD MODELS WE MAY END UP WITH BAD RESULTS. HERE IS AN EXAMPLE, THIS IS A PROJECT WHERE WE SCRATCHED OUR HEADS BECAUSE WE DIDN'T KNOW WHAT TO DO WITH THE RESULTS AND PUBLISHED TO SHARE. HEAD AND NECK DCE IMAGES WERE CIRCULATED TO 11 DIFFERENT MODELS, 9 INSTITUTIONS PLUS SOME COMMERCIAL MODELS. THEY ALL SAID THEY WERE USING THE SAME TYPE OF MODEL, BUT YOU CAN SEE THE MAPS WERE VARIABLE. MAYBE THEY DIDN'T INTERPRET THE CLINICAL DATA CORRECTLY OR THERE WAS SOMETHING GOING ON WHERE PEOPLE WEREN'T UNDERSTANDING OUR INSTRUCTIONS. WE TOOK THE QIBA SUBJECTS AND GAVE DATA IN A NO-NOISE SCENARIO, WE CAN SEE VARIABILITY AND KTRANSVALUES. RED IS OVERESTIMATION, BLUE IS UNDERESTIMATION, VERY IT'S -- VERY LITTLE GREEN. WE WERE EXPECTING 20, 25% VARIABILITY. WE'RE TALKING COMPLETE DIFFERENT DIRECTIONS. YOU'RE TALKING ABOUT POSITIVE TO NEGATIVE, 100% OVERESTIMATION, 100% NEGATIVE ESTIMATION. PRETTY SCARY. RADIOMICS, SAME IDEA. WHEN YOU LOOK AT THIS, THIS IS A GREAT PAPER FROM 2017, SUGGESTING THAT 88% OF RADIOMIC FEATURES WERE NOT REPRODUCIBLE WHEN IT STARTED FROM TWO SOFTWARE IMPLEMENTATIONS, SO THE VARIABILITY AND REPEATABILITY AND REPRODUCIBILITY IS SO VARIABLE WE NEED TO BE THOUGHTFUL ABOUT WHAT WE'RE DOING. ONE THING I WOULD PROPOSE WE MAY NEED TO DO AS A FIELD IN TERMS OF RADIOMICS IS IN THE BASIC SCIENCE WORLD WE SAY WHEN YOU DO AN EXPERIMENT, DO IT OVERFITTING MULTIPLE CELL LINES AND MODELS, PERHAPS WE NEED THE SAME APPROACH OF THE DATA SCIENCE FIELD. WE TRIED TO REPRODUCE, A GREAT GRADUATE STUDENT WAS LOOKING AT MODEL FITTING, TAKING IT BACKWARDS, TAKING A MODEL ALREADY PUBLISHED, LUNG DATA, HEAD AND NECK MODEL, AND TOOK THE SAME FEATURES THAT WERE EXTRACT AND TRIED TO RECREATE THE FINDINGS. SHE DID THIS PART OF HER Ph.D. PROJECT BUT WHEN SHE FITTED THIS TO SURVIVAL, THIS IS THE RADIOMIC MODEEL. THIS IS TUMOR VOLUME. DO YOU SEE TOO MUCH OF A DIFFERENCE? MAYBE AT THE TAIL. TUMOR VOLUME WAS DRIVING THIS RADIOMIC SIGNATURE. AND SO WE NEED TO BE THOUGHTFUL ABOUT EXTRACTING OUT USEFUL RADIOMIC DATA VERSUS VERY ARDUOUS TUMOR VOLUME SEGMENTATION, NOT JUST USING TOOL OR DATA BECAUSE WE HAVE IT. WHEN SHE NORMALIZES ON TUMOR VOLUME YOU POST NORMALIZATION AND CONCORDANCE INDEX DROPS. IT WAS TUMOR VOLUME DRIVING THIS RADIOMIC SIGNATURE. WE HAVE HAVE BE THOUGHTFUL ABOUT THAT. TO WRAP UP, WE NEED TO MEASURE AND ADDRESS THE VARIABILITY AT EACH OF THESE STEPS TO HELP ENABLE US TO VALIDATE, DEVELOP AND DEVELOP MODELS AND APPROACHES AND TOOLS AND EVALUATE EXERCISES, ELIMINATE CO-LINEARITY, IDENTIFYING BIOMARKERS TO IMPLEMENT IN CLINIC. FINALLY, THIS IS A LOT OF WORK. I THINK ONE OF THE THINGS THAT WE'RE AT IS WHEN YOU LOOK AT SORT OF MASS PRODUCTION AND INDUSTRIALIZATION AND COMPLEXITY OF TECHNOLOGY, YOU CAN BURY THE TECHNOLOGY. AS WE LEARN ABOUT SOMETHING, AS WE TALK ABOUT SAT SCIENCE, WE'RE AT A DIFFICULT PART, AT THE PEAK OF THE WORK PIECE, TALKING ABOUT STRUCTURED REPORTING, STANDARDIZATION, THE LIGHT OF THE END OF THE TUNNEL EVENTUALLY IF YOU POUR ENOUGH EFFORT IN AND POUR ENABLING TECHNOLOGY TO ALLOW US TO DO GOOD WORK, WE WILL BE ABLE TO DRIVE THIS DOWN SO THE COMPLEXITY IS BURIED AND WE CAN DO THE REAL WORK OF THE SCIENCE AND ASK THE RIGHT QUESTIONS AND GET THE ANSWERS IN AN AUTOMATED WAY. ONE OF THE THINGS WE PROBABLY NEED TO DO IS START LOOKING AT THE RAW DATA. SO ALTHOUGH WE'RE MAKING A LOT OF EFFORTS AND STRUCTURED DATA AND TALKING ABOUT LOOKING AT CATEGORICAL DATA AND QUALITATIVE PIECES, THAT ALWAYS HAS A HUMAN FINGERPRINT AND ONE OF THE THINGS WE CAN DO IS KEEP THAT THERE BUT RELATE THAT TO THE RAW DATA THAT WE START TO REALLY QUANTIFY WHAT ARE WE DOING IN CLINICAL PRACTICE, THAT WILL HELP US LEAP FORWARD MAKING A CLINICAL IMPACT. SO TAKEHOME POINTS, IMAGE QUALITY IMPACTS DOWNSTREAM MEASURES. WE DO NEED TO MAKE FIRST MODEL EFFORTS VERSUS LAST MODEL EFFORTS TO FIX EVERYTHING. MEASURING IMAGE QUALITY IS THE FIRST STEP. AND PERHAPS THE VARIABILITY AND IMAGE QUALITY IS OKAY IF WE MEASURE AND ACCOUNT FOR US, TO TEST ROBUSTNESS. TUMOR SEGMENTATION IS CRITICAL TO GET RIGHT BECAUSE IT AFFECTS TUMOR TARGETING. RESPONSE ASSESSMENT AS WELL AS ALL IMAGE BIOMARKERS. I WANT TO ACKNOWLEDGE A GROWING TEAM IN TERMS OF ADVANCED IMAGING, MY LAB GROUP AND FUNDING SUPPORTS. THANK YOU. [APPLAUSE] >> SO I WANT TO INTRODUCE DR. BILL LOOF. MARTY MURPHY ORIGINALLY SCHEDULED COULDN'T BE HERE TODAY BUT THE CONSUMMATE GENTLEMAN THAT HE IS, HE WISHES HE COULD BE HERE BUT WE HAVE DR. LOOF TO TALK ABOUT THE IMAGES PROGRAM. >> I HOPE YOU'RE NOT ALL GOING TO BE TERRIBLY DISAPPOINTED. I CAN'T MATCH MARTY MURPHY. I WON'T TRY TO. I'LL DELIVER THIS IN MY OWN STYLE. IT'S GREAT TO BE INVITED HERE. I'M AMAZE IT'S HOW MUCH I'VE LEARNED JUST SITTING HERE. IN FACT, I WISH I COULD REWRITE MY WHOLE TALK HERE BECAUSE OF THINGS I'VE LEARNED FROM THE LAST COUPLE OF TALKS. AND SINCE CAROLINE WENT BEFORE ME, I WAS NERVOUS ABOUT EXPLAINING RESIST TO A BUNCH OF RADIOLOGISTS BUT THANK YOU, YOU TOOK CARE OF THAT FOR ME. OKAY. SO FIRST OF ALL, QUICKLY, WHAT IS PROJECT DATASPHERE? 19 YEARS AGO THE FIRST PRESIDENT BUSH LAUNCHED THE CEO ROUND TABLE ON CANCER. HE TEAMED UP WITH BOB INGRAM, AT GLAXO AT THE TIME. THEY WENT AND TWISTED ARMS AND COLLECTED MONEY FROM A BROAD GROUP OF CORPORATIONS AND PRESIDENT BUSH SAID DO SOMETHING BOLD AND VENTURESOME. HE WAS INSPIRED BY, AMONG OTHER THINGS, THE DEATH OF HIS DAUGHTER ROBIN FROM A CHILDHOOD LEUKEMIA. A CURRENT CHAIRMAN IS BOB BRADWAY FROM AMGEN, AND THIS HAS GROWN OVER THE LAST 19 YEARS TO BE QUITE A DIVERSE SET OF CORPORATIONS THAT ARE INVOLVED. SO THAT WAS -- I'M GOING TO PUT THEM OUT THERE. SO, THAT WAS 18, 19 YEARS AGO. FIVE YEARS AGO, THE CEO ROUNDTABLE LAUNCHED PROJECT DATASPHERE, AND THERE WAS A YEAR OF WORK BEFORE THE LAUNCH TO BUILD THE PLATFORM AND GO THROUGH DATA SHARING AGREEMENTS AND MEET WITH LOTS OF LAWYERS AND FINALLY LAUNCH THE PLATFORM. IF YOU REMEMBER SOME OF THE HISTORY OF DATA SHARING, 2013 WAS PRETTY EARLY IN THE HISTORY OF DATA SHARING PLATFORMS. AND PROJECT DATASPHERE LAUNCHED EXCLUSIVELY ABOUT ONCOLOGY DATA, HAS BEEN UP TO THIS POINT ALMOST EXCLUSIVELY RANDOMIZED CONTROL TRIAL DATA. AND IT WAS CONTRARIAN IN THE SENSE OF HAVING AN OPEN SHARING MODEL. MANY OF THE DATA SHARING PLATFORMS HAVE WHAT'S CALLED A GATE KEEPER MODEL, WHERE YOU ARE REQUIRED TO SUBMIT A RESEARCH PROTOCOL THAT GOES TO A VETTING PROCESS, AND THEN YOU MAY OR MAY NOT GET PERMISSION TO ACCESS THE DATA. THE PROJECT DATASPHERE, THE BARRIER TO BECOMING REGISTERED USER OF THE DATA IS MUCH LOWER. IN ALMOST ALL CASES IT'S ACHIEVED IN 72 HOURS. SO THAT -- IT'S DIFFERENT THAN QUITE A FEW OF THE OTHER DATA SHARING PLATFORMS. AND IT HAS -- WE HAVE BEEN ABLE TO ACCRUE A LOT OF DATA SETS, THAT REPRESENT WELL OVER 100,000 PATIENTS, PATIENT LEVEL DATA. AND I'M CONVINCED THAT IT'S THE OPEN ACCESS MODEL THAT STIMULATES A LOT OF ACTIVITY, MEASURED HERE IN TERMS OF INVESTIGATORS WHO HAVE ACCESS TO DATA AND NUMBER OF DOWNLOADS OF THE DATA. IT'S NOT JUST ACTIVITY. THERE'S ALSO OUTPUT. I MEAN, SCIENTIFIC OUTPUT, THERE ARE 25 PEER-REVIEWED PUBLICATIONS AT THIS POINT, MOST IN THE LAST THREE YEARS. RIGHT NOW IN THE EARLY DAYS WE HAD TO STIMULATE PEOPLE TO USE THE PLATFORM BY SUGGESTING WHAT SORT OF QUESTIONS THEY COULD ANSWER FROM THE DATA. NOW WE USUALLY DON'T KNOW ABOUT THE PUBLICATION UNTIL IT'S PUBLISHED. IT HAS TAKEN ON A LIFE OF ITS OWN. PEOPLE HAVE FOUND IT AND PEOPLE ARE USING IT. SO THAT'S BEEN A JOURNEY OF FIVE OR SIX YEARS, DEPENDING HOW YOU COUNT THE FIRST YEAR. AND ABOUT A YEAR AGO, MAYBE 14 MONTHS AGO, WE LAUNCHED FOUR RESEARCH PROGRAMS. AND I'LL JUST PUT THEM ALL UP THERE. EACH ONE OF THESE FUNDAMENTALLY IS ABOUT DATA. IT'S ABOUT FINDING, CLEANING, CURATING AND CONSOLIDATING DATA ON THE PDS PLATFORM TO ADDRESS IMPORTANT QUESTIONS THAT WOULDN'T OTHERWISE BE ADDRESSABLE. SO IT'S ABOUT VOLUME, ABOUT CURATION, IT'S ABOUT ACCESSIBILITY, WHICH IS ACHIEVED BY PROJECT DATASPHERE PLATFORM. I'LL SPEND MOST OF MY TIME ON THE FOURTH ONE, WHICH SEEMS MOST RELEVANT TO THE PEOPLE GATHERED HERE TODAY, BUT I THINK YOU WOULD PROBABLY HAVE INTEREST IN ALL OF THESE. FIRST, EXTERNAL CONTROL ARM, WE'RE NOW AT VERSION 1.0 OF EXTERNAL CONTROL ARM FOR SMALL CELL LUNG CANCER. WE HAVE MORE WORK TO DO THERE, GET IT INTO THE GLIDE PATH OF CLINICAL TRIALS BEING DONE. AND THEN MOVE ON TO OTHER TUMOR TIMES, WHERE AN EXTERNAL CONTROL ARM ARGUABLY MAKES SENSE. RARE TUMOR REGISTRIES, WE HAVE BUILT A REGISTRY FOR MERKEL CELL CARCINOMA, MULTI-INSTITUTIONAL REGISTRY. MOST PEOPLE HEAR THAT, THAT SOUNDS BLAND AND BORING, BUT IT'S QUITE AN ACHIEVEMENT TO GET THE MAJOR CENTERS THAT HAVE MERKEL SERVICE SERVICES TO AGREE TO SHARE THEIR DATA. BUT IT'S A RARE ENOUGH DISEASE THAT IF THAT ISN'T DONE, THERE ARE A LOT OF IMPORTANT QUESTIONS WITH MERKEL CELL THAT JUST AREN'T GOING TO GET ANSWERED. AGAIN, THAT'S THE FIRST RARE TUMOR THAT WE'VE BUILT A REGISTRY FOR. AND NOW WE'RE CONSIDERING WHAT SHOULD BE THE NEXT, OR THE NEXT TWO OR NEXT THREE. IMMUNE-RELATED ADVERSE EVENTS, THERE WAS A SYMPOSIUM ON APRIL 17, CO-SPONSORED BY PROJECT DATASPHERE AND FDA, TO ADDRESS THE ADVERSE EVENTS OCCURRING WITH CHECKPOINT INHIBITORS, INCLUDING THE EXTRA CHALLENGE OF COMBINATION THERAPY. AND WE SHARED SOME DATA THAT HAD NOT BEEN SHARED BEFORE, IN AN OPEN FORUM, FROM THE irAE MESSAGE INDUSTRY -- REGISTRY AT MASS GENERAL FROM APPROVED CHECK POINT INHIBITORS, CONSOLIDATED BUT STILL A FIRST TO SHARE THAT KIND OF DATA, AND THEN ALSO LOOKING AT THE FAIRS DATA, ADVERSE EVENT DATA FROM MARKETED PRODUCTS. AND WE -- A BIG PART OF THAT SYMPOSIUM WAS A CALL FOR DATA, A CALL FOR ACTION. THERE ARE QUITE A FEW APPROVED IMMUNE THERAPIES NOW. AND THERE'S A LOT OF DATA THAT'S BEEN GENERATED, LOTS AND LOTS OF CLINICAL TRIALS, AND VERY LITTLE DATA SHARED WITH THE SCIENTIFIC COMMUNITY, WHICH IT'S -- WE NEED TO DO BETTER THAN THAT. THAT WAS ONE OF THE OBJECTIVES OF THE SYMPOSIUM. AND ONE THING THAT WILL COME OUT OF THAT IS A CROWD SOURCE CHALLENGE, BASED ON THE FAIRS DATA, ADVERSE EVENT DATA FOR MARKETED PRODUCTS. THAT ALREADY HAS A PUBLICLY AVAILABLE COMPONENT, THE FAIRS DATA, BUT IT'S NOT USED VERY MUCH BECAUSE IT'S BARELY USABLE. SO WE WILL CURATE THAT IN A WAY THAT MAKES IT MORE ACCESSIBLE AND DO A CROWD SOURCE CHALLENGE. AND THEN IMAGING AND ALGORITHMS. ONE NOTE ON THE PHRASE, LIBRARY LABORATORY, THE PLATFORM, THE PROJECT DATASPHERE PLATFORM WE DEVELOPED JOINTLY WITH THE THE SASS INSTITUTE, SOFTWARE COMPANY, IT'S A PLACE TO FIND THE DATA AND HAS A WORKBENCH SO YOU CAN ANALYZE THE DATA. AND THE SASS ANALYTIC STACK IS AVAILABLE FOR FREE FOR ANYONE WHO DOES WORK ON THE PLATFORM. AND ALL OF THESE PROGRAMS WERE ONGOING AND, YOU KNOW, FERTILE DISCUSSION WITH THE FDA BECAUSE THE GOAL HERE ESPECIALLY IN EXTERNAL CONTROL ARMS AND IMAGES AND ALGORITHMS IS TO INFORM REGULATORY SCIENCE AND ULTIMATELY MAKE CHANGE FOR THE GOOD IN DEVELOPMENT GUIDELINES. OKAY. SO THE IMAGES AND ALGORITHMS RESEARCH PROGRAM, IF I COULD REBRAND IT NOW IN ORDER TO BE A BIT MORE ACCURATE AND FOCUSED, PROBABLY CALL IT CAT SCANS AND AUTOMATING RESIST. THAT'S CHALLENGE NUMBER ONE. THIS STARTED FROM A CONVERSATION WITH FDA, WHERE THEY LOOKED AT THE CONCORDANCE BETWEEN INITIAL QUANTITATIVE RADIOLOGY AT THE VARIOUS CENTERS PARTICIPATING IN A CLINICAL TRIAL AND THEN THE CONFIRMATORY READING THEY END. THIS IS AVERAGE, DISCORDANCE OF 30%, CAROLINE IS NOT SURPRISED TO SEE THAT. FOR SOME DISCORDANCE IS EXTREMELY HIGH. THERE'S SOME VALUE CREATED BY ADJUDICATION BETWEEN THE FIRST AND SECOND READING, BUT IT'S QUESTIONABLE, HOW MUCH VALUE IS CREATED THAT WAY. AND IT'S SUPER EXPENSIVE TO DO THE CONFIRMATORY READS. IT TAKES A LONG TIME. YOU ARE DELAYING REGULATORY DECISIONS ON MEDICINES THAT COULD BE IMPORTANT. SO THAT'S -- SO THE PREMISE THEN IS YOU TAKE THE CAT SCANS AGAINST WHICH THE RESIST PROTOCOL IS APPLIED, AND AUGMENT CONFIRMATORY READING, AUGMENT CONFIRMATORY RADIOLOGIST. IDEALLY WHAT WE'RE DOING IS AUGMENTING THE HUMAN RADIOLOGIST SO THEY ARE NOT DOING THE GRIND AND THEY CAN THEN SUPERVISOR THAT AND INTERVENE WHEN AN AUTOMATE D APPROACH IS UNLIKELY TO BE CORRECT OR ACCURATE. AND REALLY WE STARTED THE DAY TALKING ABOUT PATIENTS. THIS WOULD TAKE A MEANINGFUL AMOUNT OF TIME AND MONEY OUT OF THE DEVELOPMENT OF CANCER DRUGS WHICH IS GOOD FOR PATIENTS. IT'S GOOD FOR THE WHOLE SYSTEM. NOW, KEY ELEMENTS, I THINK I'LL STAY ON THE FIRST BULLET, FIRST TWO BULLETS, FOR A FEW MOMENTS BECAUSE THEY HAVE A LOT OF GREAT IDEAS ON HOW TO DO THIS. WE LACK THE DATA. BECAUSE WE NEED TO GET DATA FROM THE PHARMACEUTICAL COMPANIES, SO THAT WE CAN TRAIN A MACHINE LEARNING ALGORITHM THAT CAN BE USED IN THE REGULATORY PROCESS. NOW, WE'RE USED TO HOW CHALLENGING IT IS TO GET PEOPLE TO SHARE THEIR DATA. THAT'S OUR LIFE. THAT'S WHAT WE DO. AND GOING TO IMAGES IS ANOTHER LEVEL OF RESISTANCE AND COMPLEXITY, SAY, COMPARED TO THE CLINICAL OUTCOMES DATA. SO, JUSTIN, WHAT YOU'RE GOING IS AWESOME, I WOULD LOVE TO LEARN FROM YOU SOME IDEAS OF HOW TO MOTIVATE PEOPLE MORE EFFECTIVELY TO SHARE THEIR DATA BECAUSE YOU OBVIOUSLY ARE BEING VERY SUCCESSFUL IN DOING THAT. AND THEN CAROLINE COVERED THIS BETTER THAN I CAN, AND THAT IS RECIST IS GOING TO SOON BE OBSOLETE. IT'S STILL THE STANDARD, SO THEREFORE AUTOMATING IT WILL CREATE TREMENDOUS VALUE. BUT IT'S GOING TO BE OBSOLETE SOON, SO WE NEED TO START WORKING ON THE NEXT GENERATION, KIND OF IN PARALLEL TO DOING THE BEST WE CAN WITH RECIST. A LITTLE ADVERTISEMENT FOR PROJECT DATASPHERE, IN ADDITION TO DATA SHARING PLATFORM, AND THE RESEARCH PROGRAMS, ONE OF THE THINGS WE -- THAT'S VERY IMPORTANT WORK THAT WE DO IS CONVENE PEOPLE BECAUSE WE'RE NEUTRAL, WE'RE NOT FOR PROFIT. WE MAINTAIN ALL OF OUR RELATIONSHIPS ACROSS ACADEMIA, INDUSTRY, AND GOVERNMENT IN A NEUTRAL WAY. AND WE CAN CONVENE MEETINGS QUITE EFFECTIVELY AND GET EVERYBODY TO WORK ON THE PROBLEM AS A TEAM. SORT OF THE CORNERSTONE OF THIS ROLE IS PLAYED OUT THROUGH THE SERIES OF SYMPOSIA THAT WE CO-CONVENE WITH FDA. NUMBER 8 IS COMING UP IN NOVEMBER, NOVEMBER 11. AND IT WILL BE ON THE -- CENTERED AROUND OUR PROGRAM ON IMAGING AND ALGORITHMS. YEAH, NOVEMBER 11 AT STANFORD. I WANTED TO KIND OF DO A LITTLE MARKETING TO EVERYBODY. IT'S A FACE-TO-FACE THAT WILL BE A RELATIVELY SMALL GROUP. IT WILL BE 100. AND THAT'S GOING TO FILL UP REALLY FAST, BUT WE'RE GOING TO WEBCAST IT. AND BOTH LIVE AND THEN AFTER THE FACT. SO PLEASE SHARE YOUR INTEREST WITH US. I'M HERE WITH A COLLEAGUE FROM PROJECT DATASPHERE. SINCE THIS IS DAY 4 OF HER EMPLOYMENT WITH PROJECT DATASPHERE I'LL GET HER HERE AND HAVE HER SIP THROUGH A FIRE HOSE OF MEETING PEOPLE AND LEARNING ABOUT THE PROGRAM. SHE HAS A GREAT BACKGROUND IN GENOMICS AT NIH, AT DUKE, SO WE'RE VERY HAPPY TO HAVE HER. PLEASE LET US KNOW YOUR INTEREST, IN THE SYMPOSIUM AND NOT JUST ATTENDING BUT PERHAPS HELPING US WORK IN COMPLETING THE AGENDA OR MAYBE EVEN SUGGESTING SOME SPEAKERS THAT WE CAN'T LIVE WITHOUT. I'LL STOP THERE. TRACK A.T. OR MYSELF DOWN DURING LUNCH OR ANY BREAK. I WOULD LOVE TO TALK TO YOU. THANK YOU. [APPLAUSE] >> I'LL CALL UP THE SPEAKERS BACK UP FOR THE PANEL. AND WE WILL TAKE SOME QUESTIONS. WE'RE RIGHT ON TIME. SO TWO MICS SET UP. COME ON UP. >> HI. I HAVE A QUESTION FOR THE PRESENTATION ABOUT THE SEER REGISTRIES. SO, WHERE I WORK I HAD TO DO A PICTORIAL FOR OUR REGISTRY STAFF. THEY DIDN'T KNOW WHERE TO FIND THINGS IN THE DATABASE, AND THE OTHER THING THAT'S DIFFICULT IS YOU CAN'T STANDARDIZE OUR CLINIC'S DOCUMENTS ACROSS OTHER PLACES SO IT'S SOMETHING RELATIVELY IMPOSSIBLE TO TRAIN FOR. I HAD TO SHOW THEM WHETHER IN THESE DOCUMENTS THAT ARE STRUCTURED DATA YOU COULD FIND THINGS. ARE THERE ANY EFFORTS TO TRY TO STREAMLINE WHAT HAPPENS THERE OR ARE WE RELYING ON EACH CLINIC TO DO THAT TRAINING? >> THANKS FOR THE QUESTION. SO, WE'RE DOING A COUPLE DIFFERENT THINGS. WE'RE ACTUALLY -- WE'VE BEEN MORE REACHING OUT INTO THE ACTUAL COMMUNITY, WHERE THE HOSPITALS AND CLINICS ARE WHERE WE'RE GETTING THIS INFORMATION. WE OPEN THE CHANNEL OF COMMUNICATION FOR REGISTRARS TO REPORT BACK TO LET US KNOW WHAT THE PROBLEM AREAS ARE, AND SO WE'RE ACTUALLY DEVELOPING THOSE TRAININGS FOR THE ACTUAL REGISTRARS TO GO TO THE CLINICS TO DO THAT SO THAT'S ONE THING WE'RE DOING. ANOTHER MAJOR THING WE'RE TRYING TO DO IS JUST DO A LOT OF NATURAL LANGUAGE PROCESSING TO DO AUTOMATED TECHNIQUES. FOR TWO REASONS. ONE, IT'S REALLY EXPENSIVE TO TRAIN UP A REGISTRAR. THEY ARE ALSO HUMAN. THEY CAN MAKE MISTAKES. THE MORE TRAINED THEY ARE, THE BETTER. BUT THEY ALSO RETIRE. AND SO WE HAVE TO REMEMBER THAT WE NEED TO THINK ABOUT WAYS TO -- IF THERE'S A WAY TO AUTOMATICALLY COLLECT SOMETHING EASILY CAUGHT WE SHOULD DO THAT AND HAVE THEM FOCUS ON THINGS MORE DIFFICULT. >> SO ALONG THOSE LINES, THE PROBLEM WE HAVE WITH THESE DOCUMENTS OF COURSE THEY ARE GOING TO HAVE TO BE SUBJECT TO NATURAL LANGUAGE IF WE WANT TO AUTOMATE THEM BUT WHAT WOULD MAKE IT EASIER TO AUTOMATE THESE THINGS ACROSS SEVERAL CLINICS IS IF SOME TYPE OF STANDARD FORMAT FOR THESE DOCUMENTS, I KNOW YOU CAN'T STANDARDIZE EVERYTHING, BUT IF THERE'S A WAY TO AT LEAST, YOU KNOW, MAYBE PUBLISH A SET OF RULES ABOUT HOW WE SHOULD STRUCTURE OUR DOCUMENTS, IT'S EASIER FOR NATURAL LANGUAGE PROCESSING METHODS TO EXTRACT INFORMATION. >> WE ARE TRYING TO DEVELOP THAT, AS YOU WOULD EXPECT IT'S REALLY, REALLY DIFFICULT, TOTALLY DIFFERENT, DEPENDING WHERE YOU GO. BUT WE ARE ACTUALLY TRYING TO DEVELOP SOME OF THOSE DOCUMENTS, VERY PRELIMINARY PHASES AT THIS POINT BUT IT IS SOMETHING THAT'S ACTUALLY IN THE WORKS. >> THE LAST QUESTION I HAVE ABOUT THAT HAS TO DO WITH YOUR 95% OF THESE PATIENTS ARE NOT ON CLINICAL TRIALS. AND THE DIFFICULTY WITH NOT BEING ON CLINICAL TRIALS IS THAT EVEN WITHIN THE SAME INSTITUTION, YOU HAVE THE SAME FRACTION SIZE, SCHEDULE, EVERYTHING LOOKS THE SAME, BUT THE PROCESS THEY USE TO TREAT THAT PATIENT COULD BE DIFFERENT FROM ONE PHYSICIAN TO THE NEXT. AND SO I WORRY ABOUT THE CONSISTENCY OF THE DATA THAT'S THERE. IT LOOKS THE SAME BUT THE UNDERLYING PROCESS MAY NOT BE. >> ONE OF THE HUNDREDS OF QUESTIONS WHEN WE STARTED TO LOOK AT THE PRELIMINARY DATA FROM OUR FIRST PILOT SITE, USING THE SAME MODALITY, MIGHT BE DIFFERENT. WE HAVE AN ENORMOUS AMOUNT OF DATA TO GET TO THE NUANCES, I DON'T HAVE A CLEAR ANSWER FOR YOU NOW, BUT IT IS ON THE LIST OF THINGS THAT WE REALIZE IT'S GOING TO BE VERY, VERY INDIVIDUAL. >> WE HAVE A PRESENTATION LATER TODAY THAT WILL COMMENT ON THAT EFFECT AS WELL. SO STAY TUNED. >> I HAVE A QUESTION, THERE'S A COMMENT FOR DR. CHUNG. FOR DR. DUIKER, IT'S GREAT TO GET UNDERGRADS INVOLVED. ONE THING, WHEN WE GO TO MEDICAL SCHOOL AND GET THROWN IN, IF YOU HAD AN IDEA HOW TO KEEP UP THAT CURIOSITY, THAT MINDSET, IF YOU LOSE FOUR YEARS OF TIME THE COMMUNITY WILL HAVE ADVANCED IN THE SHORT PERIOD OF TIME. >> YEAH, IT'S A CHALLENGE. WE HAVE A PROGRAM WITH PRINCETON, TRYING TO DO IT WITH YALE, TO ENCOURAGE STUDENTS COMING FROM A COMPUTATIONAL BACKGROUND THAT WE ACCEPT THEM EARLY, THEY DON'T HAVE TO TAKE THE M CAT AND WANT THEM TO TAKE THE CHALLENGING COURSES THAT COULD AFFECT THEIR GPA, IT'S NOT A BINDING AGREEMENT. THEY DON'T HAVE TO COME TO JEFFERSON, BUT WE'VE ATTRACTED SOME STUDENTS IN THAT REGARD. WE REVISITED OUR CURRICULUM TWO YEARS AGO, IT'S NOW CALLED JEFF M.D., CASE-BASED LEARNING. NO ONE VOLUNTEERED TO SAY THE STUFF I TAUGHT IN THE PAST, WE DON'T NEED TO INCLUDE IN THE CURRICULUM, RIGHT? NO ONE WANTED TO TAKE OUT ANYTHING. EVERYONE WANTED TO PUT SOMETHING IN. INCLUDING YOURS TRULY. I DON'T HAVE A GOOD SOLUTION. A QUASI-SOLUTION IS YOU DO A GAP YEAR WITHIN MEDICAL SCHOOL, MEDICAL STUDENTS SHOULD -- THEY EITHER DO A Ph.D. OR THEY DO A ONE YEAR MAYBE MASTER'S IN DATA SCIENCE INFORMATICS OR THERE'S NO DEGREE BUT THEY JUST DO A BODY OF WORK. SOME SCHOOLS ARE MORE ENLIGHTENED THAN OURS, MAYBE YALE, WHERE ONE DOES A THESIS. IT'S MORE PASS/FAIL. I DON'T KNOW ANYONE HAS FIGURED THIS OUT BUT I EMPHASIZE WITH YOU THAT PEOPLE ARE TOWARDS THE PEAK OF WHATEVER THEY HAVE LEARNED FROM AN UNDERGRAD, CODING, INFORMATICS, THE MATH PIECE. AND THEN THEY GO INTO THIS TIME WARP, FOR MEDICAL SCHOOL AND COME OUT AND IT'S WORSE BECAUSE THEY GO INTO CLINICAL TRAINING. THERE NEEDS TO BE AN OVERHAUL WHAT'S RELATIVELY NECESSARY FOR MEDICAL EDUCATION BUT THERE'S -- THERE'S A LOT OF AGIDA ABOUT HOW TO FIGURE THAT OUT. SORRY. >> A COMMENT FOR DR. CHUNG, FUTURE SELECTION, THAT'S A PROBLEM THAT I PERSONALLY RUN INTO, OTHER PEOPLE RUN INTO, AND SO RECENTLY I HEARD A TALK FROM A COMPUTER SCIENTIST AT GOOGLE WHO TALKED ABOUT STABILITY OF MODELS, AND HE BROKE IT DOWN INTO YOU HAVE VARIANCE IN DATA AND IN THE MODEL, SO FOR DATA, DATA STABILITY, THAT COULD COME FROM HOW YOU SPLIT YOUR MODEL, CROSS-VALIDATION, RUN IT AGAIN AND SPLIT DIFFERENTLY, VARIANCE THERE. IN MODEL VARIABILITY, WE THINK OF -- INTUITING A MECHANISM, TRY TO FIND FEATURES MOST IMPORTANT, FEATURE SUCTION COMPRESSING DATA, DIMENSIONALITY REDUCTION, REMAINING FEATURES HOLD LIKE IMPLICIT INFORMATION, FEATURES THAT WERE REMOVED. AND I THINK WHEN WE HAVE ALL THESE VERY COMPLEX DATASETS YOU HAVE MODELS THAT KIND OF GET YOU TO THE SAME -- POTENTIALLY THE SAME ANSWER BUT ARE DIFFERENT MODELS, DIFFERENT PATHS, CONVERGING THE SAME SOLUTION. IT'S JUST MORE LIKE SOMETHING TO THINK ABOUT, ARE YOU LOOKING TO DO PREDICTION OR FIND MECHANISTIC UNDERLYING PRINCIPLES OR JUST TRYING TO DO -- LIKE ARE YOU TRYING TO FIND MECHANISMS OR PREDICTION. THE SECTION MIGHT NOT BE AS IMPORTANT AS MECHANISTIC UNDERPINNINGS. >> SOME OF THE EXAMPLES WHERE VOLUME WAS REALLY DRIVING THE SIGNATURE, THAT WAS MY MAIN POINT, I WASN'T SUGGESTING FEATURE SELECTION WAS BETTER OF AN APPROACH. I THINK YOU MAKE A GOOD POINT, HAVING SAID THAT, WHAT IS THAT MODEL INTENDED TO BE USED FOR AND THAT FINAL APPLICATION IS SOMETHING THAT WE NEED TO CONSIDER. SIMILARLY, I SAID AT THE VERY END, THE ROBUSTNESS OF THAT MODEL WHEN WE TALK ABOUT IMPLEMENTING IT IN THE REAL WORLD, TESTING IT AGAINST VARIABLE DATA MIGHT BE THE WAY WE CAN FIND MODELS THAT ARE CLINICALLY USABLE AND, AGAIN, THAT'S THE USE CASE, SO IF IF YOU HAVE A USE CASE DRIVING DOWN MECHANISM FOR RESEARCH PURPOSES, THAT'S ONE APPROACH. YOU MAY HAVE A DIFFERENT MODEL. CLINICAL IMPLEMENTATION OF RADIOMICS INTO REAL WORLD, WE DEFINITELY NEED EVALUATE ROBUSTNESS OF THE MODEL ACROSS VARIABLE QUALITY DATA. >> TWO MORE QUESTIONS OVER HERE AND THEN WE'LL FINISH UP. >> HI, STONY BROOK, THANK YOU TO THE PANELISTS. MY QUESTION IS FOR DR. CHUNG. MY INSTITUTION, EVERYONE HERE, PROBABLY HAS ONGOING TRIALS INCORPORATING RADIOMICS, YOUR TALK WAS PERTINENT ABOUT PITFALLS, QUALITY ASSURANCE. CURIOUS WHAT YOU'RE DOING FOR YOUR CURRENT CROP OF PATIENTS ENROLLING ON STUDIES TO ENSURE THE IMAGE ACQUISITION IS OF ON A CERTAIN QUALITY, ARE THERE METRICS YOU'RE USING, ACTION ITEMS TO USE NOW TO IMPROVE OUR ACQUISITION. >> ONE OF THE THINGS WE'VE DONE AT THE INSTITUTION IS WE -- WITHIN RADIATION ONCOLOGY, WORKING CLOSELY WITH DIAGNOSTIC RADIOLOGY TO STANDARDIZE IMAGE ACQUISITION FOR CLINICAL PURPOSES, ALL CLINICAL PATIENTS. AND IMPLEMENTING. FOR INSTANCE, THE PROTOCOL FOR CLINICAL TRIALS, WE'VE TRIED TO PUSH AND HAVE THAT IMPLEMENTED AS PART OF THE CLINICAL PRACTICE SO THERE ISN'T VARIABILITY BECAUSE AS SOON AS YOU DO THAT THERE IS LESS VARIABILITY BECAUSE YOU SAY FOLLOW THIS CLINICAL TRIAL, SOME TECH HITS THE WRONG BUTTON AND NOW THEY ARE DOING THE STANDARD CLINICAL ACQUISITION. SO THOSE TRANSITIONS WILL HELP. WE TRIED TO STANDARDIZE AND WE'RE WORK TO STANDARDIZE ACROSS ALL THE INSTITUTIONS. >> GREAT. THANK YOU. >> DR. FULLER? >> PROJECT DATASPHERE AND NRG'S COMMITMENT TO GETTING THESE DATA SETS IN THE PUBLIC IS A REAL INSPIRATION, ESPECIALLY IN THE COMMITMENT TO GETTING THE PUBLIC IN A RAPID MANNER. THE QUESTION, BOTH OF THESE PROJECT DATASPHERE AND DATA THAT'S SHARED FROM THE NCI ON THAT PLATFORM HISTORICALLY DON'T HAVE IMAGING OR RT DATA. IS THERE AN AVENUE WHERE THAT DATA COULD BE SCRAPED EITHER FROM NRG SITES OR FROM THE INDUSTRY SPONSORS IN SOME MECHANISM SO REPOSITORIES AVAILABLE AS WE'RE ACCRUING THE NEW STUFF? >> START WITH BILL. >> SO WE HAVE UPGRADED THE PLATFORM WITH OUR PARTNERS AT SAS SO WE CAN NOW MANAGE THE VOLUMES THAT COME WITH IMAGES. WE'VE DONE THE SORT OF INFRASTRUCTURE WORK. AND WE'RE JUST NOW GETTING ON THE PLATFORM SOME IMAGING, BUT I DID WANT TO FOLLOW ONE -- UP WITH JUSTIN. WE HAVE A PLATFORM, MORE EXPOSURE FOR IT'S NCI DATA, AND I'M -- I GUESS I LEAPT TO AN ASSUMPTION WE WERE GOING TO ACHIEVE THIS FOR TCIA AS WELL. AND DO I HAVE THE ACRONYM RIGHT? YEAH. SO THE QUESTION I HAVE IS TIMING. IN OTHER WORDS, SHOULD WE WAIT UNTIL YOU FIGURED OUT THE LINKAGE BEFORE WE START PUTTING THE IMAGING DATA OR SHOULD WE START MOVING IMAGING DATA NOW? AND THEN RETROFIT THE LINKAGE? PROBABLY REQUIRES OFFLINE. THAT'S MY VIEW. WE'LL GET THERE. THERE WILL BE A FLOW OF THE IMAGING DATA AS WELL. >> YEAH, I THINK ECOG-ACRIN HAS A BETTER HANDLE ON THIS THAN WE DO CURRENTLY. ARE WE OPEN TO DATA SCRAPING? SHE SHAKES HER HEAD. I THINK THERE'S A WAY TO CREATE A PROCESS. PATIENTS DON'T APPRECIATE WHERE THE DATA IS. I MEAN, IT'S ONE THING FOR ME TO DO A GERMLINE, YOU KNOW, I HAVE A COLLEAGUE AT THE DANA-FARBER BROAD INTERESTED IN METASTATIC PROSTATE CANCER, MAIL A KIT, SPIT IN A TUBE, CITIZEN SCIENTIST. IT'S HARD WITH IMAGING DATA. THEY DON'T HAVE A TERABYTE THUMB DRIVE OR THEY SAY HERE'S MY FTP ADDRESS, WHERE WE -- IT DOESN'T WORK THAT WAY. THAT'S WHAT MAKES IMAGING DATA DIFFICULT. THE PATIENT OWNS THE DATA. THAT'S MY FUNDAMENTAL BELIEF. I DON'T THINK DATA SCRAPING IS THE SOLUTION. THERE CAN BE AN APPROACH, WE'D LIKE TO LEARN ECOG-ACRIN IS AHEAD, THEY GET MORE SUPPORT, OFFLINE TALK ABOUT THE IROC PERSPECTIVE. >> I WAS THRILLED WITH DR. CHUNG'S PRESENTATION EMPHASIZING THE IMPORTANCE OF THE STANDARDIZATION AND WHAT WE'RE DISCOVERING AS WE'RE LOOKING AT HEAD AND NECK DATA ACROSS INSTITUTIONS AND HOW THAT FEEDS INTO THE SEGMENTATION AND WE ALL KNOW RUBBER MEETS THE ROAD, YOU'RE THE CLINICIAN, WHAT HAPPENS THERE. WHEN I LOOKED AT YOUR PRESENTATION, DR. LAM, PART OF THE PROBLEM IS FOR THE MANUFACTURERS THEIR GOALS ARE DIFFERENT, AND SO THINGS LIKE TRYING TO UNDERSTAND TOTAL DOSE PARTICULARLY WHEN YOU WANT TO SPAN ACROSS PLAN, PLAN REVISION, A BOOST, THEY ARE NOT ORIENTED TO DO THAT. IT ENDS UP THAT THE END USERS SORT OUT WAYS TO USE THAT TECHNOLOGY TO PUT IN WHICH MEANS AS YOU'RE TALKING ABOUT STANDARDIZATION, ENGAGING THE END USERS HAS TO BE ADDRESSED, YOU'RE TO THE JUST TAKING TOP LEVEL, HOW DO WE MAKE THAT MORE ROUTINE, THAT IT DOESN'T HAPPEN AS AN EXCEPTION, YOU'RE ENGAGING WITH THE END YOURSERS WHO KNOW HOW THAT'S GOING TO UNFOLD IN THE CLINIC BUT YOU DO IT UP FRONT. LIKE YOU'RE DOING WITH SEGMENTATION. >> SO I ACTUALLY DID INVITE REPRESENTATIVES FROM VARIAN AND ELEKTA, DO WE HAVE ANYONE? OKAY, WE CAN SAY WHATEVER WE WANT ABOUT THEM. [LAUGHTER] I WANTED YOU TO TAKE THAT WAY FROM THE MEETING, ENGAGEMENT CAN GO A LONG WAY IN THE CONVERSATIONS WITH THE VENDORS IN TERMS OF SAYING THAT THESE TIMES OF EXTRACTION EFFORTS AND PUTTING RESOURCES AT THE INDUSTRY LEVEL TO DO THAT IN A MORE GRANULAR WAY IS IMPORTANT. WE MAY HAVE A MANDATE TO DO THAT. SO I THINK THAT'S ONE THING EVERYBODY CAN WALK AWAY FROM TODAY IS THAT YOU DO HAVE THAT POWER SO USE THAT BECAUSE THEY WILL LISTEN. >> RAISE YOUR HAND FROM ASTRO. SANYA IS INVOLVED IN THE ROYALS PROGRAM, INCIDENT LEARNING SYSTEM, WE HAVE THESE WEBINARS WITH THE VENDORS BECAUSE WE'RE PULLING DATA FROM THOUSANDS OF SITES IN TERMS OF MISTAKES THAT PEOPLE ARE MAKING IN DELIVERING RADIATION, A LOT IS PROCESS, A LOT IS HUMAN BEHAVIOR, YOU KNOW, COMMUNICATION IS PROBABLY MOST -- ONE OF THE TOP ONES. SO WE GET SUPPORT FROM THE VENDORS, THEY DON'T HAVE ACCESS, FEDERALLY PROTECTED SPACE, DEFINED BY AHRQ. I AM ROTATING OFF THIS YEAR AS CHAIR. BUT WE DO HAVE SOME HIGH LEVEL PEOPLE. WE HAVE THE PRESIDENT OF VARIAN. LAST TIME WE SHOWED DE-IDENTIFIED DATA, SO TALK ABOUT SANYA, FRAME WITHIN SAFETY AND SANYA WILL USE THAT. IT'S JUST A DIFFERENT WAY TO FRAME THE DATA ELEMENT ISSUE, BUT IF YOU PUT IT IN THE BOX OF SAFETY AND QUALITY THEN IT MAY RESONATE BETTER, MAY NOT. >> YEAH, I THINK I'M ASKING THAT YOU BE A BIT MORE PRO-ACTIVE SO THERE ARE GROUPS, STANDARDIZATION GROUPS IN APM, INSIDE ASTRO, TRYING TO WORK ACROSS, NOT WAITING FOR THEM TO COME TO YOU, MAYBE WE SHOULD TALK TO THEM TOO. WE DO A LOT OF THAT AS WE'RE BUILDING UP DATA SCIENCE EFFORTS, TRYING TO FIND WHO STAKEHOLDERS ARE AND BE MORE PRO-ACTIVE I THINK IS A GOOD THING. >> I WANT TO THANK THE PANELISTS FOR A GREAT SESSION 2. WE'LL BREAK FOR LUNCH AND PLAN ON MEETING BACK AROUND 1:15. THANK YOU, EVERYBODY. >> WE'LL GET STARTED WITH OUR THIRST SESSION, AI APPLICATIONS IN RADIATION ONCOLOGY. I USE THE TERM LOOSELY. I PREFER "ALGORITHMS," AND AS WE HAVE THE AFTERNOON TALKS YOU'LL SEE WHY. I WANT TO FIRST INTRODUCE DAVE FULLER. DAVE IS AN ASSOCIATE PROFESSOR, DEPARTMENT OF RADIATION ONCOLOGY, ASSOCIATE DIRECTOR OF MR PROGRAMMATIC DEVELOPMENT AT M.D. ANDERSON CANCER CENTER. FOR THOSE WHO TONIGHT KNOW HIM, DAVE IS PASSIONATE ABOUT EDUCATIONAL INITIATIVES IN RADIATION ONCOLOGY, SPECIFICALLY FOCUSED AROUND DATA. SO I WANT TO KICK THINGS OFF WITH HIM AND HE WILL MODERATE THE SESSION. DAVE, WELCOME. >> THANKS, ANSHU. THANKS, KEVIN AND TEAM FOR PUTTING THIS TOGETHER. I'M EXCITED TO BE HERE. IT'S GREAT TO SEE A TON OF FACES. I FEEL LIKE I'M PREACHING TO THE CHOIR. I'LL COVER MY SLIDES QUICKLY, THERE'S A COPY OF THE PRESENTATION, IF THERE'S SOMETHING YOU MISSED FEEL FREE TO DOWNLOAD AND USE AT WILL. I'D LIKE TO THANK GENEROUS FUNDERS INCLUDING NCI, WHO HAVE SUPPORTED SOME OF THE WORK HERE YOU SEE. ONE OF THE MOST FORMATIVE THINGS TO ME, I WAS THE BENEFICIARY OF AN NCI NSF INITIATIVE, GETTING JUNIOR PHYSICIANS IN CONTACT WITH COMPUTATIONAL SCIENTISTS, THROUGH A SLEEPAWAY CAMP FOR GEEKS WHERE THEY FUNDED A BURNING -- BUNCH OF US WHO HAD EARLY CAREER AWARDS FROM NCI AND NSF FOLKS WITH EARLY CAREER AWARDS WITH NO CANCER OR NIH EXPERIENCE. IN A LED TO THE FIRST TWO R01s I WAS FORTUNATE ENOUGH TO PARTICIPATE ON WITH THIS TEAM WHO I STILL WORK WITH TODAY. HOPEFULLY AN INDICATIVE MECHANISM OF WHY AT A PERSONAL LEVEL I'M SO GRATEFUL AND EXCITED ABOUT THESE KINDS OF EDUCATIONAL INITIATIVES. WE ALL KNOW THIS TRANSITION A.I. IS SUPER HOT, BUT A.I. EXISTS IN THE SAME HIERARCHY YOU SAW BEFORE DATA SCIENCE NEEDS. A GREAT PAPER CAME OUT IN JAMA FROM U.T. SCHOOL OF BIOMEDICAL INFORMATICS GROUP. VARYING DEGREES OF SCALE AND GRANULARITY, DRIVES INTO THE BIOMEDICAL RESEARCH. LIKEWISE, A MOVE TOWARDS NEW MODELS, YOU'VE HEARD BEFORE THIS MOVE AS CHUCK ELOQUENTLY PUT IT IN THE QUESTION SERIES, MOVING TOWARDS REAL WORLD DATA THAT'S ALGORITHMIC, POTENTIALLY NON-MECHANISTIC VERSUS CLASSIC CLINICAL TRIAL APPROACH, THESE TWO PARADIGMS ARE COMPLEMENTARY BUT DISTINCT IN WHAT THEY ARE AIMING FOR. THE PROBLEM IS, AS BIOMEDICAL SCIENTISTS AND PHYSICIANS, WE ALMOST NEVER RECEIVE ANY SIGNIFICANT TRAINING IN THESE NEWER ALGORITHMIC APPROACHES. WHAT LEO BRYMAN CALLED TWO CULTURES OF STATISTICS, WE'VE FIRMLY BEEN IN ANY NULL HYPOTHESIS SIGNIFICANCE TESTING CAM AND MOVED AWAY FROM THIS PROBABILISTIC MODELS, THAT'S SOMETHING WE HAVE TO RECTIFY. ESPECIALLY ALREADY WE'RE SEEING THAT A.I. IS GOING TO BE POTENTIALLY TRANSFORMATIVE AND DISRUPTIVE IN RADIATION ONCOLOGY. WITH FOLKS LIKE REID THOMPSON FROM OHSU THERE'S A SERIES OF REALLY GREAT KIND OF VISION-SETTING ARTICLES WHERE WE REALIZE THIS IS A SIGNIFICANT POTENTIAL PITFALL AND SIGNIFICANT OPPORTUNITY SPACE. NOVEL TECHNOLOGIES WILL BE PART OF THE FUTURE. DO WE HAVE CLINICIAN-SCIENTISTS TO DRIVE THE PROGRAMS? IT'S AN INTERDISCIPLINARY NEED, NOT JUST RADIATION ONCOLOGISTS DOING MORE OF WHAT WE'VE ALWAYS DONE, SLIGHTLY BETTER. YOU CAN'T JUST TACK ON A LITTLE BUILT OF A.I. COURSE WORK IN A ONE-HOUR PERIOD AND KNOW WHETHER SOMEONE'S MODEL IS GOOD ENOUGH WHEN YOU GET TO REVIEW FOR THE RED JOURNAL. THIS CREATES BIG PROBLEMS BECAUSE WE DON'T HAVE SUFFICIENT REFERREE TO CURATE THE EXISTING MODELS COMING IN MUCH LESS HONE IN MUCH LESS TO BE ROBUST IN REAL WORD DATA. CALL TO ARMS IS WHERE WE ARE AT THE MOMENT. I'LL START AGAIN JUST BASIC LEVEL, IT'S IMPORTANT FOR US TO THINK OF THESE THINGS DEFINITIONALLY. INFORMATICS IS THE DISCIPLINE ABOUT WHICH I'M GOING TO ALLUDE, THE SCIENCE OF PROCESSING DATA, ARTIFICIAL INTELLIGENCE IS THE DEVELOPMENT OF COMPUTER SYSTEMS THAT PERFORM HUMAN-LIKE TASKS OFTEN IN AUTOMATED WAYS. BIG DATA IS INPUT FUNCTION, THE MATERIAL, THE RAW MATERIAL, IF YOU WILL, AND MACHINE LEARNING ARE METHODOLOGIC APPROACHES. WHEN YOU HEAR ME SAY A.I., I'M TALKING ABOUT AT LEAST IN MY SPECIFIC TERMINOLOGY, RESEARCH DOMAIN, MACHINE LEARNING IS ACTUAL IMPLEMENTATION SKILL SET PRACTICE, RIGHT? THE SENTENCE AT THE BOTTOM. MACHINE LEARNING APPROACHES TO USE FOR BIG DATA, CONCEIVABLY INFORMATIVE SENTENCE, IF YOU USE THESE STRUCTURES. AS DISCIPLINE INFORMATICS IS COALESCING. FOR THOSE JUST GETTING INTO THE SPACE BECAUSE YOU'RE A BIG DATA SCIENTIST, A MEDICAL PHYSICIST, THERE'S A WHOLE WORD OF INFORMATICS FOLKS, YOU'VE SEEN NCI REPRESENTATIVES, AT THE APEX OF WHERE THEIR FIELD IS BECOMING HIGHLY PROFESSIONALIZE AND AGGREGATED. 1995, A SYSTEM WITH RESOURCES FOR OPTIMIZING STORAGE OF BIOMEDICAL INFORMATION, AND THEN ELMER BURNSTEN IN HOUSTON WITH US, MOVING FROM SYMBOLIC SIS SYSTEMS WITH LACK OF UNIFORMITY, YOU'VE HEARD ABOUT THE NEED FOR THAT LEVEL OF STANDARDIZATION, TOWARDS INFORMATION, I.E. DATA PLUS AN ASSIGNED MEANING, ONTOLOGY IN MOST CASES, TO ACTUAL KNOWLEDGE WHICH IS WHERE WE'RE APPROACHING IN OUR RAD ONC SPACE. INTERESTINGLY, THE WAY I FOUND THIS WAS BECAUSE IRA KAYLET AT UNIVERSITY OF WASHINGTON, FIRST VOLUMETRIC SYSTEM PRISM, WROTE THE BOOK WE USED FOR BIOMEDICAL INFORMATICS. ALL THE EXAMPLES WERE RADIATION ONCOLOGY EXAMPLES, TONS OF INFORMATIONING ABOUT DICOMS. AT THE BEGINNING OF THE GENESIS, WE MAY HAVE NOT BEEN AS AWARE OF IT. WHAT'S REALLY HELPED IS NOW INFORMATICS IS ALSO A BOARD CERTIFIED CLINICAL SUBSPECIALTY. EVERYONE WHO IS A PHYSICIAN HERE COULD IN THEORY BECOME A CLINICAL INFORM ATIST. THERE'S A GRANDFATHER PERIOD THAT LASTS FOR ANOTHER 2 1/2 OR 3 YEARS, SO IF YOU WANT DO THIS I HIGHLY ENCOURAGE YOU TO DO IT NOW BECAUSE AFTER THAT GRANDFATHER PERIOD YOU WILL HAVE TO DO A FORMAL SUBSPECIALTY CERTAINLYIFICATION THROUGH A TRAINING PROGRAM. NOW IS THE WINDOW OF OPPORTUNITY TO GET IN ON GROUND FLOOR OF THIS. SUPER COOL. NOW, IS EVERYONE GOING TO BECOME A CLINICAL INFORMATICIAN? NO, BUT IT WOULD BE NICE IF A FEW OF US DID. THE FIELD OF INFORMATICS, IT'S IMPORTANT TO REALIZE IT'S A DIVERSE FIELD WITH OVERLAPPING STUFF. A GOOD FRIEND AT OHSO WORKS IN THE BIOINFORMATICS SPACE. I'M AN IMAGING INFORMATICIST. THERE'S FIELDS LIKE BIOMOLECULAR IMAGING WHICH CROSS, PHARMACO GENETICS, AND CONSUMER HEALTH OR HEALTH SERVICES RESEARCH WHICH IS LIKE DR. LAM SHOWED, VERY IMPORTANT IN TERMS OF THESE BOUNDARIES ACROSS SCALE FOR HEALTH SERVICES INFORMATION. THIS HEALTH INFORMATICS KIND OF APPROACH IS IMPORTANT BECAUSE THESE ARE UNIQUE STRAINS OF DISCIPLINE.& IT'S NOT LIKE THERE'S A SINGLE HOUR OF INFORMATICS YOU CAN GET AS A TRAINEE, WHICH WILL CAPTURE EVERYTHING FROM GENOMIC SORTING ALL THE WAY THROUGH HEALTH SERVICES DATA CURATION WITH SEER. THEY ARE FAIRLY SPECIFIC SUBGENRES. WHERE IS THIS IMPORTANT? AS WE'VE SEEN FROM THIS, AGAIN, I THINK CALL TO ARMS PROPOSAL, ABOUT OUR NEED FOR TECHNOLOGY DEVELOPMENT, WE'VE GOT TO INTEGRATE RADIATION ONCOLOGY DATA BASES ACROSS SYSTEMS, TOOLS HAVE TO BE AVAILABLE FOR PATIENT DECISION MAKING AND TREATMENT OFFICES. DECISION MAKING IS KEY END POINT OF MOST INFORMATICS PROCESS. NUMBER THREE, EXPERTISE IN DEVELOPMENT OF PROFESSIONALS AND MODELING BECOMES CRITICAL. I'M FOCUS ON THREE. ADAM LED INTO THIS. RADIATION ONCOLOGY PROFESSIONALS NEEDS TO BE DEVELOPED FULLY. NOW ESSENTIALLY WE HAVE AD HOC TRAINING BY PEOPLE WHO LIKE COMPUTERS AND INTERESTED MEDICAL PHYSICISTS. THAT'S ABOUT IT. THAT'S OUR STATE OF THE FIELD AT THE MOMENT. AND EVEN THIS PROPOSAL KIND OF SAYS, HEY, THAT'S PROBABLY WHAT YOU'RE GOING TO START WITH. A FEW GUYS WHO LIKE COMPUTERS SNEAK INTO YOUR RESIDENCY PROGRAMS. YOU'VE SEEN THIS DATA FROM ADAM ABOUT UNMET NEED BEING SELF IDENTIFIED, IF YOU LOOK HERE AT THE BLUE BAR IN INFORMATICS ABSOLUTELY INSUFFICIENT, MODERATELY OR INSUFFICIENT IN A VAST MAJORITY OF CASES, WE'RE REALLY NOT TEACHING THIS, RIGHT? NO ONE THINKS WE'RE DOING A GOOD JOB TEACHING INFORMATICS TO PHYSICIAN TRAINEES. EVERYONE THINKS THIS IS IMPORTANT FOR ADVANCING THEIR CAREER. IT'S SOMETHING WE SAY THIS IS IMPORTANT BUT WE'RE NOT DOING IT. KUDOS TO ADAM AND KENT FOR REALLY WAVING THE FLAG. THERE'S A GREAT INTEREST IN FORMAL COURSE WORK. EVERYBODY IS INTERESTED IN FORMAL COURSE WORK, TRYING TO SQUEEZE IT INTO THE RAD ONC CURRICULUM. WHEN WE DID OUR CURRICULUM WE HAVE ONE HOUR OF INFORMATICS, ONE WHOLE HOUR, OF OUR CURRICULUM THAT I TEACH. WE DID AN INFORMAL POLL TO TEST THE INFORMATICS BACKGROUND. ZERO OF 21 COULD IDENTIFY THE FAIR GUIDING PRINCIPLES WHICH ARE LIKE NON-NEGOTIABLE NOW AS NEW STANDARD IN SCIENCE. OKAY? WE'LL TALK ABOUT THAT IN A MOMENT. ONLY 4 OF 21 COULD MAP CONCEPTS OF STANDARD ONTOLOGY OR SYNTAX APPROPRIATELY. 4 OF 21 HEARD OF SNOMED, COULDN'T IDENTIFY BUT HEARD THE TERM. 3 OF 21 COULD IDENTIFY HL7, EVERYBODY KNEW WHAT DICOM WAS NOT A STANDARD. SO WHAT YOU'VE GOT IS END USERS, NOT PRACTITIONERS WHO UNDERSTAND MECHANICS OF WHAT'S GOING ON. SO FAIR GUIDELINES, IF WE PURSUE AS NCI AND NIH IS SUGGESTNG, ALL OF OUR DATA IS SOON GOING TO HAVE TO BE FINDABLE, ACCESSIBLE, INTEROPERABLE AND REUSABLE, IF WE HAVE TO HAVE PRACTITIONERS WHO UNDERSTAND WHAT THAT MEANS OR CAN RECOGNIZE IT, SO WHEN YOU'RE PUTTING TOGETHER YOUR MANUSCRIPT OR YOUR PROTOCOL HOW WILL YOU MAKE YOUR DATA MACHINE FINDABLE? THIS IS NOT A TASK YOU NEED TO SAY AT THE END OF THE STUDY, I GUESS I'LL DITCH TO THE PHYSICIST OR WE END UP WITH THE PROJECT WE TALKED ABOUT WITH PROJECT DATASPHERE, GREAT DATA, NO IMAGES OR WE DIDN'T BUILD IN THE INFORMATIC STRUCTURES. THE OTHER THING, WE WERE INSPIRED BY THE FACT NCI AND KEVIN CAME UP WITH THE DATA SCIENCE FELLOWSHIP IN 2017 WHERE THERE WAS AN EFFORT FORMALLY AS YOU'VE SEEN FROM ANSHU TO CREATE A PROGRAM FOCALLY INTERESTED IN INFORMATICS AND TRAINING THESE FOLKS. WE TRIED TO RECRUIT PEOPLE FROM M.D. ANDERSON AND DIDN'T HAVE ANY LUCK. WE DECIDED TO TRY AND HOME GROW OUR OWN PROGRAM. WHICH WAS FORTUNATELY FUNDED BY NIBIB. WE GOT THE LAST OF THE R25s FOR RESIDENT TRAINING PROGRAM, IN IMAGING INFORMATICS. WE HAD OUR FIRST YEAR LAST YEAR. WE'VE HAD TWO FELLOWS, IT'S BEEN A POSITIVE EXPERIENCE. BUT THE GOAL HERE IS TO TAKE THESE FELLOWS, GIVE THEM FORMAL INFORMATICS, FORMAL IMAGING EXPOSURE, AND WAYS TO POTENTIATE, THEY WILL NOT GRADUATE AND BE AN INFORMATICIAN, IMAGING SCIENTIST. BUT THEY WILL HAVE SUFFICIENT DEPTH TO UNDERSTAND AND MAKE INTELLIGIBLE THE CLINICAL PROBLEMS IN MEANINGFUL WAY. STRUCTURED IN THE CONTEXT OF OUR RESIDENCY TRAINING PROGRAM, THERE'S SEVERAL TRACKS ONE CAN TAKE SO EVEN EARLY STAGE RESIDENTS VERSUS FELLOWS HAVE CAPACITY TO PART -- PARTICIPATE. AFTER YOU GRADUATE YOU'RE AT A HUGE DISADVANTAGE FOR CAREER DEVELOPMENT AWARDS. WE WANT TO MOVE THE PIPELINE BUT STILL GET DIAMONDS IN THE ROUGH THE CHANCE TO HAVE THE OPPORTUNITY SPACE. OUR RESIDENTS HAVE DONE GREAT. THERE'S A MANDATORY K AWARD COMPONENT APPLICATION, SO IF YOU GO IN THE PROGRAM YOU HAVE TO PUT IN A K. WE'RE EXCITED ABOUT HOPEFULLY ALSO HELPING THAT PIPELINE MOVE FORWARD DOWN THE ROAD. IT'S IMPORTANT THOUGH THAT NONETHELESS THIS DOESN'T DISAMBIGUATE OUR NEED TO BUILD THIS INTO FORMAL TRAINING PROGRAMS LIKE NORMAL. REID THOMPSON, RESPONSE TO KENT AND ADAM'S PROVOCATIVE PAPER, WE TALKED COULD WE STEAL A GOOD MODEL FROM THE PATHOLOGY COMMUNITY, A LOT OF INFORMATICS, ABOUT HOW TO BUILD INFORMATICS EDUCATION WITHIN FORMAL RESIDENCY PROGRAMS. THEY USE A MODEL CALLED FHIR, FOR PATHOLOGY RESIDENTS TO ALLOW FOLKS GET DEPTH TRAINING IN THE RESIDENCY WITHOUT TACKING ON A LOT OF DIDACTIC CLASSES FOR EVERYBODY ELSE. WE CREATED A MODIFIED VERSION OF THIS AS CONCEPTUAL UNDERPINNING NOW AT OHSU, THEY HAVE A CLINICAL INFORMATICS FELLOWSHIP IN THE RADIATION ONCOLOGY DEPARTMENT, AND SO WE'VE BEEN ABLE TO USE THEIR RESOURCES EFFECTIVELY TO HELP US BUILD THIS OUT. AGAIN ENCOURAGING, IF YOU KNOW FOLKS INTERESTED IN FELLOWSHIPS, FORMALLY IN CLINTONAL INFORMATICS, A GREAT AVENUE FOR PEOPLE TO STAY IN THE RADIATION ONCOLOGY WORKSPACE BECAUSE OF THAT. AS WE MOVE FORWARD IN OUR LAST FEW SLIDES, GETTING INTO THIS IDEA OF WHAT ARE THE UNMET EDUCATIONAL NEEDS, THERE'S THIS LIMITED TIME TO LEARN A FAIRLY COMPLEX SKILL SET, SCALING ACROSS INSTITUTIONS AND TIMES. YOU CAN'T KNOCK OFF A YEAR AS A BIG BLOCK. MOST PLACES THERE MAY BE SOME CAPACITY TO INTEGRATE IN THE FORMAL STATISTICAL TRAINING PROGRAMS, BUT TO BE FAIR THOSE ARE ALREADY PRETTY ANEMIC. WE NEED GREATER SUPPORT FOR THIS PHYSICIAN-SCIENTIST PHENOTYPE TO GET TRAINING OFFLINE THROUGH THESE AVENUES. WE NEED TO ENCOURAGE THOSE IN THE SPACE WHO MAKE IT THROUGH THE FILTER TO HELP OPEN THE DOOR FOR FOLKS AFTER THEM WHO HAVE FORMAL OPPORTUNITY AS WE CREATE THEM. WE WANT THIS PIPELINE OF COMPUTATIONAL SCIENTISTS, RADIATION ONCOLOGISTS, TRADITIONAL, AS ADAM SAID, TRADITIONAL PROGRAMS PUT PEOPLE IN THE LAB, TRANSLATIONAL WORK, YOU NEED PEOPLE IN THE DATA SCIENCE HE WAS TALKING ABOUT AS AN AVENUE OF GETTING THESE PEOPLE INTO PROGRAMS SPECIFIC TO THESE APPLICATIONS. AND FURTHERMORE, WE NEED THEM EARLIER IN THE GAME. AGAIN, I APPLAUD THE JEFF TEAM FOR MOVING THIS DIRECTION. IN MY OWN PERSONAL EXPERIENCE, THERE WAS A TIME RECOGNITION RADIOLOGY WAS FAILING, CREATED A BLUE RIBBON WORKSHOP FOR RSNA AND TOOK A BUNCH OF CHAIRMAN AND SAID WE NEED TO REVITALIZE RADIOLOGY ENTERPRISE AND SAID LET'S FIND A WAY TO BUILD IN THESE NEW TRAINING -- THESE NEW ASPECTS OF IMAGING IN AN IMPORTANT WAY. THE RESPONSE AT THE INSTITUTION I TRAINED AT IN TEXAS WAS TO CREATE A FORMAL RESIDENCY Ph.D. PROGRAM. A NOVEL PROGRAM THAT TOOK FOLKS WHO WERE KNOWN, WE KNEW THEY WERE GOING TO BE RADIOLOGISTS, WE KNEW WHERE CAREER PATH WAS GOING TO BE AND THEY COULD GET IN-DEPTH TRAINING IN HUMAN IMAGING THAT DID NOT FORCE US TO TAKE SOMEONE WHO SPENT FIVE YEARS IN A LAB AND SHOE HORN THEM INTO A NEW FIELD. HANDS ON CLINICALLY APPLICABLE VERY HIGH INPUT STUFF. THIS WAS FUNDED BY A T32 PROGRAM AND COLLAPSED A FOUR-YEAR RESIDENCY THROUGH THE HOMELAND PATHWAY INTO A COMBINED 6-YEAR POST GRADUATE COURSE, EXTRA YEAR IF YOU NEEDED IT TO FINISH YOUR DISSERTATION. I WOULD LOVE TO SEE SOME IDEA OF SOMETHING LIKE THIS. SOME MULTI-SITE PROGRAM PERHAPS WHERE WE HAVE THE OPPORTUNITY FOR COMPUTATIONAL SCIENTISTS TO RECEIVE FORMAL ADVANCED TRAINING AS PART OF THEIR RESIDENCY EXPERIENCE IN A CURATED WAY. IT WOULD BE VERY DIFFICULT AS SMALL AS OUR PROGRAMS ARE FOR ALL INDIVIDUAL PROGRAMS TO MAKE THAT AVAILABLE, IT NEEDS TO BE SOMETHING CENTRAL DONE BY EITHER NRG OR APM OR AN ORGANIZATIONAL STRUCTURE WITH THAT BANDWIDTH. OUR NEXT STEPS REALLY IN TERMS OF THIS IDEA OF THIS CORE CURRICULUM, JOHN KING, OVER HERE, AARON GILLESPIE, PARTNERS IN CRIME AT THE RADIATION ONCOLOGY EDUCATION GROUP LED BY DAN GOLDEN, CREATING MODIFIED CURRICULUM, ONLINE, COURSES WOULD BE DEVELOPED IN A SCALABLE WAY. WE WANT TO ENCOURAGE APM TASK GROUP PARTICIPATION BY JUNIOR FACULTY AND RESIDETS AS RAPIDLY AS POSSIBLE, WHERE THE WORK IS BEING DONE RATHER THAN ON THE PHYSICIAN SIDE BY ASTRO. AND IDENTIFY STAKEHOLDERS WHO CAN HELP IDENTIFY GUIDELIES. SO, I THINK THE FUTURE IS BRIGHT FOR US IN A.I. IN THESE AREAS BUT IT'S IMPORTANT WE HAVE A STRONG BASE AND BUILD EDUCATIONAL PROGRAM, THE HUMAN PIPELINE NECESSARY FOR US TO ADVANCE US TO THE NEXT LEVEL. THANKS FOR YOUR TIME. WE'LL MOVE TO THE NEXT SPEAKER AND TAKE QUESTIONS AT THE END. [APPLAUSE] I'M EXCITED TO INTRODUCE CHRIS AHERN FROM ONCORA. WE HAVE INTIMATE KNOWLEDGE OF THE COMPANY AT M.D. ANDERSON, DEVELOPING TOOL SOURCES IN MANY WAYS ANALOGOUS FROM THE WORK FROM ONCOSPACE FROM TODD. CHRIS AND GROUP HAVE EXCITING TOOL SETS. HE WILL SPEAK ABOUT THE ONCORA PLATFORM. THANKS TO THE ORGANIZERS. IT'S BEEN GREAT SO FAR. THANK YOU, DAVE. SO AT THE BOTTOM, IT SAYS CONFIDENTIAL. IGNORE THAT. YOU CAN TALK ABOUT IT AS MUCH AS YOU WANT. IN FACT, PLEASE DO. QUICK BACKGROUND ABOUT ONCORA, WE BUILD TOOLS TO HELP COLLECT, EXTRACT AND USE REAL WORLD CLINICAL DATA, CLINICAL AND IMAGING DATA TO IMPROVE OUTCOMES FOR CANCER PATIENTS. WE HAVE A CLOSE PARTNERSHIP WITH M.D. ANDERSON, USING AND EXPANDING ON A TOOL THEY BUILT CALLED BROCADE. WE'RE ADAPTING AND EXPANDING ON THAT. AS WELL AS ANALYTICS PLATFORM FOR RADIATION ONCOLOGISTS. SO, THE GOAL HERE IS TO GET THE DATA FROM THE DIFFERENT SYSTEMS, PUT IT TOGETHER IN A REALTIME DATA STORE USED FOR RESEARCH. BUT ALSO FED BACK INTO THE CLINICAL WORKFLOW. I'LL COME BACK AND TALK ABOUT THIS THROUGHOUT THE TALK. A LAST POINT OF EMPHASIS, WE HAVE A STRONG INVESTMENT IN A.I. RESEARCH. AND, AGAIN, THE GOAL THERE IS TO USE EVERYTHING THAT WE'RE DOING TO COME BACK TO THE CLINICAL WORKFLOW. SO, THIS IS AN ESCALATOR, NOT A PYRAMID. AS WE'VE SEEN EARLIER. BUT I THINK THE IDEA IS SIMILA IN SPIRIT. SO, THE -- ONE WAY OF THINKING ABOUT THIS IN ANALYTICS DATA SCIENCE AND MACHINE LEARNING CAN OFTEN BE INTERCHANGED. DAVE'S DEFINITIONS ARE A GOOD WAY TO THINK ABOUT THIS. I'LL PLAY FAST AND LOOSE WITH THESE. THE GOAL GETTING THE DATA IN ONE PLACE SO YOU CAN KNOW WHAT HAPPENED, FIGURE OUT WHY IT HAPPENED, PREDICT WHAT WILL HAPPEN, AND EVENTUALLY SHAPE WHAT WILL HAPPEN. SO THIS MISSES OUT ON SORT OF THE GRUNT WORK OF GETTING ALL THE DATA TOGETHER IN THE FIRST PLACE. SO I'M GOING TO TALK A LITTLE BIT ABOUT THE WORK WE DO TO GET THAT IN PLACE BUT COME BACK TO A PARTICULAR POINT IN THIS SORT OF JOURNEY, PARTICULARLY LOOKING AT PREDICTIVE ANALYTICS, PROSPECTIVE VALIDATION OF MACHINE LEARNING MODELS, HOW THOSE GET USED AND COME INTO THE CLINICAL WORKFLOW. EVERYBODY KNOWS THE PROBLEM, DATA IS NOT STRUCTURED. OR IS OFTEN NOT STRUCTURED. A LARGE AMOUNT OF IT IS, THANK GOD, IN EMRs, BUT STILL IN FREE TEXT. GETTING IT BACK OUT IS ACTUALLY A VERY HARD PROBLEM. INFORMATION IS LOST ON THE WAY IN AND HAS TO BE EXTRACTED USING NLP, WHICH IS DIFFICULT, DATA IS IN DIFFERENT SYSTEMS SO, AGAIN, GETTING THOSE ALL IN ONE PLACE IS A CHALLENGE. AND THESE ARE COMBINATION, DATA IS NOT NECESSARILY USABLE FOR ALL THE PURPOSES, OR IT'S USABLE BUT REQUIRES A FAIR AMOUNT OF THE TIME, SWEAT, AND APPARENTLY HIGH SCHOOL AND COLLEGE STUDENT BLOOD, SWEAT AND TEARS. WE PREFER TO APPLY SOFTWARE ENGINEER TEARS, FAIRLY EFFECTIVE. SO THE WAY WE'VE APPROACHED THIS IS USING THAT ELECTRONIC DATA CAPTURE TOOL TO COLLECT DATA IN A STRUCTURED WAY GOING IN, THAT'S ONE PART OF IT. A SECOND ASPECT OF SOLVING THIS PROBLEM IS BUILDING OUT TOOLS TO AUTOMATICALLY EXTRACT INFORMATION IN A RELIABLE WAY. THIS IS ALSO A VERY LARGE CHALLENGE. BUT I THINK EARLIER CAROLYN WAS MENTIONING THAT THERE'S A TRACK IN TERMS OF LOTS OF EFFORT GOING INTO THE PROCESS UNTIL IT'S AUTOMATED, AND THEN IT BECOMES EASIER. WE'RE WORKING ON PUSHING TOWARDS THE PART OF THAT CURVE WHERE THINGS ARE AUTOMATED, BUT REPRODUCIBLE AND SCALABLE. THE LAST IS LEARNING. BRINGING ALL THAT INFORMATION BACK INTO THE CLINICAL WORKFLOW. SO, DOCTORS AND OTHER CLINICIANS DON'T HAVE TO BE DOING THINGS OUTSIDE WHAT THEY WOULD NORMALLY DO, REDUCING THE BURDEN OF GETTING THE DATA AND USING THE DATA. SHAMELESS PRODUCT PROMOTION, A BRIEF OVERVIEW OF WHAT THESE COME DOWN TO. SO TWO THINGS. ONE WE CALL ACQUIRED PATIENT CARE. IT'S ELECTRONIC DATA CAPTURE SYSTEM, DISEASE SPECIFIC FORUMS TO CAPTURE IMPORTANT DATA ELEMENTS. CHUCK HAS USED THE TERM DATA FARMING, WHICH IS APT. SO HERE WE'RE SORT OF LAYING DOWN THE ROWS OF DATA THAT WE WANT TO BE CAPTURING. AUTOMATICALLY GENERATES THE NOTES THAT END UP GOING INTO THE EMR FOR BILLING PURPOSES. BUT RETAINS THE STRUCTURED DATA. AND THAT SAVES QUITE A BIT OF TIME SO STUDY IN JCR AND ALSO DATA QUALITY IS GREATLY IMPROVED. LAST THING I'LL TALK ABOUT AND COME BACK TO IS USING THAT DATA LAKE ALONG A-- ALONG WITH DATA TO MAKE PERSONALIZED OUTCOME PREDICTIONS. AGAIN, THE END RESULT OF THE PROCESS IS BOTH CLINICAL WORKFLOW AND RESOURCE WORKFLOW. IF YOU COLLECT THE DATA YOU WANTS, IT'S THE CLINICIAN RECORDING IT, IT'S HIGH QUALITY, CAN BE USED FOR RESEARCH, WE'VE CREATED A WAY TO INTERACT WITH THAT DATA, INTUITIVE, ANALYTICS DASHBOARD. YOU CAN QUICKLY IDENTIFY PATIENT COHORTS, UNDERSTAND INTERACTION BETWEEN DIFFERENT TREATMENTS. AGAIN THIS IS SORT OF -- I'LL COME TO THIS IN A SECOND, AT THE BASE DRIVING MORE INSIGHT FROM THE DATA YOU HAVE. FINALLY YOU CAN TRACK OUTCOMES BETWEEN SUBGROUPS IN REAL TIME. SO GETTING INSIGHT INTO WHAT'S ACTUALLY HAPPENING IN THE DATA. SO, SLOWLY WE'RE CLIMBING UP, GOING FROM DEDESCRIPTIVE ANALYTICS. WE WANT TO UNDERSTAND WHY THINGS ARE HAPPENING, OBVIOUSLY. SO WE'VE DONE PLENTY OF RESEARCH. ONE THING I WANT TO HIGHLIGHT WHICH IS AN ABSTRACT PRESENTED AT ASTRO 2018 IS TRYING TO PREDICT BREAST OUTCOMES, BASED ON EXTRACTED CLINICAL INFORMATION AT THE LEVEL OF DETAIL WHERE WE HAVE THE PRESCRIPTION BUT HAVEN'T GONE ON TO TREATMENT PLANNING. SO BASIC SETUP, TRAIN SIMILAR MODELS ON A TRAINING SET. WE'RE TRYING TO PREDICT BREAST OR CHEST WALL PAIN, FATIGUE, RADIATION DERMATITIS, SO FOR THE OUTCOMES, THOSE ARE GRADE 2 OR GREATER. TRAIN MODEL TYPES, TAKE THE ONE THAT PERFORMS BEST FOR EACH OUTCOME AND THEN TRAIN IT PROSPECTIVELY ON THE NEXT SET OF PATIENTS. SO EVALUATING THIS WE WANT TO MAKE SURE THAT THESE MODELS MEET SOME KIND OF PRE-SPECIFIED PERFORMANCE LEVEL ON THE VALIDATION SET. SO WE'LL TALK ABOUT THAT LATER IF YOU WANT, BUT THE BASIC IDEA IS WHAT KIND OF MEASURE WOULD CLINICIANS BE COMFORTABLE USING, OR WHAT KIND OF PERFORMANCE WOULD THEY BE COMFORTABLE USING TO MODEL. IN MORE DETAIL, AGAIN, SINCE THE DATA IS AUTOMATICALLY EXTRACTED, THERE'S SORT OF A VERY LARGE RANGE OF FEATURES, OVER 700, THAT INCLUDES DEMOGRAPHICS, TUMOR CHARACTERISTICS, PRIOR TREATMENTS, ALSO DETAILED RADIOTHERAPY INFORMATION. TRAINING SET CONSISTS OF AROUND 1900. BREAST CANCER PATIENTS. USE CROSS-FOLD VALIDATION. AND TO TUNE THESE DIFFERENT TYPES OF MACHINE LEARNING MODELS, I CAN TALK MORE ABOUT DETAILS LATER. VALIDATION SET IS THE SUBSEQUENT 360 PATIENTS WHO COMPLETED RADIATION TREATMENT. SO WE TRAIN THE MODELS AND THEN THE MODELS THAT PERFORMED THE BEST OUT OF ALL THOSE ARE TESTED ON THE VALIDATION SET, AND .7 OR HIGHER IS WHAT WE'RE CONSIDERING AS APPROPRIATE FOR DEPLOYING TO THE -- DEPLOYING WITHIN A CLINICAL WORKFLOW. HERE ARE THE RESULTS FOR THE TRAINING SET MODELS. SO ON THE LEFT YOU HAVE ALL OF THE DIFFERENT MODEL TYPES. ON TOP YOU HAVE THE DIFFERENT OUTCOMES. ALONG WITH THE PERCENTAGE, OR FREQUENCY IN THE TRAINING SET. EACH COLUMN YOU HAVE THE AUC, AND IN YELLOW YOU HAVE THE MODEL THAT PERFORMED THE BEST. SO WE TAKE THESE MODELS AND TEST THEM ON THE VALIDATION SET, AND WE FIND THAT IN SOME CASES THE MODELS MEET CRITERIA FOR BEING ABLE TO BE DEPLOYED TO A CLINIC. SO, END RESULT WE HAVE A GOOD PROBABILISTIC PREDICTOR OF WHETHER THE OUTCOMES WILL OCCUR OR NOT. WHAT DOES THAT MEAN? IF WE COME BACK TO THE CONTEXT OF HOW ARE WE GOING TO USE THIS TO HELP PATIENTS, WE CAN THINK ABOUT A SPECIFIC USE CASE, SO 35-YEAR-OLD WOMAN, T2 NODE POSITIVE HER-2 POSITIVE CANCER, PRIOR TREATMENT HAS HAD ALSO MASTECTOMY AND SENTINEL LYMPH NODE BIOPSY, ALL THE FACTORS AND PRIOR TREATMENTS CAN BE FACTORED INTO PREDICTIONS WE MAKE AND MODEL WE'VE TRAINED. AND ALSO COMPARED TO DIFFERENT TREATMENTS. SO, FOR EXAMPLE, CONSIDERING BREAST AXILLA AND OTHER LYMPH NODE TARGETS, VERSUS BOOST, ALSO NOT INCLUDING BOOST, DOSE AND FRACTIONATION, COMPARE AS PART OF CLINICAL WORKFLOW. SO WHAT WE'VE BEEN WORKING ON, SLOWLY CLIMBING UP THIS WAY OF USING THE DATA TO IMPROVE PATIENT CARE, AND WE'RE STARTING TO GET FURTHER AND FURTHER UP. AND I THINK THERE ARE MANY -- THERE'S A LONG WAY TO GO. IN SOME CASES THE DIFFICULTY IS NOT NECESSARILY REFLECTIVE. ONCE YOU GET GOOD AT ONE THING, IT'S NOT NECESSARILY THAT MUCH HARDER TO PROCEED TO THE NEXT STEP. ONE OF THE MAIN CHALLENGES, AND THIS HAS BEEN DISCUSSED BEFORE, AND KIND OF THROUGHOUT THE DIFFERENT TALKS SO FAR, GETTING THE DATA IN THE RIGHT FORMAT NORMALIZED AND USABLE IS ACTUALLY ONE OF THE BIGGEST CHALLENGES. FUTURE DIRECTIONS, THERE'S CERTAINLY A LOT OF OPPORTUNITY FOR USING MORE FEATURES, NOT NECESSARILY CLINICAL FEATURES OR DOSIMETRIC, USING MODALITIES AS INPUT, ALSO LOOKING AT MORE OUTCOMES, SO INSTEAD OF EITHER TOXICITIES OR DIFFERENT MEASURES OF SURVIVAL, LOOKING AT QUANTITATIVE MEASURES OF LET'S SAY TUMOR BURDEN, IN FOLLOW-UP IMAGING STUDIES, SLIGHTLY MORE TECHNICAL. THERE ARE LOTS OF INTERESTING MOVES FROM SORT OF TRYING TO PREDICT FROM DATA PARTICULAR TIME POINTS, TO TIME SERIES, AND PARTICULARLY USEFUL KINDS OF MODELS FOR THAT. RECURRENT NEURAL NETWORKS AND NEW EXTENSIONS OF THOSE, NEURAL ORDINARY DIFFERENTIAL EQUATIONS, WHICH ARE FAIRLY RECENT DEVELOPMENT BUT VERY INTERESTING. I THINK ANOTHER MOVE FROM THE SORT OF PREDICTIVE TO PRESCRIPTIVE ANALYTICS, ONE OF THE CHALLENGES THERE IS YOU NEED A CLEAR DEFINITION OF WHAT YOU'RE TRYING TO OPTIMIZE. AS WELL AS SORT OF A DETAILED UNDERSTANDING OF WHAT THE EFFECT OF DIFFERENT POTENTIAL INTERVENTIONS WOULD BE. SO, WITH THAT I'LL END THERE. IF WE DON'T HAVE TIME FOR QUESTIONS FEEL FREE TO E-MAIL ME TO TALK MORE. [APPLAUSE] >> NEXT UP MOVING TO THE SUBLIME OF CLINICAL TRIALS, YING XIAO IS A REAL LEADER IN THE COOPERATIVE GROUP SETTING, WE'RE THRILLED TO HEAR HIM TALK ABOUT -- HEAR HER TALK ABOUT QUALITY AND DATA ASSURANCE. >> IN DISCUSSION WITH ANSHU, HE ASKED ME TO TALK ABOUT BIG DATA INITIATIVES, BOTH FROM THE CLINICAL TRIAL PERSPECTIVE AND FROM INSTITUTIONAL PERSPECTIVE. SO WHAT I'LL DO IS TO ACTUALLY GIVE USE CASES IN BOTH OF THE SETTINGS. MY DISCLOSURES. AND SO AS DAVE MENTIONED EARLIER, THERE'S A BIG PUSH IN APPLYING ARTIFICIAL INTELLIGENCE IN RADIATION ONCOLOGY, AND THOSE ARE THE IDENTIFIED MAJOR AREAS THAT WE CAN USE A.I., BASICALLY CLINICAL DECISION MAKING, IMAGE SEGMENTATION, DOSE OPTIMIZATION, QUALITY ASSURANCE. AND AS EVERYBODY WAS DISCUSSING, THAT THE BIG CHALLENGE IS DATA, BOTH QUALITY OF DATA AND ACCESS OF DATA. DESCRIBE SOME OF THE USE CASES FROM NCTN AND ALSO FOR U PENN PERSPECTIVE. SO, ACTUALLY ADAM MENTIONED EARLIER THAT WE HAVE ESTABLISHED AN IROC, IMAGING AND RADIATION ONCOLOGY COURSE, THAT'S RESPONSIBLE FOR THE QUALITY ASSURANCE FOR RADIATION THERAPY, AS WELL AS IMAGING. AND ANOTHER COMPONENT THAT'S ESTABLISHED WITHIN NCTN IS CENTER FOR INVASION IN RADIATION ONCOLOGY, TO SET UP STANDARD FOR RADIATION THERAPY. AND SO THE FIRST COMPONENT I WILL TALK ABOUT, THE QUALITY AND STANDARDIZATION ISSUES FROM THOSE TWO PERSPECTIVES. SO, ACTUALLY ADAM MENTIONED EARLIER AS WELL THAT QUALITY IS ESSENTIAL FOR MULTI-INSTITUTIONAL CLINICAL TRIALS. THERE ARE TWO MAJOR REASONS. ONE IS VARIABILITY MAY CREATE SO MUCH UNCERTAINTY THAT IT CAN OBSCURE THROUGHOUT COMES. ANOTHER ASPECT IS THAT TOO MUCH UNCERTAINTY WILL ACTUALLY REQUIRE US TO ENROLL MORE PATIENTS TO ANSWER THE SAME QUESTION. THERE BY REQUIRING MORE RESOURCES. I'LL GIVE A COUPLE EXAMPLES OF THAT. THE EXAMPLE OF QUALITY OF RADIATION IMPACTING OUTCOME IS A GREAT EXAMPLE, STUDYING TWO REGIMENS, TRYING TO OBSERVE 10 PERCENT IMPROVEMENT, AND THE OUTCOME IS THEY DO SEE 70% VERSUS 50% DIFFERENCE, NOT DUE TO REGIMEN BUT DUE TO QUALITY IN RADIATION. SO THE BETTER SURVIVAL CURVE IS DUE TO THE FACT OF ALL THE RADIATION PROTOCOL COMPLIANT, THE WORST ONES ARE NOT COMPLIANT. DR. DICKER IS ONE OF THE LEAD AUTHORS IN THIS, THEY STUDIED EIGHT -- LISTED EIGHT TRIALS, AND THEY FOUND THAT THE HAZARDOUS RATIO BASED ON THE QUALITY OF RADIATION IS SIGNIFICANT FACTOR AFFECTING OVERALL SURVIVAL AS WELL AS CONTROL OF DISEASE. THE SAME PHENOMENON HAPPENS FOR IMAGING TRIALS. THIS IS ONE OF THOSE TRIALS. AND THEY SCORED IMAGES FROM 1 TO 5, 1 BEING THE WORST IN TERMS OF ARTIFACTS, ADC VALUES, SCORE OF 5 IS BEST. SCORE 3 IS USABLE. AND THEY FOUND THAT 68% SCORE 3 OR ABOVE, LESS THAN A HALF PERCENT HAVE THE PERFECT SCORE OF 5, BUT THE IMPLICATION IF YOU SEE WHETHER WE CAN SEE THE SIGNAL, IF YOU INCLUDE QUALITY FOR 5 AND ABOVE, THE SECOND ROW, YOU SEE CLEAR SIGNALS FOR DIFFERENCE IN SURVIVAL. BUT IF YOU LOOK AT THE FIRST ROW, WHERE WE INCLUDE SCORE 3 AND ABOVE, YOU DON'T SEE AS CLEARLY A SIGNAL AS YOU WOULD IF YOU ONLY INCLUDE A SCORE OF 5 AND ABOVE. SO, JUST IMAGINE THAT ONLY HALF OF THE CASES ARE WITH SCORE 5 AND ABOVE. AND SO THAT IMPACTS UPON THE SAMPLE SIZE THAT'S NEEDED TO ANSWER THE SAME QUESTION. AS YOU CAN SEE FROM A, B, TO C, THAT'S THE DEGRADATION IN QUALITY AND NUMBER OF PATIENTS THAT'S REQUIRED TO OBSERVE THE SAME DIFFERENCE. AND YOU CAN SEE THAT YOU ACTUALLY WILL REQUIRE TWICE AS MANY PATIENTS, OR EVEN FOUR TIMES AS MANY PATIENTS TO ANSWER THE SAME QUESTION. AND THE SAME CAN BE OBSERVED FOR RADIATION THERAPY. IF YOU HAVE A DOSE UNCERTAINTY OF 10%, THEN IF YOU WANT TO OBSERVE 5% DIFFERENCE IN CLINICAL OUTCOME, YOU WILL REQUIRE ACTUALLY A MAGNITUDE DIFFERENCE IN TERMS OF PATIENT NUMBER. SO, AS WE ALL KNOW, THAT CONDUCTING MULTI-CENTER CLINICAL TRIALS ARE EXPENSIVE AND IF YOU REQUIRE THIS MANY PATIENTS YOU REQUIRE A LOT MORE RESOURCES. SO, WE HAVE ESTABLISHED THIS IDENTITY IROC IN NCTN ORGANIZATION TO ENSURE THAT WE ACQUIRE THE HIGHEST QUALITY DATA FOR CLINICAL TRIALS. AND ALSO WITHIN ONCOLOGY WE HAVE ESTABLISHED THE CENTER FOR INNOVATION, WHERE WE SUPPLY RADIATION SECTION PROTOCOL TEMPLATES, SO WE HAVE CONSISTENCY TO DEFINE RADIATION GUIDANCE IN ALL THE PROTOCOLS. AND SO THOSE, WE GIVE THESE TEMPLATES TO THE P.I.s SO THEY CAN MODIFY ACCORDING TO THE PROTOCOL. AND WE DEFINE THE LIBRARY WHICH I'LL TOUCH ON LATER ON, THAT'S IN COLLABORATION WITH CHUCK. AND SO THIS STANDARD, I'LL TOUCH UPON A LITTLE BIT LATER. AND WE ALSO GIVE RADIATION THERAPY AND IMAGING QUESTIONNAIRES TO THE P.I.s SO THEY CAN INFORM IROC AHEAD OF TIME WHAT KIND OF RADIATION TECHNIQUE THEY ARE GOING TO USE OR WHAT SORT OF IMAGE MODALITY THEY ARE GOING TO USE. IROC WILL PREPARE THE RESOURCES FOR THE TRIAL, THAT'S COMING DOWN THE PIPELINE. SO, ENERGY HAS ADOPTED TG 263 BEFORE THE TASK GROUP IS PUBLISHED. AND WE DEFINE, ALL COMPLYING WITH TG 263. AND ALSO FOR DIFFERENT DISEASE SITES WE OFFER ATLASES FOR CONTOURING, AND WE OFFER THE DISEASE SITE TEMPLATE FOR RADIATION SECTION, AND WE ACTUALLY OFFER THE TOOLS TO EVALUATE THE PLAN QUALITY FOR THAT PARTICULAR PROTOCOL. AND SO THIS IS ONE OF THE SCRIPTS THAT WE OFFER ON THE WEBSITE THAT THE SITE WITH DOWNLOAD AND CHECK WHETHER THEY ARE COMPLIANT OR NOT OR THEY ARE MEETING THE CONSTRAINTS. IF YOU SEE RED, THAT MEANS THAT DEVIATION UNACCEPTABLE, AND YELLOW IS VARIATION ACCEPTABLE, GREEN IS PER PROTOCOL. AND SO DR. DIGGER SHOWED THIS IROC ENTERPRISE, THE INFRASTRUCTURE, AND BASICALLY WHAT WE DO IS THAT ALL THE DICOM COMMISSION THROUGH THE PROCESS WE EXTRACT DATA ELEMENTS, AND SOME OF THEM ARE AUTOMATICALLY SUBMITTED TO THE RAVE DATABASE. AND WE'RE TRYING TO MAKE IT CEDR COMPLIANT WITH THE FARRED. FDA. IF YOU SEE A RED DOT THE STRUCTURE NAME NEEDS TO BE CORRECTED. SO, I WANT TO SHOW THAT THIS IS WHAT WE IMPLEMENTED AT RADIATION ONCOLOGY DEPARTMENT AT UNIVERSITY OF PENNSYLVANIA. WHAT WE DID WAS THAT WE ACTUALLY CREATED A SCRIPT THAT GETS HOLD OF THE PRESCRIPTION PAGE WHICH HAS ALL THE CONSTRAINTS FOR DIFFERENT ORGANS TARGETING CRITICAL STRUCTURES AND CREATES A SCRIPT FOR THAT PARTICULAR PATIENT PRESCRIPTION. AND THEN WE RUN THE SCRIPT TO SEE WHETHER THAT IS MEETING ALL THE PRESCRIPTION CONSTRAINTS. SO THIS IS BUILT ON BASE OF SCRIPT THAT'S BEING OFFERED FROM THE VENDOR. SO THIS IS FULLY IN PLACE AT RADIATION ONCOLOGY OF U PENN, AND IT'S BEING USED IN THE (INDISCERNIBLE) SO BASICALLY PHYSICIANS CAN PAY MORE ATTENTION TO CONTOURING INSTEAD OF TO THE DOSE VOLUME HISTOGRAM COMPLIANCE. AND FOR THE DATA STANDARDIZATION, SO WE'RE TALKING ABOUT IMPLEMENTING RADIATION ONCOLOGY ONTOLOGY, AND SO THIS IS ACTUALLY ONE OF THE POSITION PAPERS AUTHORED BY CHUCK. GOING FORWARD, WE'RE TRYING TO UPDATE THE RADIATION ONCOLOGY ONTOLOGY WITH THE ONTOLOGY GUIDING PRINCIPLES TO HAVE ONE HERESIES, AND ALSO TO DISTINGUISH (INDISCERNIBLE) FROM OCCURRENCE AND ALSO TO BASICALLY RESPECT WHATEVER IS ALREADY BEING DEVELOPED, AND ALSO ONLY ESTABLISH ITEMS THAT HAS REAL WORLD EVIDENCE. SO THOSE ARE THE BASIC FUNDING PRINCIPLES AND THEN THERE'S THE OBO FOUNDRY SOCIETY THAT HAS ONTOLOGY ALREADY THERE SO WE'RE TRYING TO BE COMPLIANT AND COLLABORATE WITH THAT EFFORT. SO THIS IS THE PENN DATA MARK THAT'S BEING IN DEVELOPMENT, SO WHAT PENN STATE CENTER IS TRYING TO DO IS THAT EVERY NIGHT THEY ARE TRYING TO COPY THE CURRENT EMR WITH DE-IDENTIFICATION, BASICALLY TO GET RID OF THE IDs AND SHIFT THE DATE, AS WELL AS CONVERTING THE DATA ELEMENTS THAT'S THERE INTO OBO FOUNDRY ONTOLOGY, SO THIS IS THE CURRENT EFFORT THEY ARE DOING. THEY HAVE ALREADY CONVERTED SOME OF THE DEMOGRAPHICS, AND CONTOUR ITEMS, AND BIOBANK ENCOUNTER ITEMS INTO THE ONTOLOGY. AND THE PLAN IS TO MAKE THE PENN OMICS 2.0 TO FOLLOW OBO FOUNDRY ONTOLOGY FORMAT, AND BEING AVAILABLE PERHAPS BY THE END OF THIS YEAR OR EARLY NEXT YEAR. SO, AND THEN I WANTED TO SWITCH GEARS AND TALK ABOUT SOME OF THE A.I. AND MACHINE LEARNING IMPLEMENTATIONS IN CLINICAL TRIAL SETTINGS AS WELL AS IN THE CLINICAL SETTING AT PENN. SO ONE OF THE THINGS THAT WE HAVE PUT IN PLACE IS THAT WE HAVE ESTABLISHED THE KNOWLEDGE-BASED PLANNING TOOL FOR CLINICAL TRIALS, SO WHAT WE DID WAS THAT WE TOOK THE QUALITY PLAN AND MACHINE MODELS SO WHEN THE NEW CASE COME IN, MODEL WOULD PREDICT WHETHER OR NOT FALLING IN THE STATISTIC SPAN SO WE CAN DETERMINE WHETHER THAT'S A REALLY DIFFICULT CASE THAT CANNOT BE IMPROVED UPON OR CAN ACTUALLY BE IMPROVED UPON. WE HAVE FOUND THAT THE MACHINE IS ABLE TO PRODUCE BETTER PLANS IN GENERAL STATISTICALLY. AND ANOTHER EFFORT THAT WE ARE UNDERTAKING IS TO USE THE DEEP LEARNING FOR CONTOUR AS WE ESTABLISHED BEFORE THAT THE CONTOURING ARE REALLY CRUCIAL FOR A LOT OF DIFFERENT REASONS, RADIOMICS IS ONE OF THEM. WE HAVE ESTABLISHED THE CRITERIA, 95 CONFIDENCE LEVEL, FOR EXAMPLE FOR HEART BETTER THAN .89 WE WOULD PASS, FOR ESOPHAGUS BETTER THAN .44 WE PASS THAT BECAUSE OF VARIABILITY AMONG THE CONTOURS. RADIOMICS EVALUATION, WE ENCOUNTER SIMILAR ISSUES THAT YOU USE DIFFERENT RADIOMICS PACKAGE WE ARE HAVING DIFFICULTY TO REPRODUCE THE RESULT. THE LAST PROJECT I WANTED TO MENTION FOR THE CLINICAL TRIAL DATA IS THAT WE ARE USING THE CNN DEEP LEARNING MODEL FOR TOXICITY PREDICTION FOR 0522, AND WHAT WE DID WAS THAT WE BASICALLY JUST FIT ALL THE DATA INTO THE DEEP LEARNING NETWORK, CT, DOSE DISTRIBUTION, AS WELL AS CONTOUR AS A LABEL. AND WE FOUND THAT IF WE INCLUDE EVERYTHING, THE ACCURACY IS WE COULD ACHIEVE .76, BUT IF WE DO WITHOUT CONTOUR IT'S SOMEWHAT WORSE. SO, BASICALLY YOU THINK IT'S FEASIBLE TO DUMP EVERYTHING INTO THE MODEL AND GIVE REASONABLE PREDICTION, BUT THE DRAWBACK IS THAT THE CNN IS A BLACK BOX. IT'S HARD TO EXPLAIN WHAT IS GOING ON. I WANTED TO MENTION ONE OF THE U PENN MACHINE LEARNING APPLICATIONS THAT WE'RE TRYING TO STUDY DATA FROM THIS CAR T TRIAL FOR LYMPHOMA, AND THE HYPOTHESIS WE SAID THE TUMOR BURDEN AND TUMOR VOLUME IS CORRELATED WITH CAR T TOXICITY. SO WE'RE TRYING TO ESTABLISH THAT CORRELATION AND MAYBE TO USE RADIATION TO REDUCE TUMOR BURDEN AND TUMOR VOLUME BEFORE THE CAR T THERAPY TO REDUCE THE TOXICITY FOR IT, THOSE CAR T PATIENTS. NEXT STEPS, WE HAVE FIVE TRIALS, A.I. AND MACHINE LEARNING AS EXPLORATORY AIMS, GOING TO APPLY THE ARE A.I. FOR Q A AND USE MACHINE LEARNING TO DO PLAN QUALITY Q A, FOR PENN TRYING TO INCORPORATE PREDICTIVE MODELING INTO CLINICAL DECISION MAKING. THANK YOU. [APPLAUSE] >> NEXT WE HAVE TODD McNUTT. YOU'VE SEEN SOME OF TODD'S WORK ON ONCOSPACE. AGAIN, REAL LEADER IN THE FIELD, DOING THIS FOR QUITE A WHILE. SO VERY EXCITED TO HEAR ABOUT INFORMATICS AND PLANNING EVALUATION AND SOME NEWER STUFF THEY HAVE BEEN WORKING ON. >> WELL, I MESSED THE TIME UP AND THOUGHT I HAD LESS TIME THAN I DO SO I'LL TALK REALLY SLOW. [LAUGHTER] OKAY. WE'LL SEE WHAT HAPPENS ANYWAY. SO I DO HAVE A NEW CONFLICT OF INTEREST. ONCOSPACE, INC. IS NOW A COMPANY, SO I HAVE TO MANAGE THAT CONFLICT OF INTEREST, DOING SO BY LETTING YOU KNOW. THIS IS ACTUALLY A FIGURE THAT CAME FROM A PAPER THAT I PUT TOGETHER AFTER ONE OF THESE WORKSHOPS BACK IN I THINK 2015 OR WHATEVER IT WAS, QUITE A WHILE AGO. AND THE THOUGHT HERE WAS THAT THERE'S A -- LEVELS OF DATA THAT WE'RE TRYING TO COLLECT, I DIDN'T ORIGINALLY HAVE THE SLIDE IN HERE BUT THREW IT IN AT LUNCHTIME, COOL TO SEE THE REGISTRY WHICH NEEDS EVERY PATIENT TO BE INCLUDED SO IT CAN UNDERSTAND INCIDENCE RATES AND GEOGRAPHICAL, TRYING MOVE DOWN THE CHAIN INTO DEEPER DEPTH, HIGHER QUALITY, MORE DATA. I ALSO KIND OF SEE WHAT I SEE WITH THE GOVERNMENT-RUN DATABASE, TCIA AND THESE THINGS, THEY ARE TAKING THIS RESEARCH DATA AND TRYING TO EXPAND ON IT TO MOVE UP INTO THIS SPACE THAT I HAD CALLED THE BIG DATA OPPORTUNITY. I THINK IT REALLY IS A GAP THE BIG DATA OPPORTUNITY IS THE CLINIC, RIGHT? THE BIG DATA OPPORTUNITY FOR QUALITY REPORTING, IT'S YOUR CLINIC'S DATA. IF YOU WANT TO REPORT QUALITY OF OUTCOMES ON YOUR PATIENTS YOU BETTER COLLECT DATA ON EVERY ONE OF YOUR PATIENTS OR YOU'RE CHERRY PICKING, RIGHT? IF YOU WANT TO USE YOUR DATA FOR DECISION SUPPORT YOU NEED A SIGNIFICANT NUMBER OF DATA AND GOOD ENOUGH DEPTH IN DATA THAT GIVES YOU THE INFORMATION THAT WILL HELP YOU WITH A DECISION, RIGHT? BECAUSE PHYSICIANS HAVE A CERTAIN AMOUNT OF DATA ABOUT THAT PATIENT IN THEIR BRAIN, THEY ARE MAKING A DECISION ON IT, YOU SHOULD HAVE AT LEAST THAT AMOUNT OF DATA BEFORE YOU EXPECT THE MACHINE TO HELP YOU MAKE A DECISION. IF YOU HAVE LESS DATA, PROBABLY THE PHYSICIAN WITH THE INFORMATION IS GOING TO BEAT YOU, RIGHT? IF WE WANT TO HELP WE NEED DEEP ENOUGH DATA FOR THAT DECISION SUPPORT. WHAT I'LL FOCUS ON IS SORT OF THIS REALM. THAT'S THE LEVEL THAT I THINK IS A GAP HOPEFULLY WE'RE TRYING TO FILL. WHAT DOES IT MEAN TO BE DATA DRIVEN? IN TERMS OF PLANNING, THIS IS ABOUT PLANNING AN PLAN EVALUATION, FOCUSING ON THAT, GENERALLY WE EVALUATE TREATMENT PLANS BASED ON PROTOCOLS. THESE PROTOCOLS ARE POPULATION BASED. THEY ARE VALID, JUST POPULATION BASED. TRUTH IS EACH PATIENT IS DIFFERENT. EACH INDIVIDUAL PATIENT CAN HAVE A BETTER QUALITY PLAN THAN A POPULATION-BASED PROTOCOL WOULD WARRANT. WHAT THE DATA CAN ACTUALLY DO IS HELP US PERSONALIZE THESE PREDICTIONS AND THIS QUALITY AND GIVE YOU A PERSONALIZED GUIDELINE WITHIN THAT POPULATION-BASED GUIDELINE AS YOU'RE ENVELOPE. AND PREDICTION MODELS HELP USING INDIVIDUAL SPECIFIC FEATURES, IT ALSO IS JUST A MATTER OF BEING ABLE TO REFINE YOUR COHORT. IF I HAVE A THOUSAND PATIENTS THAT ARE EXACTLY IDENTICAL EXCEPT FOR THE WAY I TREAT THEM I KNOW A LOT ABOUT THEM, MAYBE I'LL BE ABLE TO TREAT THEM PROPERLY BECAUSE THEY WERE ALL THE SAME. MORE I CAN REFINE THE COHORT, THE MORE NUMBERS, THE BETTER I CAN PERSONALIZE TO THE COHORT. MENTAL THOUGHTS, ANYWAY. SO, IN GENERAL, A SYSTEM OUT THERE MIGHT LOOK SOMETHING LIKE THIS. YOU HAVE SOME KNOWLEDGE DATA IN YOUR DATA BASE, WE HAVE THIS FEATURE EXTRACTION. THE REASON THAT FEATURE EXTRACTION IS RED IS BECAUSE OVER THE LAST TWO YEARS I'VE SEEN A MOVEMENT AWAY FROM FEATURE GENERATION WHICH I THINK IS AWAY FROM KNOWLEDGE, AND INTO THE RANDOM GOOD PREDICTION WITHOUT REALLY KNOWLEDGE, AND I HOPE IF THERE'S ANYTHING YOU GET OUT OF THIS TALK MAYBE YOU'LL BELIEVE ME. THIS IS LIKE A SOAP BOX PITCH THAT IF WE THROW AWAY THE FEATURES, WE THROUGH AWAY OUR ABILITY TO GAIN KNOWLEDGE FROM DATA. LET'S NOT DO THAT. I HOPE I CONVINCE YOU BY THE END OF THIS. THEN MAYBE THERE'S SOME DIFFERENT TYPES OF MODELS THAT WE CAN USE TO UTILIZE THOSE FEATURES AND MAYBE WE'RE PREDICTING ANOMALY DETECTION, LOOKING FOR THINGS THAT ARE OUTLIERS, OUT OF PLACE, MAYBE LOOKING AT QUALITY METRICS OR MAYBE WE'RE DRIVING SOME PATIENT OUTCOME PREDICTIONS, AND THEN THESE TYPES FEED INTO VARIOUS PARTS OF RADIATION ONCOLOGY WORKFLOW, MAYBE LOOKING AT AWAY FROM A PARTICULAR GUIDELINE FOR A PARTICULAR TYPE OF PATIENT THAT MAY BE A SAFETY ISSUE, MAYBE CONTOUR INTEGRITY CHECKING, YOUR BLADDER IS TEN TIMES THE SIZE OF SOMEBODY ELSE'S OR IT'S REALLY, REALLY SMALL. WE HAVE PLAN QUALITY, WHICH I'LL TALK MORE ABOUT. MAYBE UNCOMMON PRESCRIPTIONS AND FAULTS, ADVERSE OUTCOMES. I'LL FOCUS ON THE THREE RED BOXES TODAY. FIRST THING IS JUST TO TALK ABOUT FEATURES. I SWEAR I'M NOT GOING TO GIVE THESE SLIDES ANY MORE, BUT I STILL THINK IT'S IMPORTANT TO UNDERSTAND THIS. WHEN WE LOOK AT TREATMENT PLAN QUALITY, WE WANT TO CALCULATE SOME FEATURES THAT CONTROL THE QUALITY OF TREATMENT PLAN. WE CAME UP WITH YEARS AGO THIS CONCEPT OF OVH HISTOGRAM. THERE'S TWO CRITICAL STRUCTURES, BLUE AND RED, THE SAME SIZE, AND HAVE THE SAME OVERLAP VOLUME WITH CRITICAL STRUCTURE, OR TARGET VOLUME. AND THAT IS AT THE ZERO POINT . IF I EXPAND THE GREEN TARGET BY 2 CENTIMETERS, CONSUME 50% OF RED ONE, CONSUME ALMOST 90% OF THE BLUE ONE JUST BECAUSE OF THE ORIENTATION. SIMILARLY IF I CONTRACT THE TARGET BY 2 CENTIMETERS I COME OFF THE BLUE ONE BUT STILL HAVE OVERLAP WITH 20% OF THE RED ONE. THESE ARE FEATURES. THE FEATURE IS HOW FAR AWAY IS WHAT PERCENTAGE OF THE VOLUME, SO IF I WOULD LOOK AT 50% OF THE RED VOLUME IS AT LEAST 1 1/2 OR ALMOST 2 CENTIMETERS AWAY FROM TARGET. THIS HAS IMPLICATIONS. IF IT'S A PARALLEL STRUCTURE PERHAPS RED IS MORE EASILY SPARED BECAUSE A GOOD PORTION IS FAR AWAY. IF IT'S A SERIAL STRUCTURE LIKE SPINAL CORD PERHAPS THE BLUE ONE IS MORE EASILY SPARED BECAUSE I CAN COMPROMISE LESS OF MY TARGET TO KEEP IT SPARED. OKAY? I DON'T KNOW IF THAT WAS GOOD ENOUGH TO EXPLAIN TO EVERYBODY. I DON'T KNOW MY AUDIENCE THAT WELL. POINT IS THAT IF I LOOK AT THESE, THIS IS SOME DATA, THIS IS THE MANDIBLE DOSE VOLUME HISTOGRAM AND MANDIBLE OVERLAP VOLUME HISTOGRAM. THIS PARTICULAR PATIENT GOT A PRETTY HIGH DOSE, OVERLAP IS FAR TO THE LEFT. s THIS PATIENT GOT LOWER DOSE, MOVED TO THE RIGHT, YOU KEEP GOING. PATIENTS CONTINUE TO MOVE. THAT KIND OF FEATURE HOLDS TRUE TO YOUR ABILITY TO GET THE DOSE, IF YOU LOOK AT THE DATA. NOW, ALL THE TREATMENT PLANS IN THE DATABASE AREN'T PERFECT. WE DON'T EXPECT THEM TO BE PERFECT, RIGHT? THEY WERE JUST THE TREATMENT PLANS WE USE TO TREAT PATIENTS AND WE DON'T HAVE TIME TO MAKE EVERY SINGLE ONE PERFECT AND DON'T HAVE TIME TO REPLANT TO MAKE THEM PERFECT. WHEN YOU LOOK AT A FEATURE, THE IDEA OF THAT FEATURE IS IT GIVES A DIRECTION TO GO. IF I DON'T HAVE THE FEATURE, I JUST THOW MY TREATMENT PLANS AT SOME FANCY CNN ALGORITHM AND GET BACK A DOSE DISTRIBUTION, I'M GOING TO GET BACK WHAT I DID BEFORE. I WANT TO GET LIKE SOME KIND OF AVERAGE OF HOW GOOD A PLAN I DID BEFORE. I DON'T WANT TO DO THAT. I WANT TO PUSH THINGS TO BE BETTER. BY GOING THROUGH A FEATURE, NOW I CAN UNDERSTAND IT. THIS FEATURE SAYS IF I'M FARTHER AWAY I SHOULD DO BETTER. NOW I CAN TELL IT, MAYBE WE CAN LOOK AT, IN THIS CASE I CAN SAY, WELL, HERE'S A PATIENT, WHAT IF I LOOK AT ALL PATIENTS HARDER TO PLAN, TO THE LEFT, AND FIND THE BEST DOSE THAT I EVER ACHIEVED ON THOSE PATIENTS, NOW I'M PUSHING THE ENVELOPE. I'M PUSHING MY QUALITY TO BE BETTER. I'M NOT ACCEPTING WHAT I DID BEFORE BECAUSE I KNOW I'M NOT PERFECT. DOES THAT MAKE SENSE? œTHEY GIVE YOU KNOWLEDGE AND TELL YOU HOW TO DO THINGS BETTER NOT JUST HOW YOU DID BEFORE. PET PEEVE NUMBER ONE. SO THIS CAN BE USED FOR PLAN QUALITY ASSESSMENT, AUTOMATED PLANNING. WE HAVE A TOOL IN PLACE IN THE CLINIC, AND THE IDEA IS YOU JUST HAVE YOUR NORMAL DOSE VOLUME METRICS THAT MIGHT DRIVE OPTIMIZATION. WE HAVE PROTOCOL VALUES AND PREDICT FOR THE INDIVIDUAL PATIENT BASED ON THAT SHAPE RELATIONSHIP AND IF THEY ARE IN GREEN WE PREDICTED A BETTER DOSE AND WE'LL KEEP IT. YELLOW MEANS THIS PATIENT COULDN'T DO BETTER THAN THE PROTOCOL SO WE'RE GOING TO USE THE PROTOCOL DOSES. THESE NEW PREDICTIONS CAN DRIVE THE OPTIMIZATION FOR TREATMENT PLANNING. AND THEN WE'VE SHOWN YEARS AGO, WE'VE SHOWN WE CAN IMPROVE TREATMENT PLANNING, IMPROVE QUALITY OF TREATMENT PLANS FOR PATIENTS BECAUSE FRANKLY YOU KNOW HOW GOOD YOU CAN DO BEFORE YOU START AND DRIVE YOUR PLANS TOWARDS THAT. THIS IS ONE AREA WHERE DATA CAN HELP YOU AND HELP IMPROVE THINGS, NOT JUST DO WHAT YOU DID BEFORE. SO KEVIN MOORE GAVE ME THIS SLIDE, WE WERE WORKING ON A PUBLICATION TOGETHER. IT'S TO SHOW HE WENT BACK TO RTOG TRIALS, I'M HAPPY TO SEE DR. XIAU SHOWING WE'RE DOING DATA DRIVEN QUALITY ASSESSMENT. HE CALCULATED USING A SIMILAR APPROACH, IN THAT CLINICAL TRIAL THERE'S SIGNIFICANT NUMBER OF PATIENTS THAT ARE BASICALLY COULD HAVE HAD A BETTER PLAN BASED ON A SIMILAR WAY OF THINKING THAT I JUST SHOWED YOU. EVEN WITH CLINICAL TRIALS WE'RE NOT PERFECT. EVEN IF YOU USE YOUR CLINICAL TRIAL DATA TO DO EXACTLY WHAT YOU DID BEFORE, YOU'RE STILL NOT AS GOOD AS YOU COULD BE. SO, THAT'S GOOD. THAT'S FEATURE. ONE OF THE THINGS THAT, GETTING TO CLINICAL OUTCOMES, WE GET HARDER TO UNDERSTAND WHAT WE'RE DOING. BUT OUR GROUP REALLY FELT LIKE IT WAS IMPORTANT TO BE INTERPRETABLE. WHEN WE LOOKED AT MODELS WE WANTED TO MEET WITH DR. QUON WEEKLY. NOT TO DO THE BLACK BOXES, MAYBE I'LL OLD SCHOOL PUSHING FOR FEATURES BECAUSE EVERYBODY LIKES BLACK BOXES BUT HOPEFULLY I'LL CONVINCE YOU AGAIN. I'M USING THIS AS EXAMPLE, NOT TO TALK ABOUT XEROSTOMIA BUT OUR MODEL OF THINKING WHEN WORKING WITH DATA TRYING TO UNDERSTAND WHAT WE'RE DOING. AS DR. QUON SHOWED BEFORE, WE DO STRUCTURED DATA COLLECTION IN CLINICAL WORKFLOW. THEY UPDATE IN REAL TIME. AND THIS SHOWS OUR PREVALENCE. HERE MUCOSITIS LESS THAN GRADE 2, THIS IS PERCENTAGE OF PATIENTS, AT THE START OF TREATMENT, EVERYBODY'S LESS THAN GRADE 2. DURING TREATMENT THEY GET REALLY BAD TO WHERE ONLY 20% HAVE LESS THAN GRADE 2, THEY HEAL RAPIDLY AND WITHIN A YEAR THEY ARE ALL FINE AND DANDY AND MOVING ALONG. YOU SEE OTHER THINGS LIKE TASTE, TASTE FALLS OFF QUITE RAPIDLY BUT HAS A SLOW RECOVERY PERIOD. DYSPHAGIA FALLS OFF, NOT AS MANY PEOPLE HAVE AS MUCH TROUBLE, BUT IT DOESN'T FALL OFF AS MUCH, MORE OF A PERMANENT CONDITION. YOU DO SEE SOME RECOVERY IN XEROSTOMIA, XEROSTOMIA IN BLUE, IT COMES -- GREEN, I'M SORRY. IT COMES DOWN AND HAS A SLOW RECOVERY PERIOD. YOU SEE GRADE 3, GRADE 2, GRADE 1 XEROSTOMIA. WE WANT TO LOOK AT INJURY AND RECOVERY, NOT JUST INJURY OR NOT JUST ONE TIME. THAT'S THE DATA. THOSE PEOPLE LOOK AT SINGLE ORGAN DEVIATION, RIGHT AND LEFT, AND COMPARE INDEPENDENT OF ONE ANOTHER, LIKE THAT. AND THAT'S A LOT OF DATA. RED IS PATIENTS WITH GRADE 2, YELLOW IS GRADE 1. I THINK, YEAH. THEY GO DOWN. MORE RED, THE WORSE IT IS. THESE ARE PAROTID AND SALIVARY GLANDS. MEAN DOSE OF 26 GRAY, YEAH, THAT'S GREAT BUT THERE'S MORE TO IT THAN THAT. SO SCOTT ROBERTSON AS DR. QUON SHOWED EARLIER, THERE'S A COUPLE POINTS TO MAKE ABOUT THIS. ONE IS -- FIRST, THIS IS THE COMBINED PAROTID, COMBINED PAROTID, THREE TO SIX MONTHS POST TREATMENT, RED IS EQUAL TO GRADE 2, BLUE IS LESS THAN GRADE 2 IN THAT TIME PERIOD. WHAT YOU SEE HERE IS THAT EVERYBODY'S CLUMPED TOGETHER RIGHT HERE. THE REASON WHY IS BECAUSE THAT'S WHAT WE TRY TO DO WHEN WE DO TREATMENT PLANNING. WE PUSH THE D 50 BELOW 30 GRAY, WHEN A DOSIMETRIST GETS THE PLAN, THEY ARE LIKE, GREAT, I MADE IT, MADE MY PROTOCOL, I'M GOOD AND MOVE ON. THEY HAVE 20 OTHER STRUCTURES AND THEY HAVE A DAY AND A HALF TO GET THE PLAN DONE BECAUSE DR. QUON GOT HIS CONTOURS IN LATE BUT THAT'S A DIFFERENT ISSUE. [LAUGHTER] NO, JUST TRYING TO BE FUNNY. THE POINT HERE IS THAT THAT POINT IS NOW CONTROLLED FOR IN YOUR DATA. THE KNOWLEDGE CONTAINED IN THE DATA FOR THAT POINT IS BURIED, CONTROLLED FOR. IF EVERYBODY HAD THE SAME DOSE, TO THE D 50, THERE'S NO KNOWLEDGE IN YOUR DATA. SO IT'S NOT TO SAY THAT POINT ISN'T IMPORTANT. IT'S THAT WE KNEW THAT POINT WAS IMPORTANT, USED IT IN OUR PLANNING, NOW IT'S CONTROLLED FOR IN OUR DATA SO THE KNOWLEDGE THAT'S CONTAINED FOR THAT POINT IS NOW SUPPRESSED. SO THEN WE SEE -- SCOTT FOUND THESE HIGH GRADIENTS IN HERE. AND HE FOUND THIS LOW DOSE POINT THAT DR. QUON WAS ALLUDING TO, THAT AT A LOW DOSE BATH AT THIS 90% LEVEL, MAYBE 10 GRAY, WAS THE HIGHEST GRADIENT, HIGHEST IMPORTANT POINT. THAT'S PROBABLY THE HIGHEST GIVEN THE WAY WE TREAT PATIENTS TODAY. NOT IN GENERAL. BUT JUST BECAUSE OF THE WAY WE TREAT PATIENTS TODAY THAT'S THE NEXT THING TO WORK ON. ONCE WE CONTROL FOR THAT IT'S PROBABLY GOING TO BE A DIFFERENT POINT. OKAY? IT'S JUST SOMETHING TO THINK ABOUT. SO THEN WE DID A CART MODEL, WE CAN TALK AND DISCUSS THEM AND SEE IF THEY MAKE SENSE. OUR ALSO PAROTID D 95 COMBINED SHOWED UP AT 10 GRAY POINT AS TOP PREDICTOR. THAT'S G -- GOOD. WEIGHT LOSS IS A PREDICTOR. MEAN DOSE DID SHOW UP AT A FUNNY LOW LEVEL. RACE, AND AGE. WE SAW AGE-RELATED THINGS WITH RADIOMICS, THE CONCEPT AN OLDER PERSON CAN'T RECOVER AS WELL FROM RADIATION DAMAGE. SOME THINGS MAKE SENSE, SMOKING MADE SENSE. WE CAN TALK ABOUT THEM. I PLOTTED, I WENT BACK TO THE PREVALENCE PLOT AND PLOTTED ABOVE 51 AND BELOW 51. I SHOWED THIS PLOT TO SOMEBODY THEY LAUGHED. THAT DOESN'T SHOW ANY STATISTICAL SIGNIFICANCE. YOU'RE PROBABLY RIGHT. MAYBE IT DOESN'T SHOW STATISTICAL SIGNIFICANCE IF I LOOK AT AGE ONLY, BUT WHEN I'M DOWN HERE IN THE TREE, MAYBE THE COHORT THAT REMAINS AT THAT LEAF IN THE TREE, MAYBE THAT'S MUCH MORE SIGNIFICANT. I DIDN'T PLOT THAT. THE TRUTH IS IT DOES SHOW YOUNGER PATIENTS DO HAVE A BETTER RECOVERY TRAJECTORY THAN THE OLDER PATIENTS. SO THEN WE GOT CRAZIER, I HOPE I -- OH, YEAH, I HAVE TIME. NOW ON FEATURES, THAT WAS OUR DVH. DOSE VOLUME HISTOGRAMS ARE BAD. SO THE DVH FEATURES SAY EVERY VOXEL IS EQUALLY IMPORTANT TO FUNCTION AS WELL AS EQUALLY SENSITIVE TO DOSE. WE TREATED EVERY VOXEL AS A FEATURE. AND THIS IS THE MEAN DOSE, OUT OF ALL PATIENTS IN THE DATABASE, SUBMANDIBULAR GLAND, VERY HIGH DOSE, CONTRALATERAL PORTION OF THE PAROTID GETS A LOW DOSE, IPSILATERAL LOW DOSE. CONTRALATERAL CONSISTLY LOW DOSE, THIS IS THE MEAN DOSE FOR A GROUP THAT DIDN'T GET XEROSTOMIA VERSUS THOSE THAT DID, THIS IS RECOVERED AND NOT RECOVERED, THESE ARE RELATIVE DOSE DIFFERENCES. THERE'S A DIFFERENT DIFFERECE PATTERN BETWEEN THOSE THAT DIDN'T AND DID GET XEROSTOMIA AND THAT RECOVERED AND DIDN'T RECOVER, YOU CAN SEE THE DOSE DIFFERENCE PATTERN. WE DID A FANCY MACHINE LEARNING. WE'LL THROW IN THE LASSO REGRESSION MODEL. ONE POINT, RIGHT THERE, SEE THE VOXEL RIGHT THERE? THAT VOXEL, IF I LOWER THE DOSE TO THAT VOXEL I'M NOT GOING TO HAVE XEROSTOMIA ANYMORE. [LAUGHTER] WHAM, VERY GOOD. SO THE POINT IS THAT YOU HAVE TO BE REALLY CAREFUL WITH THE ALGORITHMS BECAUSE WHAT IT IS IS THESE VARIABLES ARE VERY HIGHLY CORRELATED. WE TREAT PATIENTS SIMILARLY. IF I KNOW DOSE TO ONE, I KNOW THE DOSE EVERY WHERE ELSE BECAUSE THE PATTERN IS SIMILAR ACROSS THE PATIENTS. LASSO TENDS TO AMPLIFY IMPORTANCE OF THE MOST IMPORTANT VARIABLE, SUPPRESSES THE OTHER CORRELATED VARIABLES. SO WE DID RANDOM FOREST AND STARTED TO SEE A PATTERN OF INFLUENCE. IT'S NOISY AND PROBABLY BECAUSE WE DON'T HAVE ENOUGH PATIENTS. WE TRIED RIDGE LOGISTIC AND SEE A PATTERN. THIS IS THE PATTERN VERSUS INJURY AND WE SEE HIGH INFLUENCE IN THE LOW DOSE REGION AND LOW INFLUENCE HERE. AND I'M GOING TO GET TO THE CHASE. THIS IS NOW THE INJURY IMPORTANCE PATTERN, THIS IS THE RECOVERY IMPORTANCE PATTERN, AND THEY ARE DIFFERENT. AND THAT'S COOL. THAT'S NEAT. SO IF WE WANT TO THINK ABOUT WHAT THESE THINGS MEAN, WE COULD HYPOTHESIZE BUT NOT PROVE THAT MAYBE SINCE THIS REGION IS SHOWING UP AS IMPORTANT FOR INJURY THAT'S THE LOW DOSE REGION THAT VERY RARELY GETS A HIGH DOSE. FOR PATIENTS THAT GET A HIGH DOSE THERE THEY PROBABLY GOT A HIGH DOSE EVERYWHERE ELSE, PROBABLY GOING TO GET XEROSTOMIA. NOT TO SAY THE REGION IS MOST IMPORTANT BUT IF YOU GOT A HIGH DOSE THERE YOU PROBABLY GOT A HIGH DOSE EVERYWHERE ELSE. NOW, HERE WE SEE THERE'S AN IMPORTANT REGION THAT MAY BE ALIGNED WITH THE DUCTAL REGION. IT'S NOT IMPORTANT IN THE IPSILATERAL SUPERIOR PORTION, MAYBE THERE'S AN OCCLUSION THING WITH THE DUCT, I DON'T KNOW, THAT'S HYPOTHESIS. MORE INTERESTING PERHAPS IS THE RECOVERY. BOTH SUPERIOR PORTIONS ARE IMPORTANT. RECOVERY, THAT'S THE LOW DOSE REGION ON AVERAGE ON BOTH SIDES. SO MAYBE THERE ARE STEM CELLS IN THE PAROTID GLANDS WITH LOWER DOSE THRESHOLD FOR DEATH THAN CELLS RESPONSIBLE FOR NORMAL FUNCTIN. SO RATHER THAN BE A SPATIAL THING MAYBE IF YOU KEEP SOME PORTION VERY LOW YOU KEEP ENOUGH CELLS FOR IT TO RECOVER. MAYBE. TOTAL HIGH PERIODS -- TOTAL HYPOTHESIS, IT CAN HELP US THINK THROUGH AND LET DATA SCIENCE AND MACHINE LEARNING HELP TEACH US, NOT JUST PREDICT SOMETHING FROM A BLACK BOX. THIS GOES BACK TO THE PAPER. THIS IS THE HIGHLIGHTED REGION, MAYBE THE DUCTAL REGION. IT DOES ALIGN WITH THE SAME REGION FROM VAN LLOYD'S PAPER. I'LL GO FAST BECAUSE PEOPLE ARE MAD AT ME. WE DID CUT UP THE PAROTID GLAND DIFFERENTLY. WE CUT UP INTO PARTITIONS AND PIECES. THESE ARE MINI DVHs, D 10 TO D 90 FOR EACH ZONE. THIS IS THE MEAN DOSE FOR THE NO XEROSTOMIA, MEAN DOSE FOR INJURY, AND THEN MEAN DOSE -- INJURY WITH RECOVERY AND MEAN DOSE FOR INJURY WITH NO RECOVERY, AND THESE ARE THE DOSE DIFFERENCES, STANDARD DEVIATIONS FOR THOSE GROUPS. HERE IS THAT HIGH VARIABILITY IN THE IPSILATERAL SUPERIOR PORTION WE SAW IN VOXEL BASED, THESE ARE IMPORTANT VARIABLES FOR EACH ZONE FOR INJURY IN BLUE IS IMPORTANCE FOR RECOVERY, DIFFERENT COLORS BECAUSE ONE IS PREDICTING INJURY, ONE IS RECOVERY WHICH MEANS LOWER DOSE IS MORE RECOVERY. AND THEN WE USE THOSE LITTLE ZONES TO DRIVE OPTIMIZATION AND COULD DRIVE OPTIMIZATION ON INJURY WEIGHTED METHODOLOGY AND RECOVERY WEIGHTED METHODOLOGY AND WERE ABLE TO -- YOU SEE A DIFFERENT PATTERN OF THOSE AND WHETHER THAT PATTERN OF DOSE INFLUENCES THE OUTCOME OR NOT NEEDS A TRIAL OR MORE STUDY. WE COULD CHANGE DOSE PATTERNS AND IMPROVE OUR PREDICTIVE PREDICTION FOR SOME PATIENTS YOU COULDN'T HELP BUT INJURY CAN IMPROVE RECOVERY PROBABILITY. LAST THING I'LL SAY, WE HAVE BEEN LEARNING IN OUR LEARNING HEALTH SYSTEM, IN A WAY, MANUALLY. MAYBE NOT ELECTRONICALLY. THIS IS OUR COMBINED PAROTID DOSE OVER THE YEARS. YOU SEE DR. QUON CHANGED THE WAY HE LOOKED AT HIS TREATMENT PLANS. DOSE VALUES ARE DROPPING OVER TIME, SUBMANDIBULAR DOSES DROPPING OVER TIME, OUTCOMES DROPPING OVER TIME. SO THIS WAS OUR XEROSTOMIA RATES FROM 2007 TO 2010, AND THIS IS FROM 2014 TO 2015. YOU SEE REDUCTION. AND ONE OF THE RESEARCHERS FROM CANNON THAT JOINED US, FORMERLY TOSHIBA, LOOKED AT THESE MODELS AND HOW WE BUILD THESE MODELS OVER TIME. BASICALLY WHAT IT IS, IF I TOOK ALL THE DATA FROM 2007 TO FOURTH DOWN, TO PREDICT 2015 DATA, I DID OKAY, RIGHT? BUT IF I TOOK JUST THE LAST TWO YEARS, LAST THREE YEARS OF DATA, I CAN DO BETTER. AT MY PREDICTION THAN IF I USED ALL THE DATA. DATA GETS STALE. DATA GETS OLD. SO THIS CONCEPT OF MODEL UPDATES AND LEARNING HEALTH SYSTEM WE CAN THINK WE CAN KEEP THROWING DATA AT IT BUT I DON'T WANT TO BE TREATED THE WAY WE TREATED PATIENT 20 YEARS AGO. I DON'T KNOW IF YOU DO. DATA GET OLD. MY LAST SLIDE. A COUPLE THINGS TO REMEMBER, ONE, WE WANT TO USE THE DATA BUT WANT TO STAY SAFE AND MAINTAIN QUALITY. DATA IS NOT ALWAYS THE HIGHEST QUALITY IN DATABASE SO WE NEED TO MAKE SURE METHODS AND MODELS DON'T ASSUME. DATA DOES NOT CONTAIN ALL THE KNOWLEDGE. EXISTING KNOWLEDGE IS OFTEN ABSENT. PATIENTS MEET A DOSE GOAL, NO KNOWLEDGE OUTSIDE THAT DOSE GOAL. YOU HAVE TO BE CAREFUL OF THAT. BEWAREY OF SITUATIONS WHERE WE MAY FIND YOU'RE OUTSIDE OF THE BOUND WHEN YOU'RE OUTSIDE OF THE KNOWLEDGE BOUNDS. MAKE SURE WE KEEP OUR DATA MODELS CURRENT. AND I'M DONE. [APPLAUSE] >> WE'LL HAVE QUESTIONS AT THE END. MOVING FROM KIND OF THE HIGH DIMENSIONAL SPATIAL DATA SPACE TO SEMANTIC SPACE, NOT ONLY A GIFTED PHYSICIAN-SCIENTIST, ALSO RUNS THE MED NET, ONE OF THE BEST INFORMAL EDUCATIONAL OPPORTUNITIES WE HAVE IN THE FIELD, SPEAKING ABOUT NATURAL LANGUAGE PROCESSING, IMPROVING CLINICAL TRIAL THROUGH THE ONCOLOGIST NETWORK. >> THANK YOU VERY MUCH, DAVE AND ANSHU FOR INVITING ME TO SPEAK TODAY. I'M THE CHIEF MEDICAL OFFICER AND FOUNDER AND PROFESSOR AT YALE. SPECIFICALLY FOCUSED ON THE MED MET, A SOCIAL PLATFORM OVER 3500 CLINICAL QUESTIONS NOT FOUND IN LITERATURE, TEXT BOOKS OR GUIDELINES, 9,000 SINCE 2014, A MONTHLY ACTIVE USERS, 45 TO 50%. PREMISE OF MEDNET IS SIMPLE. YOU COME ON AS CLINICIAN AND SEARCH FOR THE ANSWER TO THE QUESTION THAT YOU HAVE. IF YOU DON'T FIND THE ANSWER, YOU CAN ASK IT, AND WE REACH OUT TO EXPERTS AND GET AN ANSWER TO YOUR QUESTION AND SHARE WITH THE REST OF THE COMMUNITY IN A FEED SIMILAR TO TWITTER OR FACEBOOK. OUR COMMUNITY, 9,000 ONCOLOGIST, 90% OF U.S. RADIATION ONCOLOGISTS, OVER 35% OF U.S. MEDICAL ONCOLOGISTS, 30% OF GYNECOLOGIC ONCOLOGISTS. 500+ ACADEMIC ONCOLOGISTS FROM EVERY MAJOR CANCER CENTER ANSWERED QUESTIONS ON MEDNET. THE CONTENT HAS A VERY LONG SHELF LIFE. THERE'S OVER 3500 QUESTIONS AND ANSWERS, ANSWERS VIEWED OVER A MILLION TIMES. 83% VIEWED EVERY MONTH WHICH MEANS PEOPLE ARE COMING BACK AND VIEWING Q&A THAT MIGHT HAVE BEEN A MONTH, TWO MONTHS, A YEAR OR TWO YEARS OLD. HAS LONG LASTING VALUE. WE ENVISION MEDNET, WE THOUGHT EXPERTS WOULD COME ON AND ANSWER QUESTIONS USING THE CLINICAL EXPERTISE BECAUSE QUESTIONS ARE BASED ON THINGS IN THE DATA-FREE ZONE. 60% ALSO CITE PUBLISHED DATA. THERE'S A REALLY NICE OVERLAP OF ANSWERS, PUBLISHED DATA AND CLINICAL EXPERIENCE. THIS IS WHAT WE KNOW. THIS IS WHAT I DO. THE MOST FREQUENT DATA SOURCE IS PHASE 3, RANDOMIZED TRIALS, FOLLOWED BY RETROSPECTIVE STUDIES. AN INTERESTING THING WE FOUND, 20% OF THE ANSWERS CITE CLINICAL TRIALS, CURRENT TRIALS, FUTURE TRIALS. WE SAW THE TREND EARLY ON. WE STARTED TO THINK ABOUT HOW THE MEDNET CAN HELP DRIVE CLINICAL TRIALS FORWARD. WHY DO WE CARE ABOUT CLINICAL TRIALS? CANCER CLINICAL TRIALS ARE INCREDIBLY IMPORTANT, THEY TRANSLATE OUR SCIENTIFIC DISCOVERIES INTO STANDARD CANCER CARE. AND AS WE HAVE MORE TRIALS INVOLVING TARGETED THERAPY AND PERSONALIZED MEDICINE WE'RE STRATIFYING MORE PATIENTS BASED ON MOLECULAR MARKERS AND NEED MORE PATIENTS THAN EVER BEFORE. SOUNDS EXCITING BUT IT'S VERY PROBLEMATIC. 2 TO 5% OF CANCER PATIENTS ENROLL IN CLINICAL TRIALS. 70% DELAYED, 25% NEVER REACH ENROLLMENT GOALS, MOST PATIENTS ARE UNAWARE OF CLINICAL TRIALS THEY MAY BE ELIGIBLE FOR. THOUSANDS OF QUESTIONS GO UNANSWERED BY CLINICAL TRIALS. SO THE MEDNET WANTS TO CHANGE THIS, HOW PHYSICIANS LEARN ABOUT CLINICAL TRIALS. AND WHY ARE WE FOCUS ON THE PHYSICIAN PIECE? PHYSICIANS ARE THE MOST TRUSTED SOURCE TO PATIENS. IF YOU LOOK AT FACTORS, ONCOLOGIST AND KNOWLEDGE AND EASE BEAR THE STRONGEST RELATIONSHIP WITH ACCRUAL. SO LAST YEAR WE WORKED WITH WITH SWOG AND DID A PILOT STUDY, TARGETED SOCIAL AND PERSONALIZED CAMPAIGN ON THE MEDNET TO INCREASE ACCRUAL TO CLINICAL TRIALS, FUNDED BY THE HOPE FOUNDATION. THE OBJECTIVE TO EVALUATE HOW FEASIBLE IT WAS. TARGETED SOCIAL AND PERSONALIZED CAMPAIGN TO RAISE ONCOLOGY'S AWARENESS TO TWO SWOG CLINICAL TRIALS ON THE MEDNET. ENDPOINT WERE ENGAGEMENT. WE LOOKED AT REGISTRATION AND MONTHLY USE. WOULD SWOG MEMBERS SIGN UP AND ENGAGE WITH CLINICAL TRIAL CONTENT? WE LOOKED AT SOME DESCRIPTIVE ENDPOINT, WHAT WAS THE MOST ENGAGING CONTENT TO USE IN FUTURE STUDY. FIRST STEP WAS TO INVITE SWOG MEMBERS, NCORP FOCUSED ON COMMUNITY ONCOLOGISTS. WE SELECTED 10 NCORPs, GOAL TO REGISTER 20%. WE REGISTERED 49%. WE MET OUR GOAL BY 150%. WE DID A PRE-PILOT SURVEY ON KNOWLEDGE OF TWO SWOG CLINICAL TRIALS 1418 AND 1501. WE ASKED ABOUT TRIPLE-NEGATIVE BREAST CANCER, 53% WERE AWARE. 47% DID NOT KNOW. 47% OF THESE INDIVIDUALS DIDN'T KNOW ABOUT A TRIAL THAT WAS OPEN AT THEIR OWN INSTITUTION. THIS IS A VERY LARGE PHASE 3 TRIAL ALL OVER THE COUNTRY. THE OTHER STUDY, WE ASKED IF THEY WERE AWARE OF CARDIAC TOXICITY IN METASTATIC HER-2+. WE DELIVERED PERSONALIZED AND TARGETED MESSAGING OVER SIX MONTHS. THIS INCLUDED TRIAL RELEVANT Q&A, CLINICAL TRIAL ALERTS, E-MAIL NOTIFICATIONS, LEADER BOARDS OF TOP ENROLLING PHYSICIANS. THIS IS WHAT THE CLINICALLY RELEVANT Q&A IS. WE WORKED WITH THE P.I., PUT TOGETHER A QUESTION AND ANSWER, CLINICALLY FOCUSED, QUESTION AND ANSWER BASICALLY CITED THE CLINICAL TRIAL. WE WOULD SEE THESE IN WEEKLY NEWSLETTERS. WE CREATED A LEADER BOARD SO INDIVIDUALS COULD SEE HOW THEY WERE ENROLLING AND THEIR INSTITUTION WAS ENROLLING, AGAIN LOCALLY AND NATIONALLY. THIS IS AN EXAMPLE OF E-MAIL NOTIFICATION. YOU'VE ENROLLED THIS MANY PATIENTS. THIS IS HOW YOU RANK NATIONALLY, THIS IS HOW YOUR PEERS ARE DOING. WE CREATED PERSONALIZED MESSAGING FROM THE P.I. SO A PERSONAL APPEAL TO TRY TO ENROLL MORE PATIENTS AT TRIAL. DEAR DR. SO-AND-SO, I WANT TO TELL YOU ABOUT THE TRIAL, VERY BRIEFLY. YOU CAN SEE ENROLLMENT DATA. WHY IS IT IMPORTANT? FEEL FREE TO CALL ME IF YOU HAVE ANY QUESTIONS. WE DID A POST PILOT SURVEY ON KNOWLEDGE OF THE TRIALS, AND YOU CAN SEE THAT FOR BOTH TRIALS WE'RE IN A STATE OF INCREASE OVER TIME FOR S1418 AND S1501. KNOWLEDGE DOES TEND TO INCREASE OVER TIME BUT WE'LL TAKE SOME CREDIT. WE REACHED BOTH ENDPOINTS, EXCEEDED REGISTRATION RATE, USAGE INCREASED OVER TIME. IN TERMS OF THAT MESSAGING THAT WE CREATE, THE PERSONALIZED OUTREACH AND P.I. WAS MOST POPULAR TYPE OF NOTIFICATION. SWOG INVESTIGATORS WANTED TO RECEIVE MORE OF THOSE TYPES OF MESSAGES. THE REAL LIFE CLINICAL QUESTIONS AND ANSWERS ARE STRONG ENGAGEMENT WITH INDIVIDUALS WITHIN THOSE. NCORP BUT MORE BROADLY NATIONALLY, CREATING A SENSE OF COMMUNITY AROUND CLINICAL TRIAL ENROLLMENT. SWOG MEMBERS DID REPORT LEARNING ABOUT TRIALS THEY DID NOT KNOW ABOUT. MEMBERS WHO REPORTED LEARNING ABOUT TRIALS WERE TWICE AS ACTIVE ON MEDNET AS THOSE WHO DID NOT REPORT LEARNING ABOUT THEM. SO OUR WORK LED TO NATIONAL SCIENCE FOUNDATION GRANT, ALSO NAMED COMPANY CHANGING THE FUTURE OF MEDICINE BY INC MAGAZINE. THIS IS WHERE MACHINE LEARNING COMES INTO PLAY. SANJAY IS OUR COLLABORATOR ON THIS. THE GOAL TO USE MACHINE LEARNING TO SUGGEST CLINICAL TRIALS TO THE MEDNET MEMBERS BASED ON CLINICAL QUESTIONS. RATIONALE IS CLINICAL TRIALS ARE THE MOST RELEVANT AT THE TIME INDIVIDUALS ARE SEEKING OUT INFORMATION ABOUT HOW TO TREAT THEIR PATIENTS. SO IF YOU SUGGEST CLINICAL TRIALS AT THE POINT OF CARE, IT MAY IMPROVE LIKELIHOOD OF REFERRAL TO A TRIAL. PHASE 1 IS FOCUSED ON MATCHING CLINICIANS BREAST CANCER QUESTIONS SPECIFICALLY TO MCI BREAST CANCER TRIALS. SO IN TERMS OF METHODOLOGY, SANJAY CAN GO INTO MORE DETAIL ABOUT THIS. THE FIRST PART WAS BREAST CANCER QUESTION IDENTIFICATION, HOW WELL COULD WE USE MACHINE LEARNING METHODS TO IDENTIFY QUESTIONS AS BREAST CANCER QUESTIONS. SECOND INVOLVED CANCER TRIAL MATCHING. AND JUST TO CREATE A DATA IS SET, BREAST CANCER TRIAL FEATURES, AND ABSTRACTED NCI CLINICAL TRIALS BASED ON FEATURES AND DID THE SAME WITH BREAST CANCER QUESTIONS ON THE MEDNET. THE RESULT IS APPROACHES ACHIEVED ACCURACY OF 95%, IDENTIFYING BREAST CANCER QUESTIONS, BEST ACHIEVED BY OLDER MACHINE LEARNING METHODOLOGIES AS OPPOSED TO DEEP LEARNING. AND WITH THE QUESTIONS, WE WERE ABLE TO MATCH 72.2% TO CLINICAL TRIALS, ACCURATE WHEN USING FREQUENCY ANALYSIS, DRIVEN BY FOUR FEATURES, NEOADJUVANT CHEMOTHERAPY, STAGE IV, TRIPLE-NEGATIVE AND HER2+. LEAST ACCURATE WHEN USING DEEP LEARNING TECHNIQUES. WHAT'S NEXT? WE'LL BE WORKING ON THE TECHNOLOGY, SUGGESTING CLINICAL TRIALS FOR QUESTIONS ACROSS DISEASE SITES, NOT JUST BREAST CANCER. WE WANT TO START COMBINING MACHINE LEARNING AND SOCIAL ENGAGEMENT TOOLS LIKE THE ONES I MENTIONED TO RAISE AWARENESS FOR CLINICAL TRIALS ACROSS NCTN. THE RATIONALE, WE BELIEVE SOCIAL ENGAGEMENT INTERVENTIONS AND MACHINE LEARNING ARE SYNERGISTIC. WE WANT TO LOOK AT HOW OFTEN REAL WORLD CLINICAL QUESTIONS CAN MATCH THE CLINICAL TRIALS, AND DOES THIS IMPROVE OVER TIME, AND GET BETTER, HOW DOES THIS IMPACT CLINICIANS? HOW OFTEN ARE THEY REFERRING PATIENTS TO TRIALS? THIS IS AN EXAMPLE OF WHAT IT WILL LOOK LIKE. A QUESTION ON MEDNET, IS THERE EVER A ROLE FOR ADJUVANT CHEMOTHERAPY OR IMMUNOTHERAPY FOR EARLY STAGE LUNG CANCER TREATED WITH SBRT ALONE? AN EXPERT ANSWER, NO, ONLY THE SETTING OF CLINICAL TRIAL. LOOK, HERE IS THE TRIAL THAT MATCHED TO THIS QUESTION. YOU CAN GO AND SEE WHICH SITE IS NEAR YOU BECAUSE YOU'RE LOGGED IN. YOU ENTER YOUR INSTITUION INFORMATION, WE CAN SEE WHICH SITE IS NEARBY, HAVE THE TRIAL OPEN, FULL DESCRIPTION, CONTACT THE P.I., HOPEFULLY HAVE A LEADER BOARD, WHERE PEOPLE ARE ENROLLING AND HOW WELL. AND THAT'S IT. HAPPY TO TAKE ANY QUESTIONS. I THINK SOON, RIGHT? [APPLAUSE] >> IF WE COULD GET THE LAST PANEL UP HERE WE'LL TAKE QUESTIONS AND HAVE A LITTLE BIT OF DISCUSSION. DOING OKAY ON TIME? FIRE AWAY WHILE WE'RE GETTING SETTLED, NADINE. DO YOU SEE OTHER TYPES OF HEALTH CARE PROVIDERS, NOSHES NURSE NAVIGATORS, USING THE SITE? EVEN BEFORE WE SEE THE PATIENT AT JEFFERSON, LIKE MANY OTHER PLACES, WE'VE ALREADY DISCUSSED THEM, WE'VE FIGURED OUT WHICH CLINICAL TRIAL THEY ARE ELIGIBLE FOR, RIGHT? SO EVERYONE IS ON THE SAME PAGE. IT TAKES A LOT OF TIME AND EFFORT AND IT'S NOT THE MOST EFFICIENT PROCESS, BUT DO YOU THINK A NURSE NAVIGATOR AT A COMMUNITY SITE MAY BE ABLE TO USE YOUR SYSTEM? >> NURSE COORDINATORS ARE AMONG THE MOST IMPORTANT PEOPLE, WE'RE TRYING TO DEVELOP THE PILOT STUDY WITH SWOG. THERE'S A ROLE FOR THEM TO CONVERGE ONLINE ON THE MED NET, LEARNING FROM PEOPLE WHO ARE GOOD AT ENROLLING IS PROBABLY AN AREA WHERE WE CAN ACTUALLY FOSTER THAT ON THE MEDNET AND THE IDEA BEHIND THE LEADER BOARD YOU'RE SEEING THAT SO-AND-SO RIGHT DOWN THE STREET IS DOING A REALLY GOOD JOB, LAY, WHAT ARE YOU DOING DIFFERENTLY, HOW ARE YOU IDENTIFYING THE PATIENTS AND TALKING TO PATIENTS ABOUT THE TRIAL. >> THERE'S A LOT OF BEHAVIORAL AND ECONOMIC TECHNIQUE. IMPRESSIVE. DR. FULLER, MANY TRAINEES ARE STUCK, I WANT TO GET INVOLVED WITH RADIOMICS BUT HOW I DO START? WHAT ADVICE DO YOU GIVE IN HOUSTON? >> THROUGH JOINT EFFORTS THROUGH APM, WE'RE USING AS A CRUTCH CLINICIAN TRAINING THROUGH OUR PHYSICS TEAMS. CHUCK IN THE PRACTICAL BIG DATA WORKSHOP WAS CO-CHAIR, AN AVENUE WHERE WE'VE GOTTEN YOUNGER FOLKS OFFSITE INTERACTING WITH COMMON PLAYERS IN THE FIELD. TO BE HONEST IT'S BEEN FAIRLY DIFFICULT BECAUSE WE DON'T HAVE -- EVEN WITH THESE EFFORTS, WE DON'T REALLY HAVE A STANDARDIZED CURRICULUM OR PIPELINE FOR TRAINEES. MACHINE LEARNING AND MODEL BUILDING IS NON-INTUITIVE. PEOPLE THINKS RADIOMICS IS EASY STUFF BUT AS CAROLINE SHOWED, IT TAKES FINESSE, HARD TO DO IN A SHORT PERIOD OF TIME. WE'RE STRUGGLING WITH THE BEST MECHANISM TO COACH THOSE FOLKS. I DON'T KNOW IF I HAVE AN ANSWER BUT WE'RE TRYING TO WORK THROUGH BUILDING THESE COMMUNITIES AND GETTING THEME PIECE -- THESE PEOPLE ACROSS THE FINISH LINE. HERE IS ANDRE DECKER, IF SHE'S A GOOD MODEL, HERE IS NADINE TRYING TO POINT TO PEOPLE WHO HAD SUCCESS IN THESE SPACES AND GET THEM ENGAGED. >> WE'VE BEEN EMBEDDING PEOPLE IN RADIOMICS GROUPS, EMBEDDED IN A GROUP AT PENN, A VERY LARGE RADIOMICS GROUP, BUT YOU NEED TO BE SURROUNDED BY THESE FOLKS. THERE'S ALWAYS QUESTIONS, HOW DO I DO THIS AND CLEAN UP, THERE'S A TON OF THINGS. >> TODD CAN TELL YOU. THINK ABOUT MODEL BUILDING APPROACHES, LIKE TODD CLASSIFICATION AND RANDOM FORESTS WERE INTERPRETABLE STRATEGIES STABLE FOR MANY YEARS. THE TURNOVER TIME IS QUICK, MONTHS. AN EXCITING MODEL OF DEEP LEARNING CAN BE OBSOLETE BY THE TIME THE INSTRUCTOR HAS PUBLISHED ON IT. HOW ARE YOU DEALING WITH THIS FLOOD OF NEW DOMAIN EXPERTISE THAT'S FLYING AT PEOPLE ALL THE TIME? >> FROM THE RADIOMICS PERSPECTIVE? >> EVEN MACHINE LEARNING PERSPECTIVE. >> I DON'T THINK WE -- AT THIS POINT WE DON'T ACCEPT THE BLACK BOX THAT WELL, RIGHT? WE'RE NOT ACCEPTING THE WAVE BECAUSE WE WANT TO TALK ABOUT IT AND UNDERSTAND IT. WE'RE MAYBE NOW OLD SCHOOL AND RESISTING A LITTLE BIT, TO BE HONEST. AFTER LISTENING TO CAROLINE'S TALK, VOLUME WAS THE MOST IMPORTANT THING, I'M HAPPY WE'RE HOLDING OFF A LITTLE BIT. >> YOU'RE HAPPY BEING A LAGGE,R, NEXT DOOR IS NADINE THROWING IN THE KITCHEN SINK. I SHOULD HAVE YOU ARM WRESTLE IN FRONT OF ME. >> YEAH, SO I LOOK FROM TWO PERSPECTIVES. ONE, WHEN WE IMPLEMENT CNN PERFORMANCE IMPROVED SUBSTANTIALLY, ESPECIALLY IF YOU THREE EVERYTHING AT IT. HOWEVER, NOT SO MUCH WITH RADIOMICS FEATURES, SO WE HAVE THOUSANDS OF FEATURES, WE EXTRACT FROM RADIOMICS, IF WE THROW A THOUSAND FEATURES TO MACHINE LEARNING IT'S WORSE THAN IF WE EXTRACT THREE KEY FEATURES, JUST VOLUME IS BETTER THAN ALL THE FEATURES COMBINED TOGETHER. I THINK WE HAVE TO LOOK AT THE PROBLEM AT HAND TO TAKE THE BEST. >> RIGHT. I THINK THAT'S THE KEY THING. MOST OF THE THINGS WE'RE LOOKING AT IS WHEN YOU LOOK AT DECISION SUPPORT, IT'S NOT JUST PREDICTING WHAT'S GOING TO HAPPEN. IT'S WHAT CAN YOU DO ABOUT IT. SO IF YOU DON'T HAVE AN INTERPRETABLE FEATURE, YOU HAVE NO MEANS OF CHANGING SOMETHING TO IMPROVE IT. IF YOU'RE DOING AUTOSEGMENTATION AND WILL LOOK AT CONTOURS ANYWAY, BY ALL MEANS CNN IS PERFECTLY FINE. THERE'S PROBLEMS WHERE THINGS ARE PERFECT AND OTHER PROBLEMS YOU NEED TO KNOW HOW TO CHANGE SOMETHING, YOU NEED TO KNOW HOW TO CAUSED IT. >> THE SAME APPROACH, LARGE SCALE, WHAT ARE YOUR THOUGHTS, CHRIS? >> IT'S NOT A CHOICE YOU NECESSARILY HAVE TO MAKE. ANY KIND OF AGGRESSION MODEL OF COEFFICIENT YOU CAN UNDERSTAND, DECISION TREE WILL HAVE SPLITS, EVEN RANDOM FORCE IS TRICKIER, HOW MANY AND HOW UP THINGS GET SPLIT ON. THAT'S ON A CLEAR INTERPRETATION. MORE RECENT MODELS, ENSEMBLE METHODS AND DEEP LEARNING MODELS, THERE ARE METHODS YOU CAN WRAP AROUND THOSE KINDS OF MODELS THAT BASICALLY PERTURB THE DATA IN A LOCAL SPACE AND FIT REGRESSION MODEL SO YOU CAN LOOK AT -- YOU CAN PEER INSIDE THE BLACK BOX LOCALLY, FOR AN INDIVIDUAL PREDICTION WHAT'S PUSHING THINGS IN DIFFERENT DIRECTION. I DON'T THINK IT'S A CHOICE YOU HAVE TO MAKE. I THINK THERE'S OBVIOUSLY LOTS OF BENEFITS IN TERMS OF QUICKLY FITTING DECISION TREES OR OTHER VERY EASILY INTERPRETABLE MODELS. BOTH IN TERMS OF FAMILIARITY AND SPEEDS, ALSO IMPROVEMENT OF PERFORMANCE ESPECIALLY WHEN APPLICABLE FOR THE INPUT DATA, SO I THINK THERE'S GOOD CAUSE TO USE BOTH, TOOLS TO MEDIATE BETWEEN THAT, INTERPRET THE OUTPUT. >> SO TODD, MAYO CLINIC. I'M A STATISTICS BY EDUCATION, BY TRADE, EMBEDDED IN RADIATION ONCOLOGY DEPARTMENT. AND SO I'M A FEATURE PERSON DOWN HEART, BUT I ALSO LIKE THIS ENSEMBLE AND USING EVERYTHING AT MY DISPOSAL. WITH FEATURES ESPECIALLY, I'M A STATISTICS, I'M NOT GOING TO PUT IN DVH 1 THROUGH 100. BUT WHEN I WAS LOOKING AT YOUR DVH PLOT, YOU'RE THROWING IN EVERYTHING. I'M WONDERING HOW ARE YOU SELECTING, WHAT ARE YOU CHOOSING, ARE YOU KEEPING EVERYTHING? >> YEAH, EXACTLY. SO THAT'S THE THING. WITH THAT STUDY WE WERE JUST LIKE TO LOOK AT INFLUENCE PATTERN, NOT PICK THE MOST IMPORTANT DVH ATTRIBUTE FOR THE PREDICTION. WE JUST LOOKED AT INFLUENCE. WE'RE NOT USING THAT PREDICTION MODEL CLINICALLY. WE'RE USING. >> BUT WHEN YOU DO, YOU'RE BUILDING VARIOUS LEVELS OF MODELS TO TRY TO FIND PREDICTORS. >> I THINK TO THAT POINT, IT BECOMES DIFFICULT, ONE THING I LOVE ABOUT YOUR WORK, TODD, HAVE YOU GREAT INTUITIVE VISUALIZATIONS OF WHAT'S GOING ON. AS WE MOVE TO THESE MORE ESOTERIC APPROACHES TO WHOLE DVH, BAYESIAN QUADRATIC EQUATIONS, YOU STILL ENVIRONMENTAL TO GET A FORMAT A GOOF BALL M.D. CAN LOOK AT, IF IT'S BALL PARK RIGHT. SOMETIMES SIMPLICITY MAY NOT BE ON ANALYTIC SIDE BUT RELATION TO VISUALIZATION BECOMES AN ENDPOINT. CAROLINE? >> THE XEROSTOMIA DATA YOU'VE SHOWN AND ANALYTICS IS INCREDIBLE. XEROSTOMIA IS SOMEWHAT OF A SUBJECTIVE MEASURE, RIGHT? HAS THERE BEEN A STUDY LOOKING AT TEST, RETEST OF GRADE 1 TO 5 XEROSTOMIA? EMILY? >> RIGHT. >> (INAUDIBLE). >> RIGHT. WE'VE DONE THAT A LITTLE BIT ACROSS THE BOARD. IN OUR THORACIC AND HEAD AND NECK GROUP WE HAVE CLINICIAN ASSESSMENT AND PATIENT REPORTED OUTCOMES. THERE'S PUBLICATIONS ON COMPARING PATIENT REPORTED OUTCOMES SO WE HAVE REDUNDANCY. THAT WAS JUST DONE ON CTCAE. >> IT WOULD BE INTERESTING TO SEE FOR INSTANCE TRAINEE VERSUS PHYSICIAN ASSISTANT VERSIONS PHYSICIAN ON THE SAME DAY INCLUDING AND PRO AND TO SAY DO THESE CORRELATE AND WHEN THERE'S A DISCORDANCE DO YOU THROW THAT DATA AWAY BECAUSE THERE'S SOMETHING THAT DOESN'T QUITE LOOK RIGHT? AND YOU MAY NOT THROW IT AWAY BUT MAY ACTUALLY SCALE, THERE MIGHT BE A DIFFERENT VALUE SCORE FOR THAT PARTICULAR DATA POINT VERSUS ONES CONCORDANT IN ITEMS OF FINDING GOOD DATA VERSUS NOT SO -- >> (INAUDIBLE). >> MICHIGAN HAS DEVELOPED QUESTIONNAIRES FOR THAT. CORRELATED AND SELECTED SALIVA TO QUANTIFY IT. >> EXACTLY. THAT GOES BACK TO MY POINT ABOUT HAVING THE RAW DATA BEING COLLECTED IN THE BACKGROUND TO TEACH US HOW WE SHOULD BE QUALIFYING THESE SCORES BECAUSE SOME MAY OR MAY NOT BE AS RELIABLE AS OTHERS. IT MAY HELP PRIORITIZE WHICH PIECES WE WANT TO GO AT WITH RADIOMICS AND MACHINE LEARNING AND EVERYTHING ELSE, IF THERE ARE CERTAIN CLINICAL MEASURES UNRELIABLE, US WASTING TIME TRYING TO BUILD AROUND AN ENDPOINT THAT'S NOT REALLY -- >> I THINK WHEN I GO BACK TO THE BIG TRIANGLE, THE MEASURE OF SALIVARY FUNCTION IS A COST, IT'S NOT POSSIBLE TO DO ON EVERY PATIENT. THAT'S IN THE RESEARCH BUTTON. >> EXACTLY. >> BUT WE CAN USE THAT TO VALIDATE THE OTHER MEASURES TO MAKE SURE WE -- >> TO YOUR POINT, SPEAKING ABOUT THE MICHIGAN DATA, SUGGESTS THERE'S QUITE A BIT OF NOISE IN TERMS OF DIFFERENCE BETWEEN MEASURED SALIVARY FUNCTION, BIOLOGIC PROCESS, AND WHAT THE PATIENT EXPERIENCES. THESE PATIENTS WITH DYSPHAGIA FEEL LIKE THEY HAVE DRY MOUTH WHEN THEY JUST CAN'T SWALLOW. HIGH DIMENSIONAL TOXICITY IS A RICH AREA FOR THIS MINING. >> THAT'S ACTUALLY AN INTERESTING POINT. WE'RE DOING DYSPHAGIA, AND WHEN WE DO VOXEL PATTERN THE SALIVARY GLANDS LIGHT UP, THERE'S SALIVARY FUNCTION IN THE DYSPHAGIA SCORE SO WE SEE THAT. >> THIS QUESTION IS FOR DAVE. WHEN YOU'RE RECRUITING FELLOWS WHAT'S YOUR TAKE ON CODING, DO THEY NEED A CODING BACKGROUND? IN MY GROUP WE HAVE STUDENTS WHO COME TO ME AND THEY MAYBE DON'T HAVE ANY CODING EXPERIENCE, AND YOU'RE TRYING TO FIGURE OUT WHAT THE BEST WAY IS TO MANAGE THAT, HELPING THEM LEARN, HAVING THEM PAIRED WITH AN ANALYST IS NOT HELP UNTIL THAT SENSE. >> IT'S ACTUALLY QUITE VARIABLE. IN SOME SENSE PART OF THE PROBLEM IS FOLKS WE GOT, FIRST WAVE OF APPLICANTS WERE FELLOWS. ALMOST ALL OF THOSE FOLKS HAD CODING EXPERIENCE COMPARATIVELY STALE. MAYBE C PLUS PLUS, MAYBE SOME JAVA, BUT NOT A LOT OF PYTHON. WE'RE INCREASINGLY SEEING MORE UBIQUITY OF R, MORE TRANSLATABLE, BUT THE KIND OF EXPLOSION OF PEOPLE SEEN IN PYTHON-BASED APPROACHES IN THE LAST FEW YEARS, WE'RE STILL CATCHING FOLKS SEVERAL WAVES OF TRAINING PAST THAT. SO A BIG POINT, HOW GOOD AT CODING DO YOU NEED TO BE TO BE EFFECTIVE? YOU NEED TO HAVE SOME UNDERSTANDING OF GENERAL CODE EXPERIENCE. TO BE FAIR, A LOT OF US, YOU'RE NOT GOING TO BE THERE WRITING COMMAND LINE STUFF MOST OF THE TIME. AS A PHYSICIAN-SCIENTIST YOU HAVE TO BE INTERDISCIPLINARY ENOUGH BUT KNOWLEDGEABLE ENOUGH TO OVERSEE PROCESSES BUT NOT PRO EFFICIENT AT MASSIVELY HIGH LEVEL NECESSARILY. OPPORTUNITY COSTS ARE TOO HARD FOR SOMEBODY TO GO BACK AND SPEND TWO YEARS GETTING PROFICIENT BUT SOME LEVEL IS PRINCIPALLY IMPORTANT. >> MY CONCERN IS MORE SO THAT IF WE DON'T HAVE ANY CODING EXPERIENCE WHEN THEY ARE GIVEN THAT PHYSICIAN-SCIENTIST JOB OR STARTING JOB, DON'T HAVE ABILITY TO HIRE SON, WE'VE ESSENTIALLY LOST, SPEAKING FROM EXPERIENCE I DO MOST OF THE CODING MYSELF. >> WE UNDERSTAND SCIENCE PROJECTS WITH NOT DONE ON A LAP TOP. I'M WONDERING IF YOU THINK WE SHOULD -- THIS IS SOMETHING FOR EVERYBODY, SHOULD WE INTEGRATE AN ACTUAL CODING EXPERIENCE? I FIND AT LEAST SOME MOST SUCCESSFUL PEOPLE WHO WORKED WITH ME HAVE CODED FROM THE GROUND UP AND GET MORE FORMATIVE EXPERIENCE. >> THIS GOES TO THE POINT WE FOUND DIFFICULT. SO PLICA GOOD FRIEND, MCW, HARD CORE. TRANSLATING THOSE SKILLS INTO SOMETHING THAT IS IDENTIFIABLE AS A SKILL SET BY OTHER ACADEMICIANS IS HARD, BEING A ADEPT IS A CAREER VALUE AS OPPOSED TO HOBBY. I REALLY WELCOME THIS IDEA FROM THE GROUP BECAUSE WE HAVE TALENTED FOLKS, FULL RADIATION ONCOLOGIST WHO CAN CODE IN THE U.S., DEPARTMENT CHAIR SAYS THIS IS ON A COMPUTER, RIGHT? I WONDER HOW WE CAN BRING IT TO THE FORE. >> SO WE DON'T HAVE MIXED MESSAGING HERE THERE ARE PLENTY OF PEOPLE THAT GO TO BOOT CAMP AND DATA ROBOT BOOT CAMP, WHATEVER IT IS, VARIOUS BOOT CAMPS, IN FOUR MONTHS THEY GO FROM ZERO TO 60 IN DATA SCIENCE AND CAN SCRAPE AND DO THIS AND THAT. BUT THEY HAVE ZERO CONTEXT. I'LL ALWAYS TAKE SOMEONE WHO HAS CONTEXTED BASED, YOU KNOW, BECAUSE I MEET PLENTY OF DATA SCIENTISTS, YOU MEET THEM, THEY WANT DATA, DATA, DATA. SO WHAT'S THE QUESTION, RIGHT? WE SEE THIS ALL THE TIME. I THINK THERE IS SOME BALANCE. I UNDERSTAND -- I THINK THE ISSUE WHEN YOU TALK TO A DEPARTMENT CHAIR, YOU KNOW, I CAN RELATE A LITTLE BIT, IT'S HOW YOU FRAME IT IN TERMS OF, LEARNED OVER THE PAST TEN YEARS WHEN I'M TALKING TO A HOSPITAL PRESIDENT, EVERYTHING IS FRAMED IN TERMS OF VOLUME, QUALITY, PATIENT SATISFACTION, YOU KNOW, THROUGHPUT, ET CETERA. RIGHT? WHEN I TALK TO THE CMO IT'S ONE THING. IF I TALK TO THE DEAN, HIS OR HER HOT BUTTONS ARE SOMETHING ELSE. I HAVE TO REMIND MYSELF WHO AM I TALKING TO, WHAT ARE THE HOT BUTTONS, HOW DO I FRAME THE SUBJECT? THAT'S MY LIFE. IT'S NOT ABOUT THE CODING SKILL. WHAT DOES IT DO FOR THE DEPARTMENT? I THINK MANY DEPARTMENTS IN THE PHYSICS AREA, I THINK DOSIMETRY WILL EITHER DISAPPEAR OR IT'S GOING TO HAVE TO BE REIMAGINED BECAUSE I JUST DON'T UNDERSTAND HOW IT COULD EXIST IN ITS CURRENT FORM A COUPLE YEARS FROM NOW WITH THE NEW -- YOU KNOW, SOFTWARE PLATFORMS, RIGHT? WHY DO YOU NEED SO MANY? MAYBE THEY WILL BE REPURPOSED FOR QUALITY ASSURANCE. I DON'T KNOW. BUT THESE ARE FUNDAMENTAL QUESTIONS. >> I ALSO SAY TO BE FAIR WE'VE SEEN A MODEL, NADINE CAN SPEAK TO THIS, AGAIN CHRIS AS WELL, THESE HYBRIDIZED -- MEDNET OR LIKE AARON GILLESPIE'S ECONTOUR, THESE ARE FOR ALL INTENTS AND PURPOSES LARGE SCALE INFORMATICS PROJECTS THAT ARE NOW KIND OF START-UP INDUSTRY OUTSIDE TRADITIONAL ACADEMIC AREAS, BUILDING UP A CADRE OF PHYSICIANS WHO ARE WORKING IN THESE AREAS. HAVE YOU SEEN THAT'S BEEN BUILDING INFORMATION, INFORMATICS CAREER RUNNING IN PARALLEL AS YOU'RE WITH YOUR CMO WORK, HOW DOES THAT WORK AS A MODEL FOR A PHYSICIAN-SCIENTIST DEVELOPMENT? >> YOU MEAN LIKE HOW TO WORK ON MEDNET AND BE A CLINICIAN? >> YEAH, HOW DO YOU CREDITTIZE THE WORK YOU'RE DOING ON THAT, IN THE ACADEMIC SETTING IN A WAY, AS A CAREER MODEL, THE NEXT PERSON WHO WANTS TO BE A NADINE, HOW DO WE SET THEM UP? >> HAVE YOU TO BE IN AN INSTITUTION THAT LOOKS FAVORABLY ON IT. OUR CHAIRMAN PETER GLASER STARTED BIOTECH COMPANIES IN THE PAST UNDERSTANDS AND TAKES PRIDE IN IT. TO HAVE HIS FACULTY, TWO MEMBERS AT A PLACE LIKE THIS, THE FIRST THING IS THE CULTURE. AND I DON'T THINK EVERY DEPARTMENT IS LIKE THAT. AND THEN IF YOU'RE WRITING GRANTS AND WRITING PAPERS, GETTING ACADEMIC CREDIT FOR THAT. AGAIN, I DON'T THINK EVERY DEPARTMENT WOULD TAKE THAT AS YOUR -- INTO ACCOUNT WHEN IT COMES UP FOR PROMOTION, SO IT'S REALLY FINDING AN INSTITUTION THAT ACTUALLY FOSTERS THAT ENVIRONMENT. >> GRASP THAT POTENTIAL SPACE. >> YEAH. >> AND AS FOR THE DOSIMETRIST BEING REPLACED, FOR NEW MODALITY, PROTON OR CARBON, STILL A LOT OF UNFINISHED PIECES THAT NEED SOMEONE TO TAKE CARE OF, AND WE DON'T REALLY HAVE A GOOD MODEL FOR PROTON PLAN YET. SO I SORT OF FORESEE IN THE FUTURE WHEN NEW TECHNOLOGY MODALITY COMES INTO PLAY, WE'RE STILL NEED A LOT OF HUMAN INTERVENTIONS, AND SLOWLY BEING REPLACED BY MACHINES, BUT STILL CONTINUOUSLY INNOVATING AND WE WILL ALWAYS BE IN THE FRONT END. >> I THINK TODD WAS MENTIONING EARLIER I HOPE WE'RE DOING ADAPTIVE PLANS TO KEEP THEM BUSY BY DOING FIVE TIMES THE WORK. WE'LL TAKE CHUCK AND JOHN AND CLOSE OUT. CHUCK? >> AN EDUCATION QUESTION. HOW WE APPROACH TRAINING CRITICAL THINKING, IT'S VERY EXCITING WHAT HAPPENS WITH MACHINE LEARNING NOW, PEOPLE ARE ENTRANCED BY TOOLS, A NUMBER OF PRESENTATIONS THAT COME UP WITH AUC TALKING ABOUT A DOSE-RESPONSE THRESHOLD IS MARGINAL NUMBER BUT PEOPLE HAVE NO INSTINCT FOR WHAT THAT MEANS. I WAS THRILLED WITH CAROLYN'S EXAMPLE, RADIOMICS, HOW FAR DID THAT DISCUSSION GO BEFORE ANYBODY CALLS B.S. ON THAT RESULT? [LAUGHTER] AND PEOPLE TEND TO -- >> I HOPE SHE'S NOT WATCHING THE LIVE STREAM. >> THEY ARE RELUCTANT TO SAY I DON'T KNOW WHAT THAT METRIC MEANS, HOW DO I TRANSLATE THAT, IT COMES WITH A LOT OF EXPERIENCE TYING IT IN CLINICALLY. AND IF WE'RE LOOKING TO TRAIN THE NEXT GENERATION I THINK IF WE JUST STOP AT TEACHING THEM HOW TO RUN A PYTHON AND TENSER NET WE'RE PROBABLY MISSING KEY ELEMENTS TO MAKE IT TRANSLATE MEANINGFUL INTO CLINICAL PRACTICE, HOW DO YOU INJECT THAT? IT'S NOT JUST THE YOUNG ONES DOING IT. IT WILL NEED PEOPLE WITH EXPERIENCE. >> I'LL GO ONE STEP FURTHER AND SAY IT'S NOT REALLY THE AUC. YOU HAVE TO LOOK AT THE OUTPUT OF THE MODEL. AN EXAMPLE IS WE HAD A WEIGHT LOSS PREDICTION MODEL, AND IT DID A TERRIBLE JOB AT PREDICTING WHICH PATIENTS WOULD HAVE TROUBLE, IF IT PREDICTED THEY WOULD HAVE A WEIGHT LOSS PROBLEM IT WAS NOT VERY GOOD. 50- 50 THEY WOULD LOSE WEIGHT. IF THEY WOULDN'T LOSE WEIGHT, IT WAS 95% ACCURATE, GOOD, RIGHT? IF I KNOW THEY ARE NOT GOING TO LOSE WEIGHT THEY DON'T NEED A FEED BE TUBE. THE OTHERS, WELL, 50- 50 FEEDING TUBE. AUC WASN'T THAT GOOD. IS THE MODEL USEFUL FOR THE DECISION YOU'RE TRYING TO MAKE? >> I THINK LIKE YOU HEARD FROM CAROLINE, ADAM AS WELL, YOU NEED DOMAIN EXPERTISE AND A SENSE OF WHAT IS THE PROBLEM BUT ALSO THESE PROBLEMS ACTUALLY LEND THEMSELF TO THE KIND OF COUNTER FACTUAL ROBUSTNESS TESTING THAT WE DON'T NORMALLY DO IN THINGS LIKE CLINICAL TRIALS. YOU SAW WITH TODD, I CAN LOOK AT A LASSO AND REALIZE MY PENALIZATION FACTOR IS DRIVING THIS, NOT THE ACTUAL DATA PERHAPS, SO I'LL MOVE TO RIDGE REGRESSION, TESTABLE HYPOTHESES, YOU CAN INJECT NOISE, DO FRAGILITY TESTING. TEACHING METHODOLOGICAL RIGOR AND EVALUATION BECOMES JUST AS IMPORTANT AS METHODOLOGIC EVALUATION AND MODEL CONSTRUCTION, RIGHT? BUT AGAIN THAT TAKES TIME. THAT'S NOT SOMETHING YOU JUMP INTO, YOU KNOW, SIX-WEEK CODING CAMP WILL -- I THINK IT'S EASIER TO LEARN PYTHON THAN IT IS TO LEARN THAT KIND OF CRITICAL THINKING. JOHN? LAST QUESTION HERE. >> YES, SO I THINK WE STARTED TALKING ABOUT WHAT I WAS GOING TO BRING UP, MAYBE WE DON'T -- MAYBE NOT EVERY PHYSICIAN NEEDS TO BE A CODER BUT THEY DO NEED THIS CRITICAL THINKING ASPECT, HOW DO I APPLY MY DOMAIN EXPERTISE TO TRANSLATE AUC INTO SOMETHING CLINICALLY MEANINGFUL, KIND OF LIKE WHAT WAS SET, POSITIVE OR NEGATIVE PREDICTIVE VALUE TO ACTUALLY USE A LOW AUC MODEL IN THE RIGHT CONTEXT. I DON'T THINK THERE'S A LOT OF -- I WAS GOING TO TALK ABOUT MOOCS, ONLINE COURSES AT BASIC LEVELS, I DON'T THINK THERE ARE MOOCS FOR CRITICAL THINKING BECAUSE IT'S A NEW AREA, THAT'S POTENTIALLY A NEW AREA, I'M INTERESTED IN MAKING THESE. >> WE'RE STILL A LEVEL, OUR BEST RESOURCE AS A COMMUNITY, ASK FOR AT VICE ON TOUGH CASES. MOVING FROM HOW DO I TREAT A T3 WITH THESE FEATURES, I THINK ACTUALLY SOMEDAY PARTICIPATION IN THINGS LIKE THAT, THAT BECOMES A MINEABLE RESOURCE FOR US TO THEN SAY WHERE ARE THESE EDGE CASES, AS YOU'RE COLLECTING THE HUGE BODY OF QUESTIONS ASKED, IS THAT LEADING TO SOMETHING WHERE YOU CAN POTENTIALLY IDENTIFY WHAT ARE, YOU KNOW, USE CASES WHERE A DECISION MODEL IS A HIGH PRIORITY? DO YOU SEE THAT AS SOMETHING THAT COULD BE DONE WITHIN THE CONTEXT OF WHAT'S GOING ON? >> YEAH, SO WE COLLECT A LOT OF DATA FOR THAT PURPOSE. THIS VISION OF MEDNET, USING MORE TECHNOLOGY TO PUT IT OUT THERE. PROBABLY SOMETHING THAT'S COMING VERY SOON. AND I THINK WE'RE JUST AT THE BEGINNING OF THAT. BUT IN TERMS OF LIKE A DATA SOURCE WE HAVE BEEN COLLECTING MORE DATA THAN WE EVEN NEED FOR WHAT WE DO RIGHT NOW FOR THAT PURPOSE. >> OKAY. I WANT TO THANK THE PANELISTS FOR AN INCREDIBLE AND LIVELY DISCUSSION. I WANT TO JUST MENTION ONE I THINK THAT THE THEMES WE'VE SEEN BRINGING UP IN PARTICULAR HOW WE'VE IDENTIFIED AND FOCUSED ON TOXICITY IN SOME TALKS, IN THE NEXT ITERATION OF THE WORKSHOP WE'LL BE INTERESTED IN ALSO FOCUSING ON HOW WE'RE USING DATA SPECIFICALLY ON COLLECTING AND INTEGRATING PATIENT REPORTED OUTCOMES AND WE'VE SEEN INDUSTRY INTEREST IN ACQUISITION OF MODELS OR COMPANIES THAT ARE FOCUSED ON THAT PART 2000 2000 -- TOO, AND EXPLOSION IN DEVICES, WEARABLE. NOT A MATTER OF WHAT TOXICITY PHYSICIANS ARE SEEING WHEN THEY SEE THE PATIENTS, WHAT TOXICITIES ARE PATIENTS EXPERIENCING ALL THE TIME. FOR US TO MEASURE THAT WHEN THEY ARE AT HOME OR IN THE MIDDLE OF THE NIGHT, THAT'S GOING TO BE ANOTHER LAYER THAT'S GOING TO MAKE A POTENTIAL IMPACT IN TERMS OF HOW WE FEED THAT DATA BACK INTO OUR MODELS. AND IMPROVE ON PATIENT CARE. STAY TUNED. IT WILL BE VERY EXCITING. THANK YOU FOR A GREAT PRESENTATION. I'LL WORK ON GETTING EVERYBODY OUT A FEW MINUTES EARLY SO I WANT TO GO AHEAD IF IT'S OKAY WITH EVERYBODY TO -- DOES ANYBODY WANT TO TAKE A BREAK OR GO AHEAD AND MOVE ON? OKAY. EVERYBODY IS GOOD. I WANT TO INTRODUCE DR. RON ENNIS, MEMBER OF THE ASTRO BOARD OF DIRECTORS, ALSO PROFESSOR AT RUTGERS CANCER INSTITUTE, DIRECTOR OF THE CLINICAL NETWORK THERE. IT'S A PLEASURE HAVING YOU, DR. ENNIS. >> GOOD AFTERNOON. GOOD NEWS. MY TALK IS NOT EVEN THE FULL TALK. WE'LL GET OUT EVEN EARLIER. THANK YOU, ANSHU, FOR INVITING ME. I'M HONORED TO BE HERE. FOR ME THIS IS COMING FULL CIRCLE. I PROBABLY TOOK WHAT WAS THE FIRST A.I. COURSE AT COLUMBIA COLLEGE A GENERATION AGO, IT WAS MORE ASPIRATIONAL AT THAT TIME THAN REAL. SO IT'S GREAT THAT WE'VE COME FULL CIRCLE TO SOMETHING REAL. ANSHU ASKED ME PARTICULARLY IN THE TITLE HERE, SOCIETY FROM ASTRO, PERSPECTIVE OF ASTRO AND HOW ASTRO COULD RELATE TO WHAT WORK THAT'S BEEN GOING ON HERE. I'VE TAKEN THE LIBERTY OF THINKING ABOUT SOCIETY, A VARIETY OF OTHER WAYS INCLUDING THE PATIENTS AND SOCIETY AT LARGE, SOCIETY OF RADIATION ONCOLOGISTS, WHAT A.I. MIGHT BE ABLE TO OFFER, PRACTICING RADIATION ONCOLOGISTS. BUT EVEN BEFORE I DELVE INTO THAT, SITTING HERE TODAY I WANT TO POINT OUT TO THE COMMUNITY HERE IF IT'S NOT AWARE THAT WE, IF I COULD INCLUDE MYSELF, OR YOU, ARE AN EMERGING SOCIETY WITHIN RADIATION ONCOLOGY. AND I THINK IT WOULD BE PRODUCTIVE FOR THE GROUP TO THINK, START THINKING THAT WAY, AND THE BIGGEST THING THAT CAN ACCOMPLISH IS STARTING TO THINK CONSTRUCTIVELY AND PROACTIVELY ABOUT THE NEEDS OF THE COMMUNITY. SO WE'VE HEARD A BUNCH OF THINGS, AND THIS COMMUNITY NEEDS TO INTERACT AND GET THINGS FROM OR ASK FOR THINGS FROM OR INTERACT WITH NCI, VENDORS, ASTRO, CLINICIANS, A VARIETY OF STAKEHOLDERS, AND EDUCATION, WE HEARD A FAIR AMOUNT ABOUT EDUCATION, MEDICAL SCHOOL, SO WHETHER THIS WAS A FORMAL THING OR INFORMAL THING I WOULD ENCOURAGE THINKING ABOUT THAT, YOU MAY KNOW THERE'S A SUBSPECIALTY, RADIATION ONCOLOGY, THEY HAVE THEIR OWN SOCIETY. IS ANY COMMUNITY READY OR DO THEY NEED THAT? I DON'T KNOW. IT'S WORTH THINKING ABOUT THAT. IN TERMS OF TOPICS I'M GOING TO, AGAIN, FROM A SOCIETAL SITUATION, WHAT DO I SEE AT A HIGH LEVEL AND THIS IS STUFF THAT OTHER PEOPLE HAVE SPOKEN ABOUT HERE AND ALLUDED TO AS WELL. A COUPLE THINGS, PERHAPS DIFFERENT PERSPECTIVES, AND AREAS THAT I THINK ARE MOST FOR ME AT LEAST MOST EXCITING ABOUT WHAT THIS SPACE CAN ACCOMPLISH IS LEADING US TOWARDS A GENUINE MEASURE OF QUALITY. I'LL GET TO WHAT I MEAN BY THAT IN A MINUTE. USE IN ADAPTIVE PLANNING, FOR MAKING THAT A REALITY. DECISION SUPPORT, AND GLOBAL HEALTH. SO INSTITUTE OF MEDICINE, PEOPLE REFERRED TO THIS A COUPLE TIMES TODAY, LEARNING HEALTHCARE SYSTEM, INSTITUTE OF MEDICINE ENVISIONED A WHILE AGO, ARTICULATED WELL. FOR ME STRUCK A CHORD, WHAT I THOUGHT THIS ENDEAVOR COULD ACCOMPLISH AND SEEMS TO BE MOVING TOWARDS. AND OBVIOUSLY GENUINELY GENERATING DATA ON AN ONGOING BASIS, BEING ABLE TO LEARN IN REAL TIME, TO GENUINELY IMPACT HEALTH CARE. WE'RE NOT THERE. NOT REALLY. AND BUT ONE CAN ENVISION WITH TOOLS THAT WE'RE WORKING ON HERE AND ELSEWHERE, GETTING THERE. I MUST SAY IT'S UNCOMFORTABLE FOR ME WHEN I THINK ABOUT IT THAT I DON'T KNOW OF ANOTHER BUSINESS THAT DOESN'T REALLY MEASURE ITS SUCCESS. WE REALLY DON'T MEASURE OUR SUCCESS, IF OUR SUCCESS IS HOW OUR PATIENTS ARE REALLY DOING. WE DON'T REALLY KNOW THAT. THE VAST MAJORITY OF PRACTITIONERS HAVE NO IDEA HOW GOOD THEY ARE DOING. THEY THINK THEY ARE GOOD. EVERYONE THINKS THEY ARE GOOD. A LOT OF THEM ARE. BUT SOME MAY NOT BE. AND NO ONE CAN REALLY LEARN BECAUSE THEY DON'T KNOW WHAT THE GAPS ARE. THE ONLY EXCEPTION ARE A FEW PLACES THAT TRY AGGRESSIVELY TO CAPTURE DATA ON ONGOING BASIS, AND BEING SOMEONE WHO DID THAT, IT'S QUITE LIMITED, YOU LOSE PATIENTS TO FOLLOW-UP. PATIENTS COME IN, FORGET TO TELL YOU ABOUT STUFF THAT HAPPENED. OR THEY DON'T BRING THE RECORD. DO YOU MAKE THE EFFORTS TO GET THE RECORD TO UPDATE YOUR DATA? THERE'S A LOT OF GAPS, A LOT OF PROBLEMS IN THAT. BUT I THINK WE CAN GET THERE, BUT THIS REALLY CAN CRUCIALLY IMPACT PATIENT DATA IN A SUBSTANTIAL WAY, AND IN MY OPINION EVEN IN GREATER WAY THAN GREAT DRUGS. IF WE COULD DEVELOP A HEALTHCARE LEARNING SYSTEM THAT WAS ACTIVELY AND ACCURATELY KNOWING WHAT WAS GOING ON, IT WOULD HAVE A TREMENDOUS IMPACT. WHAT ARE THE BARRIERS? WE TALKED ABOUT MANY HERE ALREADY. THERE'S A PROBLEM OF INFRASTRUCTURE TO COLLECT DATA. WEHAVE EFFORTS OUT THERE TRYING TO DO THAT WITH ONCOSPCE AND ONCORA, FOR EXAMPLE, WHICH IS FABULOUS. WE'LL SEE HOW THOSE DEVELOP, WHETHER THEY CAN BE IN THE CLINIC. PHYSICIANS ARE VERY BUSY, A LOT ARE NOT MOTIVATED NECESSARILY SO IT WILL BE USER FRIENDLY AND SEAMLESS. OBVIOUSLY DATA IS IN DIFFERENT PLACES, WE NEED TO FIND A WAY TO BRING THAT TOGETHER IN A WAY THAT'S NOT FRAGMENTED TO HAVE AN IMPACT. REALISTICALLY, BEING AN ADVOCATE FOR THIS, WE NEED MONEY, RIGHT? WE NEED MONEY. HOW DO YOU FIND MONEY TO HAVE AN INSTITUTION, THIS IS LIKELY TO BE THE AT LEAST FIRST STEP, SOME WILL DECIDE THEY SEE THE VALUE AND THEY BELIEVE THERE WILL BE A RETURN ON INVESTMENT AND WILL LEAD FORWARD. BUT THAT'S NOT AN EASY CASE TO MAKE. BUT, AGAIN, THIS GROUP MIGHT BE ABLE TO HELP PEOPLE DO THAT. BUT WITHOUT MONEY, WITHOUT FINANCIAL INCENTIVES, IDEALISM ONLY GETS YOU SO FAR. AND A VERY IMPORTANT POINT, RANDOMIZED CLINICAL TRIAL AND REAL WORLD DATA OUGHT TO BE COMPLEMENTARY, EVERYONE BELIEVES THAT, INCLUDING MYSELF, BUT A LOT OF PEOPLE DO NOT BELIEVE THAT. I HOPE YOU'RE AWARE OF THAT. PROBABLY THE VAST MAJORITY OF RADIATION ONCOLOGISTS, OF PEOPLE ON THE BOARD, REALLY THINK RANDOMIZED TRIAL IS GOLD, AND THERE'S NOTHING BESIDES GOLD. WE HAVE OUR WORK TO DO. AD I THINK ONE OF THE ISSUES IS REPRODUCIBILITY OF THE DATA, I THINK THERE'S A SKEPTICISM, NOT I THINK, I KNOW THERE'S A SKEPTICISM, OH, STATISTICS YOU COULD MANIPULATE TO SHOW WHATEVER YOU WANT. IT'S NOT AS PRECISE AS A TEST TUBE AND PARTICULAR METHODOLOGY, EVEN MORE SO IN THIS AREA THERE'S A PERCEPTION AND SOMEWHAT THAT'S REAL IF YOU WORK ENOUGH TIMES AND DON'T NECESSARILY REVEAL HOW MANY TESTS BEFORE YOU GOT YOUR GOOD P-VALUE, FOR EXAMPLE, THERE'S A PERCEPTION THAT IT'S NOT REAL. IT'S NOT REAL SCIENCE, NOT REALLY VALID. WE NEED TO FIND A WAY AROUND THAT. ONE OF THE CRUCIAL ISSUES I BELIEVE IS FINDING A METHODOLOGY THAT BECOMES ESTABLISHED, THESE ARE THE METHODS YOU HAVE TO FOLLOW TO BE ACCEPTED. AND BE ABLE TO DEMONSTRATE HOW IT'S REPRODUCIBLE BY ANOTHER GROUP. THAT STARTS TO CONVINCE PEOPLE SCIENTIFIC IN JUST THE SAME WAY AS SOMETHING ELSE, IT REALLY IS VALID DATA. WHAT'S THE GAIN? THE GAIN IS OBVIOUS. BUT DOCTORS THEMSELVES, HEALTH WILL KNOW WHAT THE HELL IS GOING ON AND HOW WELL WE'RE DOING. NOT JUST ASSUMING I'M DOING WHAT THE TRIAL DID OR JUST LIKE PATRICK WALSH, JUST AS GOOD. ANYONE WHO HAS DONE -- EVERY SURGEON WILL SAY I DO THE SAME SURGERY AS DR. WALSH. WE DO THE SAME THING. THERE'S A HUGE GAIN IF WE COULD DO THIS. IN THE END IT'S ABOUT BETTER PATIENT CARE. I SEE ADAPTIVE PLANNING AS A HUGE OPPORTUNITY AND WHERE A.I. REALLY WILL I THINK MAKE OR BREAK IT FRANKLY AND COULD DO A LOT OF POTENTIAL IN BRINGING THIS TO FRUITION AND MAKING THIS HOW WE TREAT EVERYONE. IT ALMOST SEEMS LIKE WHY NOT, WITH THE RIGHT TOOLS. I DON'T THINK IT'S HARD TO ENVISION, SETTING A PATIENT UP, IMAGE THE PATIENT, A.I. TOOL CAN DECIDE WHETHER THE PATIENT IS ALIGNED PROPERLY OR NOT. AUTOSEGMENT THAT IMAGE, CREATE A PLAN, RIGHT ON THE SPOT BASED ON ORGANS AND CONSTRAINTS IT'S BEEN GIVEN, AND POTENTIALLY WHAT' HAPPENED TO THE PATIENT, SUMMING UP DOSIMETRY TO THIS POINT, AND TREATING IN THE TREATMENT SESSION. I SEE THAT AS VERY DOABLE, IF YOU PIECE ALL THESE A.I. PIECES TOGETHER. AND I DON'T THINK IT CAN HAPPEN HAPPEN IN A PRACTICAL WAY ACROSS THE NATION WITHOUT A LOT OF A.I. TOOLS. ANOTHER AREA WHERE I THINK THERE'S POTENTIAL GAIN IS IN HELPING PHYSICIANS. THIS IS THE SOCIETAL PERSPECTIVE. THERE'S TOO MUCH DATA. TOO MANY STUDIES. WHEN I STARTED, I KNEW EVERY STUDY. LEE WILSON AND I KNEW ALL THE DATA. IT'S IMPOSSIBLE NOW. JUST IMPOSSIBLE TO KEEP UP. SO WE NEED HELP. SO DOCK DOCTORS DO THE BEST THEY CAN BUT TOOLS TO HELP THE DOCTORS COULD REALLY GO A LONG WAY TO HELPING MAKE DECISIONS, THIS GOES UNDER WHAT'S CALLED DECISION SUPPORT. AND YOU COULD ENVISION THIS IN A FEW WAYS. THIS DOVETAILS POTENTIALLY WITH MEDNET OR MAYBE ANOTHER PERSON WILL COME UP WITH AN IDEA, IT COULD BE THE THING DOCTORS WILL SET PARAMETERS FOR WHAT IT WANTS, MAYBE WANTS IT TO EXTRACT DATA STUDIES OF CERTAIN PARAMETERS BASED ON MAYBE IT WILL BE BASED ON PATIENT INFORMATION. MAYBE GUIDELINES OR SOME COMBINATION. IT COULD BE BUILT IN A FLEXIBLE WAY THAT EVERYONE -- PEOPLE CAN TAILOR IT, MAYBE, MAYBE NOT, MAYBE A HEALTH SYSTEM WILL DETERMINE CARE. THAT WOULD BE A MORE DARK USE FOR OF FROM OUR PERSPECTIVE. I THINK IT'S AN UNMET NEED THAT A.I. COULD HAVE A HUGE IMPACT ON PHYSICIAN QUALITY OF LIFE AND PATIENT CARE BECAUSE YOU'LL MAKE BETTER RECOMMENDATIONS FOR THE PATIENTS. AND LASTLY, I THINK THIS COULD HELP US WITH OUR RADIATION ONCOLOGY GLOBAL HEALTH PROBLEM. I THINK MANY ARE AWARE OUTSIDE THE WESTERN WORLD AVAILABILITY OF RADIATION ONCOLOGY CARE AS WELL AS A LOT OF OTHER HEALTH CARE, BUT WE'RE RADIATION ONCOLOGISTS IS POOR. A LOT IS RELATED TO MONEY. BUT ONE CAN ENVISION A PRETTY BASIC CLINIC IF WE HAVE THE TOOLS OF ADAPTIVE PLANNING AND DECISION SUPPORT TOOLS CAN ENVISION A COST EFFECTIVE CLINIC THAT CAN DO CLASS SOLUTIONS, ON QUICK THINGS, THAT COULD HAVE A MAJOR IMPACT ON GLOBAL HEALTH, FROM THE SOCIETAL PERSPECTIVE OF GLOBAL HEALTH, A.I. AND RADIATION ONCOLOGY HAVE A TREMENDOUS IMPACT ON WORLDWIDE HEALTH. WHAT COULD ASTRO DO? ASTRO, I'M ON THE BOARD OF DIRECTORS, MY AREA IS GOVERNMENT RELATIONS BUT I HAVE A PASSION FOR THIS AREA AS WELL FROM MY BACKGROUND. ASTRO IS NOT SUGAR DADDY THAT CAN SOLVE EVERY PROBLEM BUT ASTRO VERY MUCH LIKES TO AND WANTS TO PROMOTE ALL AREAS WITHIN THE SPECIALTY TO THE DEGREE THAT IT CAN. AGAIN I THINK TO SOME DEGREE ASTRO AS A SOCIETY NEEDS TO HEAR FROM ITS STAKEHOLDERS, WHAT ARE THE NEEDS. FOR EXAMPLE, THINGS THAT ONE COULD ENVISION POTENTIALLY ASTRO DOING, ASTRO DOES SMALL BUT PILOT GRANTS AND STARTING GRANTS. AND PARTNERING, SOME PARTNERSHIP GRANTS WITH SOME PHARMA, BUT IT'S NOT HARD TO IMAGINE WE COULD REACH OUT TO SOME I.T. OR A.I. COMPANIES AND MAYBE DEVELOP A SHARED GRANTS TO HELP THINGS GOING. ADVOCACY WORK, AN AREA I MAINLY DO. COULD BE VERY IMPORTANT IN INTERACTING WITH MEDICARE OR NCI IF WE IDENTIFY NEEDS THAT THERE ARE. ONE THING THAT I KIND OF -- IT CAME ACROSS MY MIND DURING THE DAY, NCI BUILDING A GREAT INFRASTRUCTURE BUT THE PIECE THAT'S MISSING IS THE MEDICARE PIECE, AND THE SEER MEDICARE PIECE MAYBE WE COULD FIGURE OUT A WAY THAT COULD BE BROUGHT IN, CURRENTLY IT'S VERY CUMBERSOME, REALLY AN ASIDE. SO PAYER ISSUES SOMETIMES COME INTO PLAY, ASTRO HAS A ROBUST AREA WORKING WITH THAT. SPECIALTY MEETINGS IS SOMETHING ASTRO IS OPEN TO AND HAS DONE FOR OTHER AREAS AND AGAIN THESE THINGS LARGELY RELY ON THE COMMUNITIES COMING FORWARD TO ASTRO LEADERSHIP, WE NEED THIS, WE NEED THAT, SOME KIND OF DIALOGUE. SO SPEAKING ON BEHALF OF ASTRO AND THE BOARD WE WELCOME ANY OPPORTUNITY TO HELP FACILITATE THIS AREA OF OUR SPECIALTY. AND THANK YOU FOR YOUR ATTENTION. I'M HAPPY TO TAKE ANY QUESTIONS. [APPLAUSE] >> PREACHING TO THE CHOIR. [LAUGHTER] >> THANKS, DR. ENNIS. I WANT TO THANK EVERYONE AGAIN FOR COMING OUT. AND ATTENDING THIS FIRST WORKSHOP AND I HOPE YOU WALK AWAY TODAY, YOU KNOW, ENTHUSIASTIC ABOUT MEETING OTHER PEOPLE LIKE MINDED AND SEEING THAT THIS IS REALLY AN AREA THAT THE FIELD IS FOCUSING ON. I WANT TO AGAIN EMPHASIZE ENGAGING WITH NCI, ENGAGING WITH ASTRO, WITH YOUR COLLEAGUES. PLEASE DO REACH OUT TO MYSELF, MICHELLE. I WANT TO THANK THE CENTER FOR STRATEGIC SCIENTIFIC INITIATIVES FOR SPONSORING. I THANK KEVIN CAMPHAUSEN FOR SPONSORING. WE WILL CERTAINLY BE INVITING YOUR COMMENTS AND THOUGHTS AFTERWARDS AS WELL. THANK YOU SO MUCH, AND HAVE A SAFE JOURNEY HOME. THANKS. [APPLAUSE]