>> WELCOME TO THE LANGUAGE PROCESSING SPECIAL INTEREST GROUP TODAY, FEATURING 2 DISTINGUISHED SPEAKERS DR. KARIN VERSPOOR AND DR. HONGFANG LIU, DR. VERSPORR, IS INCOMING DEAN SCHOOL OF COMPUTING TECHNOLOGIES AT MELBOURNE, AUSTRALIA AND SHE JOINS US FROM MELBOURNE AUSTRALIA. DR. LIU IS PROFESSOR OF BIOMEDICAL INFORMATICSA THE MAYO CLINIC AND SHE WILL BE JOINING US FROM ROUGH ATOM CHESTER, USA. --ROCHESTER, USA, BOTH ARE SPEAKING ABOUT PROCESSING LANGUAGE TO THE COMMUNITY THROUGH THE PANDEMIC. WITHOUT FURTHER DELAY, I WOULD LIKE TO INVITE DR. VERSPOOR, COULD YOU GO TO THE SECOND SLIDE, SO THIS IS HOW YOU CAN COMMUNICATE QUESTIONS. OUR Q&A SESSION WILL BE MODERATED TO QINGYU.CHEN@NIHBNT GOFF ANDY WOO WILL PRESENT THEM TO THE SPEAKERS. WE WILL HOLD QUESTIONS UNTIL THE LAST 10 MINUTES OF THE QUESTION. AND NOW WOULD LIKE TO WELCOME DR. VERSPOOR. >> THANK YOU SO MUCH FOR HAVING ME AND I'M REALLY HAPPY TO BE HERE. AS LANA MENTIONED THE NATURAL LANGUAGE PROCESSING COMMUNITY HAS HAD QUITE A STRONG AND ACTIVE RESPONSE TO THE ISSUE OF COVID. AND MY GROUP IS NO EXCEPTION. SO WHEN COVID STARTED HAPPENING AND LIKE MANY OF YOU, WE WERE SENT HOME AND TRYING TO WORK OUT OUR NEW REALITY OF WORKING FROM HOME, WE FELT COMPELLED TO REALLY EXAMINE HOW WE COULD LEVERAGE OUR SKILLS IN ORDER TO HELP SUPPORT THE COVID RESPONSE. AND WE WERE LUCKY THAT AT THE TIME THERE WERE DATA SETS BEING COLLECTED THAT ALLOWED US TO DO THAT. SO THIS WORK WAS DONE IN THE CONTEXT OF A TRAINING CENTER WE HAVE WHERE WE HAD A NUMBER OF Ph.D. STUDENTS AND POST DOCS WHO ARE ALL BEING TRAINED IN THE AREA OF AI FOR MEDICAL TECHNOLOGIES. SO COVID SEEMED LIKE A PERFECT PLACE FOR US TO BE APPLYING OUR SKILLS. SO LET ME GIVE YOU A BIT OF CONTEXT FOR WHAT WE ARE TRYING TO DO. MANY OF YOU WILL BE AWARE AND I KNOW WE HAVE THE DEVELOPERS OF THIS TOOL, AND THIS RESOURCE ONLINE, SO THANK YOU QINGYU, THERE IS A GROWING BODY OF LITERATURE AROUND AND IN FACT THE LIT COVID TOOL WHICH IS REPRESENTED HERE HAS BEEN CAPTURING A TREMENDOUS AMOUNT OF RESEARCH. I MEAN YOU CAN SEE HERE THAT, YOU KNOW IN FEBRUARY, ALREADY, THERE WERE PUBLICATIONS COMING OUT RELATED TO COVID-19 AND THAT HAS ONLY GROWN AND YOU KNOW THERE'S SOME REMARKABLE NUMBERS OF PUBLICATIONS HERE 4000 PUBLICATIONS IN 1 WEEK IN AUGUST, IT'S JUST AMAZING. I HAVEN'T UPDATED THIS IN A COUPLE OF MONTHS AND OBVIOUSLY THE TREND HAS CONTINUED SO THERE'S BEEN A MAXIMUM AMOUNT OF PUBLICATIONS COMING OUT AND THAT OF COURSE IS PART OF A LARGER TREND. AGAIN I HAVEN'T UPDATED THIS GRAPH BUT THE TREND CONTINUES. YOU KNOW NOW WE HAVE CLOSE TO 30 MILLION OR AROUND 30 MILLION PUBLICATIONS INDEXED IN PUBMED AND IT'S JUST A REMARKABLE DOCUMENT. AND THE REAL QUESTION THAT ALL OF THIS INFORMATION AND THIS RESEARCH LITERATURE RAISES IS HOW DO WE ACCESS IT, HOW DO WE FIND THE ARTICLES THAT ARE RELEVANT TO OUR RESEARCH QUESTIONS, HOW DO WE CYST THROUGH ALL OF THE INFORMATION IN ORDER TO REALLY HONE IN ON THE INFORMATION THAT'S MOST RELEVANT TO US AND FOR COVID, THIS WAS NOT AN EXCEPTION, WE HAD LOTS OF NEED TO KIND OF DIG THROUGH ALL OF THIS LITERATURE AND THE TYPICAL WAY OF COURSE THAT WE ACCESS INFORMATION IN THE LITERATURE IS THROUGH SEARCH. SO WE HAVE TARGETED QUERIES WHERE OUR USER FORMULATULATES A QUESTION AND TYPES IT IN IN NATURAL LANGUAGE, SOMETIMES USING STRUCTURED QUERIES THROUGH THE MESH VOCABULARIES FOR INSTANCE AND THE PROCESS IS, YOU ENTER YOUR QUERY, YOU GET A SET OF RESULTS BACK, THE USER SCANS THOSE RESULTS AND YOU MIGHT HAVE ACCESS TO SOME LITTLE SNIPITS OR SOME PREVIEW OF THE CONTENT, MAYBE THE ABSTRACT BUT THE USER IS DOING A LOT OF WORK OF BASICALLY GOING THROUGH THE LIST AND SELECTING THINGS THAT THAT ARE RELEVANT AND WHEN YOU'RE FACED WITH A TARGETED RESEARCH, QUICK QUESTION, THIS IS FINE BECAUSE YOU DOING THIS ON A REGULAR BASIS AND YOU KNOW YOU CAN UPDATE YOURSELFOT LATEST RESEARCH BY LOOKING AT THE DATE ORDER AND JUST UPDATING FROM TIME TO TIME. IN THE CONTEXT OF COVID WHERE WE HAD THIS MASSIVE QUANTITY OF LITERATURE AND FAR LESS CLEAR KIND OF INFORMATION NEED, WE HAD A SEARCH IN THE TRADITIONAL SENSE WOULD PROBABLY BE NOT QUITE ADEQUATE AND IN FACT, I FOUND MYSELF TALKING TO A JOURNALIST NAMED JEFFREY BRAYNARD AT SCIENCE MAGAZINE WHO WROTE THIS ARTICLE ABOUT SCIENTISTS DROWNING IN COVID-19 PAPERS AND REALLY RAISED THE ISSUES OF HOW TEXT MINING COULD SUPPORT SCIENTISTS TO DO A BETTER JOB OF NAVIGATING THAT AND 1 OF THE POINTS I MADE TO HIM WHEN I WAS TALKING TO HIM WAS THAT WE NEEDED TO THINK ABOUT THE USERS IN THIS CONTEXT AND REALLY THINK ABOUT WHETHER THE TOOLS THAT WE'VE BEEN GIVING THE USERS WERE SUFFICIENT AND IN ORDER TO MEET THEIR NEEDS. AND THE TEXT MINING COMMUNITY HAS OF COURSE BEEN WORKING VERY HARD OVER A DECADE TO TRY TO ADDRESS THE NEEDS OF ORGANIZING THE INFORMATION IN THE LITERATURE BUT WE HAVEN'T ALWAYS HAD THE KIND OF USER NEEDS RIGHT IN THE FRONT OF MIND. OUR WORK WAS FACILITATED BY AS I SAID THE AVAILABILITY OF THIS LITERATURE AND IN PARTICULAR THROUGH A CAGLE CHALLENGE, THE PEOPLE AT THE LNAI INSTITUTE RELEASED A LARGE AND GROWING DATA SET OF COVID-19 LITERATURE WHICH WAS MADE AVAILABLE TO THE COMMUNITY AND THE CHALLENGE WAS REALLY AROUND HOW CAN WE LEVERAGE NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL STRATEGIES TO HELP SCIENTISTS NAVIGATE THIS LITERATURE AND IMPORTANTLY TO FIND NEW INSIGHTS. WE OF COURSE HAD THE TREK ACTIVITY SPONSORED BY NIST WHICH ALSO LOOKED AT ORVEG AND SO THEY RAN A SERIES OF EVALUATIONS AROUND INFORMATION RETRIEVAL IN ORDER TO EXAMINE HOW WE COULD LEVERAGE AI IS NATURAL LANGUAGE PROCESSING IN THE CONTEXT OF INFORMATION RETRIEVAL IN ORDER TO ADDRESS THE QUESTIONS THAT PEOPLE HAVE AROUND COVID. BUT THIS WAS FOLLOWING THE KIND OF TRADITIONAL SEARCH MODEL IN THAT CONTEXT. AND I STARTED KIND OF LOOKING AT THE QUESTIONS PEOPLE WERE ASKING ABOUT COVID AND REALIZED THAT THEY WERE QUITE DIFFERENT QUESTIONS THAN THE TYPICAL QUESTIONS PEOPLE HAVE WHEN THEY DO SEARCH THE LITERATURE AND SO THESE ARE SOME OF THE QUESTIONS THAT WERE POSED BY THE WORLD HEALTH ORGANIZATION AND THAT WERE THEN ADOPTED IN THE CAGLE CHALLENGE AND IF YOU READ THEM, YOU GET A SENSE THAT THEY'RE VERY KIND OF OPEN ENDED QUESTIONS. SO 1 OF THE QUESTIONS WAS, WHAT HAS BEEN PUBLISHED ABOUT MEDICAL CARE PRESUMABLY IN THE CONTEXT OF COVID, SO HOW DO WE CARE FOR PATIENTS WITH COVID? WHAT DO WE KNOW ABOUT VACCINES AND THERAPEUTICS? WHAT DO WE KNOW ABOUT COVID-19 RISK FACTORS? THESE ARE ALL EXTREMELY OPEN END ENDED QUESTIONS, WE'RE NOT ASKING ABOUT THE EFFICACY OF A PARTICULAR THERAPEUTIC OR THE--WHETHER A PARTICULAR SYMPTOM IS RELATED TO COVID. WE'RE ASKING THESE OPEN ENDED QUESTIONS WHICH REALLY ARE ABOUT GATHERING INFORMATION WHERE WE DON'T QUITE KNOW WHAT EXACTLY WE'RE LOOKING FOR. A WAY THAT THE MORE TARGETED MATCHING OF TERMS AND CONCEPTS THAT'S DONE IN TRADITIONAL SEARCH AND CONCEPT RECOGNITION SUPPORTS. SO WE WERE ASKING, OKAY, WHAT'S BEYOND SEARCH, RIGHT? WHAT IS BEYOND THE KINDS OF TRADITIONAL INFORMATION RETRIEVAL MECHANISMS. AND WE FELT THAT NATURAL LANGUAGE PROCESSING ACTUALLY WAS REALLY PART OF THE ANSWER, SO GOING BEYOND THE WORDS IN THE ARTICLES AND THE WORDS FROM THE QUERY TO ACTUALLY GENERALIZE AND HAVE CONCEPTS REPRESENTED RATHER THAN WORDS BUT ALSO IMPORTANTLY HAVING RELATIONSHIPS REPRESENTED AND CATEGORIES OF INFORMATION. ALIGNING TO THAT NOTION THAT WE SAW IN THE QUESTIONS ON THE PREVIOUS SLIDE THAT YOU KNOW WERE INTERESTED IN THERAPEUTICS, AND WE'RE INTERESTED IN VACCINES AS A CATEGORY WITHOUT NECESSARILY TARGETING A SPECIFIC THERAPEUTIC OR VACCINE. AND THAT REALLY REQUIRES US TO KIND OF HOP UP A LEVEL AND GENERALIZE TO THE CATEGORICAL INFORMATION. SO, 1 OF THE ASPECTS OF WHAT'S BEYOND SEARCH AND REALLY LEVERAGING NATURAL LANGUAGE PROCESSING TO TRANSFORM DOCUMENTS INTO THIS MORE KIND OF GENERALIZED REPRESENTATION. THAT HAS BEEN EXPLORED IN THE CONTEXT OF COVID LITERATURE AND SO HERE'S 1 EXAMPLE OF A SYSTEM CALLED COVID SCHOLAR THAT AUGMENTED THE ABSTRACTS IN THE LITERATURE RELATED TO COVID IN THE CORE 19 DATA SET WITH PARTICULAR CONTENT SPECIFIC CATEGORIES. SO THEY CATEGORIZED THE ARTICLES INTO TREATMENT ARTICLES RELATED TO TREATMENT, ARTICLES RELATED TO MECHANISM, ARTICLES RELATED TO PREVENTION OR DIAGNOSIS, AND SO THIS PROVIDED 1 WAY TO HELP USERS KIND OF ORGANIZE AND NAVIGATE AND IDENTIFY ARTICLES THAT WERE RELATED TO MORE SPECIFIC KINDS EVER CATEGORIES BUT STILL AT THAT KIND OF GENERALIZED LEVEL. SO THIS IS PART OF THE ANSWER OF HOW WE CAN SUPPORT BEYOND THE TRADITIONAL IR. THERE ARE ALSO A NUMBER OF EXAMPLES OF SYSTEMS THAT BUILD ON ENTITIES AND RELATIONSHIPS WHICH WE WOULD TYPICALLY FIND USING NATURAL LANGUAGE PROCESSING AND SO THERE IS A NUMBER OF SYSTEMS THAT WERE DEVELOPED INCLUDING 1 AT THE LNN INSTITUTE WHICH ORGANIZES PREIDENTIFIED CONCEPTS THAT WERE TYPICALLY IDENTIFIED USING CONCEPT RECOGNITION TECHNIQUES OR NAMED ANY RECOGNITION TECHNIQUES AND AT THE NLM WE HAVE A SIMILAR KIND OF TOOL CALLED PUB [INDISCERNIBLE] AND THAT GIVES US PARTICULAR ENTITIES AND THIS PROVIDES US AGAIN OF WHAT IS BEYOND THE WORDS IN THE DOCUMENT AND TRYING TO HELP US CHARACTERIZE THE SEMANTIC CONTENT. SO THIS IS A VERY IMPORTANT WAY OF TRYING TO STRUCTURE AND ORGANIZE AND HELP FACILITATE FINDING CONCEPTS RATHER THAN PARTICULAR DOCUMENTS IN RESPONSE TO THE QUERY. AS AN ASIDE, THIS PROCESS OF RECOGNIZING CONCEPTS FROM AN AHATOLOGY OR FOCUSED ON VABULARY IS QUITE TRICKY AND WE'VE DONE SOME WORK ON THAT. THIS IS A LITTLE EXAMPLE OF THE CRACKED, THE COLORADO TEXT LAND, THE RICH CORPUS WHERE THE LITERATURE WAS MANUELLY ANNOTATED FROM THE SEQUENCE ONTOLOGY, THE GENE ONTOLOGY AND A NUMBER OF OTHER ONTOLOGYS. AND IT TURNS OUT THAT THIS PROCESS OF KIND OF ABSTRACTING AWAY FROM NATURAL LANGUAGE AND TRYING TO CAPTURE THE CONCEPTS IS QUITE CHALLENGING BECAUSE OF LEX CALVARIATION IN HOW WORDS EVER EXPRESSED. SO EVEN THAT PROCESS OF GENERALIZING TO CONCEPTS FROM WORDS REQUIRES A PRETTY SOPHISTICATED SET OF TOOLS AND TECHNOLOGIES. WE OF COURSE HAVE THE UNIFIED MEDICAL LANGUAGE SYSTEM, AGAIN, A TOOL FROM THE NATIONAL LIBRARY OF MEDICINE, THE NIH WHICH ALLOWS US TO RECOGNIZE MORE MEDICALLY ORIENTED CONCEPTS SO YOU CAN PREPROCESS THE LITERATE AND YOU ARE TRANSFORM IT INTO THIS REPRESENTATION AND THAT WAS 1 OF THE TOOLS THAT WE BUILT ON IN OUR APPROACH. ONE WAY THAT THOSE CONCEPTS HAVE BEEN LEVERAGED IS IN TOOLS THAT ALLOW TO YOU SEARCH BY CONCEPT, SO, THE IBM WATSON COVID-19 NAVIGATOR IS AN EXAMPLE OF A TOOL THAT PREPROCESSED THE LITERATURE IN TERMS OF RECOGNIZING THE CONCEPTS THAT WERE THERE AND THEN ALLOWING USERS TO BEYOND THE TYPICAL SEARCH. WHAT WE DECIDED TO LOOK AT WAS THE STRUCTURE OF RANDOMIZED CONTROL TRIALS WITH THE IDEA THAT RESEARCHERS ARE OFTEN VERY INTERESTED, MOST INTERESTED IN IDENTIFYING EVIDENCE AROUND TREATMENTS AND SO WE LOOKED AT WHAT'S KNOWN AS THE PICK O STRUCTURE OF THE ARTICLES THAT WERE IN THE COLLECTION THAT WE WERE WORKING WITH SO PICO STANDS œINTERVENTION COMPARISON AND COME AND IT RESPONDS TO THE TIP OF A RESEARCH QUESTION. SO, YOU KNOW A SEARCH THAT IS DONE IN THE CONTEXT OF SYSTEMATIC REVIEWS FOR EVIDENCE RETRIEVAL REALLY REVOLVES AROUND THIS STRUCTURE, TRYING TO ASSESS THE EFFECTS OF A REGULAR INTERVENTION COMPARED TO A CONTROL FOR A CONDITION OR POPULATION IN SOME CONTEXT AND THEN UNDERSTANDING THE OUTCOMES THAT ARE ASSOCIATED WITH THAT. NOW WE WERE CLEAR THAT BECAUSE OF THE OPEN ENDED NATURE OF THE QUESTION, WE PROBABLY WEREN'T GOING TO FIND ALL 4 OF THOSE ELEMENTS IN ANY OF THE ARTICLES BECAUSE THE SCIENCE AROUND COVID WASN'T--WASN'T THAT MATURE, BUT WE FELT THAT IF WE COULD SORT OF UNPACK THE RESULTS OF STUDIES THAT WERE BEING PUBLISHED ALONG THESE KIND OF DIMENSIONS THAT WOULD HELP PEOPLE TO UNDERSTAND, YOU KNOW WHAT ARE THE INTERVENTIONS THAT ARE BEING STUDIED IN THE CONTEXT OF COVID. AND WHAT POPULATIONS ARE BEING EXAMINED. AND SO WE BUILT ON A TOOL, ACTUALLY FROM BRIAN WALL'S GROUP IT WAS DESIGNED TO HELP US IDENTIFY THOSE PICK O ELEMENTS. THAT GROUP BUILT A CORPUS EVIDENCE-BASED MEDICINE CORPUS AND IT INVOLVES ANNOTATING THOSE ELEMENTS OF THE PICO STRUCTURE, THE INTERVENTION, PARTICIPANTS OR THE POPULATION AND THE OUTCOME. AND SO WE LEVERAGED THAT RESOURCE AND YOU KNOW HAD A MODEL THAT ALLOWED US TO RECOGNIZE THOSE PICO ELEMENTS. AND SO, IN THE SYSTEM THAT WE BUILT, WE USED THAT AS A SAID AS OUR KIND OF FOCUS APPROACH TO ORGANIZING THE INFORMATION IN THE DOCUMENT COLLECTING THEM, BUT THEN COUPLING IT WITH THE CONCEPT RECOGNITION, I INTRODUCED EARLIER, AND SO WE HAD THIS MULTILAYER ANNOTATION OF DOCUMENTS WHICH INVOLVED RECOGNIZING THE PICO ELEMENTS AS WELL AS THE CONCEPTS FROM THE UMLS AND MOSTLY FOCUSED ON THE SUBSET OF THE MLS WHICH CORRESPONDS TO MESH TERMS, MEDICAL SUBJECT HEADINGS TERMS AND THE IDEA IS THAT IF YOU HAVE A DOCUMENT AND YOU RUN IT THROUGH THIS CONCEPT RECOGNITION AND THE ANNOTATION OF PICO, PHRASES IN THE TEXT, THEN YOU CAN IDENTIFY PARTICULAR CONCEPTS THAT ARE ASSOCIATED WITH PARTICULAR PICO CATEGORIES, SO IN A PHRASE LIKE THIS 1, CUMULATIVE COVID-19 HOSPITALIZATION AND DEATH RATES, CAN YOU RECOGNIZE THE CONCEPTS OF HOSPITALIZATION AND MORTALITY AND BECAUSE WE IDENTIFIED THIS AS AN OUTCOME STATEMENT, WE CAN RELATE THOSE CONCEPTS TO THE PICO DIMENSION OF OUTCOME AND SO WE CALL THESE PICO CONCEPTS WHERE WE'RE ASSOCIATING A PARTICULAR CONCEPTS WITH PARTICULAR DIMENSIONS OF THAT PICO FRAMEWORK. AND WE RAN THIS ANALYSIS ACROSS THE WHOLE COVID-19 COLLECTION. AND WE BUILT A GRAPH OF CONNECTIONS BASED ON CO-OCCURRENCE OF THOSE PARTICULAR PICO CONCEPTS. SO, IF YOU HAVE AN ARTICLE WHICH HAS FOR INSTANCE A POPULATION CONCEPT OF CHILD, AND AN INTERVENTION CONCEPT OF QUARANTINE MEANING THAT THE STUDY EXAMINES THE INTERVENTION OF QUARANTINE IN THE CONTEXT OF THE POPULATION OF CHILDREN THEN WE BUILD A LINK BETWEEN THEM, SO JUST SIMPLE CO-OCCURRENCE IN THIS CASE, BUT AIMED AT TRYING TO ORGANIZE THE INFORMATION IN THE LITERATURE ACCORDING TO THESE PARTICULAR CONSTRUCTS. AND SO WE END UP WITH BASICALLY PAIRS OF PICO CONCEPTS THAT ARE ORGANIZED ACCORDING TO THE PICO CATEGORY THAT THEYRIAALATE TO. AFTER ALL OF THIS PREPROCESSING, WE WANT TO KIND OF COME UP WITH A WAY TO SUPPORT THE RESEARCHERS TO FIND LITERATURE ALONG THE LINES OF THOSE CONCEPTS--AND THE KIND OF TYPICAL SEARCH AND TRYING TO ENABLE EXPLORATION AND IF WE THINK ABOUT THE STANDARD WAY THAT PEOPLE SEARCH, TYPICALLY WE FOLLOW AN ITERATIVE SEARCH PROCESS WHERE WE REFINE. WE FIND AND WE NARROW BUT EVENTUALLY WE MIGHT SHIFT AND SLIGHTLY DEFINITE SET OF LITERATURE AND WE HONE IN AND GO SMALLER AND SMALLER AND SMALLER UNTIL WE FIND THE SET OF LITERATURE THAT REALLY CAPTURES THE QUESTION. AS I SAID IN THE CONTEXT OF COVID, THE QUESTIONS WERE QUITE DIFFERENT AND WE FELT THAT THIS NOTION OF EXPLORATORY RESEARH WHICH HAS KIND OF BEEN DISCUSSED IN THE LITERATURE FOR QUITE SOMETIME AS A WAY THAT PEOPLE OFTEN APPROACH SEARCH IN THE CONTEXT OF OFTEN INVOLVES JUMPING AROUND, YOU START WITH 1 QUERY, START WITH 1 LITERATURE, IT'S NOT QUITE TELLING YOU WHAT YOU NEED SO YOU ACTUALLY JUMP TO ENTIRELY DIFFERENT PART OF THE INFORMATION SPACE AND START SEARCHING AROUND THERE AND WE DO THIS PROCESS OVER AND OVER. THROUGH THAT THEN THE USER STARTS TO THINK ABOUT OKAY, WHAT AM I ACTUALLY INTERESTED IN FINDING AND THAT THEN HELPS THEM EFFECTIVELY DISCOVER INFORMATION THAT THEY DID NOT ANTICIPATE AND YOU KNOW THE PROCESS HERE ALLOWS USERS TO NAVIGATE THE INFORMATION SPACE IN ORDER TO DEFINE WHAT THEY'RE LOOKING FOR RATHER THAN STARTING WITH THE DEFINITION OF THEIR INFORMATION NEED, THAT WAS VERY CLEAR AT THE OUTSET. AND SO WE WANTED TO DEVELOP A SYSTEM THAT WOULD EFFECTIVELY ALLOW PEOPLE TO FOLLOW THIS EXPLORATION PROCESS WITHOUT THE PRECONCEIVED NOTION OF EXACTLY WHAT THEY WERE LOOKING FOR AND THERE'S A NUMBER OF DIFFERENT LITERATURE THAT YOU CAN LOOK AT THAT REALLY LOOKS AT HOW USERS ALTERNATE BETWEEN THIS MECHANISMS OF TARGETED SEARCH AND MORE EXPLORTORY EXAM NATION OF THE LITERATURE, THAT'S WAWE WERE TRYING TO SUPPORT. SOPHISTICATED WE DEVELOPED THE SYSTEM CALLED COVID-C AND BUILDING ON THESE EVIDENCE AND THE PROCESS OF EXPLORING THE LITERATURE AND IT COUPLES BOTH OF THESE THINGS SO IT COUPLES SEARCH WITH EXPLORATION AND THEN IT INCORPORATES THE REPRESENTATION OF CONCEPTS AND RELATIONSHIPS AND THOSE PICO CONCEPTS THAT I MENTIONED EARLIER AND IMPORTANTLY BECAUSE WE'RE TRYING TO BUILD SOMETHING FOR USERS WE WANTED TO HAVE A WEBSITE, A TOOL THAT PEOPLE COULD USE THAT GAVE THEM A KIND OF VISUAL SUMMARY OF THE LITERATURE THAT THEY WERE EXPLORING BEYOND THE WORDS AND THE ABSTRACT. AND THIS IS A VERY MESSY SLIDE, JUST TO GIVE YOU A SENSE OF WHAT THE WHOLE SYSTEM LOOKS LIKE TOGETHER BUT BASICALLY WE COUPLE AS I SAID, THE SEARCH, WITH A STANDARD SEARCH ENGINE WITH A NORMAL RETRIEVAL PROCESS BUT THEN IF WE AUGMENT IT WITH THE VISUAL SUMMARIES OF INFORMATION HINGING ON THE CONCEPTUAL REPRESENTATION. SO THE SEARCH LOOKS VERY FAMILIAR. YOU CAN TYPE IN A FAIRLY OPEN ENDED STATEMENT IN THE INCUBATION PERIOD OF COVID-19 AND GET BACK A LIST OF ARTICLES THAT RELATE ACCORDING TO THE NORMAL KIND OF SEARCH PROCESS AND THEN YOU CAN COLLECT THE ARTICLES THAT YOU'RE INTERESTED IN INTO SOMETHING WE CALL AID COLLECTION OR A BRIEF CASE SO CAN YOU THINK OF IT LIKE FINDING PAPERS AND I'M STICKING THEM IN THE MY BRIEF CASE BUT IMPORTANTLY, WE HAVE THIS VISUALIZER COMPONENT WHICH THEN REALLY OPENS UP THE LITERATURE AND IS MUCH MORE ABOUT SUPPORTING THAT EXPLORATION OF THE LITERATURE AND SO WE GIVE THE USERS A VISUAL SUMMARY OF THE COLLECTION THAT'S BEEN RETRIEVED SO WHEN YOU TYPE IN THAT OPEN ENDED QUESTION, YOU MIGHT GET BACK, YOU KNOW 10S OR EVEN HUNDREDS OR POSSIBLY THOUSANDS OF ARTICLES THAT KIND OF MATCH AT THE TERM LEVEL. BUT THEN TRYING TO UNDERSTAND WHAT'S IN THERE, WE USE OUR PICO CONCEPTS AS THE ORGANIZING PRINCIPLES OF VISUALIZATION SO WE COLLECT AND WE DISPLAY ALL OF THE POPULATION CONCEPTS THAT ARE IN THAT SUBSET OF THE LITERATURE THAT CAME BACK FROM THE QUERY AND CAPTURE THEIR RELATIONSHIPS TO THE INTERVENTION CONCEPTS. SO THIS IS THE--THE LINKS HERE CORRESPOND TO THOSE OCCUR OCCURRENCES OF POPULATION CONCEPTS AND INTERVENTION CONCEPTS AND THEN SIMILARLY WE HAVE LINKS BETWEEN THE INTERVENTION CONCEPTS AND OUTCOME CONCEPTS SO THE--AND THE TYPES OF INTERVENTIONS THAT HAVE BEEN EXAMINED IN THE CONTEXT OF THAT QUERY AND SO AND LOOK AT THIS AND SAY OKAY, PENAL HAVE LOOKED AT PUBLIC QUARANTINE AND PUBLIC PRACTICE AND OF BY THE WAY IT'S ALL MESSY BECAUSE IT'S AUTOMATICALLY PREPROCESSED, YOU CAN ALSO LOOK AT THAT AND GO THAT'S NOT AN INTERVENTION, THAT'S NOT A POPULATION, SEEMS A BIT STRANGE BUT BECAUSE WE'RE NOT INTERESTED IN NECESSARILY BEING SUPERPREICIZE BECAUSE WE'RE EXPLORING THIS AND THE NOISE IN THIS PROCESS. SO THEN THROUGH THIS, LET'S SAY THAT THE USER DECIDES THAT THEY'RE INTERESTED IN THIS RELATIONSHIP BETWEEN THE GASTROINTESTINAL TRACT SO MAYBE A POPULATION OF HAVE PARTICULAR GASTROAND GI ISSUE DISPS QUARANTINE AND THEN THEY MIGHT CLICK ON THAT LINK AND THEY WILL FIND THERE'S 1 ARTICLE IN THAT AND THIS HAPPENS TO BE THE ARTICLE DISP THEN THEY CAN ADD THIS ARTICLE TO THEIR COLLECTION IF THEY FIND THIS INTERESTING, SO THEY CAN THEN GO BACK AND THEY CAN CONTINUE TO EXAMINE AND EXPLORE THE COLLECTION THROUGH THIS WE ALSO PROVIDE A TOPIC MODELING, BASE SUMMARY OF THE LITERATURE WHICH ALLOWS US TO BASICALLY STATISTICALLY RELATE TO THE CONCEPTS THAT ARE FOUND THERE AND SO, WE ACTUALLY USED A TOPIC MODEL BASED ON CONCEPTS RATHER THAN THE WORDS IN ORDER TO BASICALLY ORGANIZE AGAIN THE DOCK YOU WANTS ACCORDING TO MORE HEMEATIC INFORMATION THAT WAS IN THEM. SO THIS VIEW IN THE COLLECTION HIGHLIGHTS FOR PEOPLE WHAT IS IN THERE AND HOW THOSE DOCUMENTS ARE CONNECTED AND POTENTIALLY CAN THEN ALLOW THEM TO GO BACK AND HAVE A MORE TARGETED KIND OF SEARCH. THE QUESTION JUST BASED ON UNDERSTANDING WHAT THE THEMES ARE THAT ARE AVAILABLE IN THE COLLECTION. FINALLY AT EACH ARTICLE LEVEL WE GIVE A CONCEPT CLOUD VIEW OF THE ARTICLE AND THAT'S--THE IDEA BEHIND THAT IS RATHER THAN HAVING A WHOLE ABSTRACT, YOU HAVE TO READ THROUGH PERHAPS SURFACING THE KIND OF P-CONCEPTS THAT ARE IN THE ARTICLE CAN HELP THE USER TO JUDGE KIND OF RELATIVELY QUICKLY WHETHER OR NOT THAT ARTICLE CONTAINS SOMETHING THAT THEY'RE INTERESTED IN. AND AGAIN WE'RE USING CONCEPTS HERE AND NOT WORDS AND SO YOU GET PHRASES LIKE E.COLI OR INFECTIOUS COMPLICATIONS AND SO FORTH WHICH ARE CONNECTED TO EACH OTHER AS PHRASES. SO, YOU KNOW REALLY WE ARE JUST 1 OF A COLLECTION OF TOOLS THAT ARE AVAILABLE AND HERE IS SOME OF THE 1S THAT ARE AVAILABLE BUT WHAT WE TRYING TO DO WITH COVID-SEE WAS INTEGRATE A NUMBER OF THE KIND OF TECHNOLOGIES THAT ARE RELEVANT IN ORDER TO SUPPORT THAT MORE KIND OF EXPLORATION, SO WE NOT ONLY DO INFORMATION RETRIEVAL AND CONCEPT REPROCESSING AND PICO, ELEMENT IDENTIFICATION BUT WE ALSO HAVE THIS USER FACING TOOL WHICH REALLY LEVERAGES THOSE CONCEPTS AND RELATIONS IN ORDER TO TRY TO SUPPORT THE USERS IN THAT MORE EXPLORATORY SEARCH. SO THAT BRINGS ME TO THE END AND JUST TO SUMMARIZE, YOU KNOW THE OBJECTIVE OF COVID-SEE REALLY WAS TO HELP SCIENTISTS NAVIGATE THIS MASSIVE AMOUNT OF LITERATURE ACCUMUTING AROUND COVID-19 BY GOING DEEPER BEYOND THE CONTEXT OF THE TEXT AND WE REALLY JUST TOOK THE TOOLS IN OUR TOOL BOX, RIGHT? WE TOOK WHAT WE UNDERSTOOD ABOUT MANAGE CONCEPT, PICO, TOP MODELING AND PUT THEM TOGETHER TO ENABLE THIS EXPLORATION OF THE LITERATE AND YOU ARE IMPORTANTLY BECAUSE IT IS USER FACING, YOU KNOW WE WANTED TO HAVE A SYSTEM THAT ALLOWED FOR VISUAL OVERVIEWS OF THE DOCUMENT COLLECTION AND HAVING A NECKANISM THROUGH THE BRIEF CASE--MECHANISM THROUGH THE BRIEF CASE OF DOCUMENTS OF INTEREST SO THAT COULD BE EXPORTED AND ARE READ MORE IN THE NEXT STEP. SO I WILL LEAVE IT THERE AND BETHANK MY TEAM BECAUSE OF COURSE NOTHING LIKE THIS CAN BE BUILT IN THE KIND OF TIME FRAME WE HAD WITHOUT A LOT OF HANDS ON DECK AND SPECIFICALLY ID LIKE TO THANK SIMON SUSTER, A RESEARCH FOAL O WORKING WITH US AND YULIA OTMAKHOVA, AND SHEVON MENDIS, THE DEVELOPER MADE IT HAPPEN AND PUT IT ON A WEBSITE, COVID-SEE.COMMAND THE OTHER PEOPLE MENTIONED THERE ARE THEY ARE CONTRIBUTED SO THANK YOU VERY MUCH FOR YOUR ATTENTION. THANK YOU SO MUCH, EXCITING TALK. THANK YOU FOR THE INVITATION FOR KICKOFF THIS 2021 SIG EVENT, I AM FROM THE MAYO CLINIC, I AM HERE PRESENTING THE OPEN HOUSE NATURAL LANGUAGE PROCESSING FOR COMMUNICATION TRANSLATIONAL RESEARCH, WE WILL DEMONSTRATE THROUGH THE COVID-19 BUT BEFORE I GO AHEAD TO TALK ABOUT THE OVERALL THINKING BEHIND IT, JUST A WARNING SIGN, THIS IS NOT ABOUT THE STATE-OF-THE-ART NOP IN CLINICAL SPACE, IT IS MORE ABOUT YOU KNOW HOW THE REAL WORLD IMPLEMENTATION OF NRP FOR CLINICAL TRANSLATIONAL RESEARCH. HOW MANY OF YOU HAVE ACCESS TO CLINICAL TEXTS DOCUMENTS? PROBABLY NOT MANY BECAUSE THERE IS A PRIVACY CONCERN WHICH LIMITS THE ACCESS IN THE PUBLIC DOMAIN WHICH LIMITED THE NUMBER OF PEOPLE WHO CAN COME TO DEVELOP NATURAL LANGUAGE PROCESSING SYSTEM FOR THE CLINICAL AND THE TRANSLATIONAL. WE ALSO FACE ANOTHER SITUATION WHICH IS CALLED RESEARCH PRODUCIBILITY AND WE ALSO FACE SOMETHING WHICH WE CALL HOUSE DISPARITY AND ALL THOSE THINGS ARE LINKED TOGETHER, THE REASON I MENTIONED THOSE, TOO, IS THAT THE RESEARCH REPRODUCIBILITY IMPLY YOUR DATA NEED TO BE REPRODUCIBLE, THIS IS ALSO IMPLY THE PROCESS, HOW YOU GET THE DATA FROM YOUR SOURCE DATA NEED TO BE CLEAR AND TRANSPARENT AND ALSO NEEDED TO BE REPRODUCIBLE, SECONDLY, REGARDING HEALTH DISPARITY, WE KNOW THAT NOT ALL THE HEALTHCARE SYSTEMS CAN HAVE RESEARCHES LIKE HOSTING NLP TEAM IN THEIR SPECIFIC HEALTHCARE INSTITUTION TO DO NLP ASSISTIVE DEVELOPMENT AND THIS IS COME TO THE ISSUE IS THAT THOSE WITH RESOURCES CONSTRAINED HEALTHCARE SYSTEMS, THEY ACTUALLY DON'T HAVABILITY TO DIG INTO THEIR CLINICAL EHR SYSTEMS AND TRY TO LEVERAGE A PRACTICE DATA TO HELP, YOU KNOW ADDRESS THE IMPROVING HOUSE QUALITY SO I'M GOING TO--NOT A NEW CONCEPT. BACK IN 2009 THERE IS THE OPEN HOUSE NATURAL LANGUAGE PROCESSING CONSORTIUM ESTABLISHED BY THROUGH THE COLLABORATION OF MAYO CLINIC AND IBM THROUGH THAT COLLABORATION WE HAVE ABOUT 2 SYSTEMS DEPLOYED, 1 IS THE CTAKES SYSTEM WHICH IS FROM MAYO CLINIC AND THE OTHER IS MEDICAT P SYSTEM WHICH IS THROUGH IBM. THROUGH THOSE YEARS, FOR CTAKES TO MOVE TO APACHE AND OVER THE YEARS, WE NOTICE A LOT OF ACTIVITIES IN OPEN NLP SYSTEM DEVELOPMENT BUT NOT ALL OF THEM ARE, YOU KNOW CAN BE MIGRATED TO APACHE CTAXES, SO WE CREATED THIS CONSORTIUM WHICH IS AIMING TO YOU KNOW HOSTING ANY SYSTEM WHO ARE SHARING SOMETHING TO THE OPEN SPACE. AND BACK IN 2017, WE WERE FUNDED BAY NCATS TO DEVELOP AN OPEN HOUSE NATURAL LANGUAGE PROCESSING COLLABRATORY, SO THIS IS A CHANCEALATE TO, YOU KNOW THERE ARE A LOT OF DATA NETWORKS, CLINICAL DATA NETWORKS THAT TRY TO LEVERAGE AND PRACTICE DATA, BASICALLY EHR DATA AND CLINICAL TRANSLATIONAL RESEARCH, BUT MEAN WHILE THEY'RE NOT MANY SYSTEMS DEVELOPED IN THE COMMUNITY TO SUPPORT THAT. SO THIS EFFORT WHICH IS A COLLABRATORY EFFORT IS THE PARTNERSHIP TEAM APPROACH WITH 4 GOALS, THE FIRST GOAL IS TO TRY TO, YOU KNOW LET'S WORK SOME WAY SO WE CANNOT SHARE THE PATIENTS DAILY BASIS THEA, SHARE CONCEPT LEVEL STATISTICS LIKE NLP ARTIFACTS TO THE PUBLIC DOMAIN AND SO TO IMPROVE THE NLP DEVELOPMENT IN THE CLINICAL DOMAIN, SECONDLY, CAN WE GENERATE SOME SORT OF TEXT DATA TO HELP PEOPLE UNDERSTAND HOW THE CLINICAL NARRATIVES LOOK LIKE, THIRDLY, UP KNOW HOW DO WE LEVERAGE SOMETHING CALLED PRIVACY PRESERVING COMPUTATIONAL AND APPROACH TO SEE IF WE CAN ENHANCE THE PHENOTYPING WITH NLP RESULTS. LASTLY IT'S ABOUT PARTNERSHIP, HOW DO WE PARTNER WITH DIVERSE COMMUNITY FOR TRANSLATIONAL SCIENCE TO LEVERAGE THE CLINICAL NATURAL LANGUAGE PROCESSING. SO, THE M4 OF THIS ACTIVITY IS ALL THE OTHER AIMS IS TO TRY--M2 IS TO TRY, M2 AND M1 TRY TO CREATING SOME RESOURCES FOR THE COMMUNITY SO WE CAN TRAIN THE NEXT GENERATION, AIM 3 IS MORE ABOUT SHOWING THE UNITY OF CLINICAL NLP FOR CLINICAL AND TRANSLATIONAL RESEARCH. SO THIS IS THE SLIDE TO SHOW SAY BASICALLY IN THE CLINICAL AND TRANSLATIONAL RESEARCHERS SPACE, 1 OF THE POPULAR USE CASE AS A CLINICAL CONCEPT EXTRACTION, SPECIFICALLY CLINICAL CONCEPT EXTRACTION HAS BEEN WIDELY USED FOR ALL KINDS OF DIFFERENT USE CASES, SOME ARE IN, YOU KNOW CLINICAL WORK FLOW OPTIMIZATION, SOME ARE RELATED TO DRAG, DRAG STUDIES, SOME RELATED TO, YOU KNOW PATIENT MANAGEMENT SO THERE ARE MANY DIFFERENT TYPE USE CASES, SO THIS IS FIRST THE COLUMN SHOWED THE METHODS, ALL THOSE CLINICAL CONCEPT EXTRACTION APPLICATION WAS DEVELOPED SO JUST TO GIVE A CONTEXT, THIS IS THE METHODOLOGY REVIEW ARTICLE, THE TOTAL NUMBER OF ARTICLE WE ACHIEVED WHICH LEVERAGING OUR CLINICAL DOCUMENTS, CLINICAL TEXT IS ABOUT 4 ARTICLES OVERALL BUT AMONG THEM, THERE ARE ABOUT OVER 200 OF THEM HAVE DETAILED METHODS, HOW EXACTLY THEY DEVELOP THE CLINICAL CONCEPT EXTRACTION SYSTEM. SO YOU CAN SEE, SOME ARE USED, THE MAJORITY OF THOSE ARE STILL USING [INDISCERNIBLE] APPROACH, SOME USE HYBRID, SOME USE MEASURED LEARNING AND SOME USE DEEP LEARNING. SO YOU MAY WONDER WHY THIS IS CURRENTLY? THIS IS THE LAST CENTURY METHODS. COME TO THE--AS YOU KNOW TOURS TO ENABLE CLINICAL TRANSLATIONAL RESEARCH. IF YOU VIEW THOSE RESEARCH AS REAL WORLD, YOU KNOW TYPE OF USE CASES, THE NLP SYSTEM WE'RE TALKING HERE IS ABOUT REAL WORLD IMPLEMENTATION OF AI APPROACH. SO WHEN YOU ARE TALKING ABOUT REAL WORLD, THIS IS COME TO THE PIECE SO YOU ACTUALLY NEED TO VIEW THE HUMAN TRUST WITH AI APPLICATION, WE AS NLP RESEARCHER OF COURSE, WE TRUST OUR NLP ALGORITHM PERFORMANCE, WE TRUST OUR METRICS BUT WHEN YOU TALK TO CLINICAL RESEARCHERS, EPIDEMIOLOGISTS, HOW TO BUILD THEIR TRUST INTO YOUR NLP APPLICATION. SO THIS IS THE--THERE ARE 4 CHARACTERISTICS RELATED TO REAL WORLD AI APPLICATION, THE FIRST 1 IS YOU KNOW BEHAVE AS IT SHOULD BE, SECONDLY, THEY NEEDED TO PROVIDE EVIDENCE WHY YOU CONSIDER THAT A CONCEPT TO BE, YOU KNOW POSITIVE OR NEGATIVE OR PRESENT IN THAT PATIENT'S RECORDS WHICH MEANS EXPLANNABILITY, THIRD IS REPRODUCIBILITY, IT MEANS THAT FOR THE SAME CLINICAL DOCUMENT, CLINICAL PHRASES OR DOCUMENTS, YOU CAN NOW GIVE 1 AS THE--THE RESULTS MUST BE--MUST BE ABLE TO BE REPRODUCIBLE, LASTLY OF COURSE, YOU NEED TO DEAL WITH THINGS HERE WE TALK ABOUT SECURITY IN THE CLINICAL NLP SPACE, WE ARE TALKING ABOUT, YOU KNOW SEMANTIC SHIFT DRIFTS, ALL THOSE AROUND THERE TO WHICH YOU NEED TO KEEP UPDATING, KEEP MONITORING POTENTIAL SEMANTIC DRIFTS POTENTIAL DATA DISTRIBUTION DIFFERENCE. SO THOSE TRANSLATE TO 5 CHARACTERISTIC REVIEW. WHEN WE TRY TO TRANSLATE, YOU KNOW, TRY TO DEPLOY, CLINICAL NLP FOR REAL WORLD CLINICAL TRANSLATION OR RESEARCH, SO THERE ARE 5 CHARACTERISTICS 1 ISRIAALATED TO THE REPRODUCIBILITY, BASICALLY THE DATA NEED TO BE REPRODUCIBLE AND THE PROCESS NEED TO BE REPRODUCIBLE SECOND RELATE TO TRANSPARENCY EXACTLY HOW DO WE YOU KNOW HOW WE MAKE THOSE DECISIONS AND THIRD THABILITY TO EXPLAIN THE RESULTS AND PROVIDE THE EVIDENCE WHY WE SAY THAT, IMPLEMENTABILITY AS WAS INTEROPERABILITY WITH THE STRUCTURE OF STANDARDS AND THE TERMINOLOGY. SO THOSE ARE THE 5 CHARACTERISTICS WHICH BASICALLY IMPLAY WE ARE GOING TO DO THINGS IN A VERY SO LET'S DIG A BIT DEEPER RELATED TO THE ALGORITHM DEVELOPMENT. SO, SO BASICALLY SO BASICALLY WHEN WE COME TO CLINICAL DOMAINS WE NEED TO END UP TO DO THOSE TRADITIONAL NLP DEVELOPMENT CYCLE RELATED TO GENERAL ANNOTATION AND YOU DO DATA CONNECTION COHORT SCREEN AND DEVELOP A NOTATION GUIDELINE AND TALK ABOUT TRAINING THAN O STATORS AND TRY TO PUT THAT INTO PRODUCTION AND GENERALERATE THE COHORTS, SO THIS WHEN YOU DO A SINGLE SIDE STUDY BUT IT BECOMES VERY CHALLENGE AND WHEN YOU TALK ABOUT THE STUDY OR COMMUNITY BEST STUDY, BASICALLY THE DATA CONNECTION CO SHORT SCREENING, THIS IS REQUIRE VERY RIGOR PROCESS TO INSURE FAIR BUSINESS TO INSURE REPRESENTATION AND INSURE DATA WILL NOT BE [INDISCERNIBLE] THAT WILL BE REPRESENTED WHEN YOU DEVELOP THE NLP SYSTEM FOR THAT. SO, WE ACTUALLY TRANSLATE THOSE PRINCIPLES INTEREST A SYSTEM WHICH IS A REAL WORLD SYSTEM WHICH IS WE HAVE THAT AS A MAYO CLINIC AND SERVICE INFRASTRUCTURE AND IT'S HIGHLIGHT THAT WE NEED TO CONSIDER IF WE WANT TO REAL WORLD IMPLEMENTATION OR REAL WORLD USE OF NLP FOR CLIN KALG TRANSLATIONAL RESEARCH AS FAR AS REAL WORLD'S IMPLEMENTATION--TOP INNOVATE, THE EXPERTISE AND MUST BE CONNECT THE PRESERVED. THIS IS THE--RELATED TO THINGS, YOU KNOW WHY DO WE NEED TO TAP INTO THE CLINICAL DOCUMENTS FOR INFORMATION? BECAUSE SOME OF THOSE INFORMATION ARE EMERGING INFORMATION IN THE COVID SPACE FOR EXAMPLE, LOSS OF APPETITE, IT WAS NOT AT THE INITIAL SYMPTOM LIST BUT IN ORDER TO, YOU KNOW YOU NEEDED TO BE AWARE OF THAT AND OF THE DOMAIN EXPERTISE, WHO ARE PHYSICIANS, THEY MUST BE PART OF THE PROCESSES SO IF WE WANT TO EXTRACT IN COVID RELATED SIGNS AND SYMPTOMS THEY CAN BE PART OF THE TEAM. SECONDLY TO FACILITATE THE TOURS WE BRING TO THE KUOF THE BHER SIDE THEIR PURPOSE IS NOT TO REPLACE THE EXPERTS BUT TO MORE ENGAGE AND AND EMPOWER THEM. SO THIS IS WHAT WE CALL AUGMENTS ANNOTATION OR AUGMENTING CURATION. LASTLY, IT MUST BE ABLE TO PROCESS LARGER AMOUNT OF DATA IN THE RESPONSIVE AND SCALABLE WAY. SO, SO IF YOU HAVE THOSE, THERE'S THE KIND OF USER BASE AND WITH THE USER FEATURE THEN YOU CAN ACTUALLY BUILD YOU KNOW MORE ADVANCED TECHNOLOGY ON TOP OF IT. SO WE IMPLEMENT INSIDE OUR CLINICAL INFRASTRUCTURE DEAL WITH THESE WHOLE PROCESS, I WILL NOT GO INTO DETAIL BECAUSE--IT'S PUBLISHED IN THAT PAPER, THE PREVIOUS PAPER MENTIONED HERE. SO WHEN COVID HITS, AS ANYONE, ANY OF US, WE BEINGA ACTUALLY AND SINCE WE ARE AT THE MAYO CLINIC AND WE ARE VIEWED AS DATA SCIENTIST, WE ASK, WE ARE ACTUALLY IN MARCH AT THE BEGINNING BE BRINGING TO THE HOUSE--BRING TO MAYO CLINIC TO BE PART OF THEIR STRATEGY DATA TEAM SO WE END UP DEALING WITH A LOT OF THINGS RELATED TO COVID. WE ARE CREATING DATA DASHBOARD FOR THE MAYO CLINIC, WE ALSO DO PREDICTIVE MODELING FOR THE MAYO CLINIC AND WE END UP ALSO DOING SOMETHING WHICH IS SYNDROMIC SURVEILLANCE FOR THE CLINIC SO I'M NOT GOING TO TALK ABOUT THE DATA DASHBOARD AND THE CRITICAL MODELING PART BUT MORE ON THE WAY IN WHICH WE USED IT FOR SURVEILLANCE COVERS SO THIS IS A KIND OF DOING SYNDROMIC SURVEILLANCE FOR COVID-19 AND OTHER NOVEL INFLUENZA LIKE DISEASES. SO THIS IS SPECIFIC APPLICATION WAS DEPLOYED, USING OUR INFRASTRUCTURE WHICH WE ESTABLISHED FOR THE CLINIC BEFORE, SO, FROM THE IDEA OF DOING SYNDROMIC SURVEILLANCE TO ACTUALLY GET IMPLEMENTED IS JUST TAKES A WEEK. ACTUALLY WE USE THE SAME WALLED WEB-BASED APPROACH, DOING ANNOTATION, DOING THINGS BECAUSE WE BELIEVE THE KNOWLEDGE BASE AS WELL AS THE EXPERT WE ENGAGE ALREADY PROVIDE US ENOUGH INFORMATION WHICH WE CAN GET GOAL TO DEVELOP A SYMBOLIC AI NLP SYSTEM TO STUDY THE SYMPTOM PREVALENCE STUDY, THERE, YOU KNOW ALL THE DIFFERENT DISEASES WHERE TO SEPARATE COVID FROM THE OTHER FLU TYPE OF ILLNESS. AND WE ACTUALLY USING THE SAME FRAMEWORK NOW TO DO THE ADVERSE EVENTS, DETECTION ASSOCIATE WIDE THE VACCINATION AS WELL AS THE, YOU KNOW SHORT-TERM AND LONG-TERM COMPLICATION, SO THE ENGINEER--THE INFRASTRUCTURE AT THE BACK END IS [INDISCERNIBLE] THE ONLY THING DIFFERENT IS THE UNDERLYING SYMBOLIC SYSTEM WE DEPLOYED THERE AND WE TAP INTO EXPERTS TO DEVELOP THOSE SYSTEMS AND THERE IS A SPECIFIC REASON BEHIND IT. IF THE KNOWLEDGE WAS NOT KNOWING, WE WILL NOT OBSERVE IN THE CLINICAL DOCUMENTS, BECAUSE THE ONLY TIME PHYSICIANS STARTS TO DOCUMENT THOSE THINGS IN THEIR NOTE SYSTEM THAT KNOWLEDGE PIECES HAVE BEEN KNOWING FOR IMPULSE THROUGH THE----IT STARTED TO DOCUMENT IT, GIVEN THAT KIND OF CONDITION THE EXPERT KNOWLEDGE, THE READ LITERATURE INTENSIVELY SO WE COUNT ON THEM TO BRING THE KNOWLEDGE INTO THE SYSTEM WE DEVELOP WHEN WE'RE SCREENING THE CLINICAL EHR SYSTEM. SO THIS IS JUST A LOCAL, RIGHT? AND THEN WE TALK ABOUT SAY, OKAY, WE HAVE THIS SYNDROMIC SURVEILLANCE SYSTEM WHERE WE, YOU KNOW SCREENING ALL THE SIGNS AND SYMPTOMS ASSOCIATE WIDE THE COVID IN OUR CLINICAL EHR SYSTEM, WE ALSO SCREENING ALL THE OTHER RELEVANT CONCEPTS THERE, HOW DO WE BRING THIS TO THE NATIONAL LEVEL? AND FORTUNATELY THERE IS A NATIONAL EFFORT RELATED TO COVID COHORT. THIS IS A NATIONAL COVID COHORT COLLABORATIVE EFFORT. SO IN THIS EFFORT WE GO TO THE NEXT AND WE BEING PART OF A MEETING WITH COUPLE OTHER COLLEAGUES, WITH THE NLP EFFORT THERE AND THROUGH THAT EFFORT WE MAKE ALL THE PROCESS AND THE INFRASTRUCTURE AS WELL AS EVERYTHING WE'VE DONE AT THE MAYO LOCALLY, THOSE THINGS BE TRANSLATED INTO THE PUBLIC SPACE. AND THE ALGORITHM, THE APPROACH WE ADOPT HERE IS ONLY FOR COVID-19 CONCEPTS BUT THE INFRASTRUCTURE, THE APPROACH WE'RE TAKING IS APPLICABLE TO ANY RPGHT YOU KNOW CLINICAL CONDITIONS IF THERE IS A COHORT AROUND IT. SO WE'RE HOPING THROUGH THIS EFFORT THERE IS TREMENDOUS ADOPTION OF NLP FOR CLINICAL TRANSLATION RESEARCH. GETTING INTO SOME DETAILS WE ADOPT MODULIZE THE IMPLEMENTATION SO WE MINIMIZE THIS PRODUCT OF A BACKBONE FOR ANY SITES WHO ARE INTERESTED TO EXTRACT NLP, CLINICAL CONCEPTS FROM THEIR CLINICAL DOCUMENTS TO BE ABLE TO THROUGH THE INITIAL STEP OF GETTING THAT, FROM THEIR EHR, START IN THE COMMENTATOR MODEL, DEPLOY THIS, WE HAVE EVERYTHING IN BACKBONE INFRASTRUCTURE SO PEOPLE, MAJORITY OF THE SITES IF THEY WANT TO ADOPT IT, TAKE IT 2-4 HOURS TO GET THEIR DATA WAREHOUSE NLP INFRASTRUCTURE INITIATED. SO WHY WE USE MODULIZED APPROACH, WE ALL KNOW EHR CLINICAL SYSTEM DATA WAREHOUSE THEY HAVE REGULATORY SECURITY, AND LEVEL OF IT SUPPORT CAN BE QUITE DIFFERENT AND THEY'RE DIFFERENT MATURITY OF CLINICAL DATA INFRASTRUCTURE SERVICE, AND ALSO WE ALL KNOW CLINICAL DOCUMENTATION, LACK OF DOCUMENT STRUCTURE AND ALSO NOT ALL DOCUMENTS ARE CREATED EQUAL SO WE HAVE A CROWD SOURCED ENVIRONMENT SET UP FOR PEOPLE TO ENGAGE--DOMAIN EXPERTS TO ENGAGE THE NLP SYSTEM. THIS IS RIGHT IN THE NIH CLOUD. PEOPLE CAN TEST THE PROCESS WE HAVE, CAN BE ABLE TO EXTRACT SINCE THEY INPUT THE TEXT, THIS CAN BE FEEDBACK AND IT CAN BE THE RURAL BASED SYSTEM WE DERIVED. SO REQUEST WE USE EXPERT BASED SYSTEM, THEY'RE UNDERLYING REASONS, THE MAJOR REASON ALREADY ILLUSTRATED IS THAT, THE ONLY TIME THE CLINICAL CONCEPTS WILL APPEAR IN CLINICAL DOCUMENTATION IS THAT KNOWLEDGE WILL ALREADY KNOW IN THE DOMAIN SO WE--THOSE EXPERT, THEY HAVE THE KNOWLEDGE IN ORDER TO GET THE SYSTEM RUNNING. WE ACTUALLY GET THE EVIDENCE FROM MULTIPLE STUDIES, WE FOUND THE--WE CAN INVESTIGATE AND DOCUMENT VARIATION THROUGH ANALYSIS, THE DATA SEMANTICS CAN BE LEVERAGED IN A PORTABLE WITH MINOR REFINEMENT AND WE ALSO ILLUSTRATE, DEPENDS ON THE TRAINING DATA SETS AND THE CASE DISTRIBUTION ESPECIALLY WITH EMERGING CONCEPTS, THE EXPERTS BASE CAN DO QUITE WELL. THIS IS THE LAST SLIDE WHICH IS TO SAY THAT DIFFERENT FROM HOW WE SEE YOU KNOW LITERATURE DOMAIN WHEN WE GO TO EHR ON CLINICAL DOCUMENTS CONNECTIONS THERE ARE HIGH LEVEL HETEROGENEOUS ROW GENERATEDISSITY THERE. EHR SYSTEM, EACH INSTITUTION ADOPT. THEY HAVE DIFFERENT FORMAT ALSO DEPENDS ON THEIR IMPLEMENTATION, THEY HAVE DIFFERENT STANDARDIZATION. SO WE FOUND THAT THE PROBABILITY OF SYSTEMS WE DEVELOP FOR EXAMPLE, USE, YOU KNOW PRIOR TO DEEP LEARNING APPROACH, SOMETIMES THEY HAVE HIGH DEPENDENCYOT DISTRIBUTION OF STATISTIC OF THAT FOR LOCAL HELPERS, THERE WAS SIGNIFICANT VARIATION BETWEEN THE INSTITUTION AND THE DOCUMENTATION ACROSS DIFFERENT INSTITUTIONS AND THE--WE ACTUALLY VIEW, IF WE TRAIN AND EVALUATE STATISTICAL NLP SYSTEM, IT CAN BE QUITE EXPENSIVE WHICH MAKES THOSE RESOURCES CONSTRAINED TO HEALTHCARE SYSTEM LIMITED TO ADOPT THE TECHNOLOGY; SO, SO, IN SUMMARY, IS BASICALLY IT'S NOT A AND IMPLEMENTATION OF NLP FOR CLINICAL TRANSLATION RESEARCH, RESEARCHERS SO MOST OF THE TIME AND AND WE HAVE AN NLP SYSTEM AND IF WE TUCK THAT AWAY, WE ACTUALLY NEEDED TO MAKE SURE THEY UNDERSTAND EACH STEP OF OUR DEVELOPMENT PROCESS, THEY'RE ENGAGED AND THE USABILITY ARE THE KEYS IN THE CLINICAL RESEARCH SPACE, WE DO A LOT OF AT THE BEGINNING OF COVID, THERE'S SOMETHING RELATED TO RETRACT THE ARTICLES WHICH WHERE THE DATA CAN BE REPRODUCED, THE SAME THING HERE FOR CLINICAL NLP, ON AND WE NEEDED TO INSURE THE REPRODUCIBILITY. SO IN GENERAL, EVIDENCE GENERATION FROM RURAL DATA, THE NLP ITSELF IS A [INDISCERNIBLE] IT MUST NEED TO BE IMPLEMENTABLE, IT NEEDS TO BE ROUGH ATOM PRODUCIBLE AND FAIR AND INTERPOPERABLE IS A TEAM SCIENCE COLLABORATION. MOST IMPORTANTLY WE TRY TO AVOID, UNINTENDED CONSEQUENCE. I THINK THAT'S IT. THANK YOU. TWO MINUTES FOR QUESTIONS. >> THANK YOU. --THANK YOU, I WANT TO TAKE THIS OPPORTUNITY ON BOTH SPEAKERS TO FOR THEIR FANTASTIC PRESENTATIONS SO IF YOU HAVE ANY QUESTIONS PLEASE SEND TO MY E-MAIL ADDRESS AND ALSO, JUST WANT TO BRIEFLY MENTION THAT BOTH KAREN AND HONGFANG, WE ARE REALLY FORTUNATE TO HAVE THEM FOR THE COVID-19 SO THEY HAVE BEEN THE DOMAIN EXPERTS AND WE HAVE REALLY HAD GREAT EFFORTS FOR NOT ONLY JUST FOR COVID-19 OUTBREAK BUT FOR MANY OTHER THINGS OR BIOMEDICAL DISCOVERIES AND RESEARCH. KAREN WAS 1 OF MY Ph.D. MENTORS, AND ALWAYS ENCOURAGED ME TO THE MANUEL ANY OTHER MACHINE LEARNING MODELS THAT ARE NOT ONLY FOR THE PERFORMANCE AND ASSESSMENT BUT WHAT IT CAN BE USED FOR THE END USERS TO MAKE IT REALLY FOR THEM FOR THE BIOCURATORS AND ALSO FOR HONGFANG, SHE HAS MADE TREMENDOUS EFFORTS FOR ANNOTATING AND CURATING AND DEIDENTIFYING RECORDS TO MAKE IT TO OMP, FOR THE SHARED ASHING TASKS AND TO THE NLP COMMUNITY TO MAKE THE DATA FOR THEM AND FOR US TO USE. SO REALLY JUST WANT TO USE THIS OPPORTUNITY TO THANK BOTH OF THEM. I HAVE 1 QUESTION FOR KAREN. SO IT'S A SERIES OF 3 QUESTIONS, ALL ABOUT [INDISCERNIBLE]. SO FIRST OF ALL DID YOU DO [INDISCERNIBLE] ALL TEXT OR ONLY YOUR CONCEPTS? AND THEN ANOTHER SUBQUESTION IS, IF ALL TEXT IS [INDISCERNIBLE]--IF ALL TEXT IS [INDISCERNIBLE] WAS THE CONCEPTS CANNED CAN HPV-[INDISCERNIBLE]? HOW DOES THE [INDISCERNIBLE] COMPARE, SO IT'S ALL ABOUT TOPIC MODEL. >> WELL, LOOK, SO, LOOK THE SUBSTITUTE PHRASES THAT WAS DONE BY MY Ph.D. STUDENT [INDISCERNIBLE] AND SHE REPLACED--SO RATHER THAN USING THE SORT OF CANONICLE TERM IN THE RECORDS SHE REPLACED THE CONCEPTS WITH THE MOST FREQUENT INSTANCE OF THE PHRASE THAT WE FOUND IN THE DOCUMENT COLLECTION, SO IT WAS A DATA DRIVEN SELECTION OF THE TERM. THAT'S BECAUSE THE CANNONICLE TERM IN THE UMLS IS OFTEN VERY FORMAL AND NOT REALLY WHAT PEOPLE SAY. SO BUT YEAH, THE MODELING WAS DONE, I BELIEVE AND I MAYBE MISREMEMBERING IT WAS A WHILE AGO THAT WE DID THAT WORK. I BELIEVE IT WAS A MIX OF THE WORDS AND THE PHRASES THAT CORRESPONDED TO THE CONCEPTS. WHAT WAS THE SECOND QUESTION? >> SECOND QUESTION WAS IF ALL THE TEXT IS IN [INDISCERNIBLE] WAS THERE USING CONCEPTS OLONE? SO I ASSUME IT'S JUST-- >> YEAH, YOU KNOW THE 1 THING THAT WE DISCOVERED IS THAT IN THE CONTEXT OF THE COVID-19 COLLECTION, WE HAD WE FOUND THAT THE TOPICS WEREN'T SPECIFIC ENOUGH SO THEY OFTEN CONTAINED KIND OF, YOU KNOW WE ENDED UP WITH SOME TOPICS THAT WERE BASICALLY LIKE THE SCIENCE TOPIC, YOU KNOW LIKE WORDS LIKE INVESTIGATE AND STUDY AND THESE SORTS OF THINGS WHICH HUNG TOGETHER SEE ULIA DID FOLLOW UP WORK TO EFFECTIVELY TRY TO SEPARATE, LET'S CALL THIS GENERIC TOPICS FROM KIND OF MORE SPECIFIC TOPICS. WE HAVEN'T FOLDED THAT BACK INTO COVID-SEE YET BUT THE IDEA WAS TO EFFECTIVELY MAKE THIS DISTINCTION BETWEEN TOPICS THAT ARE ACTUALLY CONTENT RELATED AND TOPICS WHICH ARE JUST SHORT OF GENERIC SORT OF GLUE KINDS OF TOPICS. SO I MEAN THAT WAS 1 INTERESTING THING THAT EMERGED BUT YOU KNOW OBVIOUSLY THE LDA WAS REALLY THERE, JUST FOR--MOSTLY FOR THE VISUALIZATION AND WE HAVEN'T DONE ANY BEYOND THAT KIND OF QUESTION OF THE GENERIC VERSUS SPECIFIC TOPICS, WE HAVEN'T DONE ANY SPECIFIC ANALYSIS OR RESEARCH ON THAT COMPONENT IT'S JUST MORE FOR THE VISUALIZATION AND IT WAS AN EASY THING FOR US TO PLUG IN, YOU KNOW BEARING IN MIND THAT WE BULLED ALL THESE ELEMENTS TOGETHER AND GOT A SYSTEM GOING FROM NOTHING BASICALLY WITHIN A NEW MONTHS SO WE COULD JUST LEVERAGE EXISTING TOOLS. >> THANK YOU KAREN. I ASSUME THAT'S WHAT WAS PROMPTED AND ALSO I'M SUMMARIZING HERE NLP COVID-19 WORKSHOP PAPER, RIGHT? >> RIGHT. EXACTLY WE HAD AN NLP COVID-19 WORKSHOP PAPER AT NLPOT TOPIC OF MODELING. >> JUST FOR THAT AUDIENCE, IF YOU ARE INTERESTED PLEASE SEE THE DETAIL IN THAT PAPER. THANK YOU. >> IT'S A SHORT PAPER. >> AND I HAVE 1 QUESTION FOR HONGFANG, COULD YOU PLEASE PROVIDE MORE INSIGHTS AND EXPLAIN THE ABILITY OF NLP SPECIFIC FOR COVID-19? >> SO EXPLAINABILITY CURRENTLY AS YOU PROBABLY SEE, WE ARE USING SYMBOLIC APPROACH SO THE DECISION, ALL THOSE THINGS, PATTERNS ARE OPEN, EVERYONE CAN SEE, SO IT'S TRACEABLE EXPLAINABLE, YEAH. >> OKAY, THANK YOU. THAT'S ALL THE QUESTIONS I HAVE GOT. SO, IT'S PERFECT AND I THINK IT'S ALMOST, WE HAVE GOOD TIMING. SO,. >> THANK YOU SO MUCH FOR HAVING US. >> THANK YOU. >> THANK YOU. >> THANK YOU SO MUCH FOR JOINING US, WE RUN OUT OF TIME SO QUICKLY. WE DIDN'T NOTICE HOW THIS HOUR PASSED AND IT WAS A PLEASURE SEEING YOU BOTH, EVEN THROUGH THE SCREEN. >> THANK YOU. >> HOPEFULLY WE CAN SEE EACH OTHER IN PERSON SOON.