WELCOME TO THE 2019 ROBERT S. GORDON LECTURE IN EPIDEMIOLOGY. MY NAME IS STEPHANIE GEORGE, AND I'M PLEASED TO REPRESENTATIVE DR. DAVID MURRAY, OUR DIRECTOR, AND INTRODUE OUR SPEAKER. BEFORE I DO THAT, I WANT TO INVITE EVERYONE TO JOIN US AFTER THE LECTURE FOR A RECEPTION IN THE NIH LIBRARY DIRECTLY TO MY LEFT. THE ROBERT S. GORDON LECTURE IS AWARDED EACH YEAR TO A SCIENTIST WHO HAS MADE MAJOR CONTRIBUTIONS IN THE AREA OF RESEARCH OR TRAINING IN THE FIELD OF EPIDEMIOLOGY OR CONDUCT OF CLINICAL TRIALS. THE AWARD WAS ESTABLISHED IN TRIBUTE TO ROBERT S. GORDON JR., FOR HIS DEDICATION AND CONTRIBUTIONS TO THE FIELD OF EPIDEMIOLOGY, AND HIS DISTINGUISHED OVER 30 YEARS OF SERVICE TO NIH. DURING WHICH TIME HE SERVED IN NUMEROUS SENIOR LEADERSHIP POSITIONS. DR. DR. JOHN IOANNIDIS COMES TO US FROM STANFORD, PROFESSOR OF MEDICINE AT HEALTH RESEARCH POLICY AND BY COURTESY OF BIOMEDICAL DATA SCIENCE AND STATISTICS. HE'S ALSO DIRECTOR OF MEADOW RESEARCH INNOVATION CENTER, AND DIRECTOR OF THE Ph.D. PROGRAM IN EPIDEMIOLOGY AND CLINICAL RESEARCH. DR. IOANNIDIS RECEIVED HIS M.D. FROM NATIONAL UNIVERSITY OF ATHENS IN 1990 AND AND DOCTORATE IN SCIENCE IN BIOPATHOLOGY IN SAME INSTITUTION, TRAINED AT HARVARD AND TUFTS IN INTERNAL MEDICINE AND INFECTIOUS DISEASES, AWARDS ARE NUMEROUS, AS ARE THE DISCIPLINES HIS WORK SPANS. CHECK OUT HIS BIOGRAPHY ONLINE AND I WILL NOW INTRODUCE OUR SPEAKER. DR. IOANNIDIS' PRESENTATION IS IN SCIENTIFIC METHOD, WE DON'T JUST TRUST, OR WHY REPLICATION HAS MORE VALUE THAN DISCOVERY. PLEASE JOIN ME IN WELCOMING OUR 2019 ROBERT S. GORDON JR. AWARD RECIPIENT, DR. JOHN IOANNIDIS. >> THANK YOU FOR THIS TREMENDOUS HONOR, AND FOR ASKING ME TO GIVE THIS LECTURE TODAY ON THIS TOPIC. IT'S ALWAYS A PLEASURE TO BE BACK AT THE NIH, AND TO MEET WITH COLLEAGUES AND TO MEET MORE COLLEAGUES, VERY BRIGHT PEOPLE. AND THIS IS A STELLAR INSTITUTION. VERY UNIQUE IN THE WORLD. SO, IN SCIENTIFIC METHOD WE DON'T JUST TRUST. TRUST IS GREAT TO HAVE, BUT WE NEED TO HAVE MORE THAN JUST TRUST. SCIENCE IS A FANTASTIC ENTERPRISE. VERY SUCCESSFUL. ABOUT 180 MILLION PAPERS FLOATING AROUND. THIS IS JUST 20 MILLION OF THEM FROM THE LAST 16 YEARS. THEY CREATE A UNIVERSE ALONG WITH TWO MILLION PATTERNS AND 200,000 DISCIPLINES OF SCIENCE. I'M POINTING WITH AN ARROW HERE, AND IT'S VERY TINY FONT, BUT LET ME MAGNIFY THAT. HI THERE, MY BEST PAPER IS A SPECK OF DUST IN A SPECK OF DUST IN A SPECK OF DUST SOMEWHERE AROUND HERE. [LAUGHTER] NO SINGLE PAPER CAN EVER COMPETE WITH ITS SURROUNDING OR WITH& SCIENCE AT LARGE. SCIENCE IS A COMMUNAL EFFORT. IT'S THAT WHOLE GALAXY THAT MATTERS. IT'S A LIVING GALAXY, AN EVOLVING GALAXY, WITH PLENTY OF DATA, WITH PLENTY OF HYPOTHESES, WITH PRESENT OF INFERENCES. AT THE SAME TIME, THAT GALAXY ALSO HAS PLENTY OF EMPTY SPACE. PLENTY OF DARK MATTER. AND WHAT WOULD THAT DARK MATTER BE? IT WOULD BE ANALYSES THAT HAVE NEVER BEEN PUBLISHED, DONE BUT NOT PUBLISHED. IT WOULD BE DATA THAT ARE AVAILABLE BUT NOT REALLY AVAILABLE. THEY WERE ACCESSED MAYBE ONCE BY SOMEONE BUT THEN WERE LOST. IT WOULD ALSO BE SCIENCE THAT WAS NOT DONE. IT WOULD BE REPLICATIONS THAT WERE NOT DONE BECAUSE IT WAS THOUGHT TO BE NEED TO RESEARCH AND LOST OPPORTUNITIES ABOUT RESEARCH THAT NEVER FLOURISH BECAUSE IT WAS NOT FUNDED, NOT ENOUGH RESOURCES TO DO IT. HOW CAN WE LOOK AT THE EXISTING MATTER AND TRY TO UNDERSTAND HOW WE CAN EXPAND THAT UNIVERSE FURTHER? MUCH OF THE EMPHASIS HAS BEEN ON MAKING DISCOVERIES. BUT TO BE HONEST, I THINK THAT DISCOVERY IS A BORING NUISANCE. ALMOST EVERY SINGLE PAPER THAT YOU WILL READ IN THE LITERATURE WILL CLAIM THAT IT HAS SOME NOVEL RESULTS, EVERY GRANT I WILL SUBMIT TO NIH I PROMISE TO FIND SOMETHING NOVEL, AND I KNOW THAT PROBABLY I'M LYING STATISTICALLY SPEAKING. [LAUGHTER] THIS IS A TEXT MINING EXERCISE, PROBING THE ENTIRE PubMed FROM 1990 TO 2015. 96% OF THE BIOMEDICAL LITERATURE CLAIMS SIGNIFICANT, STATISTICALLY SIGNIFICANT AND ENOUGH RESULTS WHEN P-VALUES ARE USED IN THE ABSTRACT OF A PAPER OR IN THE FULL TEXT PAPERS AVAILABLE IN MEDLINE. TRANSLATION PROCEEDS AT GLACIAL PACE DESPITES HAVE MILLIONS OF STATISTICALLY SIGNIFICANT AND SEEMINGLY NOBLE RESULTS. WE DO MAKE PROGRESS BUT IT'S NOT MILLIONS OF DISCOVERIES THAT MOVE TO HAVE THAT TREMENDOUS IMPACT ON MEDICAL CARE, ON PATIENT OUTCOMES, ON HEALTH, OR ON LIVING BETTER AND LONGER LIVES. SEVERAL YEARS AGO WE LOOKED AT THE CREMƒ DE LA CREME, HIGHLY CITED CLINICAL RESEARCH WITH LANDMARKS, MILESTONES, AND WE ASKED HOW LONG DID IT TAKE TO GET THERE, TO HAVE ONE OF THE MOST HIGHLY CITED PAPERS IN THE HISTORY OF MEDICINE. ON AVERAGE IT TOOK ABOUT 25 TO 30 YEARS IN THE CASE OF NITRIC OXIDE IT TOOK ABOUT 200+ YEARS. THERE ARE A FEW EXCEPTIONS WHERE THINGS MATERIALIZE FASTER. FOR EXAMPLE I CONSIDER ONE OF THE MOST EXCITING EXPERIENCING IN MY CAREER WHEN EARLY ON AS A SCIENTIST I WAS AT NIH INVOLVED IN ACTD 320, A RANDOMIZED TRIAL THAT SHOWED WITH THERAPY WE COULD DRAMATICALLY DECREASE THE RISK OF DEATH AND DISEASE PROGRESSION FOR HIV INFECTED PATIENTS. THAT TRIAL WAS PUBLISHED WITHIN LESS THAN FOUR YEARS FROM THE TIME THAT PRO PROTEASE INHIBITORS HAD BEEN DEVELOPED BASED ON CONCENTRATE AND COMMUNAL EFFORT TO TRY TO DO SOMETHING ABOUT HIV. AND IT DID WORK. SO, HOW CAN WE SHIFT FROM STORIES WHERE IT TAKES 200 YEARS OR 30 YEARS TO STORIES WHERE IT TAKES FOUR YEARS? AND ALSO HOW CAN WE SHIFT NOT ONLY THE TIMING BUT ALSO CREDIBILITY OF THESE EFFORTS? BECAUSE IN THAT SLIDE HERE, I ALSO HAVE SOME BLACK MILESTONES WHICH IS WHEN LARGER DATA CONTROL STUDIES WERE DONE THAT SHOWED THAT THAT CREMƒ DE LA CREMƒ PAPER WAS EXAGGERATED IN TERMS OF WHAT IT WAS PROMISING TO HAVE FOUND. I THINK THAT WE'RE INCENTIVIZING A FAKE NARRATIVE. THE NARRATIVE THAT IS DOMINANT IS THAT WE HAVE AN OVERSUPPLY OF MAJOR TRUE DISCOVERIES, AND I THINK THAT THIS IS CURRENTLY UNTENABLE. I DON'T NEED TO REMIND YOU HOW FEW NEW DRUGS GET LICENSED, EVEN THOUGH WE'RE TRYING TO PUSH THAT AGENDA AND WE HAVE THE BRIGHTEST MINDS IN THE WORLD WORKING ON SCIENCE, THE EFFORT THAT WE PUT IS TREMENDOUS AND THE REAL PROGRESS THAT YOU CAN MEASURE IN ANY WAY YOU WANT IS FAR MORE LIMITED. NOW, THIS NARRATIVE COEXISTS WITH SOME ADDITIONAL URGES. ONE OF THEM IS MAKE HASTE. YOU KNOW, RUSH. PATIENTS ARE DYING. LICENSE NEW DRUGS IMMEDIATELY. AND I DO FEEL THAT PRESSURE. AND I THINK THAT, YES, WE NEED TO MAKE HASTE, BUT THIS DOESN'T MEAN THAT WE SHOULD CUT CORNERS IN TERMS OF THE EVIDENCE THAT WE NEED TO ACCRUE. SECOND URGE IS TO BE METHODOLOGICALLY SLOPPY. OH, JUST GET AN ANSWER RIGHT AWAY. WE HAVE ALL THESE HUGE COLLECTED DATASETS. NO NEED TO DO RANDOMIZED CONTROL TRIALS, JUST RUN SOMETHING THROUGH THE MILL OF YOUR COMPUTER AND THAT'S GOING TO BE ACCURATE. AND VERY OFTEN, THIS IS NOT ACCURATE. AND THE THIRD URGE AVOID REPLICATION, DON'T WASTE TIME WITH REPLICATION. THIS IS "ME TOO" EFFORT. WE NEED TO MOVE FORTH, TO MOVE ON MORE DISCOVERIES. NOW, THE VALUE OF DISCOVERY CAN BE MODELED. LET'S SAY R IS PRE-STUDY ODDS OF RESEARCH STUDY BEING TRUE, BF IS BAYES FACTOR CONFERRED BY THE DISCOVERY DATA. H IS RATIO OF THE RATE OF NEGATIVE CONSEQUENCES FROM A FALSE POSITIVE DISCOVERY CLAIM VERSUS A POSITIVE CONSEQUENCE FROM THE TRUE POSITIVE DISCOVERY. THEN THE VALUE OF THE DISCOVERY PROCESS IS PROPORTIONAL TO TRUE POSITIVE MINUS H TIMES FALSE POSITIVE, OR IF YOU LOOK THROUGH THAT, IT IS PROPORTIONAL TO R TIMES THE BAYES FACTOR MINUS H. AND OBVIOUSLY YOU CAN ADD A CONSTANT IF YOU HAVE TWO SCIENTISTS, TWO THOUSAND SCIENTISTS, TWO MILLION, YOU CAN MULTIPLY THAT WITH HOW MUCH EFFORT IS GOING INTO THIS. R AND H ARE RATHER FIELD SPECIFIC. IF I HAVE WASTED MY LIFE IN A FIELD WHERE THERE'S NOTHING TO DISCOVER, THAT'S VERY SAD. BUT THAT FIELD JUST HAS NOTHING REALLY USEFUL TO DISCOVER. IF I WANT TO MAKE PROGRESS, I JUST NEED TO CHANGE FIELDS. SOMETIMES MAYBE THERE ARE THINGS TO BE DISCOVERED, BUT THEY WOULD REQUIRE A COMPLETELY DISRUPTIVE APPROACH ABANDONING WHAT WE DO AND JUST TAKING A COMPLETELY NEW APPROACH TO TRYING TO ATTACK THE SAME QUESTION. H IS ALSO PRETTY MUCH FIELD SPECIFIC. SO, THE FOCUS SHOULD BE -- MUST BE ON INCREASING THE BASE FACT FACTOR. THE OPTIONS FOR INCREASING THE BAYES FACTOR ARE NUMEROUS AND THEY DEPEND ON WHAT FIELD WE'RE WORKING WITH, BUT TYPICALLY RUNNING LARGER STUDIES, GETTING MORE EVIDENCE, WOULD HELP. AND ALSO ENSURING GREATER PROTECTION FROM BIAS WOULD HELP WHICH MEANS WE OPTIMIZE DESIGN, THINKING, HOW WE SHOULD GO ABOUT ANSWERING A QUESTION. IN THE VERY SAME EQUATION IT'S EASY TO GET NEGATIVE VALUES. IF YOU WANT TO AVOID HAVING NEGATIVE VALUES FROM DISCOVERY, THEN ONE NEEDS A BAYES FACTOR TO BE MORE THAN THE RATIO OF H OVER R, AND OFTEN THIS IS VERY DIFFICULT. WHY IS THAT? MOST ORIGINAL DISCOVERIES ARE CLAIMS TO COME FROM SMALL STUDIES, WHERE BIASES ARE VERY COMMON, AND THEREFORE THE BAYES FACTOR OFTEN IS EVEN LESS THAN 5, WHICH IN THE BAYESIAN WORK MEANS VERY, VERY LITTLE. AND ALSO, MOST FIELDS CURRENTLY ARE WORKING AREAS WHERE THE PRE-STUDY ODDS ARE PRETTY LOW. WE'RE ATTACKING MASSIVE SPACES OF EXPIRATION WHERE THERE ARE SIGNALS TO BE DISCOVERED, BUT THE DENOMINATOR OF ALL THE POTENTIAL THINGS THAT WE CAN MEASURE AND ALL THE THINGS WE CAN ANALYZE IS PROBABLY VERY LARGE. IF WE HAVE THAT COMBINATION, MOST DISCOVERIES ARE OPERATING IN A SPACE WHERE THEY HAVE NEGATIVE SCIENTIFIC VALUE. IT'S FAR MORE LIKELY THAT THEY WERE CONFUSERS, THEY WILL GENERATE FALSE NEGATIVES THAT WOULD LEAD PEOPLE ASTRAY AND MORE RESOURCES WASTED DOWNSTREAM, BUILDING ON SOMETHING THAT IS A FALSE NEGATIVE CLAIM, OR AN EXAGGERATED CLAIM, RATHER THAN THAT THEY WILL SAVE THE WORLD. HOW DO WE SORT OUT WHERE TO GO NEXT? WE NEED REPLICATION. WE NEED TO TAKE ALL OF THESE TENTATIVE DISCOVERIES AND TRY TO REPLICATE AND SEE WHAT STILL SURVIVES DIFFERENT EFFORTS TO REPRODUCE THESE RESULTS EITHER EXACTLY THE SAME WAY OR WITH DIFFERENT ANGLES OF TRIANGULATION. IS THAT NEW? NOT NECESSARILY. THIS IS PRETTY MUCH HOW SCIENCE STARTED IN THE WESTERN WORLD. YOU HAD TO REPLICATE AND REPRODUCE FINDINGS IN FRONT OF THE ROYAL ACADEMY AND PEOPLE WERE WATCHING TO SEE WHETHER THE APPARATUS WOULD WORK. CURRENTLY, WE MOSTLY TRUST WHAT WAS DONE BEHIND CLOSED DOORS OR BEHIND A CLOSED COMPUTER IS SOMETHING THAT WE CAN PUT TRUST ON. WE DO HAVE EMPIRICAL STUDIES ON FIELDS WHERE REPLICATION IS NOT JUST CONSIDERED TO BE A "ME TOO" TYPE OF EFFORT, AND ACTUALLY REPLICATION PRACTICES ARE COMMON. AND THESE EMPIRICAL STUDIES SUGGEST MOST OF THE INITIAL CLAIMED STATISTICALLY SIGNIFICANT EFFECTS ARE EITHER FALSE POSITIVES OR SUBSTANTIALLY EXAGGERATED. ONE SUCH FIELD IS GENETICS. GENETIC EPIDEMIOLOGY WENT THROUGH IMMENSE TRANSFORMATION OVER THE LAST 10 YEARS, MOVING FROM CANDIDATE GENE STUDIES WHERE PEOPLE HAD TO COME UP WITH SINGLE HYPOTHESES TO TEST IN AGNOSTIC FASHION IN TERMS OF ASSOCIATION WITH PHENOTYPES. ONE COULD GO BACK, BY DOING THIS, AND ASSESS HOW OFTEN THE PAPERS THAT WERE PUBLISHED IN THE CANDIDATE GENE ERA WITH SUCCESSFULLY REPLICATED, REPLICATION RATE ON AVERAGE WAS 1.2%. THIS MAY BE AN UNDERESTIMATE, I'M WILLING TO TAKE THAT TO 5 TO 10% AT MOST BUT MORE THAN 90% OF THESE PAPERS THAT WERE PUBLISHED FOR MANY, MANY YEARS, OVER 20 YEARS IN THE BEST OF OUR JOURNALS WERE PROBABLY NOT SAYING MUCH AND PROBABLY JUST FALSE POSITIVES. HERE IS ANOTHER EVALUATION, ANIMAL STUDIES, THERE'S TENS OF THOUSANDS OF ANIMAL STUDIES BEING DONE, I THINK THEY ARE TREMENDOUSLY IMPORTANT BECAUSE ANIMAL RESEARCH IS AN ESSENTIAL GATEWAY TO HUMAN CLINICAL RESEARCH. IF WE DECIDE TO MOVE DIRECTLY FROM EARLY BENCH DISCOVERY TO HUMANS, I THINK WE WILL BE TESTING LOTS OF NOISE INADVERTENTLY ON HUMANS AND WE DON'T WANT TO DO THAT. HOWEVER, YOU'RE ALL AWARE THAT FOR MANY FIELDS WHERE WE HAVE HUNDREDS IF NOT THOUSANDS OF POTENTIAL LEADS FROM ANIMAL STUDIES LIKE NEUROLOGICAL DISEASE WE HARDLY HAVE ANY SUCCESS WHEN IT COMES TO HUMANS. FOR EXAMPLE, DEMENTIA OR TREATMENT OF STROKE, WE HAVE HUNDREDS THAT SEEM TO WORK IN ANIMAL MODELS, AND THEIR LEVEL. ONE MIGHT SAY THAT THE EXPLANATION FOR THAT IS ANIMALS ARE VERY DIFFERENT FROM HUMANS. I THINK THAT THERE ARE DIFFERENCES. I HOPE I'M A LITTLE BIT DIFFERENT THAN A MOUSE BUT MY GENOME IS NOT THAT DIFFERENT. I THINK MOST OF THE PROBLEM IS NOT NECESSARILY THE DISSIMILARITY OF THE EXPERIMENTAL SYSTEM AS THE WAY RESEARCH IS DONE IN WAYS THAT BIAS CAN CREEP IN. THIS IS A SLIDE FROM A PAPER WHERE AWAY ARE WE LOOKED ACROSS THE ENTIRE LITERATURE ON ANIMAL STUDIES NEUROLOGICAL DISEASES AND FOUND PROMINENT SIGNALS OF EXCESS SIGNIFICANCE BIAS, YOU KNOW, PRESSURE TO DELIVER STATISTICALLY SIGNIFICANT RESULTS COULD BE DETECTED ACROSS THAT FIELD, AND ONCE WE STARTED SORTING OUT DIFFERENT PATTERNS OF BIAS THERE WERE VERY FEW PIECES OF EVIDENCE THAT WE MAINED TO BE PRETTY STRONG. PRE-CLINICAL RESEARCH IN THE LAST 8 YEARS WE HAVE SEEN RAPID CHANGE IN OUR UNDERSTANDING ABOUT REPLICATION AND REPRODUCIBILITY. THIS TIME, MOST OF THE LEADS CAME FROM THE INDUSTRY. AND THE INDUSTRY HAD EVERY RIGHT TO FEEL UNCERTAIN ABOUT NOT BEING ABLE TO REPRODUCE PAPERS PUBLISHED IN TOP JOURNALS BY ACADEMIC TEAMS. FOR ACADEMIC INVESTIGATORS, MAYBE THIS IS CURIOSITY, MAYBE THIS IS A PAPER IN "NATURE," MAYBE IT WAS A WAY TO AGAIN TENURE. FOR THE INDUSTRY, IT WAS AN ISSUE OF SPENDING HALF A BILLION DOLLARS ON SOMETHING THAT WOULD LEAD NOWHERE. SO WE HAD SEVERAL COMPANIES THAT LAUNCHED REPRODUCIBILITY CHECKS ON HIGH PROFILE HIGHLY CITED PAPERS FROM ACADEMIC TEAMS, PRETTY MUCH SUMMARIZING WHAT THEY HAD ALREADY BEEN SEEING IN LARGE SCALE. MOST OF THE RESULTS COULD NOT BE REPRODUCED IN THEIR HANDS. REPRODUCIBILITY RATES RANGE FROM 0 TO 25 25%. ONE SUCH FAMOUS AM EXCEL WITH AMGEN, GLEN BEGLEY CONCLUDED THAT THE FAILURE TO WIN THE WAR ON CANCER HAS BEEN BLAMED ON MANY FACTORS BUT RECENTLY A NEW CULPRIT EMERGED, TOO MANY BASIC SCIENTIFIC DISCOVERIES ARE WRONG. WE'VE SEEN SIMILAR CHECKS ACROSS VERY DIFFERENT SCIENTIFIC DISCIPLINES. PSYCHOLOGICAL SCIENCE HAS ALSO GONE THROUGH A MAJOR TRANSFORMATION. THIS IS A PAPER THAT WAS PUBLISHED THREE YEARS AGO IN " SCIENCE," A COLLABORATION OF 273 PSYCHOLOGISTS AND THEIR TEAMS TRYING TO REPRODUCE 100 OF THE CREMƒ DE LA CREME PAPERS FROM TOP JOURNALS, AND HE COULD READ RESULTS IN DIFFERENT WAYS BUT THE SUMMARY NO MATTER HOW YOU LOOK AT IT, ABOUT 2/3 OF THE TIME THE ORIGINAL RESULT COULD NOT BE REPRODUCED, AND EFFECT SIZES WERE MUCH SMALLER COMPARED TO THE ORIGINAL. WHAT IF YOU ONLY NEED "NATURE" AND "SCIENCE"? THIS IS ANOTHER REPRODUCIBILITY CHECK OF 21 PAPERS, ON AVERAGE THE REPRODUCIBILITY EFFORT REVEALED EFFECT SIZE OF 50% OF THE ORIGINAL. AND IN MANY CASES, THERE WAS NOTHING THERE. THE EFFECT WAS COMPLETELY IN THE VICINITY OF THE NULL WITH TIGHT CONFIDENCE INTERVALS. DOES IT MEAN THAT THE REPRODUCIBILITY WAS CORRECT AND THE ORIGINAL WAS WRONG? IN THESE CASES PEOPLE SYSTEMATICALLY TRIED TO FOLLOW THE EXACT RECIPE AS THE ORIGINAL STUDY AND EVEN COMMUNICATED WITH INVESTIGATORS, EVEN TRIED TO MAKE SURE THEY HAVE ALL THE CRITICAL DETAILS. THERE COULD BE MANY EXPLANATIONS. IT COULD BE THAT BOTH ARE CORRECT BUT FOR SOME SHOW WE DON'T KNOW SOME REASON WHY THEY DISAGREE. IT COULD HAVE BEEN BOTH ARE WRONG BECAUSE, AGAIN, THERE'S SOME BIASES THAT ARE CREEPING IN THAT WE'RE NOT FAMILIAR WITH. OR IT COULD BE THAT ONE OF THEM IS NOT CORRECT. BUT CLEARLY, IF YOU HAVE A SITUATION WHERE UNDER THE VERY BEST EFFORTS TO REPRODUCE YOU CANNOT GET SOMETHING TO WORK AGAIN, ONE HAS TO WONDER WOULD IT WORK IF WE WERE TO USE THAT WIDELY IN REAL LIFE, IN PATIENTS, IN COMMUNITIES, IN THE REAL WORLD. REPRODUCIBILITY EFFORTS CAN BE TRICKY, AND THEY CAN ALSO BE EMOTIONAL. THEY CAN LEAD TO WHAT THEY CALL THE REPRODUCIBILITY WARS. THE REPRODUCIBILITY EFFORT, FOR EXAMPLE, ON CANCER BIOLOGY STARTED PUBLISHING A NUMBER OF PAPERS WHERE THAT METICULOUS EFFORT TO REPRODUCE HIGH PROFILE CANCER BIOLOGY PAPERS WERE ATTEMPTED TO BE REPRODUCED, EVERYTHING HAD BEEN PRE-SPECIFIED, PRE-REGISTERED REPORTS. RESULTS WERE VERY CLEAR. BUT THEN MOST OF THE TIME IF THE ORIGINAL WAS NOT REPRODUCED, ORIGINAL INVESTIGATORS WOULD FIGHT BACK, AND YOU END UP IN A SITUATION WHERE YOU FEEL THAT REPUTATION IS AT STAKE, THERE'S VERY FIERCE EMOTIONS ABOUT WHO IS RIGHT, WHO IS WRONG, CAREERS ARE THOUGHT TO BE AT STAKE, AND INTERPRETATIONS CAN BE DIFFERENT ON WHAT IS SUCCESSFUL AND WHAT IS UNSUCCESSFUL. WE BEIEVE PEOPLE DO CARE ABOUT REPRODUCIBILITY. AND THEY SHOULD CARE. IT'S REALLY A CENTRAL PIECE OF THE SCIENTIFIC METHOD. HOWEVER, WHAT EXACTLY DO WE MEAN BY REPRODUCIBILITY? WHAT IS RESEARCH REPRODUCIBILITY? IF YOU LOOK ACROSS ALL 22 MAJOR DISCIPLINES OF SCIENCE THERE'S A RAPIDLY INCREASING USE OF THAT TERMINOLOGY, IF YOU DO TEXT MINING EXERCISE LIKE WHAT I'M SHOWING HERE, BUT PEOPLE MEAN VERY DIFFERENT THINGS. BASICALLY, YOU CAN SEPARATE REPRODUCIBILITY INTO THREE MAIN CLUSTERS, ONE IS REPRODUCIBILITY OF METHODS WHICH IS ABILITY TO UNDERSTAND OR TO REPEAT AS EXACTLY AS POSSIBLE THE EXPERIMENTAL AND COMPUTATIONAL PROCEDURES, SO AVAILABILITY OF SOFTWARE, OF SCRIPT, OF DATA, AND THE ABILITY TO PUT THEM TOGETHER AND GET THE SAME RESULT FROM THE VERY SAME DATA. REPRODUCIBILITY OF RESULTS WHICH MEANS THAT WE'RE DOING YET ANOTHER STUDY ON NEW PARTICIPANTS, NEW SAMPLES, NEW OBSERVATIONS, AND WE HOPE TO GET A RESULT THAT IS CONSISTENT, COMPATIBLE, IDEALLY AS CLOSE AS THE ORIGINAL. AND REPRODUCIBILITY OF INFERENCES WHICH MEANS THAT WE HAVE ONE STUDY, A REPLICATION OR MULTIPLE STUDIES OR BODY OF EVIDENCE, AND I ASK PEOPLE IN THE AUDIENCE, WHAT DO YOU CONCLUDE OUT OF THIS, AND WE MAY OR MAY NOT AGREE ABOUT WHAT THESE DATA AND WHAT THESE RESULTS MEAN. REPRODUCIBILITY CAN BE AFFECTED BY THE RECIPE OF RESEARCH PRACTICES THAT WE APPLY. AND YOU CAN THINK OF TWO EXTREMES OF RESEARCH PRACTICES, TWO STEREOTYPES. OF COURSE THIS IS A STEREOTYPE, DOESN'T MEAN I HAVE SOME PARTICULAR RESEARCHER IN MIND DOING THINGS WRONG BUT IF YOU THINK OF SMALL AND BIG DATA THIS IS WHAT OFTEN THAT LOOKS LIKE. SO WITH SMALL DATA, WHICH IS STILL THE MAJORITY IN MOST SCIENTIFIC FIELDS, WE HAVE THE PROTOTYPE OF THE SOLO SILOED INVESTIGATOR, SMALL TEAM, VERY LIMITED RESOURCES, LIMITED FUNDING. ONE NEEDS TO BE SUCCESSFUL. YOU KNOW, THESE THREE OR FOUR YEARS OF FUNDING END, AND YOU NEED TO SAY I HAVE SOMETHING MAJOR TO SAY AND SOMETHING MAJOR THAT WOULD LEAD ME TO MY NEXT GRANT. ACTUALLY WE NEED TO GET GOING PROBABLY NOT AFTER THREE YEARS BUT AFTER THREE DAYS TO WRITE THE NEXT GRANT. AND HOW DO YOU DO THAT? MUCH OF THE TIME THE RESULTS WILL BE QUOTE/UNQUOTE NEGATIVE OR UNIMPRESSIVE. YOU NEED TO START EXPLORING, SEARCHING, WHATEVER SPACE YOU HAVE AVAILABLE, THERE WILL BE A LOT OF CHERRY PICKING OF BEST LOOKING RESULTS, A LOT OF POST HOC INTERPRETATION, P-VALUE OF LESS THAN 0.05 ARE CONSIDERED UP IN, WITH P-HACKING IT'S POSSIBLE TO GET THERE. THERE'S NO REGISTRATION BECAUSE THAT DECREASES DEGREES OF FREEDOM, NOTED SHARING OFFERS AMMUNITION TO COMPETITORS AND NO REPLICATION BECAUSE IF YOU TRY AND FAIL YOU'VE INVERSED TWICE THE EFFORT AND YOU'RE BACK TO SQUARE ZERO AND PEOPLE SAY YOU FOUND NOTHING SO THAT'S IT FOR YOU. SOME OF THE WAYS TO IMPROVE THE VALIDITY OF SMALL SAMPLE SIZE RESEARCH ARE VERY EASY TO IMPLEMENT. THEY COST NOTHING. THEY WOULD SAVE US FROM A LOT OF TROUBLE. HOWEVER, THEY ARE NOT IMPLEMENTED. FOR EXAMPLE WE KNOW ABOUT EXPERIMENTAL BIAS FOR OVER HALF A CENTURY NOW. ROSENFELD PUBLISHED PAPERS MORE THAN HALF A CENTURY AGO. AND WE KNOW THAT IN ANIMAL EXPERIMENTS OR IN OTHER IN VIVO STUDIES BUT ALSO IN VITRO IT'S GREAT TO HAVE PEOPLE READ THE RESULTS TO BE BLINDED TO EXPERIMENTAL CONDITIONS, HOWEVER THAT HAPPENS LESS THAN 10% OF THE TIME. RANDOMIZATION IS NOT ABOUT HUMANS HERE, IT'S ABOUT ANIMAL WORK OR OTHER EXPERIMENTS. IT COSTS NOTHING. IT'S VERY EASY TO DO. IT WOULD SAVE US FROM LOTS OF TROUBLE OF IMBALANCES THAT WOULD BE CONSCIOUSLY, SUBCONSCIOUSLY BE CREATED. AGAIN, LESS THAN 30% OF THE TIME EVEN IN THE MOST RECENT STUDIES THIS IS IMPLEMENTED. WITH BIG DATA WE'RE CHALLENGING THE STATUS QUO OF SMALL STUDY RESEARCH, BUT IT DOESN'T MEAN NECESSARILY THAT OUR ODDS OF SUCCESS ARE NECESSARILY BETTER. IN THAT SITUATION, WHICH IS BECOMING MORE COMMON WITH ELECTRONIC HEALTH RECORDS, WITH LARGE OMICS DATABASE IS, AND OTHER SUCH EFFORTS, WE HAVE EXTREMELY LARGE SAMPLE SIZES. WE HAVE OVERPOWERED STUDIES THAT ARE LIKELY TO GIVE SIGNALS NO MATTER WHAT, EVEN IF NO SIGNALS ARE WORTHWHILE DETECTING, YOU WILL DETECT TONS OF THEM. THIS MEANS, AGAIN, THERE NEEDS TO BE A CHERRY PICKING PROCESS, AND, AGAIN, A LOT IS DONE POST HOC, MANY -- MOST OF THESE DATABASES ARE NOT EVEN ASSEMBLED FOR RESEARCH PURPOSES. THEY HAPPEN TO ACCRUE OVERNIGHT. I'M SLEEPING AND WAKE UP IN THE MORNING, AND THERE'S SO MANY MORE PATIENTS THAT HAVE BEEN ADDED TO THE ELECTRONIC HEALTH RECORDS SO THAT ONE OR MORE RESEARCHERS COULD TAP INTO. STATISTICAL INFERENCE TOOLS ARE A BIT DIFFERENT. PEOPLE RECOGNIZE VERY QUICKLY THAT IF THEY JUST USE A P-VALUE OF LESS THAN 0.05, EVERYTHING WILL BE STATISTICALLY SIGNIFICANT, SO THERE'S A LITTLE BIT OF MORE SOPHISTICATION MUCH OF THE TIME IN THAT SPACE, BUT VERY OFTEN YOU SEE IDIOSINCRATIC TOOLS WITHOUT CONSENSUS, PEOPLE WORKING IN SAME OR SIMILAR FIELDS, DIFFERENT APPROACH AND DIFFERENT THRESHOLD FOR CLAIMING SUCCESS. REGISTRATION REMAINS EXCEPTION, ALSO IN THAT SPACE, AND DATA SHARING DOES HAPPEN PROBABLY MORE OFTEN THAN IN FIELDS THAT USE SMALL DATASETS, BUT UNFORTUNATELY MUCH OF THE TIME THERE'S NO UNDERSTANDING WHAT IS BEING SHARED. MOST OF THE TIME DATASETS ARE DUMPED SOMEWHERE WITH SOME ACCESS WHERE SOMEONE COULD EASILY OR MORE DIFF RESULT -- DIFFICULT GET ACCESS AND YOU ASK DOES ANYONE KNOW WHAT THE DATA ARE AND WHAT THEY SHOW AND LITERALLY EVEN THE INVESTIGATOR WHO HAS GENERATED THE DATA PROBABLY DOES NOT KNOW BECAUSE IT'S JUST A BIG BLACK BOX THAT IS VERY, VERY HARD TO INTERPRET WHAT EXACTLY HAS BEEN GENERATED, LET ALONE HOW VALID IT MIGHT BE. REPLICATION DOES HAPPEN IN SOME FIELDS, AND I SHOW YOU SOME EXAMPLES. AND EVEN WHEN IT DOES NOT HAPPEN, WE MAY HAVE A SITUATION WHERE PEOPLE SAY I'M DOING A STUDY THAT IS SOMEHOW DIFFERENT, ACTUALLY THERE'S AN INCENTIVE TO JUSTIFY I'M DOING A DIFFERENT STUDY BUT YOU LOOK AT THE TWO STUDIES AND SAY YOU'RE ASKING EXACTLY THE SAME QUESTION BUT TO GET IT PUBLISHED SOMEONE HAS TO SAY IT IS DIFFERENT. METANALYSIS CAN TAKE THE SIMILAR STUDIES AND EXAMINE THEM IN THEIR TOTALITY. STUDIES ON THE SAME QUESTION WITH PRETTY MUCH THE SAME COMPARISONS WITH PRETTY MUCH THE SAME INTERVENTIONS OR THE SAME RISK FACTORS, OR THE SAME HYPOTHESES BEING ADDRESSED, AND GIVE US A SENSE OF HETEROGENEITY BETWEEN THE DIFFERENT STUDIES THAT ADDRESS FAIRLY SIMILAR QUESTIONS. HETEROGENEITY MAY BE GENUINE, MUCH OF HETEROGENEITY MAY REFLECT GENUINE DIFFERENCES ACROSS THE STUDIES. HOWEVER, IT ALSO POINT OUT DIFFERENCES THAT STEM FROM BIASES DIFFERENTIALLY EXPRESSED OR AFFECT DIFFERENTIALLY DIFFERENT INVESTIGATIONS ON THE SAME QUESTION. IF THAT'S THE CASE, WE SHOULD BE ABLE TO PICK SOME HINTS OF THE PRESENCE OF A BIAS BECAUSE IT LEAVES A PATTERN OF A PARTICULAR TYPE OF DIVERSITY, A PARTICULAR TYPE OF HETEROGENEITY ACROSS THE RESULTS THAT ARE BEING COMBINED IN THE METANALYSIS. SO THIS IS PRETTY MUCH WHAT WE DID HERE. WE PRE-SPECIFIED 17 PATTERNS OF BIAS AND THOUGHT, HOW WOULD THE LITERATURE LOOK LIKE, IF THESE BIASES WERE PRESENT, AND FOR ALL THE MET MET -- METAANALYSES WE TRIED TO ASK DO WE SEE PATTERNS. MOST BIASES COULD BE SEEN IN MOTE FIELDS. HOWEVER, SOME BIASES SEEMED TO BE HAVING MORE COMMON HINTS OF PRESENCE IN SOME FIELDS COMPARED TO OTHERS. FOR EXAMPLE, SMALL STUDY EFFECTS WHICH IS PATTERN WHERE SMALL STUDIES GIVE MORE PROMINENT PERHAPS EXAGGERATED RESULTS COMPARED TO LARGER STUDIES WAS A PATTERN SEEN VERY PROMINENTLY IN THE SOCIAL SCIENCES, WE'RE SEEING PROMINENTLY IN THE BIOMEDICAL SCIENCES AND NOT REALLY SEEN, OR VERY SOFT SIGNALS WERE SEEN IN THE PHYSICAL SCIENCES. PHYSICAL SCIENCES ARE MUCH BETTER USED TO WORKING WITH LARGE DATASETS, WITH COMMUNAL SCIENCE, WITH CERN OR BIG TELESCOPES SHARING, PHYSICISTS OR ASTROPHYSICISTS WORKING ON A PARTICULAR DOMAIN OF SCIENCE. THE SAME APPLIED TO OTHER DISCIPLINES, SOMETIMES ONE OR A ANOTHER FIELD HAD MORE OF THE SIGNAL OF ONE BIAS OR ANOTHER. BUT MOST BIASES WERE SEEN IN SCIENTIFIC FIELDS. SOLUTIONS ARE INTEGRAL TO THE PROCESS WE DO SCIENCE AND YOU CAN THINK OF REPRODUCIBILITY PROBLEMS AS A CENTRAL CONCERN IN EVERYDAY EXPERIMENTATION, IN EVERYDAY STUDY DESIGN, IN EVERYDAY WAYS OF RUNNING SCIENCE. LOTS OF PEOPLE ARE THINKING ABOUT THEM AND SOME VERY SMART SOLUTIONS HAVE APPEARED, BUT ALSO MANY OF THEM ARE VERY SPECULATIVE. THEY HAVE NO EMPIRICAL SUPPORT TO TELL US THAT THEY WILL MAKE THINGS BETTER RATHER THAN MAKE THINGS WORSE BECAUSE OF COLLATERAL DAMAGES THAT THEY MAY PROCURE. HERE'S 12 FAMILIES OF SOLUTIONS. LARGE SCALE COLLABORATION, ADOPTION OF REPLICATION, CULTURE AS A SINE QUE NON, STANDARDIZATION OF ANALYSIS, THRESHOLD FOR CLAIMING DISCOVERY OR SUCCESS, IMPROVEMENT OF STUDY DESIGN STANDARD, IMPROVEMENT IN DISSEMINATION OF RESEARCH AND BETTER TRAINING OF THE SCIENTIFIC WORKFORCE IN METHODS AND HOW IT SHOULD BE APPLIED AND STATISTICAL LITERACY AND NUMERACY. I HOPE I'VE CONVINCED YOU BY NOW SOME OF THE MOST SUCCESSFUL FIELDS IN TERMS OF CREDIBILITY ARE THOSE THAT ADOPTED LARGE SCALE COLLABORATION AND ADOPTION OF REPLICATION CULTURE. THIS IS ONE EXAMPLE OF MANHATTAN PLOTS, GENETIC EPIDEMIOLOGY TRANSFERRED TO A FIELD WE CAN SAFELY REPRODUCE SIGNALS. I'M NOT DELVING ON WHETHER SIGNALS ARE USEFUL, WHETHER THAT'S SOMETHING THAT YOU CAN TAKE TO BE PATIENTS AND CHANGE THEIR LIVES BUT AT LEAST THEY ARE THERE. SOMETIMES THE SIGNALS MAY BE TRUE BUT MAYBE NOT USEFUL. THIS IS PERFECTLY FINE. YOU KNOW, WE KNOW WHAT WE'RE DEALING WITH. WE KNOW WHAT IS THE TRUE COMPLEXITY OF THE RESEARCH QUESTIONS THAT WE'RE FACING. REGISTRATION CAN BE TRICKY IN MANY SITUATIONS. THE LEAST I WOULD LIKE TO SEE IS PEOPLE WHO ARE DOING VERY INTERESTING EXPLORATORY RESEARCH FEEL THAT THEY NEED TO SAY THAT I HAVE REGISTERED MY STUDIES. IF YOU WAKE UP AT 3:00 AFTER MIDNIGHT WITH A FANCY IDEA THAT YOU CANNOT GO BACK TO SLEEP AND NEED TO RUN TO THE LAB TO TRY IT OUT, AND YOU JUST ARE MESSING RIGHT AND LEFT WITH DIFFERENT POSSIBILITIES PROBABLY YOU DIDN'T WRITE A PRODIGAL BEFORE YOU STARTED DOING THAT BUT SOMETHING INTERESTING MAY COME OUT OF IT AND MAYBE YOU'RE THE NEXT ALEXANDER FLEMING WITH PENICILLIN. HOW LIKELY IS THAT TO BE THE CASE? NOT VERY LIKELY, BUT THERE ARE 20 MILLION SCIENTISTS AUTHORING SCIENTIFIC PAPERS. SO A FEW AMONG THOSE 20 MILLION WILL BE ALEXANDER FLEMING, HOPEFULLY. WHAT WE NEED TO DO IN THAT CASE IS JUST SAY THAT THAT WAS EXPIRATION, IT WAS MAD, WILD, CRAZY, EXCITING, FASCINATING EXPLORATION. THAT'S WHAT CAME OUT. NOW SOMEONE NEEDS TO REPRODUCE IT IN A PROSPECTIVE REPRODUCIBILITY EFFORT. LEVEL 1 WOULD BE REGISTRATION OF DATASET. I FEEL UNEASY WHEN I'M WORKING IN A FIELD THAT I DON'T KNOW HOW MANY DATASETS ARE OUT THERE THAT COULD BE PROBED AND HOW MANY DIFFERENT WAYS THEY CAN BE PROBED. REGISTERING A DATASET IS LIKE REGISTERING A NUCLEAR ARSENAL. SO I'M TELLING YOU THAT I HAVE THIS BIG DATASET IN MY COMPUTER, IT INCLUDES OBSERVATIONS ON TWO MILLION PEOPLE, AND I HAVE 2,000 VARIABLES ON EACH ONE, WHICH MEANS THAT TONIGHT IF I FEEL DEPRESSED I CAN PRESS A BUTTON ON SOME STATISTICAL SOFTWARE AND RUN SO MANY BILLIONS OR TRILLIONS OF CHI SQUARE P-VALUES AGAINST YOU, SO IT'S ONE WAY TO CONFER THE BREADTH OF POSSIBILITIES OF ANALYSIS THAT CAN BE DONE. LEVEL 2 WOULD BE REGISTRATION OF A PROTOCOL. IF A PROTOCOL EXISTS, THERE'S LOTS OF RESEARCH WHERE THERE'S NO PROTOCOL. AND SOMETIMES IT'S JUST CHI EXPLORATION BUT MOST OF THE TIME A PROTOCOL IS FEASIBLE. TIME SPENT COMING UP WITH A PROTOCOL IS ALWAYS WELL SPENT. LEVEL 3 WOULD BE REGISTRATION OF ANALYSIS PLAN IF THERE IS ONE. SOMETIMES YOU HAVE TO KNOW THIS IS THE ANALYSIS PLAN THAT I COULD THINK AHEAD OF TIME, AND THESE ARE SOME MODIFICATIONS BECAUSE OF SOME PECULARITIES I HAD NOT ANTICIPATED. LEVEL 4 WOULD BE REGISTRATION OF BOTH ANALYSIS PLAN AND RAW DATA. AND LEVEL 5 WOULD BE OPEN LIVE STREAMING, WHERE YOU ITERATIVELY COMMUNICATE WITH THE SCIENTIFIC COMMUNITY ABOUT WHAT ARE THE EXPERIMENTS I'M PLANNING TO DO, ONE RECEIVES FEEDBACK, THEY ARE REVISED, SHARED, DONE AGAIN, ITERATIVE OPEN PROCESS WITH THE WHOLE COMMUNITY. THIS IS PRETTY MUCH HOW THE CLAIM BY NASA THAT THEY HAD IDENTIFIED BACTERIA USING ARSENIC INSTEAD OF PHOSPHORUS FOR DNA BACKBONE WAS REFUTED, A PAPER IN "SCIENCE." IN THE ABSENCE OF HAVING SOME RULES IN THE SCIENCE GAME, IN MANY FIELDS ANY RESULT CAN BE OBTAINED. WHAT YOU GET IS WHAT I CALL EXTREME VIBRATION OF EFFECTS, AND AT THE EXTREME YOU GET THE GAIN OF PHENOMENON, A GREEK ROMAN GOD COULD SEE IN TWO OPPOSITE DIRECTIONS. THESE ARE DATA FROM THE NATIONAL SURVEY, METICULOUSLY COLLECTED, FOCUSING ON THE RIGHT PANEL, HORIZONTAL RATIO, MINUS P-VALUE, ASSOCIATION OF ALPHA OR VITAMIN E LEVELS WITH THE RISK OF DEATH WITH ONE MILLION POINTS ON THE PLOT, ONE MILLION RESULTS ONE COULD GET IN THE SAME DATASET, ON THE VERY SAME QUESTION, JUST ANALYZING THE DATA SLIGHTLY DIFFERENT. FOR EXAMPLE, DEATH CAN BE AFFECTED BY ZILLIONS OF FACTORS, SO IF YOU ACCOUNT FOR A FACTOR OR NOT YOU HAVE TWO VOICES. IF YOU HAVE 19 SUCH CHOICES TO MAKE THIS IS 2 TO THE 19th POWER, THIS IS ONE MILLION DIFFERENT POSSIBILITIES OF ANALYZING THE SAME DATA ON THE SAME DATASET, ON THE SAME QUESTION. 70% OF THE TIME VITAMIN E DECREASES RISK OF DEATH. 30% OF THE TIME, VITAMIN E INCREASES RISK OF DEATH. IF I HAVE A STRONG BELIEF ON WHAT VITAMIN E SHOULD DO, I CAN GET THAT RESULT. NO MATTER WHAT. THIS IS REALLY HAPPENING ON A DAILY BASIS. IT IS HAPPENING ON SOME OF THE MOST INFLUENCE PAPERS THAT YOU WILL SEE IN THE LITERATURE. THIS IS A PAPER THAT LAST YEAR WAS ONE OF THE 20 HIGHEST IMPACT PAPERS ACROSS ALL SCIENCE, PRACTICALLY CONCLUDED WITH THREE CUPS OF COFFEE PER DAY YOUR RISK OF DEATH DECREASES BY 17%. IF THEY GIVE ME THE DATASET, I COULD MAKE IT INCREASE BY 50%, IT'S AN OPEN PLEDGE. TRANSPARENCY, HOW CAN WE FIND OUT THAT WE CAN EVEN TRUST THE DATA? THAT THERE'S NOT AN A MISSING FROM TRANSPARENCY, FOR EXAMPLE. WE NEED TO FIND WAYS TO MAKE SHARING EASIER. THIS IS A PIVOTAL STUDY, 329, SOUNDS LIKE A SUBMARINE BUT IT'S A RANDOMIZED TRIAL. WHEN IT WAS DONE, PUBLISHED IN 2001 BY SMITHKLINE MEACHAM, IT SHOWED THIS IS SAFE FOR MAJOR DEPRESSION ADOLESCENTS. 15 YEARS LATER THEY CONCLUDED THEY ARE NOT EFFECTIVE AND NOT SAFE FOR MAJOR DEPRESSION IN ADOLESCENTS. HOW OFTEN DOES THAT HAPPEN? A FEW YEARS AGO WE LOOKED AT REANALYSIS THAT HAD BEEN DONE ON THE SAME CLINICAL QUESTION FROM THE SAME CLINICAL TRIAL DATASET. THESE ARE REANALYSES RATHER THAN REPUTATIONS BUT IF WE CANNOT REANALYZE AND HAVE SOME CONFIDENCE WHY SHOULD WE EVEN TAKE THE SECOND STEP OF DOING A SEPARATE INDEPENDENT REPLICATION ON A NEW STUDY? WE FOUND 37 SUCH REANALYSES AND 35% OF THE TIME THE CONCLUSION, THE MAIN CONCLUSION OF THE REANALYSIS WAS DIFFERENT COMPARED TO THE CONCLUSION OF THE ORIGINAL PAPER. THIS TREATMENT SHOULD BE USED, NO THIS TREATMENT SHOULD NOT BE USED, IN THIS SUBGROUP, NO IN THAT SUBGROUP. WAS IT RESEARCH PARASITES WHO HAD RUN THESE REANALYSES? WAS IT ROGUE ANALYSTS TRYING TO MAKE A CAREER OF THEMSELVES OR PUT THE GREAT ORIGINAL INVESTIGATORS TO SHAME? ALMOST ALWAYS IT WAS THE SAME INDIVIDUAL INVESTIGATORS WHO REPUBLISHED BUT IN A PUBLICATION ENVIRONMENT WHERE IF THEY WERE TO SAY I TRIED TO REANALYZE MY DATA AND FOUND EXACTLY THE SAME THING AND CONCLUDE THE SAME CONCLUSION SOMEONE WILL TELL THEM WHY DID YOU WASTE YOUR TIME? THIS IS DUPLICATE PUBLICATION. SO, OUR INCENTIVE SYSTEM SELECTS FOR CONFUSION, SELECTS FOR DISCORDANT RESULTS, EVEN FOR THINGS THAT ARE DONE BY THE SAME INVESTIGATORS. RECENTLY WE VISITED THAT PATTERN, LOOKING AT ANOTHER EXTREME WHERE REANALYSIS ACTUALLY SHOULD HAVE BEEN THE DEFAULT. PLOS MEDICINE AND BMJ HAVE POLICIES IN PLACE IF YOU PUBLISH A RANDOMIZED TRIAL YOU NEED TO MAKE NOT ONLY THE FULL PROTOCOL BUT ALL DATA AVAILABLE TO ANYONE WHO WOULD ASK FOR THEM. ALONG WITH THE COLLEAGUES FROM MY METRICS TEAM, WE INVITED INVESTIGATORS WHO PUBLISHED UNDER THIS POLICY TO SHARE THE DATA WITH US, AND WE PROMISED WE WOUL REANALYZE THEM AT NO COST FOR FREE. IT'S LIKE GETTING AN INVITATION FROM IRS. [LAUGHTER] BUT I'M REALLY GLAD THAT ALMOST 50% OF THESE INVESTIGATORS ACTUALLY DID SEND US THEIR DATA, AND THEY ALSO WERE VERY HELPFUL WITH THAT AND WE REANALYZED AND FOUND A FEW ERRORS BUT NOTHING MAJOR. THE CONCLUSIONS WOULD STILL REMAIN THE SAME. ONE EXTREME IS HIGHLY SELECTIVE ENVIRONMENT WHERE PEOPLE ARE JUST TRYING TO IMPRESS AND SHARE VERY LITTLE, AND THE OTHER A VERY SELECTIVE ALSO ENVIRONMENT WHERE EVERYONE IS WILLING TO SHARE, 50% ARE WILLING TO SHARE, AND THEY ARE VERY OPEN TO HAVING THEIR DATA REANALYZED AND EVERYBODY LOOKS FINE. ONE MIGHT ARGUE MAYBE THESE PEOPLE WHO DID CONTRIBUTE DATA TOOK AN EXTRA LOOK AND IF THEY FOUND MAJOR ERRORS MADE SURE THEY SENT US A VERSION THAT WOULD BE COMPATIBLE. I DON'T WANT TO BECOME PARANOID. I BELIEVE THAT IF WE CREATE A CULTURE OF SHARING, MOST OF THE RESULTS THAT WE WILL GET IF PEOPLE ARE WELL TRAINED, HOPEFULLY WILL BE REANALYZABLE AND REPRODUCIBLE AT THAT LEVEL SO HOW DO WE IMPROVE SHARING? SHARING IS A CHALLENGE IN ITSELF, AND WHEN WE TRIED TO RENEW, TO RETRIEVE, THE MOST HIGHLY CITED PAPERS DATASETS, THE DATA BEHIND THE MOST CITED PAPERS IN PSYCHOLOGY AND PSYCHIATRY, WE MET WITH QUITE A LOT OF RESISTANCE. YOU KNOW, THESE TRIALISTS AND THESE STUDIES WERE NOT BOUND BY THE PLOS MEDICINE AND BMJ RULES AND COULD OR COULD NOT SHARE, MIGHT NOT SHARE THEIR DATA WITH US. WE REALIZED THAT IN SOME CASES THEY HAD MADE THEIR DATA AVAILABLE ALREADY, AND IN FEW MORE THEY WERE WILLING TO SHARE THAT INFORMATION. BUT VERY OFTEN THEY COULD NOT OR DID NOT. WHAT ARE THE REASONS FOR THAT IN THE MOST COMMON REASON WAS THAT IT WAS OUTSIDE OF THE RESEARCHER'S CONTROL. I HAVE COME ACROSS MANY, MANY SITUATIONS WHERE RESEARCHERS DID NOT CONTROL THEIR OWN DATA. SOMETIMES RESEARCHERS PUBLISHED AS FIRST OR LAST AUTHORS OR BOTH, AND THEY HAVE NEVER SEEN THE DATA THEY PUT THEIR NAMES ON. IT'S REALLY SCARY. LEGAL AND ETHICAL CONCERNS, THE CONSENT DID NOT ALLOW THAT, WAS DONE IN THE '90s OR SO, THE DATA NO LONGER EXISTS, CLASSICAL EXAMPLE DATA WAS EATEN BY TERMITES, A FAMOUS QUOTE. RESEARCHERS ARE STILL USING DATA, FOR HOW LONG, SIX MONTHS, TWO YEARS, TWENTY YEARS? ARE WE MAKING PROGRESS IN SHARING? WE ARE. A FEW YEARS AGO ALONG WITH SHELLY SCULLY FROM NIH WE LOOKED AT REPRODUCIBILITY RESEARCH AND TRANSPARENCY PRACTICE AND FOUND HARDLY ANYTHING WAS SHARED BETWEEN 2000 AND 2014. SOME ISSUES LIKE GENOMICS ARE DOING THIS BUT IN THE BIG VIEW OF ZOOMING OUT THAT WAS VERY, VERY UNCOMMON. CONVERSELY, WHEN WE LOOKED AT 2015 TO 2017 THERE WAS REAL PROGRESS. AND IN SOME CASES THAT PROGRESS CAN BE EXPLAINED BECAUSE GENERALS DID CHANGE POLICIES, FOR EXAMPLE IN PSYCHOLOGY -- JOURNALS DID CHANGE POLICIES, SOME SWITCHED TO ENCOURAGING ROUTINE SHARING, AND YOU SEE RAPIDLY INCREASING RATE OF SHARING DATASETS IN THE PUBLISHED PAPERS IN THE JOURNALS, BUT IN BIOMEDICINE I THINK IT'S BEEN MORE OF A DIVERSE MOVEMENT WHERE MULTIPLE JOURNALS, FIELDS, INSTITUTIONS ARE INCENTIVIZING OR FACILITATING SHARING, GOING FROM 0% TO SOMETHING LIKE 25% BY 2017. WE ALSO SEE SOME OTHER CONCOMITANT CHANGES IN TRANSPARENCY INDICATORS. DISCLOSURE OF FUNDING, FOR EXAMPLE, OR DISCLOSURE OF CONFLICT OF INTEREST HAS GONE UP OVER THE YEARS. MOST PEOPLE STILL CLAIM THEY HAVE SOMETHING NOVEL TO SAY, EVEN IF YOU LOOK AT THE ABSTRACT IT'S VERY PROMINENT, BUT THERE ARE MORE PEOPLE WHO SAY THAT I'M TRYING TO REPLICATE SOMETHING OR AT LEAST TRYING TO DO SOMETHING BUT AT THE SAME TIME REPLICATE EXISTING KNOWLEDGE. COMPUTATIONAL METHODS CAN FACILITATE MUCH OF OUR REPRODUCIBILITY QUEST. A COUPLE YEARS AGO WE PUBLISHED GUIDELINES TRYING TO ENHANCE WAYS THAT JOURNALS, INVESTIGATORS AND INSTITUTIONS COULD MOVE FORWARD IN IMPROVING THEIR REPRODUCIBILITY FOR COMPUTATIONAL METHODS, INCLUDING SOFTWARE, SCRIPT, AND LINKED TO DATA. VERY OFTEN YOU SEE A LINK, AND YOU CLICK ON THAT LINK AND THERE'S NOTHING THERE. YOU GET AN ERROR SIGNAL. SO THERE'S SOME VERY EASY INTERVENTIONS THAT CAN IMPROVE AVAILABILITY OF THESE FUNCTIONAL LINKS, BUT THERE'S ALSO MORE SOPHISTICATED ONES THAT CAN TAKE US SUBSTANTIALLY FURTHER. BETTER STATISTICS AND METHODS, WE HAVE SEEN A TRANSFORMATION OF RESEARCH OVER THE LAST SEVERAL DECADES, AND DATA SCIENCE HAS TAKEN A CENTRAL ROLE ACROSS MULTIPLE SCIENTIFIC FIELDS. IS THAT NEW? WELL, SCIENCE HAS ALWAYS BEEN ABOUT DATA. BUT I THINK THAT WHILE IN THE PAST IT MIGHT HAVE BEEN EASY TO WORK WITH SMALL DATA SETS FOR DESCRIPTIVE PURPOSES, CURRENTLY YOU REALLY NEED A LICENSE TO KILL. A LICENSE TO ANALYZE. AND MOST OF THE TIME, MOST OF THE PAPERS THAT I SEE PROBABLY ARE DONE BY INVESTIGATORS WHO DON'T HAVE THAT LICENSE TO ANALYZE. IT'S VERY UNCOMMON TO SEE TRANSPARENT STATISTICAL ANALYSIS PLANS. WE LAMENT WE DON'T INVESTMENT MUCH IN TRAINING OUR INVESTIGATORS IN STATISTICS AND ISSUES OF DESIGN AND THAT GOOD STUDY DESIGNS ARE UNDERUTILIZED. I MENTIONED THE VERY SIMPLE TRACES OF RANDOMIZATION AND BLINDING OF INVESTIGATORS SO UNDERUSED. HOW DO WE DO THAT, JUST USE A CHECKLIST, ADD A LEVEL OF BUREAUCRACY AND SAY FILLING THAT CHECK LIST, IS THAT ENOUGH? IF SOMEONE HAS DONE SOMETHING IN A WAY THAT IS VERY SUBSTANDARD, IF IT'S REALLY SILLY, HOW LIKELY IS IT THAT IT WILL BE ACKNOWLEDGED RATHER THAN SOMEONE CHECK THAT CHECK LIST, DID I, THAT I'M OKAY? IT'S A CONTINUOUS EXTENSION. I WAS USING THE RIGHT STATISTICAL METHODS. THERE ARE MANY PLEAS TO TRY TO CHANGE THE WAY WE MAKE STATISTICAL INFERENCES AND I'M NOT GOING TO SPEND MUCH TIME ON THEM BECAUSE EACH ONE OF THEM HAS A DIFFERENT PHILOSOPHY MIND. SO ONE SUCH PLEA IS TO BECOME MORE STRINGENT. MANY FIELDS THAT HAVE IMPROVED THEIR TRACK RECORD DID BECOME MORE STRINGENT, GENETIC EPIDEMIOLOGY MOVED FROM 0.05 TO 10 TO THE MINUS 9 AND THINGS SEEM TO BE WORKING BETTER IN TERMS OF OF REPRODUCIBILITY. FOR FIELDS WITH P-VALUES OF 0.05, YOU COULD MOVE TO 0.005 FOR STATISTICAL SIGNIFICANCE, THAT WOULD IMMEDIATELY ELIMINATE PROBABLY A LARGE SEGMENT OF NOISE. BUT THAT WOULD ALSO TAKE AWAY SOME GENERAL SIGNAL. SO THIS NEEDS TO BE BALANCED IN EACH FIELD IN TERMS OF WHETHER IT'S A GOOD IDEA OR NOT. MOST ARE STILL USING NULL HYPOTHESIS SIGNIFICANCE TESTING, NOT A GOOD CHOICE, FOR EXAMPLE FOR EVALUATING A THERAPY OR MINING ELECTRONIC HEALTH RECORDS OR MINING BIG DATA. WE NEED TO FIND STATISTICAL METHODS THAT ARE FIT FOR PURPOSE AND VERY OFTEN THESE MAY BE BAYESIAN, MAY BE FALSE DISCOVERY RATE BASED, SOMETIMES THEY MAY -- WE DON'T INVEST ON TRAINING INVESTIGATORS AND RETRAINING THEM WITH CONTINUOUS EDUCATION ON STATISTICAL METHODS AND DESIGN ISSUES. WE JUST TRY TO CATCH UP WITH THE NEXT TOOL, NEXT TECHNICAL TOOL THAT MAY BE AVAILABLE BUT NOT THE CORE OF THE SCIENTIFIC METHOD. CONFLICTS OF INTEREST, I THINK THAT THERE IS IMPROVEMENT IN TRANSPARENCY OF REPORTING CONFLICTS OF INTEREST, BUT VERY OFTEN I WONDER IS TRANSPARENCY ENOUGH. CAN WE LIVE THE GENERATION OF ORIGINAL KNOWLEDGE AND REPLICATION TO CONFLICTED STAKE HOLDERS? WHO ARE THE STAKEHOLDERS WHO SHOULD BE RUNNING SENSITIVE STUDIES LIKE RANDOMIZED TRIALS, METANALYSIS, COST EFFECTIVENESS ANALYSIS GUIDELINES? NIH HAS BEEN SHIFTING AWAY FROM MOST OF THOSE. AND THERE'S EVIDENCE IF YOU LOOK AT TRIALS SUPPORTED BY NIH BEFORE REGISTRATION A GOOD PROPORTION WERE STATISTICALLY SIGNIFICANT IN RESULTS. AFTER REGISTRATION, THIS HAPPENS UNCOMMONLY. SHOULD PUBLIC ENTITIES LIKE NIH RESUME AND EXPAND THEIR ROLE ABOUT SUPPORTING SENSITIVE RESEARCH WHERE CONFLICTS REALLY NEED TO BE AVOIDED THOROUGHLY IF WE WANT FULL TRUST IN THEM? SHOULD WE WAIT FOR PERFECTION? NO. SCIENCE WILL NEVER BE PERFECT. IT'S THE BEST THING THAT HAS HAPPENED TO HUMAN BEINGS, TO HOMO SAPIENS, BUT IT'S A PROCESS IN EVOLUTION. WE NEED TO USE THE BEST SCIENCE WE HAVE. I HEAR, OKAY, LET'S GET RID OF ALL THAT JUNK, THAT HORRIBLE RESEARCH, AND WE KNOW NOTHING. THAT'S NOT TRUE. WE HAVE LIGHTS. WE HAVE WONDERFUL AMPHITHEATERS, THE PROJECTOR IS WORKING. I CAN MOVE MY SLIDES. LOTS OF THINGS HAVE HAPPENED, AND I THINK WE NEED TO DEFEND SCIENCE AND THERE WILL BE MANY ANTI-SCIENCE VOICES TRYING TO DISMANTLE THE SCIENTIFIC EFFORT. THERE ARE TWO WAYS WE CAN GO ABOUT IT. ONE IS SAY SCIENCE IS PERFECT AND PROBABLY THAT'S NOT GOING TO LEAVE US VERY MUCH ROOM TO PLAY BECAUSE VERY QUICKLY WE WILL MEET WITH THE SCIENTIFIC METHOD ITSELF WHICH SAYS SCIENTISTS -- VERY OFTEN WE MAY BE WRONG OR SCIENCE IT OUR BEST SHOT, THIS IS WHAT WE NEED TO DEFEND. SOMETIMES OUR BEST SHOT WILL HAVE LESS CREDIBILITY COMPARED TO OTHER TIMES. WE KNOW WITH HIGH CERTAINTY ABOUT CLIMATE CHANGE, WE KNOW WITH HIGH CERTAINTY THAT TOBACCO IS GOING TO KILL PEOPLE. WE KNOW FAR LESS ON WHETHER BROCCOLI IS GOING TO MAKE ME LIVE LONGER. SO, WE NEED TO BE TRANSPARENT ABOUT WHAT WE KNOW AND DO NOT KNOW. WE NEED MORE RESEARCH ON RESEARCH. I'M CLEARLY BIASED BECAUSE THIS IS WHERE I AM INVESTING MY EFFORT. SO PROBABLY I'M ASKING FOR MORE FUNDING, IT'S LIKE, BUT I THINK WE NEED TO STUDY HOW EXACTLY TO EVALUATE OUR RESEARCH PRACTICES. WE NEED TO FIND WAYS THAT YOU CAN REFUTE EVERYTHING THAT I TOLD YOU TODAY WITH MORE EMPIRICAL DATA AND BETTER SCIENCE. WE NEED TO FIND HOW WE CAN BEST PERFORM RESEARCH, COMMUNICATE RESEARCH, VERIFY RESEARCH, EVALUATE RESEARCH AND REWARD RESEARCH. WHAT SCIENTIFIC WORKFORCE AND WORLD OF SCIENCE ARE REENVISIONING WE CAN MODEL THAT, WE CAN MODEL SCIENCE IN 2030 OR 2040. IS IT GOING TO BE ACCURATE? WELL, I THINK THAT WE MAY WELL BE WRONG. BUT THIS IS ONE SUCH MODEL OF SCIENCE IN THE FUTURE. WE USED 11 EQUATIONS TO TRY TO DESCRIBE A SIMPLIFIED UNIVERSE OF SCIENCE WHERE YOU HAVE THREE TYPES OF SCIENTISTS. DILIGENT ARE THE MAJORITY. YOU HAVE THE CARELESS COHORT OF SCIENTISTS WHO MIGHT BE CUTTING A FEW CORNERS, MAYBE NOT VERY WELL TRAINED, MAY BE FOLLOWING SUBOPTIMAL RESEARCH PRACTICES. HOW MANY ARE THEY? THEY MADE SURVEYS ABOUT THAT, WHEN YOU ASKED PEOPLE ARE YOU CUTTING CORNERS THE ANSWER IS USUALLY NO. WHEN YOU ASK PEOPLE, DO YOU KNOW OTHER SCIENTISTS IN YOUR ENVIRONMENT WHO ARE CUTTING CORNERS, THE ANSWER IS ALMOST ALWAYS YES. [LAUGHTER] BUT I THINK LET'S SAY THESE ARE THE MINORITY, AND THEN WE HAVE THE UNETHICAL COHORT, CLEAR FRAUDS, YOU KNOW, CREATING DATA THAT DON'T EXIST. FRAUD IS VERY UNCOMMON, WE'RE TALKING LESS THAN 1%. IF YOU HAVE INCENTIVIZE THESE COHORTS WITH THE SAME INCENTIVES, IF YOU DISCOVER SOMETHING, IF YOU MAKE A CLAIM, PUBLISH A PAPER IN "NATURE," YOU WILL BE FINED. AND YOU DON'T HAVE DIFFERENTIAL INCENTIVES BASED ON REPRODUCIBILITY, WHAT YOU'LL GET IS THAT THE UNETHICAL AND CARELESS COHORT WILL TAKE OVER. THE REASON FOR THAT IS THAT THEY CAN GET THEIR FASTER WITH FEWER RESOURCES, WITH CUTTING CORNERS. SO WE NEED TO REENGINEER REWARDS SYSTEM. I'LL LEAVE YOU WITH A COUPLE SLIDES ON HOW WE DO THAT. WE TRY TO INCENTIVIZE PRODUCTIVITY, AND PRODUCTIVITY IS WONDERFUL, BUT WE NEED TO THINK ABOUT THE WHOLE ELECTRO ELECTROCARDIOGRAM, QUALITY, REPRODUCIBILITY, SHARING AND TRANSLATION IMPACT. PQRP. WE NEED TO FIND OPPORTUNITIES TO CHANGE THE WAY WE DO SCIENCE IN OUR EVERYDAY ENVIRONMENT. IT NEEDS TO BE A GRASS ROOTS MOVEMENT. WE CANNOT JUST WAIT FOR SOME KING OF SCIENCE TO IMPOSE WITH HIS OR HER AUTHORITY OR QUEEN OF SCIENCE WHAT EXACTLY SHOULD BE DONE. I THINK IT'S SCIENTISTS WHO REALIZE WHAT MAKES THEIR SCIENTIFIC WORK MORE CREDIBLE, MORE REPRODUCIBILITY, MORE APPLICABLE. WE NEED TO CONVINCE OTHER STAKEHOLDERS. SCIENTISTS WANT TO PUBLISH A LOT AND WANT FUNDING BUT THERE'S THE PRIVATE INVESTORS, PRIVATE, NOT FOR PROFIT FUNDERS, EDITORS, PUBLISHERS, SOCIETIES, UNIVERSE, RESEARCH INSTITUTIONS. NON-SCIENTIFIC STAFF, HOSPITALS, INSURANCE COMPANIES, GOVERNMENTS, FEDERAL AUTHORITIES, PEOPLE. SOME OF THEM WANT TO SEE PAPERS, OTHERS WANT TO SEE FUNDING. SOME WANT TO SEE THINGS THAT WORK, OTHERS WANT TO MAKE PROFIT. IT'S ALL FINE. BUT IT NEEDS TO BE INTEGRATED TO TRY TO GET THE BEST POSSIBLE SCIENCE. TO CONCLUDE, I THINK THAT THE PRESUMED DOMINANCE OF ORIGINAL DISCOVERY OF REPLICATION IS AN ANOMALY. IT'S REALLY THE EXCEPTION. ORIGINAL DISCOVER CLAIMS TYPICALLY HAVE SMALL OR NEGATIVE VALUE AND SCIENCE BECOMES WORTHY MOSTLY BECAUSE OF REPLICATION. THERE IS SUBSTANTIAL ROOM FOR IMPROVEMENT, DOESN'T SEEM SIGH AS -- SCIENCE IS NOT GOOD, IT'S THE BEST THAT CAN HAPPEN TO HUMANS BUT WE CAN MAKE IT BETTER. THERE ARE MANY POSSIBLE INTERVENTIONS THAT MAY IMPROVE EFFICIENCY, AND TRANSPARENCY, OPENNESS AND SHARING IS LIKELY TO HELP, BUT DETAILS ON HOW TO CAN BE IMPORTANT AND WE NEED TO FIND HOW DETAILS PLAY OUT IN DIFFERENT SETTINGS. A MILLION THANKS TO YOU FOR LISTENING AND SPECIAL THANKS TO A NUMBER OF COLLABORATORS THAT HAVE JOINED FORCES WITH ME OVER THE YEARS TO GENERATE SOME OF THE EMPIRICAL EVIDENCE I SHARED WITH YOU TODAY. THANK YOU. [APPLAUSE] I'M ONE OF THE BIGGEST FANS YOU HAVE. MY LAB IS DEFINITELY READING YOUR PAPERS. I THINK IT'S ABSOLUTELY TRUE THAT INCENTIVE SYSTEM IS COMPLETELY WRONG. SCIENCE, AS YOU SHOWED IT, IT'S REALLY A SYSTEM, RIGHT? AND IT WORKS BASED ON THE INCENTIVES THAT ARE PUT INTO IT. I HAVE RECENTLY TENURED HERE. IN TEN YEARS OF MY TENURE TRACK, NOBODY EVER GAVE ME ANY CREDIT FOR REPRODUCIBILITY. ALMOST WHEN WE PRESENT DATA AT THE CONFERENCES, I THINK WE ARE ENEMIES OF MAJORITY, WHEN WE PUBLISH -- I MEAN MY LAB PROBABLY HASN'T PUBLISHED STUDY WHERE WE DIDN'T INCLUDE INDEPENDENCE VALIDATION COHORT IN THE PAST FIVE YEARS. WE USUALLY GET FROM REVIEWERS, OH, THEY SHOULD HAVE SECOND INDEPENDENT VALIDATION COHORT, RIGHT? WHEN, YOU KNOW, 90% OF EVERYTHING THAT IS PUBLISHED OR PRESENTED DOESN'T HAVE EVEN THE FIRST ONE, RIGHT? SO UNLESS WE CHANGE THE INCENTIVES, UNLESS WE USE, FOR EXAMPLE, THESE H FACTORS THAT FACTOR IN WHAT KIND OF METHODOLOGICAL, YOU KNOW, ADVANCES WE HAVE USED OR THE CHECKLIST, WHAT WE DID, WHETHER OUR WORK IS REPRODUCIBLE, IRRESPECTIVE WHETHER IT'S POSITIVE OR NEGATIVE SCIENCE WILL CONTINUE THE WAY IT'S CONTINUING. >> I SYMPATHIZE WITH WHAT YOU DESCRIBE AND THINK THIS IS A FEELING MANY SCIENTISTS IN DIFFERENT FIELDS CONVEY TO ME ALL THE TIME BUT I THINK THERE IS PROGRESS. YOU HAVE FIELDS WITH A DOMINANT VIEW THIS IS IMPORTANT TO DO, AND I THINK THAT THESE FIELDS ARE LIKELY TO BE MORE SUCCESSFUL IN THE LONG TERM, AND IT'S NOT GOING TO HAPPEN OVERNIGHT. I THINK THAT IF WE FOCUS ON TRAINING YOUNGER SCIENTISTS AND HOPEFULLY RETRAINING SOME OLDER ONES ABOUT WHAT REALLY MATTERS AND WHY WE'RE DOING THIS AND WHY IT IS IMPORTANT TO DO GOOD SCIENCE RATHER THAN SLOPPY SCIENCE, I THINK THIS IS LIKE AN EVERYDAY EFFORT, EVERYDAY STRUGLE, NOT ONE COURSE ONE WOULD TAKE. I HEAR THE QUESTION IS THERE ONE COURSE I CAN TAKE? NO, THIS IS SCIENCE IN CONTINUITY, YOUR EVERYDAY METHOD AND APPLICATION, AND THERE'S NO SINGLE COURSE THAT CAN REPLACE YOUR EXPERIENCE AS A SCIENTIST. >> THANK YOU FOR THAT TALK. I AGREE IT'S GOING TO TAKE A CULTURAL CHANGE TO MOVE FORWARD CONSIDERING WE'RE AT NIH AND THERE MIGHT BE PROGRAM STUFF, IN THE AUDIENCE. DO YOU HAVE SUGGESTION ON THE FUNDING SIDE AS A MAJOR FUNDER OF BIOMEDICAL RESEARCH. >> SUGGESTIONS APPLIED TO NIH, NIH HAS TREMENDOUS POWER TO FACILITATE PRINCIPLES AND HAS MOVED IN THAT DIRECTION ON MANY FRONTS, PROBABLY NOT ALL THE MOVEMENTS HAVE BEEN EQUALLY EVIDENCE-BASED BUT PEOPLE WHO ARE RECEPTIVE AND WANT TO CHANGE THE WAY THINGS ARE DONE. CLEARLY IF YOU HAVE NIH GIVING PRIORITY TO INCENTIVE STRUCTURES, TO OPENNESS AND SHARING, AND FOCUSING ON GOOD WORK AS OPPOSED TO JUST QUICK AND SUCCESSFUL, IT CAN BE A TREMENDOUS IMPACT. >> YES, THERE WAS A HIGH PROFILE CASE OF THERANOS, STATISTICSS CALLED THEM OUT AND SAID IT WAS NOT GOING TO WORK, THE BIOTECH COMPANY, DONE WITH INCOMPLETE KNOWLEDGE OF UNDERLYING DATA FRO THE COMPANY. SO THE QUESTION IS, WHAT DOES GUT ININSTINCT TELL YOU WITH RESPECT TO HOW SOMETHING IS GOING TO TURN OUT, INTERESTING IF YOU TOOK THE GUT INSTINCT AND SAID I THINK THIS IS GOING TO BE PROFOUND VERSUS TURN OUT TO BE FAKE, REGARDLESS OF UNDERLYING STATISTICAL VALUE. COULD YOU COMMENT ON THAT? I THINK THAT WOULD BE REALLY IMPORTANT IN TERMS OF BIOTECH I.P.O.s. >> TRANSPARENCY APPLIS JUST AS WELL, IN THE CASE OF BIOTECH AND STARTUPS AND UNICORNS. ACTUALLY I THINK THAT THE FIRST TO WRITE THE CRITICAL PAPER ON THERANOS IN JAMA A YEAR BEFORE JOHN STARTED PUBLISHING HIS "WALL STREET JOURNAL" INVESTIGATIONS, AND I SENT THAT TO JAMA IN 2014, WENT THROUGH REVIEW AND LEGAL REVIEW BECAUSE I WAS CHALLENGING THE HIGHEST VALUATION START-UP IN THE COUNTRY VISITED BY VICE PRESIDENTS AND ALL THE INFLUENTIAL PEOPLE WERE ON THE ADVISORY BOARD, AND EVERYBODY WAS SO EXCITED. I WAS SAYING THEY HAVE NO EVIDENCE TO SUPPORT WHAT THEY CLAIM. I DON'T SEE ANY PAPER THAT THEY HAVE PUBLISHED. I THINK THEIR VALUATION IS $9 BILLION BUT COULD BE $9 ACTUALLY, THE PAPER WAS PUBLISHED EVENTUALLY IN JAMA. I GOT A LOT OF PUSHBACK AT THAT TIME. I HEARD FROM THE GENERAL COUNSEL ASKING ME TO RECANT, AND THAT THEY WERE GETTING FDA APPROVAL WHEN THEY DID GET THEIR FIRST AND SINGLE AND ONLY FDA APPROVAL, ALL OF THAT IS WAY BEFORE THE "WALL STREET JOURNAL." I WAS TOLD AGAIN TO RECANT AND WRITE AN EDITORIAL WITH THEIR CEO THAT I WAS WRONG, AND I REMEMBER WASHINGTON POST WRITING A STORY LIKE THE INSANELY INFLUENTIAL STANFORD PROFESSOR ASKING TOO MUCH OF THERANOS AND WHAT A SHAME, YOU DON'T UNDERSTAND HOW INNOVATION IS HAPPENING. PEOPLE NOW THINK THERANOS WAS AN EXCEPTION. IT WAS A FRAUD AND EVERYBODY IS FINE. I DON'T BELIEVE THAT. A COUPLE MONTHS AGO WE PUBLISHED A PAPER LOOKING AT EVERY UNIFORM IN THE HEALTH CARE SPACE, THE MAJORITY OF THEM LOOKED PRETTY MUCH LIKE THERANOS IN TERMS OF THEIR TRANSPARENCY AND AVAILABILITY OF PUBLISHED PEER-REVIEWED SCIENCE. SO I'M NOT SAYING IT'S FRAUD. BUT UNLESS WE IMPROVE TRANSPARENCY IN SCIENCE FOR THESE ENTITIES I THINK THAT WE'RE RUNNING THE RISK OF HAVING DEJA VU THERANOS TIMES TWO, TIMES THREE, TIMES FOUR IN THE NEAR FUTURE. >> THANK YOU FOR THE TALK. I'M IAN HUTCHINS, A DATA SCIENTIST AT THE NIH AND SPEND A LOT OF TIME DEVELOPING AND USING RESEARCH ASSESSMENT METRICS FOR PORTFOLIO ANALYSIS. ONE OF THE THINGS I OBSERVED IS THERE SEEMS TO BE A LOT OF CULTURAL AND POLICY BARRIERS TO REPRODUCIBILITY EFFORTS. JOURNALS WILL OFTEN HAVE THEY WON'T PUBLISH REPLICATION STUDIES, EVEN PLOS ONE UNTIL RECENTLY HAD THAT POLICY IN THERE. I HAVEN'T LOOKED SPECIFICALLY BUT ONE IMAGINES APPLICATIONS FOR FUNDING THAT FOCUS ON REPLICATING A PREVIOUS STUDY DON'T FARE THAT WELL IN THE SYSTEM. HOW DO YOU THINK THE LOG JAM CAN BEST BE BROKEN UP? >> SO, I THINK WE NEED MORE TRAINING AND MORE PEOPLE TO UNDERSTAND WHAT IS AT STAKE HERE. I BELIEVE MANY JOURNALS HAVE CHANGED THEIR STANCE OVER TIME AND MANY ARE SYMPATHETIC, MANY FIELDS THAT HAVE ADOPTED REPLICATION MASSIVELY LIKE GENETICS WOULD NOT ALLOW YOU TO PUBLISH NOTICE THE REPLICATED, A SIGN QUA NON. WE NEED TRAINING. WHERE EDITORS WERE ASKED TO PERFORM AN INVESTIGATION ABOUT A THIRD OF THEM COULD NOT EVEN TELL A STUDY WAS A RANDOMIZED TRIAL. METHODOLOGIC EXPERTISE IS LACKING, WE NEED TO PUSH AND EDUCATE PEOPLE TO BE MORE KNOWLEDGEABLE HOW THINGS ARE WORKING IN THE WAY THAT WE DO SCIENCE. THE BEST WILL BE THE ONES THAT PUBLISH THE BEST SCIENCE EVENTUALLY AND THESE ARE SURVIVE, THINKING IN AN EVOLUTIONARY MODE. I DON'T WANT TO THINK THE BEST JOURNALS ARE GOING TO DISAPPEAR AND WE'LL JUST GET MORE AND MORE NOISE. >> THANK YOU VERY MUCH. >> WE NEED TO EXIT THE AUDITORIUM FOR ANOTHER GROUP COMING IN. WE HAVE PEOPLE ONLINE TO ASK QUESTIONS. JOIN US IN THE NIH LIBRARY TO ASK QUESTIONS THERE. THANK YOU. [APPLAUSE]