>> GOOD MORNING EVERYONE. I AM BETSY HUMPHREYS. I'M THE DEPUTY DIRECTOR FOR THE NATIONAL LIBRARY OF MEDICINE AND IT'S MY GREAT PLEASURE TO WELCOME YOU HERE TODAY. AS I THINK IT'S NOT A SECRET, 2011 IS THE 175th ANNIVERSARY AND THROUGHOUT THE WHOLE YEAR WE HAVE BEEN TAKING ADVANTAGE OF THAT MILESTONE TO REALLY INCREASE AWARENESS AND USE OF OUR PRODUCTS AND SERVICES AND ALSO TO DO A LITTLE CELEBRATION ABOUT WHAT, IN MY DEFINITELY BIASED SPAENS A VERY IMPRESSIVE RECORD OF INNOVATION AND POSITIVE IMPACT ON ACCESSED INFORMATION WORLDWIDE. TODAY, HOWEVER, THE GOAL IS TO LOOK AHEAD AND CONSIDER THE OPPORTUNITIES AND CHALLENGES THAT WILL SHAPE LMS ABILITY AND THE ABILITY OF OTHER BIOMEDICAL LIBRARIES TO CONTINUE TO IN INDICATE SKPRAIT TO MAKE POSITIVE CONTRIBUTIONS TO DELIVERING THE RIGHT SCIENTIFIC AND HEALTH INFORMATION TO THE RIGHT PLACE AT THE RIGHT TIME. OUR SPEAKER, DR. CLIFFORD LYNCH, IS EXECUTIVE DIRECTOR COALITION FOR NETWORK INFORMATION, AND I THINK HE'S PERHAPS UNIQUELY QUALIFIED TO HELP US CONSIDER SOME OF THE FORCES THAT WILL AFFECT LMN AND OTHER BIOMEDICAL LIBRARIES FOR THE NEXT LITTLE WHILE. FOR THOSE OF YOU WHO ARE UNFAMILIAR WITH IT, THE COALITION FOR NETWORK INFORMATION INCLUDES ABOUT 2,000 MEMBER ORGANIZATIONS AND LMN IS ONE OF THEM, CONCERNED WITH THE INTELLIGENT USES OF INFORMATION TECHNOLOGY AND NETWORKED INFORMATION TO ENHANCE SCHOLARSHIP AND THE AGENDA INCLUDES WORK IN DIGITAL PRESERVATION, DATA DATA-INTENSIVE SCHOLARSHIP, TEACHING, LEARNING, AND TECHNOLOGY, INFRASTRUCTURE, AND STANDARDS DEVELOPMENT. AND THOSE ARE ALL TOPICS THAT ARE NEAR AND DEAR TO NLM. BEFORE CLIFFORD JOINS JOINED C NI IN 1997, SPENT 18 YEARS AT THE UNIVERSITY OF CALIFORNIA OFFICE OF THE PRESIDENT AND THE LAST TEN AS DIRECTOR OF LIBRARY OPERATIONS -- AUTOMATION, AND HE AND I WERE REMINISCING ABOUT OUR INTERACTIONS WITH CLIFFORD WHEN THE UC SYSTEM MOUNTED THE MEDLINE TAPED FOR UC-WIDE ACCESS. THIS WAS QUITE AN ACHIEVEMENT AND VERY USEFUL ACROSS THE UC SYSTEM, AND OF COURSE, THIS PREDATED READY ACCESS TO THE INTERNET. WHICH OVERTOOK IT. CLIFFORD HAS A P.D. IN COMPUTER SCIENCE FROM UC BERKELEY, WHERE HE IS AN AJUNCT PROOF COURSE IN THE SCHOOL OF INFORMATION. HE IS THE PAST PRESIDENT OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, AND NUMEROUS OTHER BODIES. HE'S ACTUALLY JUST RECENTLY BECOME THE CO-CHAIR OF THE BOARD ON RESEARCH DATA AND INFORMATION AT THE NATIONAL ACADEMYS. I'M BEGINNING TO TAKE THE TIME TO COVER ALL THE ADVISORY BOARDS AND ALL THE AWARDS THAT HE HAS RECEIVED, BUT IT'S A VERY LONG LIST, INDEED. I THINK MOST IMPORTANT FOR US TODAY IS THAT CLIFFORD HAS A VERY BROAD PICTURE OF DEVELOPMENTS THAT WILL AFFECT OUR FUTURE, AND HE THINKS VERY DEEPLY ABOUT THESE THINGS. SO I'M DELIGHTED THAT HE'S HERE TODAY AND HE'S GOING TO SPEAK TO US ABOUT BIOMEDICAL LIBRARIES IN THE NEXT DECADES, OPEN, DIFFUSE AND VERY PERSONAL. DELIVERED -- CLIFFORD? [APPLAUSE] >> WELL, THANKS FOR THAT VERY KIND INTRODUCTION. IT'S ALWAYS A PRIVILEGE TO BE ABLE TO COME OUT AND VISIT WITH NLM AND IT'S PARTICULAR A DELIGHT FOR ME TO BE A PART OF THIS 175th-YEAR CELEBRATION EVEN THOUGH WE'RE ONLY MAKING IT IN AT THE VERY, VERY TAIL END OF THE YEAR. BUT WE'VE MADE IT NONETHELESS. I REALLY DO WANT TO START BY JUST TAKING A MINUTE TO REALLY CLEAT N -- CONGRATULATE NLM ON THOSE 175 YEARS AND PARTICULARLY ON THE LAST 20 OR 30 OF THEM, AS I'VE BEEN FAMILIAR WITH AT LEAST SOME OF THE WONDERFUL WORK THEY'VE DONE. IT'S REALLY VERY STRIKING HOW OFTEN NLM HAS GOTTEN IT RIGHT AND NOT ONLY GOTTEN IT RIGHT ABOUT IT THE FUTURE BUT BEEN ABLE TO DO IT EARLY ENOUGH TO LEAD AND TO INFLUENCE DEVELOPMENTS. AND THERE ARE NOT A LOT OF ORGANIZATIONS YOU CAN SAY THAT ABOUT. YOU CAN CERTAINLY POINT TO ORGANIZATIONS THAT HAD A SINGLE KIND OF PIVOTAL INSIGHT THAT CAUSED THEM TO—N RESET THEIR DIRECTION AT AN IMPORTANT MOMENT IN THEIR LIFE. BUT TO BE ABLE TO DO THIS WITH THE KIND OF CONSIST-EY DECADE AFTER DECADE THAT NLM SEEMS TO HAVE DONE IS REALLY VERY, VERY UNUSUAL, IN MY EXPERIENCE. AND IT WOULD BE VERY HELPFUL IF SOMEBODY DID A GOOD STUDY OF JUST HOW THEY'VE MANAGED TO PULL THAT OFF, BECAUSE I THINK THAT IT WOULD POTENTIALLY HELP A LOT OF OTHER ORGANIZATIONS WHO YOU KNOW, ARE DEALING IN VERY TRUSHLET AND CHANGING -- TURBULENT AND CHANGING TIMES AS THEY TRY AND NOT ONLY EVOLVE BUT LEAD THROUGH THOSE TIMES. CERTAINLY, AS PART OF MY THINKING ABOUT THIS TALK, WHICH IS GOING TO DEAL WITH SORT OF THE PRESENT AND THE NEXT MAYBE 10 TO 20 YEARS, I'VE TRIED TO THINK A BIT ABOUT WHAT THIS MEANS SPECIFICALLY FOR NLM AND WHERE IT MIGHT TAKE NLM AND YOU KNOW, IT'S A LITTLE HARD BECAUSE WHEN YOU LOOK AT MANY OF THE TRENDS IN MOTION, NLM IS ACTUALLY ALREADY THERE AND WELL-POSITIONED SO YOU HAVE TO REACH A BIT OUT INTO THE FUTURE TO REALLY FIND PLACES WHERE -- AND DO THAT IN A SPECULATEIVE WAY TO FIND PLACES WHERE IT MIGHT REALLY CHANGE THEIR TRAJECTORY FOR NLM'S STRATEGY. SO I SHOULD ALSO MAKE A FEW DISCLAIMERS HERE. AS BETSY INDICATED, I'M NOT A PHYSICIAN. I'M NOT A BIOLOGIST. I AM A COMPUTER AND INFORMATION SCIENTIST BY TRAINING. I HAVE HAD VERY LIMITED INTERACTION WITH THE WORLDS OF MEDICINE, THE WORLDS OF BIOLOGY, ALTHOUGH I HAD SPENT A LOT OF TIME TO GO-LOOKING AT SCHOLARLY COMMUNICATION AND INFORMATION ORGANIZATION AT INTERACTIONS BETWEEN THE BROADER PUBLIC AND VARIOUS FORMS OF SCHOLARLY INFORMATION. AND THAT KIND OF COLLECTION OF IGNORANCE GENERALITY AND EXPERIENCE YOU KNOW, PROBABLY QUALIFIES ME TO DO JUST THE KIND OF IRRESPONSIBLE SPECULATION THAT YOU'RE IN FOR TODAY. SO WHERE ARE WE NOW? I THINK THAT YOU CAN READILY SEE THAT BIOMEDICAL LIBRARIES ARE A PART OF THIS WHOLE MOVE TO DATA-INTENSIVE SCIENCE, THAT WE'VE SEEN ENORMOUS EVOLUTION IN THE PORTFOLIO OF RESPONSIBILITIES OF THOSE LIBRARIES AND THE STRATEGIES FOR DELIVERING IT. YOU CAN SIT THESE PRETTY COMFORTABLY IN SOME WAYS INTO THE BROADER CONTEXT OF WHAT'S HAPPENED TO RESEARCH LIBRARIES THAT SUPPORT SCHOLARSHIP ACROSS ALL FIELDS, PARTICULARLY IN HERE AND NOW OF DATA AND COMP PUT'SALLY INTENSIVE SCHOLARSHIP. THERE ARE SOME THINGS THAT ARE A LITTLE DIFFERENT IN THE BIOMEDICAL FIELD. ONE OF THEM INTERESTINGLY IS THE PRESENCE OF THE NATIONAL LIBRARY OF MEDICINE. IN THIS COUNTRY WE REALLY DON'T HAVE ANYTHING ANALGUS FOR OTHER AREAS OF SCHOLARSHIP WITH THE EXCEPTION OF AGRICULTURE AND THERE ARE SOME SPECIAL THINGS ABOUT THAT, TOO. SO IT'S QUITE FASCINATING TO ME TO WATCH, FOR EXAMPLE, THE PORTFOLIO OF SCIENCES THAT ARE PRIMARILY SUPPORTED BY THE NATIONAL SCIENCE FOUNDATION AND THE DEPARTMENT OF ENERGY STRUGGLE WITH HOW TO ORGANIZE VARIOUS KINDS OF INSTITUTIONAL PRACTICES AND INFRASTRUCTURE TO DEAL WITH THAT INTENSIVE SCIENCE FROM A STARTING POINT WHERE THE INSTITUTIONS IN PLACE ARE VERY SCARCE AND VERY FEW. AND TO COMPARE THAT TO WHAT'S HAPPENED IN THE LIFE SCIENCES WHERE YOU HAVE NLM AND NCPI AS VERY STRONG PRESENCES AND KIND OF NATURAL FOCAL POINTS FOR AGGREGATING AND CURATING A VAST RANGE OF INFORMATION.bO SO IT'S REALLY QUITE INTERESTING TO LOOK AT THE DISTINCTIONS THERE AS WE WATCH THE RESPONSE BUILD TO THE REQUIREMENTS OF DATA DATA-INTENSIVE SCIENCE. SO IF YOU LOOK RIGHT NOW, YOU SEE IN MANY FIELDS, AS I SAY, A LOT OF DISCUSSION ABOUT WHO SHOULD BE RESPONSIBLE FOR DATA? HOW ARE DATA ARCHIVES FUNDED? WHAT ARE THE ASSUMPTIONS ABOUT SHARING DATA? WHEN CAN IT BE SHARED? HOW IS IT SHARED? HOW IS CREDIT GIVEN FOR DATA WHEN DATA IS REUSED BY SECONDARY STUDIES? THESE ISSUES ARE GOING ON IN EVERY DISCIPLINE I CAN THINK OF EVEN BEYOND THE SCIENCES, AS WE SEE THE EMERGEENCE OF NOT JUST SOCIAL SCIENCES THAT ARE DATA-INTENSIVE BUT SOME REALLY CREATIVE WORK IN THE DIGITAL HUMANITIES. WE'RE ACTUALLY, I THINK, FACING SOME DIFFERENT QUESTIONS IN THE LIFE SCIENCES ABOUT THINGS LIKE DEGREE OF CENTRALIZATION, ABOUT WHETHER WE NEED VERY, VERY WIDELY DISTRIBUTED SORT OF SAFETY NETS FOR DATA OR WHETHER THAT CAN BE DONE THROUGH A RELATIVELY SMALL NUMBER OF SPECIALIZED INSTITUTIONS THAT ARE SUPPORTING THE ENTIRE COMMUNITY. AND I THINK THAT'S A VERY INTERESTING DIFFERENCE. IN FACT, WE SEE TENSIONS AROUND CENTRALIZATION, AND I WOULD SAY PUB NET IS ONE OF THESE KINDS OF CENTRALIZING ACTIVITIES IN THE LIFE SCIENCES THAT ARE REALLY COMPLETELY DIFFERENT FROM WHAT'S GOING ON IN A LOT OF THE OTHER DISCIPLINES. I'LL SIMPLY NOTE THAT, BECAUSE I THINK THAT THE WAY RESPONSIBILITIES END UP DISTRIBUTED IN THE LIFE SCIENCES AMONGST DATA ARCHIVES, RESEARCH LIBRARIES, LIBRARIES THAT ARE OR INFORMATICS OPERATIONS THAT ARE ASSOCIATED WITH HEALTHCARE DELIVERY AND NATIONAL INSTITUTIONS LIKE THE NATIONAL LIBRARY OF MEDICINE. I THINK THAT THAT DISTRIBUTION MAY BE QUITE DIFFERENT, AND GOING OUT TEN YEARS IN THE LIFE SCIENCES, THAN IT -- THEN IT ENDS UP BEING LET'S SAY THE GEOLOGICAL OR ENVIRONMENTAL SCIENCES OR SOME OF THE PHYSICAL SCIENCES ENGINEERING. I THINK YOU REALLY MAY SEE SOME DIFFERENCES THERE. THE OTHER THING THAT I WILL SUGGEST THAT I'M GOING TO GET INTO THIS IN A LOT MORE DETAIL SHORTLY, BUT I JUST WANT TO KIND OF PUT IT OUT IS THAT WE'RE GOING TO SEE A GREATER DISTINCTION BETWEEN INFORMATION DELIVERY SYSTEMS, INFORMATION MANAGEMENT INSTITUTIONS THAT DEAL WITH BIOLOGIST -- BIOLOGY BROADLY AND THOSE THAT ARE SPECIFIC TO MEDICINE AND HEALTHCARE. RIGHT NOW I THINK THERE'S A FAIR AMOUNT OF COMMONALITY THERE. PEOPLE REFER KIND OF CASUALLY TO BIOMEDICAL LIBRARIES, AND IN FACT, I THINK THAT THERE IS GOING TO BE A GREATER DWERJS HERE DFB DIVE ERGENCE. WHAT'S GOING TO HAPPEN IS YOU ARE GOING TO HAVE THIS SORT OF BASIS FOR BIOLOGY, AND YOU ARE GOING TO HAVE ENORMOUS NUMBERS OF PEOPLE STUDYING VARIOUS KINDS OF BIOLOGY IN ITS OWN RIGHT, AND THAT ENTIRE BASIS, BIOLOGY, OF COURSE, INFORMS AND SUPPORTS RESEARCH INTO MEDICINE, BUT RESEARCH INTO MEDICINE IS GOING TO EVOLVE A SET OF ISSUES THAT ARE QUITE SEPARATE, QUITE UNIQUE, AND QUITE DISTINCT FROM THE FUNDAMENTAL ISSUES OF THAT BASE OF BIOLOGICAL KNOWLEDGE, AND I THINK THAT'S GOING TO BE A DEVELOPMENT THAT'S GOING TO GET MUCH MORE OBVIOUS OVER THE NEXT DECADE OR SO. AS I SAY, I'LL COME BACK TO THERE. I WANT TO LOOK FIRST AT THE SORT OF GENERAL CASE OF BIOLOGY, THOUGH, AND JUST NOTE A FEW THINGS THAT ARE HAPPENING. SO I'VE ALREADY TALKED ABOUT THE RISE OF DATA-INTENSIVE BIOLOGY, AND I CAN'T OVEREMPHASIZE THAT ENOUGH. THE VARIOUS KINDS OF CENSURES, THE IMAGING SYSTEMS, THE SEQUENCING SYSTEMS THAT ARE COMING INTO LARGE-SCALE USE TODAY, THE MASSIVE COMBINE TORIAL CHEMISTRY AND PARALLEL TESTING SYSTEMS. THE AMOUNT OF DATA THAT THESE THINGS GENERATE IS JUST STAGGERING, AND WE DON'T DO A VERY GOOD JOB OF MANAGING ALL OF IT. IN PARTICULAR, THERE IS A TREMENDOUS AMOUNT OF NEGATIVE DATA THAT COMES OUT OF THESE THAT'S JUST THROWN AWAY, WHICH COULD BE USED A LOT MORE STRATEGICICALLY, I THINK. WE ARE STARTING TO RUN INTO SITUATIONS NOW INTERESTINGLY WHERE IT'S REALLY GETTING TO BE CHEAPER TO REGENERATE DATA THAN IT IS TO SAVE IT. CERTAINLY WE'RE JUST ON THE EDGE OF THAT IN TERMS OF SEQUENCING, I WOULD SAY. AT THE SAME TIME WE ARE SEEING REALLY CONSIDERATIONS THAT ARE NOT JUST ECONOMICAL BUT ALSO EXTEND INTO ETHICAL DIMENSIONS ABOUT KEEPING DATA RATHER THAN RECREATING IT. FOR EXAMPLE, I THINK THAT WE'RE BECOMING MUCH MORE DISCIPLINED ABOUT CLINICAL TRIALS AND ABOUT THE NEED TO MAKE THAT DATA PUBLIC AND TO SAVE IT AND NOT TO REPEAT THESE KINDS OF ACTIVITIES OR YOU KNOW, SLIGHTLY TWEAK VERSIONS OF THEM UNNECESSARILY NOT JUST BECAUSE OF FINANCIAL COSTS BUT BECAUSE OF HUMAN COST. IT'S VERYGop INTERESTING TO ME TO READ SOME OF THE DISCUSSIONS, WHICH I'VE CERTAINLY BEEN EXPECTING TO APPEAR THAT SHOWED UP IN THE PRESS VERY RECENTLY ABOUT NOT USING GREAT APES AS EXPERIMENTAL SUBJECTS IN MOST CASES. I THINK THAT YOU'RE GOING TO SEE MORE AND MORE INTEREST IN HOW DO WE MINIMIZE THE USE OF THE ANIMALS BROADLY IN SCIENCE? AND HOW CAN WE GET THE MAXIMUM USE OUT OF THE DATA THAT'S PRODUCED THOSE WAYS. SO YOU'RE CLEARLY GOING TO SEE THOSE KINDS OF TRENDS CONTINUING. I THINK THAT ONE OF THE SORT OF BIG MARCO DEVELOPMENTS THAT'S VERY INTERESTING NOW AND IT'S CERTAINLY ONE THAT NLM HAS BEEN TRACKING CLOSELY FOR A LONG TIME IS THE RETHINKING OF SCIENTIFIC ARTICLES. AND IT'S VERY INTERESTING TO LOOK AT WHAT HAPPENED HERE. WE WENT THROUGH A PERIOD FROM ROUGHLY LET'S SAY 1990 TO ABOUT 2005, WHERE WE TRANSITIONED THE SCIENTIFIC JOURNAL FROM PRINT TO ELECTRONIC FORM, WHERE THE VERSIONS OF RECORD BECAME THE ELECTRONIC VERSIONS, WHERE PRINT WAS LARGELY ELIMINATED, AND USERS NOW ACCESSED THE ARTICLES AND THESE JOURNALS ON PUBLISHER PUBLISHER-BASED SERVERS AND WE'VE BUILT UP A SET OF STRATEGIES COLLECTIVELY TO TRY AND ENSURE THE PRESERVATION OF THESE JOURNALS BEYOND THE LIFE OF THE PUBLISHERS THAT HOST THEM. BUT IF YOU ACTUALLY LOOK AT THE ARTICLES AND THE JOURNALS, A SCHOLAR FROM 1920 WOULD BE PERFECTLY FAMILIAR TO THE FORMAT AND SIMILAR RESULTS MIGHT BE NEWS. BUT THEY'D UNDERSTAND THE ARTIFACT AND THE GENRE THEY ARE DEALING WITH. THEY MIGHT FIND A MORE SLIGHTLY GENEROUS USE OF COLOR GRAPHICS IN SOME JOURNALS BUT THAT'S ABOUT IT. THE ONE REALLY COOL THING WOULD BE THE ABILITY TO CLICK ON CITATIONS AND NAVIGATE OFF TO CITED PAPERS. BUT REALLY THE ESSENCE OF THAT JOURNAL ARTICLE HASN'T CHANGED, AND WHAT I SEE HAPPENING NOW IS A LOT OF FORCES SORT OF LINING TO RENEGOTIATE THE CONNECTIONS BETWEEN JOURNAL ARTICLES AND THE DATA THEY REPORT. SO THAT YOU'LL BE ABLE IN SOME WAYS TO USE THE JOURNAL ARTICLE AS A PORTAL INTO THAT DATA. YOU WILL CERTAINLY BE ABLE TO FOLLOW CONNECTIONS BETWEEN JOURNAL ARTICLES AND THEIR UNDERLYING DATA AND WE'RE SEEING A LOT OF PROT TYPES IN THAT AREA. STANDARDS HAVEN'T QUITE SETTLED DOWN, BUT I THINK THAT IF YOU COULD GO OUT ANOTHER 10 TO 15 YEARS, JOURNAL ARTICLES WILL NO LONGER BE FAMILIAR AS SIMPLY ELECTRONIC PAPER, IF WE COULD TRANSPORT A SCIENTIST FROM 1920 INTO THE WORLD OF SAY 2030. THESE WOULD ACTUALLY BE NEW AND CONSIDERABLY MORE INTERACTIVE THINGS THAT WOULD REALLY BE QUITE STRIKING. I THINK THAT ANOTHER THING THAT IS CLEARLY EMERGING IS THE RECOGNITION THAT LITERATURE, SCIENTIFIC LITERATURE OR IF YOU WILL ALLOW ME THE TERM THIS SORT OF SCIENTIFIC KNOWLEDGE BASE, WHICH ENCOMPASSS NOT JUST THE LITERATURE BUT THE UNDERLYING DATA SETS, THAT THIS IS NOT JUST THERE FOR HUMAN READERS, THAT IN FACT, THIS IS AN OBJECTIVE ONGOING COMP PUTATION. THAT WE'RE GOING TO MINE AND COMPUTE UPON THESE KINDS OF COLLECTIONS OF TEXT AND DATA AND THAT THEY WILL SERVE AUDIENCES THAT INCLUDE COMPUTER PROGRAMS, AS WELL AS HUMAN BEINGS. THAT'S REALLY CRITICAL TO THINK AS BECAUSE WE NEED THEY THINK ABOUT HOW WE RESTRUCTURE THE ARTICLE, ABOUT HOW BEST TO ACCOMMODATE THAT SORT OF COMPUTATION. WE ALSO, I THINK, NEED TO RECOGNIZE THAT THERE IS GOING TO BE A GREAT SORT OF FLEXIBILITY OF INTEGRATION NEEDED. ONE CAN IMAGINE, FOR EXAMPLE, A VERY TYPICAL USE SCENARIO BEING THE COMPUTATION OF SOMETHING THAT COMBINES THE GENERAL PUBLIC KNOWLEDGE BASE WITH A SET OF PROVISIONAL OR AS YET UNRELEASED INFORMATION THAT HAS BEEN COLLECTED UP BY A LAB OR RESEARCH GROUP AS THEY TRY AND UNDERSTAND, WELL, WHERE DO OUR FINDINGS LEAD? HOW DOES SOMETHING THAT WE'VE JUST DISCOVERED AFFECT THE BIGGER PICTURE? THEY MAY WANT TO DO THAT BEFORE RELEASING THOSE FINDINGS. CERTAINLY WE WILL SEE ORGANIZATIONS THAT SIT ON THE BOUNDARY BETWEEN PUBLIC KNOWLEDGE BASES AND COLLECTIONS OF PROPRIETYARY KNOWLEDGE SUCH AS PHARMACEUTICAL COMPANIES WANTING TO DO EXACTLY THOSE SORTS OF COMPUTATIONS. SO WE'VE GOT A LOT OF ISSUES ABOUT SORT OF DIFFERENTTIAL COMPUTATION AND UNDERSTANDING HOW THE ADDITION OF CRENNEL -- INCREMENTAL KNOWLEDGE AFFECTS COMPUTATIONAL RESULTS. WE ALSO HAVE SOMETHING ELSE GOING ON, WHICH WE DON'T LIKE TO TALK ABOUT VERY MUCH. NOT ONLY HAVE WE BUILT UP THIS ARSENAL OF EXPERIMENTAL APPARATUS AND OBSERVATIONAL APPARATUS THAT IS NOW FEEDING OUR DATABASES FASTER THAN ANYONE CAN FOLLOW IT, AND THIS IS HAPPENING IN MEDICINE THE SAME WAY IT'S HAPPENING IN ASTRONOMY. IN ASTRONOMY NOW THEY'VE MOVED TO THESE SYSTEMIC SKY SURVEYS. YOU JUST RUN A TELESCOPE ACROSS THE SKY AT VARIOUS WAVE LENGTHS, DATABASE IT. YOU RUN THE INFORMATION THROUGH VARIOUS KINDS OF FILTERS SO YOU ARE INTERESTED MAYBE IN THINGS THAT WEREN'T THERE LAST NIGHT THAT ARE THERE TONIGHT THAT YOU DIDN'T PREDICT. BUT BY AND LARGE THOSE OBSERVATIONS GO UNSEEN BY HUMANS, AND THEY ARE RELYING ON A MIXTURE OF FILTERING ALGARITHMS AND AMATEUR ASTRONOMERS TO SPOT MOST OF THE INTERESTING NEW STUFF. AS THE OBSERVATIONAL APPARATUS GROWS IN BIOLOGY, WE WILL SEE MORE AND MORE OF EXACTLY THIS KIND OF PHENOMENON AND WE HAVE TOTO BECAUSE WE CAN'T KEEP UP WITH IT. WE CAN'T KEEP UP WITH THE JOURNALS EITHER. I HAVE HEARD NUMBERS QUOTED AND ACTUALLY PROBABLY SOMEBODY HERE HAS A BETTER NUMBER, BUT I'M NOT SURE THE PRECISE NUMBER MATTERS, WHETHER IT'S ONE JOURNAL ARTICLE IN THE LIFE SCIENCES EVERY MINUTE OR EVERY TWO MINUTES. EVERY HOUR, EVERY DAY, 24 HOURS A DAY, 365 DAYS A YEAR. THE POINT IS IT'S WAY TOO MANY FOR HUMAN BEINGS TO STAY ON TOP OF. SO IT SEEMS ABUNDANTLY CLEAR THAT THE WHOLE WAY WE RELATE TO THIS KNOWLEDGE BASE IS GOING TO BE CHANGING SIGNIFICANTLY AND IS GOING TO INVOLVE A CONSIDERABLE AMOUNT OF ADDITIONAL COMPUTER MEDIATION. SO THAT'S KIND OF WHERE I SEE THE BASIC BIOLOGY SITUATION GOING. WHAT ABOUT MEDICINE? WHAT ABOUT MEDICAL LIBRARIES? WHAT'S GOING TO HAPPEN HERE? WELL, I THINK WHAT HAPPENS HERE, WHAT REALLY IS THE SORT OF LANDSCAPE TROPICAL STORM -- TRANSFORMER HERE IS THAT ALONG WITH LITERATURE AND ALL THE BIOLOGY DATA WE HAVE DATA RECORDS ABOUT INDIVIDUAL HUMAN BEINGS, AND YOU HAVE THAT FROM AT LEAST TWO PERSPECTIVES. YOU HAVE RECORDS THAT DOCUMENT THEIR GENETICS, AND YOU HAVE RECORDS THAT DOCUMENT THEIR PHYSICAL CONDITION, AS THAT EVOLVES OVER TIME OR DETERIORATES OVER TIME AND THE VARIOUS INTERVENTIONS AND PERHAPS THE ENVIRONMENTAL INFLUENCES AND LIFE CHOICES THAT SHAPE THAT. SO REALLY WHAT YOU DO, AS YOU MOVE FROM BIOLOGY TO MEDICINE, IS YOU EXPAND THAT KNOWLEDGE BASE WITH THIS INCREDIBLE, INCREDIBLY LARGE POTENTIAL COLLECTION OF INFORMATION ON INDIVIDUALS, AND YOU START LINKING THAT TO EVERYTHING. NOW, WE HEAR A LOT OF TALK ABOUT THIS AND WE CERTAINLY HEAR ABOUT THE COMING AGE OF PERSONALIZED MEDICINE. WE HEAR, I THINK, WITH SOME TRUTH ABOUT THE PROSPECT THAT IT'S GOING TO BE CHEAP ENOUGH TO REALLY START SEQUENCING EVERYBODY'S GENOME AS OPPOSED TO JUST DOING A LITTLE BIT OF ISOLATED GENEOTYPING OF SPECIFIC VARIATIONS. THIS ISN'T JUST SCIENTIFIC DETERMINISM, THOUGH. I THINK WE NEED TO REALIZE THAT THERE IS A WHOLE SET OF ECONOMIC AND REGULATORY AND SOCIAL ISSUES WHIRLING AROUND THERE AND WE OVERLOOK WHAT'S GOING ON THERE VERY MUCH AT OUR PERIL. SO LET'S IMAGINE WE SEE THIS KNOWLEDGE BASE STARTING TO TAKE SHAPE AND PROBABLY IT STARTS TO TAKE SHAPE IN A SERIOUS WAY THIS DECADE. AS WE INTRODUCE AND LINK ALL OF THESE MEDICAL RECORDS, ALL OF THIS GENOMEIC DATA AND ALL OF WHAT IS HAPPENING IN BIOLOGY. ONE OF THE THINGS WE SEE IS SOME OF THE LARGEST MACHINE MACHINE-LEARNING AND STATISTICICAL ANALYST ACTIVITIES EVER CONCEIVED OF HAPPENING AGAINST THESE DATABASES, AND WE ACTUALLY SEE IDEAS EMERGE IN A SERIOUS WAY ABOUT THE RESOLVING POWER OF BIG DATA, THAT IF YOU CAN GET A LARGE ENOUGH UNIVERSE OF DATA, YOU CAN DO STATISTICICAL ANALYSIS, PATTERN RECOGNITION, CLASSIFICATION MACHINE LEARNING ALGARITHMS THAT BEGIN TO HELP YOU SORT OUT VERY, VERY COMPLEX, MULTIDIMENSIONAL INTERDEPENDENCIES, BETWEEN FOR EXAMPLE, LARGE ASSORTMENTS OF GENES, ENVIRONMENTAL FACTORS, MEDICAL CONDITIONS -- THOSE KINDS OF THINGS. SO SCALE BECOMES A REALLY BIG ISSUE. ALL OF A SUDDEN THE QUESTION OF HOW WE CAN FIND WAYS TO POOL AND AMASS DATA, PARTICULARLY MEDICAL DATA BECOMES PART OF THE CONSTRUCTION OF OUR INSTRUMENTS FOR ADVANCING MEDICINE, IN ESSENCE. THIS IS A VERY STRONG MOTIVATION THAT I THINK PEOPLE ARE ONLY NOW REALLY UNDERSTANDING. AND FOLKS ARE GOING TO START LOOKING FOR WAYS TO GREW GROW THAT POOL, LOTS OF DIFFERENT WAYS TO GROW THAT POOL. THERE ARE SOME VERY CREATIVE STUFF GOING ON ALREADY IN REACHING OUT ACROSS PRIVATE AND PUBLIC SECTOR BOUNDARIES TO POOL CERTAIN KINDS OF DATA. I'VE BEEN WATCHING THE WORK THAT THE SAGE BIOINFORMATICS COMMENTS IS DONE, WHICH I THINK HAS BEEN VERY CREATIVE IN THINKING ABOUT HOW ONE MIGHT, FOR EXAMPLE, RECYCLE THE CONTROL ARMS OF CLINICICAL TRIALS TO ENRICH OUR RECORDS BASE HERE. WHO HOLDS THESE RECORDS? HOW THEY AMASS THEM? HOW THEY GET SHARED AND COMPUTERED UPON AND WHAT HAPPENS TO THE RESULTS? THESE CAN BE VERY CENTRAL ISSUES AND IT'S VERY INTERESTING BECAUSE THESE RECORDS DON'T COME OUT OF PLACES THAT ARE FAMILIAR TO LIBRARIES IN SOME SENSE BY AND LARGE. THEY COME OUT OF IT OPERATIONS. THEY COME OUT OF RECORDS MANAGEMENT. THEY COME OUT OF THE PROCESSES OF DELIVERING MEDICAL CARE OR THE PROCESSES OF DIAGNOSIS AND CHARACTERIZATION OF HUMANS. YET I THINK, I REALLY THINK THAT MEDICAL LIBRARIES MAY WIND UP WITH A CONSIDERABLY MORE SUBSTANTIAL ROLE IN MANAGING THESE RECORDS AND HELPING TO INTEGRATE THEM WITH THE REST OF THE KNOWLEDGE BASE THAN PEOPLE MIGHT EXPECT RIGHT NOW. I WANT TO MAKE A FEW OBSERVATIONS ABOUT THE KIND OF PRAGMATICS OF DEALING WITH MEDICAL RECORDS RIGHT NOW FROM A POLICY POINT OF VIEW. I WAS SENDING YOU K -- I WAS IN THE UK TWO WEEKS AGO.p DAVID CAMERON, THE PRIME MINISTER, WAS MAKING ANNOUNCEMENTS ON THE TELEVISION WHEN I WAS THERE THAT EVERYBODY IN THE U K NATIONAL HEALTH SERVICE WAS GOING TO BE A RESEARCH SUBJECT, UNLESS THEY OPTED OUT, THAT IT WAS HIS INTENTION TO BASICALLY TAKE THE MASS OF RECORDS THAT WERE HELD BY THE NATIONAL HEALTH SERVICE AND REMEMBER IN THE UK THINGS ARE MUCH MORE CENTRALIZED. THERE IS A NATIONAL HEALTH SERVICE AND THEREFORE THERE IS A REALLY BIG POOL OF RECORDS THAT'S KIND OF VIEWED AS A NATIONAL ASSET. HE WAS GOING TO MAKE THOSE RECORDS AVAILABLE, UNLESS INDIVIDUALS OPTED OUT, TO VARIOUS SORTS OF RESEARCH ACTIVITIES IN MEDICINE AND BIOMEDICAL SCIENCES. VERY INTERESTING. IT WAS HARD TO GAUGE THE PUBLIC REACTION THERE. I THINK THAT YOU KNOW, THERE IS A GENERAL SORT OF SENSE IN THE UK THAT THEY ARE LOOKING FOR ALMOST ANYTHING TO CUT COSTS AND GENERATES REVENUE AND CERTAINLY, AS WITH EVERYWHERE ELSE, HEALTHCARE IS A SUBSTANTIAL COST CENTER. DOES THIS SOUND AT ALL FAMILIAR? I THINK WE'RE DOING A LOT OF THIS HERE. NOW, HERE IN THE STATES, OF COURSE, WE DON'T HAVE MEDICAL RECORDS ALL NEATLY STASHED IN THE NATIONAL HEALTH SERVICE. WE HAVE THEM SCATTERED AROUND THROUGH VARIOUS KINDS OF HOSPITAL SYSTEMS, SOME OF WHICH ARE MINING THEM INTERNALLY FOR VARIOUS REASONS, AND WE ACTUALLY HAVE A CONSIDERABLE LEVEL, I WOULD SAY, OF DISTRUST ABOUT THE WAY IN WHICH MEDICAL RECORDS CAN BE USED AGAINST YOU. I WOULD SUSPECT, AND I WOULD SAY THIS WITHOUT HAVING TRIED TO REALLY TRACK THIS DOWN IN PUBLIC OPINION POLLS AND UNDERSTANDING FULLY THAT THERE ARE LAWS AGAINST SOME OF THESE ACTIVITIES, THAT MOST PEOPLE IN THIS COUNTRY, PROBABLY ASSUME THAT MEDICAL RECORDS ARE AS MUCH OF A THREAT AS ANYTHING ELSE BECAUSE THEY ARE A METHOD BY WHICH INSURERS CANNOT COVER YOU, EMPLOYERS CAN DISCRIMINATE AGAINST YOU AND THINGS OF THAT NATURE, THAT THEY ARE -- THAT THAT INFORMATION IS VERY MUCH SUBJECT TO MISUSE IN WAYS THAT CAN BE VERY DAMAGING. I THINK PEOPLE HERE AT LEAST IN THIS COUNTRY, WORRY ABOUT THAT A TREMENDOUS AMOUNT. AND THAT THAT IS -- THEY DON'T HAVE A PROBLEM WITH MEDICAL RESEARCH, BUT THEY HAVE A PROBLEM WITH THE OTHER PURPOSES TO WHICH THIS DATA CAN BE PUT. AND WE'RE GOING TO FIND THAT RESOLVING THAT SET OF CONCERNS IS GOING TO TURN OUT TO BE VERY ESSENTIAL TO PROGRESS IN MEDICINE, AS WE LOOK TO THE NEXT COUPLE OF DECADES. I'D NOTE THE DEVELOPMENT OF SOMETHING ELSE THAT'S REALLY INTERESTING THAT IS TAKING PLACE IN OTHER COUNTRIES AS WELL AS IN THE STATES BUT SEEMS TO HAVE A PARTICULARLY STRONG FOOTHOLD IN THE STATES AND THAT'S THE NOTION OF CONSUMER GENETICS, IF YOU WILL. COMPANIES LIKE 23 AND ME AND THERE ARE A BUNCH OF THEM, WILL BE MORE THAN HAPPY RIGHT NOW FOR A COUPLE OF HUNDRED BUCKS CHARGED TO YOUR CREDIT CARD TO DO SOME ROBLE REASONABLY EXTENSIVE GENEOTYPING ON YOU. THEY WILL SEQUENCE YOU AS TECHNOLOGY GETS CAERP AND THREE OR FOUR THINGS THAT ARE REALLY INTERESTING TO NOTE ABOUT THESE DEVELOPMENTS. FIRST OFF, THIS IS SOMETHING THAT HAPPENS SORT OF OUTSIDE THE MEDICAL SYSTEM. IT'S BETWEEN AN INDIVIDUAL AND ONE OF THESE COMPANIES. THE STUFF DOESN'T END UP IN YOUR MEDICAL RECORDS UNLESS YOU PUT IT THERE, AND WE'LL COME BACK TO THAT IN A MINUTE. BUT THE FACT THAT IT DOESN'T END UP THERE BY DEFAULT IS VERY POWERFUL. THERE ARE A NUMBER OF PEOPLE WHO ARE VERY UNHAPPY WITH THESE SERVICES AND HAVE TRIED TO SHUT THEM DOWN, NOTABLELY IN CALIFORNIA UNDER THE PRESUMPTION THAT PEOPLE SHOULDN'T BE ENTITLED TO THIS KIND OF INFORMATION, ESPECIALLY NOT OUTSIDE SYSTEM. I THINK THAT THAT IS A POSITION THAT WILL END UP HAVING ZERO TRACTION, BUT I THINK THAT NONETHELESS WE MAY HAVE TO HAVE A POLITICAL FIGHT ABOUT IT BECAUSE THERE ARE SOME LOBBIES AND VESTED INTERESTS THAT MAY THINK OTHERWISE. BUT I THINK AGAIN IF YOU SORT OF LOOK AT PUBLIC OPINION THERE, IT IS SO DRAMATICALLY SKEWED THAT WAY THAT IN THE END IT'S GOING TO SETTLE OUT. NOW, WHILE MANY PEOPLE WHO DO THIS ARE PROBABLY GLAD TO HAVE THIS INFORMATION FOR THEIR OWN PURPOSES, ONE OF THE THINGS THAT WE'RE SEEING THAT'S REALLY, REALLY INTERESTING IS THAT THERE ARE GROUPS OF PEOPLE WHO ARE SHARING THIS DATA. THEY'RE EITHER CHAIRING SHARING IT IN CLOSED COMMUNITIES OR IN SOME CASES THEY'RE SHARING IT IN VERY PUBLIC WAYS. THERE ARE ACTUALLY PEOPLE WHO ARE SORT OF IN THE GREAT SPIRIT OF FACEBOOK AND TWITTER ARE SHARING THEIR GENOMICS, THEIR VITAL SIGNS READINGS, THEIR EATING HABITS, AND EVERYTHING ELSE ON THE NETWORK IN VAST QUANTITIES. SO YOU ACTUALLY ARE SEEING SOME MOVE TOWARDS SHARING OR PERHAPS OVERSHARING, DEPENDING ON YOUR POINT OF VIEW, THIS KIND OF INFORMATION TO BE HARVESTED. AND I THINK THESE KINDS OF THINGS ARE ALSO GOING TO COME INTO PLAY IN THIS DISCUSSION ABOUT MEDICAL RECORDS AND WHERE THEY FIT. WE'RE ALREADY SEEING SOME VERY, VERY INTERESTING DEVELOPMENTS HERE. ONE EXAMPLE IS WHERE YOU GET COMMUNITIES OF PEOPLE WHO ARE DEALING WITH A SPECIFIC, OFTEN FAIRLY RARE DISEASE, AND YOU START CONNECTING THEM UP WITH THESE KINDS OF SEQUENCING ACTIVITIES OR GENEOTYPING ACTIVITIES AND START BUILDING UP THESE DATA COLLECTIONS THAT ARE DRIVEN IN VARIOUS MIX BY PATIENTS AND RESEARCHERS, AND THE MIX VARIES A GREAT DEAL FROM SITUATION TO SITUATION, THAT BECOME PART OF THESE KNOWLEDGE BASES AND I THINK WE'RE GOING TO SEE A TREMENDOUS AMOUNT MORE OF THAT KIND OF THING GOING ON. RIGHT NOW THESE THINGS OFTEN HAVE VERY UNSTABLE HOSTING. THEY ARE YOU KNOW, ON PLACES LIKE GOOGLE GROUPS OR YAHOO MAILING LISTS, OR THIS SORT OF THING THAT DON'T NECESSARILY HAVE COMMITMENTS TO LONG TERM PRESERVATION. I THINK THAT THESE ARE GOING TO NEED TO BE ADOPTED, IN MANY CASES, AND THERE IS THERE IS GOING TO NEED TO BE HOW THOUGHT IS GIVEN INTO HOW THEY INTEGRATE IN THE BIG REUSABLE KNOWLEDGE BASE OF MEDICINE. SO I JUST WANT TO NOTE A FEW MORE IDEAS CONNECTED WITH THIS BEFORE WINDING THIS UP. WHEN WE THINK ABOUT THESE QUESTIONS ABOUT GENOMEIC SEQUENCING AND MEDICAL RECORDS AND HOW TO MAKE MORE OF THOSE AVAILABLE AS A PUBLIC GOOD, IF YOU WILL, AS PART OF THIS COLLECTIVE KNOWLEDGE BASE, I THINK THERE ARE A NUMBER OF THINGS THAT COME UP RATHER QUICKLY. ONE IS THIS QUESTION ABOUT LIVING PEOPLE, MAKING THEIR DATA AVAILABLE AS A CONTRIBUTION TO THE PUBLIC GOOD. THAT'S A VERY PROMISING AREA. THERE ARE PEOPLE WHO I THINK WILL DO THAT. THERE ARE A LOT MORE PEOPLE I THINK WHO WILL DO THAT IF THEY CAN GAIN CONFIDENCE THAT THIS DATA WON'T BE USED AGAINST THEM. THERE IS A QUESTION OF WHAT HAPPENS TO THE RECORDS OF DEAD PEOPLE? UNTIL NOW THIS REALLY HASN'T BEEN MUCH OF AN ISSUE. I MEAN, YES, THERE ARE SOME YOU KNOW, SORT OF LARGE-SCALE STUDIES, THAT LOOK AT COHORTS OF PEOPLE THROUGHOUT THEIR LIFE AND CONSIDER PEOPLE IN THOSE COHORTS WHO HAVE DIED. BUT ONE CAN ACTUALLY IMAGINE IN AN ERA OF GOOD ELECTRONIC MEDICAL RECORDS THAT THIS IS A DATABASE THAT GROWS AND GROWS AND GROWS AND THAT YOU CARE NOT JUST ABOUT LIVING PEOPLE FOR THE PURPOSES OF COMPUTATION FOR DISCOVERY BUT ALSO DEAD PEOPLE. SO WE HAVE THIS QUESTION OF WHO OWNS THE MEDICAL RECORDS OF DEAD PEOPLE AND UNDER WHAT CIRCUMSTANCES CAN THEY BE SHARED? UNDER WHAT CIRCUMSTANCES ARE THEY DESTROYED? WHERE DO THEY GO? WHO COLLECTS THESE? WHAT PART OF THE SORT OF KNOWLEDGE BASE OF MEDICINE ARE THESE GOING TO COMPRISE? AND ONE CAN IMAGINE THAT EVEN IF THERE IS TREMENDOUS HESITATION ABOUT MAKING MEDICAL RECORDS PUBLIC WHILE YOU'RE ALIVE, WHEN YOU'RE DEAD, IT DOESN'T MATTER THAT MUCH TO A LOT OF PEOPLE. COULD THIS BECOME ANOTHER BOX ON AN ORGAN DONOR CARD? THAT SAYS YES YOU KNOW, IF MY ORGANS CAN BE OF ANY USE, USE THEM AND BY THE WAY, IF MY MEDICAL RECORDS CAN BE OF ANY USE, ADD THEM TO THE DATABASE. THESE RECORDS REPRESENT A CONTRIBUTION TO COLLECTIVE KNOWLEDGE AND WELL-BEING IN A VERY SIMILAR KIND OF A WAY AND ONE CAN ABSOLUTELY IMAGINE THOSE KINDS OF IDEAS GAINING TRACTION IN THE PUBLIC SPHERE. WE HAVE SOME INTERESTING QUESTIONS ABOUT HOW THESE CONNECT UP TO OTHER AREAS. GENEALOGY BEING ONE. IT IS AMAZING THE NUMBER OF PEOPLE WHO ARE SERIOUSLY INTERESTED IN GENEALOGY AND THE GROWTH OF GENEIO LOGICAL DATABASES AND GENEOLOGICAL RESEARCH. THAT IS NOT ENTIRELY DISCONNECTED FROM SOME OF THESE STUDIES, ESPECIALLY AS WE THINK ABOUT GOING FORWARD. THE WHOLE WAY WE THINK ABOUT RECORDS OF PEOPLE, BE IT IN A BIOGRAPHY CONTEXT, A GENEALOGY CONTEXT, A MEDICAL CONTEXT -- I BELIEVE THAT THIS IS GOING TO BE FAIRLY SERIOUSLY REASSESSED AND RENEGOTIATED OVER THE NEXT 20 OR 30 YEARS, THE NEXT COUPLE OF GENERATIONS AND THAT WE'RE ALREADY SEEING LOTS OF LITTLE SIGNS AND HARBINGERS OF THIS. BUT THE STAKES ON THIS ARE GOING TO BE VERY SIGNIFICANT FOR MEDICAL RESEARCH. NOW, WE HAVE A FEW OTHER LITTLE PROBLEMS TO SORT OUT HERE. FOR EXAMPLE, WE HAVE AN ELABORATE SYSTEM WHICH GOVERNS THE ETHICAL PURSUIT OF MEDICAL EXPERIMENTATION THROUGH INSTITUTIONAL REVIEW BOARDS AND THINGS OF THAT NATURE. THEIR TOUCHUNS -- TOUCHSTONES FOR REASONS THAT ARE FAIRLY EASY TO UNDERSTAND HAVE BEEN ABOUT THINGS LIKE CONFORMED CONSENT WHERE YOU ARE FAIRLY CLEAR ABOUT WHY IS THIS DATA BEING COLLECTED AND FOR WHAT PURPOSE AND HOW WILL IT BE REUSED? UNFORTUNATELY, IN AN ERA OF MASSIVE DATABASES, TEXT MINING, COMPUTATIONAL DEDUCTION, YOU CAN'T ANSWER THESE QUESTIONS VERY EASILY. WHAT YOU'D REALLY LIKE TO SAY IS WE WELL-, WE'RE COLLECTING THIS DATA FOR THIS REASON AND BY THE WAY, WE'D LIKE TO DATABASE IT AND USE IT FOR ANYTHING ELSE WE CAN THINK OF BECAUSE WE'RE JUST LEARNING ALL THE THINGS WE CAN DO WITH REALLY BIG DATABASES. AND THERE IS A REAL TENSION OR CONTRADICTION THERE THAT I THINK DESPERATELY NEEDS SOME CAREFUL REEXAMINATION. WE HAVE AN ENORMOUS FRICTION EMERGING BETWEEN ALL OF THE IDEAS COMING OUT OF DATA DATA-INTENSIVE SCIENCE AND THIS SORT OF TRADITIONS THAT UNDERLIE THE PROTECTION OF HUMANS IN EXPERIMENTAL WORK AND THE ETHICS OF HUMAN EXPERIMENTATION. I DON'T KNOW HOW OR WHERE WE HAVE THAT CONVERSATION EXACTLY. I DO KNOW THAT SOME OF THESE -- SOME OF THE REGULATORY STUFF IN THIS AREA IS BEING REVIEWED AT THIS POINT, BUT MY SENSE IS THAT AT LEAST LARGE PARTS OF THIS ARE REALLY ABOUT CONTROLLING THE SPREAD OF THIS INTO SOCIAL SCIENCES RATHER THAN YOU KNOW, THE SORT OF CORE FOCUS HERE AND HOW THIS CONFLICTS WITH THE PROSPECTS OF DATA DATA-INTENSIVE SCIENCE. BUT IT'S CLEAR THAT WE'RE GOING TO NEED TO HAVE THESE CONVERSATIONS SOMEHOW IF THE OPPORTUNITIES HERE ARE GOING TO BE REALIZED. AND PART OF THIS WILL BE IN A SENSE DECIDING WHAT OPPORTUNITIES WE WANT TO REALIZE AND WHICH ONES WE'RE JUST UNCOMFORTABLE WITH. ANOTHER PIECE OF THIS THAT I THINK WE CHEERFULLY FORGET IS THAT THE UNITED STATES ISN'T GOING TO DO THIS BY ITSELF. THERE IS A WHOLE WORLD OUT THERE, AND OTHER COUNTRIES -- WE TALKED ABOUT THE UK -- ARE ALSO AMASSING THESE KINDS OF KNOWLEDGE BASES. NOW, WHAT'S REALLY INTERESTING HERE IS THAT FUNDAMENTAL, BIOLOGICAL KNOWLEDGE, I THINK WITH THE EXCEPTION OF A FEW BITS ABOUT NASTY VIRUSES, IS REALLY VIEWED AS A GLOBAL RESOURCE. AND THE NOTION OF SCIENCE NOT HAVING OR FREELY CROSSING INTERNATIONAL BOUNDARIES IS VERY REAL AND HAS BEEN RESPECTED IN THE LITERATURE AND THE WAY SCIENCE HAS COMMUNICATED. NOW WE SEE THE PROSPECT OF SCIENTISTS IN DIVINATIONS, SEEING DIFFERENT KNOWLEDGE BASES THAT AREN'T CONSISTENT. WE SEE THE KIND OF INTERESTING POSSIBILITIES OF POOLING THIS ACROSS NATIONS, WHICH GEB AGAIN, INCREASES THE RESOLUTION OF OUR COMPUT COMPUTATIONAL INSTRUMENTS. I DON'T KNOW HOW WE SORT THIS OUT, BUT IT'S VERY CLEAR TO ME THAT THERE IS AN INTERNATIONAL ISSUE HERE, AND IT'S ONE THAT PROBABLY BEARS CONSIDERATION SOONER RATHER THAN LATER. SO, I WANT TO JUST CONCLUDE THESE COMMENTS AND SPECULATIONS -- AND THERE ARE LOTS OF TIME FOR QUESTIONS -- WITH A COUPLE OF KIND OF RETORICAL POINTS. SO I THINK I'VE ARGUED THAT THE SHAPE OF THE FUTURE MEDICAL KNOWLEDGE BASE IS GOING TO BE DETERMINED IN SIGNIFICANT PART NOT JUST BY ADVANCES IN BIOLOGICAL SCIENCE, NOT JUST BY COMPUTATION COMPUTATIONAL ADVANCES, BUT BY THE INTEGRATION OF THESE RECORDS AND THAT THE ISSUES AROUND THE INTEGRATION OF THESE RECORDS ARE AT LEAST AS MUCH SOCIAL, POLITICAL, REGULATORY, PUBLIC POLICY-DRIVEN AS THEY ARE TECHNICALICALLY DRIVEN OR SCIENTIFICALLY DRIVEN. SO ONE OF THE QUESTIONS THAT YOU RUN INTO HERE IS ONE OF PUBLIC TRUST. WHAT ORGANIZATIONS, WHAT INSTITUTIONS, WHAT AGENCIES DOES THE PUBLIC TRUST TO MAINTAIN DATA TO USE IT WISELY, TO USE IT IN WAYS THAT ARE CONSISTENT WITH THE WAYS THEY SAY IT'S GOING TO BE USED? TO BE TRANSPARENT ABOUT WHAT THEY'RE DOING? WELL, LLOYD ENOUGH, LIBRARIES ARE STILL PRETTY HIGH ON THAT LIST. THERE ARE A LOT OF OTHER INSTITUTIONS THAT I'D SAY ARE PRETTY LOW ON THAT LIST, BUT LIBRARIES ARE PRETTY HIGH ON THAT LIST, AND I THINK THAT IS ANOTHER CONNECTION THAT WE REALLY SHOULD NOT OVERLOOK, AS WE THINK ABOUT THE ROLE OF LIBRARIES AS STEWARDS FOR THESE KNOWLEDGE BASES OF THE FUTURE, WHICH INTEGRATE DATA, LITERATURE, AND RECORDS AND PROVIDE A COMPUTATIONAL PLATFORM FOR ALL OF THEM. I THINK THAT, WHEN WE TALK SPECIFICALLY ABOUT THE FUTURE OF MEDICAL LIBRARIES AND THE KNOWLEDGE BASES THAT DRIVE MEDICINE, WE CAN SEE IT STARTING TO SPLIT OFF FROM THE UNDERLYING BIOLOGY KNOWLEDGE BASES BECAUSE OF THAT INCLUSION OF RECORDS. I THINK THAT THIS RAISES SOME VERY INTERESTING QUESTIONS ABOUT THE ROLE OF LIBRARIES WITHIN SPECIFICALLY MEDICAL ORGANIZATIONS AND HOW THIS CONNECTS TO RECORDS MANAGEMENT AND TO IT WITHIN THESE ORGANIZATIONS. I THINK IT BEGS VERY COMPELLING QUESTIONS ABOUT WHAT OUGHT TO BE DONE ON A NATIONAL BASIS AND WHAT OUGHT TO BE DONE ON A MUCH MORE DISTRIBUTED BASIS AND WHEN THINGS ARE DONE ON A DISTRIBUTED BASIS, HOW THIS IS ROLLED UP TO FACILITATE LARGE-SCALE ANALYSIS. I DON'T HAVE THE ANSWERS TO THOSE SPECIFICALLY, BUT I DO THINK THAT WE ARE VERY FORTUNATE IN HAVING A NATIONAL LIBRARY OF MEDICINE THAT IS AVAILABLE AS A POTENTIAL FOCAL POINT AND POTENTIAL STEWARD FOR MANY OF THESE DEVELOPMENTS AS WE GO FORWARD. THANK YOU. [APPLAUSE] >> THANK YOU VERY MUCH, CLIFFORD, FOR THAT GREAT TALK. I KNOW THAT CLIFFORD'S ALWAYS READY TO ENGAGE WITH THE AUDIENCE AND TAKE QUESTIONS. THOSE OF YOU WHO ARE SITTING THERE, IF YOU HAVE A QUESTION, REMEMBER TO TURN YOUR MICROPHONE ON SO WE'LL PICK IT UP. ANYONE HAVE ANY COMMENTS THEY WOULD LIKE TO MAKE? YES. >> HI, CLIFF. THANK YOU FOR A GREAT TALK. SO ONE OF THE BIG DRIVERS OF BIG DATA ANALYSIS, OF COURSE, IS COMMERCIAL USES AND ONE OF THE LARGEST COMMERCIAL USES ANALYSIS IS BUYING DATA AND BEHAVIOR ON THE INTERNET. WHAT ABOUT THOSE TECHNIQUES BEING USED FOR BIOMEDICAL DATA ANALYSIS AS WELL AS THE INTEGRATION OF THOSE RECORDS? ANY THOUGHTS ON THAT? >> INDEED, BEING DRIVEN BY COMMERCIAL WORK. IT'S REALLY KIND OF INTERESTING THAT SOME OF THE LARGEST COMP PUTATIONS IN THIS AREA AREN'T BEING DONE BY SCIENTISTS ANYMORE IN LABS, LOOKING AT PURELY SCIENTIFIC RESULTS. THEY'RE BEING DONE BY PEOPLE TRYING TO SELL THINGS OR TO ANALYZE VARIOUS KINDS OF HUMAN BEHAVIOR FOR, I THINK, SOMETIMES COMMERCIAL REASONS, SOMETIMES GOVERNMENTAL REASONS. ONE OF THE THINGS THAT HAS TAKEN ON JUST UNBELIEVEABLE SCALE IN THE LAST DECADE IS COMP PUTATION ABOUT SOCIAL BEHAVIOR, WHICH IS JUST FEN NOMNAL. I THINK IT'S IMPORTANT THAT THE FOLKS IN THE MEDICAL WORLD STAY CONNECTED TO THESE DEVELOPMENTS, AND I THINK YOU'RE GOING TO SEE NEW SPECIALTIES SHOW UP. THERE ARE ALREADY LOTS OF PEOPLE IN THE BIOINFORMATICS WORLD WHO ARE HEAVILY INVOLVED IN THIS, BUT YOU ARE GOING TO, I THINK, SEE THESE TECHNIQUES MOVE INTO PUBLIC HEALTH AREAS, INTO EPIDEMIOLOGY -- THOSE KINDS OF THINGS. AND FURTHER DOWNSTREAM, TOO. THIS GETS BOTH INTERESTING AND CREEPY TO THINK ABOUT. IF YOU JUST START THINKING ABOUT WHAT'S OUT THERE AND WHAT'S DOABLE AND DON'T WORRY ABOUT DATA PRIVACY OR WHOSE ARM YOU'D HAVE TO TWIST TO INTERGRATE THE DATA SETS, THERE IS ACTUALLY YOU KNOW, ALL KINDS OF DATA THAT COULD BE INTEDPRATED TO ANSWER VARIOUS PROBLEMS. HERE IS A COUPLE JUST OFF THE TOP OF MY HEAD. SO THERE HAS BEEN THIS ONGOING ARGUMENT ABOUT WHETHER PEOPLE WHO TALK ON CELL PHONES A LOT ARE MORE LIKELY TO GET VARIOUS KINDS OF BRAIN CANCER. SO YOU'VE ACTUALLY GOT DATABASES FOR BIG CELL PHONE COMPANIES THAT WILL GIVE YOU A REAL SENSE OF HOW MUCH TIME PEOPLE SPEND ON THE PHONE. IT WON'T TELL YOU WHETHER THEY'RE USING AN EAR PIECE OR NOT, ALTHOUGH IF THEY ARE USING AN EAR PIECE, THEY MIGHT HAVE BOUGHT IT AND YOU MIGHT BE ABLE TO FIND THAT OUT. YOU REALLY COULD DO SORT OF A SOCIAL-LEVEL STUDY ON THAT FAIRLY EASILY. NOW, YOU KNOW, YES, WHAT IT WOULD TELL YOU IS EEQUIV INDICATED BUT THESE ARE TWO SILOS OF DATA SITTING AROUND. YOU ACTUALLY CAN GET PRETTY GOOD DATA ON THE GENERAL EATING HABITS OF A SUBSTANTIAL AMOUNT OF THE POPULATION. THE MARKETING PEOPLE HAVE THAT. YOU KNOW, ACROSS TIME. AND YOU COULD IN THEORY CORALATE THAT RATHER THAN RELYING ON SELF-REPORTING OR OTHER SORTS OF THINGS TO DO LARGE-SCALE STUDIES ABOUT THE INTERACTION OF DIET AND THIS AND THAT. AS I SAY, THESE THINGS GET KIND OF CREEPY AS YOU THINK ABOUT THEM, BUT WE ARE IN A WORLD NOW WHERE SOCIAL BEHAVIOR AND DATA SETS REPRESENTING IT ARE JUST ABUNDANT. ONE OF THE QUESTIONS THAT I CAN'T FIND ANYBODY WHO IS WILLING TO TALK ABOUT VERY MUCH IS WHAT HAPPENS TO OLD MARKETING DATA? IT'S PROBABLY GOT ONLY A USEFUL MAYBE FIVE-YEAR LIFE SPAN FOR MINING FOR IMMEDIATE KINDS OF PURPOSES BECAUSE PEOPLE'S TASTES CHANGE AND BEHAVIORS CHANGE. BUT I HAVE THIS FEELING THEY ARE NOT ALLOWING IT THROWING IT OUT THAT, IT JUST KIND OF GOES INTO SOME SILO IN THE BASEMENT OF THE CORPORATE DATA CENTER. AND ONE WONDERS ABOUT WHAT SHOULD HAPPEN TO THAT IN THE LONG TERM? SHOULD THAT SOMEHOW MOVE PUBLIC IN THE SAME WAY THAT WE OPEN UP THE DETAILS OF CENSUSS AFTER 70 YEARS? SO THAT WE CAN AT LEAST YOU KNOW, DO HISTORICAL ANALYSIS OF THESE KINDS OF THINGS? THESE ARE THE SORT OF POLICY QUESTIONS THAT THE BIG DATA ENVIRONMENT RAISES AND SEEM TO ME TO BE RELATIVELY UNTOUCHED. MANY OF THEM HAVE AT LEAST PERIPHERAL RELEVANCE TO ALL KINDS OF MEDICAL AND PUBLIC HEALTH RESEARCH. >> YES. >> THANK YOU FOR THE TALK. I THINK ONE OF THE FOCUSES IS ABOUT THE GROWING AMOUNT OF INFORMATION AND THE COMPLEXITY OF THE INFORMATION AND IT SEEMS THAT THERE IS AN EVEN BIGGER CHALLENGE NOW IN TRANSLATING THAT INFORMATION FOR THE PUBLIC'S CONSUMPTION, PARTICULARLY PEOPLE WHO AREN'T TRAINED IN THOSE AREAS OF SPECIALTY. I WAS JUST WONDERING IF YOU CAN COMMENT ON THAT. >> OH, THAT'S SUCH AN INTERESTING QUESTION. AND THIS IS ONE THAT SHOWS UP WITH A GREAT VENGEANCE IN THESE DEBATES ABOUT THESE SERVICES THAT WILL DO GENEOTYPE GENEOTYPING ON YOU BECAUSE IT TURNS OUT THAT THERE WERE SOME STUDIES DONE OF THESE, AND IN TERMS OF THE LABWORK, IF YOU WILL, THEY ARE PRETTY GOOD AND THEY ARE PRETTY CONSISTENT. IF THEY TELL YOU YOU'VE GOT SOME YEAH, IT'S GENERALLY QUITE ACCURATE. THE PLACE WHERE EVERYBODY STARTED GETTING WORRIED IS NOT SO MUCH YOU'RE BEING TOLD THAT YOU HAVE A PARTICULAR VARIATION, BUT WHEN THEY START SUGGESTING WHAT THAT MIGHT MEAN, YOU KNOW, THAT THIS GIVES YOU THIS PREDISPOSES YOU TOWARD SOMETHING AND THEN THERE IS A QUESTION OF WELL, CAN YOU QUANTIFY THAT PREDISPOSITION? AND THAT'S WHERE -- THAT'S THE EXCUSE THAT WAS SUGGESTED IN TERMS OF SHUTTING THESE SERVICES DOWN BECAUSE ONCE THEY STARTED HELPING YOU TO INTERPRET IT, IT STARTED SOUNDING KIND OF MEDICAL. AND I'M MORE SENSITIVE TO THAT AND ACCEPTING OF THAT ARGUMENT THAN THE BUSINESS ABOUT THE RAW DATA. I THINK WE ACTUALLY HAVE TWO PHENOMENA GOING ON. WE HAVE A LOT MORE OF THE PUBLIC THAN EVER BEFORE FINDING THEIR WAY INTO RESEARCH LEVEL DATA AND STARTING TO INTERPRET IT FOR THEMSELVES. AND I THINK WHAT WE ARE FINDING, ALTHOUGH, WE DON'T ALWAYS LIKE TO ADMIT IT, IS THAT THERE IS A CONSIDERABLE SLICE OF THE PUBLIC THAT IF THEY ARE WILLING TO PUT IN THE TIME AND ENERGY, CAN DO PRETTY WELL AT THAT. IT PRESUPPOSE THAT'S THEY HAVE A BASIC COMMAND OF SCIENCE AND STATISTICS AND A FEW OTHER THINGS AND THAT THEY ARE WILLING TO PUT SOMETIME IN ON IT. BUT THERE ACTUALLY IS A SECTOR OF THE PUBLIC THAT I THINK IS DOING VERY WELL THERE. BUT THEN THERE IS A BIG SECTOR OF THE PUBLIC THAT HAS GOT A TERRIBLE PROBLEM DEALING WITH THIS STUFF BECAUSE THEY ARE RELATIVELY ENUMERATE AND AREN'T APT WITH STATISTICS AND BASIC STUFF LIKE THE VOCABULARY AND WRITING LEVEL THAT'S TYPICAL OF SCIENTIFIC PAPERS IS A PROBLEM. AND SO WE'VE SEEN THE GROWTH OF THIS WHOLE SORT OF TRANSLATIONAL ROLE. NLM DOES SOME OF THIS. A LOT OF THE BIG MEDICAL CENTERS DO SOME OF THIS. THERE ARE COMMERCIAL PLAYERS IN HERE. FOR BETTER OR WORSE THE POPULAR PRESS DOES SOME OF THIS. AND I THINK THAT'S FILLED A REAL NEED, TOO. BUT THE -- I THINK THE STAKES ARE GOING TO GO UP A LOT IN THE NEXT 20 YEARS BECAUSE THE DATA IS GOING TO GET MORE AND MORE COMPLEX. IT'S GOING TO BECOME MORE AND MORE PERSONAL, AND TRYING TO SORT OUT YOUR OWN PERSONAL RISKS AND THE CHOICES YOU WANT TO MAKE ABOUT THEM IS GOING TO GET TO BE A VERY COMPLEX BUSINESS. I DON'T KNOW HOW WE FIX THAT. I SUSPECT THAT WE PROBABLY CAN DEVELOP SOME COMPUTER-BASED TOOLS THAT HELP PEOPLE FIND THEIR WAY THROUGH SOME OF THAT THAT, RATHER THAN JUST REINTERPRETING THE LITERATURE CAN HELP THEM TO INTERPRET DATA ABOUT THEMSELVES. BUT I ALSO AM CAUTIOUS ABOUT HOW FAR THOSE TOOLS CAN REALLY GO EFFECTIVELY. I THINK IT'S FAIRLY CLEAR YOU CAN DO SOME THINGS. I ALSO WORRY THAT WE'RE GOING TO GET INTO A LOT OF SITUATIONS WHERE IN AN IDEAALE WORLD THE ANSWER IS ONE THING. IN A WORLD OF FAIRLY SEVERE ECONOMIC CONSTRAINTS WE'RE CHOOSING TO DO SOMETHING ELSE, WHICH IS SORT OF CLEARLY PRETTY SUBOPTIMAL BUT THAT'S WHAT WE'RE DOING BECAUSE THAT'S WHAT WE FEEL WE CAN AFFORD. I WISH I HAD A BETTER ANSWER FOR THAT. I DO THINK, THOUGH, KIND OF REFLECTING ON THIS THAT ONE OF THE THINGS I REALLY DO THINK IS JUST A WONDERFUL, WONDERFUL BYPRODUCT OF THE MOVE TO OPEN ACCESS, THE DEVELOPMENT OF THE INTERNET, THE KIND OF PUBLIC RESOURCES THAT INSTITUTIONS LIKE NLM HAVE BEEN CREATING IS THE WAY WE'VE OPENED UP THIS WEALTH OF INFORMATION TO PARTS OF THE ZWRERNLE PUBLIC THAT ARE -- GENERAL PUBLIC THAT ARE PREPARED TO STEP UP TO THE CHALLENGES OF USING IT. AT THE SORT OF HIGHER END OF THE PUBLIC THAT'S THE MORE EDUCATED PUBLIC THAT'S WILLING TO INVEST TIME AND LEARNING MORE ABOUT THINGS THAT INTEREST THEM, WHETHER IT'S ABOUT YOU KNOW, MEDICINE OR WHETHER IT'S ABOUT ANY NUMBER OF OTHER SCHOLARLY PURSUITS. IT REALLY IS JUST AN AGE OF ABUNDANCE WE'VE CREATED. AND IT'S ONE THAT I THINK, FOR EXAMPLE, WILL SERVE FUTURE GENERATIONS OF GIFTED KIDS IN A PHENOMENAL WAY THAT WE'VE NEVER BEEN ABLE TO DO BEFORE. OKAY. WE'VE GOT ONE HERE AND THEN I'LL GO BACK THERE. OKAY. THANKS, CLIFF. >> THANKS, CLIFF. I APPRECIATE HEARING ALL THE EMPHASIS YOU PUT ON NEW SOURCES OF DATA THAT ARE GOING TO INFORM MEDICINE AND LIFE SCIENCES AND I THINK MY QUESTION IS REALLY AS THE PROVISION OF DATA CONTINUES AND IT'S AVAILABLE FROM DIFFERENT PLACES AND IT'S FROM TWITTER FEEDS AS WELL AS FROM ELECTRONIC HEALTH RECORDS, WHAT DO YOU SEE AS THE ROLE THEN OF THE LIBRARY COMMUNITY MORE BROADLY IN HELPING ORGANIZE THAT INFORMATION OR DIRECT PEOPLE TO IT OR IS IT MORE A SENSE OF TRYING TO HELP DEVELOP FORMATS AND STANDARDS AND SO FORTH THAT ENSURE THAT THE KIND OF THERE IS SOME SORTING THROUGH THE CACAVNI OF DATA THAT MIGHT OTHERWISE ARISE? >> I THINK WE GET INTO SOME SORT OF ANNOYING SAMANTHAIC PROBLEMS HERE PAWS WE TALK ABOUT THE ROLE OF THE LIBRARY COMMUNITY AND YOU KNOW, REALLY WHAT YOU ARE GOING TO GET IS A MIXTURE OF PEOPLE WHO ARE DATA STEWARDS, PEOPLE WHO HELP WITH ACCESS AND DISCOVERY OF DATA, PEOPLE WHO DEVELOP ORGANIZATIONAL TOOLS AND SCHEMES AND ON THEOLOGYGIES AND VOCABULARYS AND THIS SORT OF THING. PEOPLE WHO ARE TRANSLATORS. YOU ARE GOING TO FIND A WHOLE BUNCH OF DIFFERENT ROLES AND RIGHT NOW THESE ARE -- THE SKILLS FOR THIS ARE KIND OF SCATTERED AROUND BETWEEN INFORMATICS PEOPLE IN VARIOUS FIELDS -- LIBRARIANS,o'INFORMATION SCIENTISTS. I DON'T PARTICULARLY WANT TO TRY AND CAREFULLY FIGURE OUT ON MY FEET WHAT'S THE DIFFERENCE BETWEEN AN INFORMATICS PERSON AND AN INFORMATION SCIENTIST BUT THEY DO SEEM TO HAVE DIFFERENT TRIBES AT LEAST WITH KIND OF DIFFERENT PRACTICES AND VIEWS OF THE WORLD. I THINK THAT YOU ARE GOING TO SEE ALL OF THESE PROFESSIONS AND MORE COMING TOGETHER OR AROUND THE MAINTENANCE AND CREATION AND EXPLOITATION OF THESE KINDS OF DISCIPLINARY KNOWLEDGE BASES. SO I WORRY A LITTLE WHEN THE QUESTION GETS FRAMED AS THE ROLE OF LIBRARIES OR LIBRARIANS SPECIFICALLY. I THINK WE NEED TO -- WE PROBABLY NEED TO BE A LITTLE MORE FLEXIBLE ABOUT OUR CHARACTERIZATION OF PROFESSIONS AND ORGANIZATIONS. IS N CBI A LIBRARY? IS IT PART OF A LIBRARY? THAT'S KIND OF AN OBVIOUS CASE IN POINT. AND I THINK WE'RE HAVING TERRIBLE TROUBLE NOW, AS WE LOOK MORE BROADLY ACROSS DATA-INTENSIVE SCHOLARSHIP, WITH THESE QUESTIONS ABOUT SHOULD LIBRARIES BE -- SHOULD RESEARCH LIBRARIES BE OR INCLUDE RESEARCH DATA REPOSITIVETRY OR ARE THESE REALLY SOMETHING DIFFERENT? TO WHAT EXTENT CAN LIBRARIES CREDIBLY FILL THAT ROLE? THESE ARE HARD QUESTIONS, BUT THEY'RE ALSO QUESTIONS THAT ARE SHAPED BY SOME VERY SORT OF TRADITIONAL HABITS ABOUT HOW WE CHARACTERIZE PROFESSIONS AND INSTITUTIONS THAT MAY NOT BE AS USEFUL IN THE FUTURE AS THEY WERE IN THE PAST. >> HI. I HAVE A QUESTION ABOUT DATA CITATION. I'M WONDERING IF YOU HAVE ANY FEELING AS TO WHETHER THE PUBLISHERS ARE READY TO SORT OF EMBRACE AN EDITORIAL POLICY SAYING DATA SHOULD BE IN A SECTION. IT SHOULD AN KNOWLEDGEMENT AND LOCATED SOMEWHERE WHERE WE CAN ALWAYS FIND IT BECAUSE WE'VE GONE THROUGH SOMETHING RECENTLY LOOKING AT OUR OWN -- LOOKING FOR WHETHER PEOPLE ARE CITING OUR DATA PROPERLY, FINDING THAT AT LEAST YES, MOST OF THE TIME THEY'RE CITING IT, WHICH IS VERY NICE, BUT IT CAN BE ALL OVER THE PLACE. SOMETIMES BURIED IN A FOOTNOTES. SOMETIMES BURIED IN SUPPLEMENTAL DATA. AND I'M JUST CURIOUS IF THERE SEEMS TO BE AN OBVIOUS MOVEMENT IN ANY DIRECTION. >> LET ME SAY A COUPLE THINGS ABOUT THAT. I THINK THE FIRST THING THAT'S WORTH SAYING IS THAT I THINK MOST PUBLISHERS IN MOST DISCIPLINES NOW HAVE FIGURED OUT THAT THEY DON'T WANT TO BE IN THE DATA BUSINESS, THAT THEY DON'T WANT TO USE THE SORT OF SUPPLEMENTARY INFORMATION PROVISIONS OF ARTICLES AS A PLACE FOR PUTTING IMPORTANT REFERENCE DATA. I MEAN, YOU MIGHT PUT SOME SPREAD SHEETS IN THERE FOR SOMEONE TO WORK WITH OR SOME COMPUTER PROGRAMS OR SOME EXTRA GRAPHS OR SOMETHING, BUT THEY ARE NOT KEEN ABOUT HAVING SORT OF LONG-TERM ARCHIVAL RESPONSIBILITY FOR UNDERLYING DATA. THEY ARE TAKING THE VIEW THAT BELONGS IN LIBRARIES OR DISCIPLINARY REPOSITIVETORIES OR SOMEWHERE. SO THAT MEANS THAT YOU NEED A CITATION MECHANISM FOR IT. RIGHT NOW, AS I THINK AS YOU SAY, MY SENSE ALSO IS THAT THOSE PRACTICES ARE ALL OVER THE PLACE FROM JOURNAL TO JOURNAL AND INDEED AUTHOR TO AUTHOR. SOMETIMES IT'S JUST BURIED IN THE ACKNOWLEDGEMENTS. SOMETIMES YOU KNOW, FOOTNOTES. SOMETIMES A FORMAL CITATION. YOU NAME IT. I THINK THAT THERE IS GREAT INTEREST IN DEVELOPING AND SEEING THE WIDESPREAD ADOPTION OF A SORT OF A STANDARD AND FORMAL CITATION MECHANISM, AND OF HAVING THAT DONE SPECIFICALLY SO THAT CITATIONS TO DATA CAN BE COUNTED AND COMPUTED ON IN MUCH THE SAME WAY CITATIONS TO OTHER PAPERS CAN BE COUNTED OR COMPUTED ON AND SO THAT ONE COULD RELIABLY HARVEST THOSE CITATIONS THROUGH A COMPUTATION COMPUTATIONAL SCAN OF ARTICLES IN THE SAME WAY ONE CAN HARTEST CITATIONS TO PAPERS. THEY'RE NUMBER OF STANDARD EFFORTS GOING ON RIGHT NOW. MOST NOTABLELY WY SAY THE DATA CITE WORK TO TRY AND COME UP WITH SUCH A STANDARD. THE -- THERE IS A SORT OF A MESSY SUBPIECE IN THAT YOU NEED AN IDENTIFIER SYSTEM UNDERNEATH THIS AS WELL TO REALLY MAKE IT WORK. BUT I THINK YOU WILL SEE THAT HAPPENING. THE OTHER KIND OF ALTERNATIVE DEVELOPMENT, AND IT'S ONE THAT I THINK YOU WILL SEE ONLY FOR VERY IMPORTANT KINDS OF DATA AND THAT HAS SOME SCALING PROBLEMS IS THE SORT OF WHAT I WOULD CALL THE TOMBSTONE PAPER OR DATA PAPER, WHERE YOU DO A SHORT PAPER THAT BASICALLY SAYS I'VE CREATED THIS DATA SET, AND HERE RAY FEW BASIC STATISTICS AND OBSERVATIONS ABOUT IT. SO FOR EXAMPLE, THAT'S WHAT THEY DID WHEN THEY PUBLISHED THE HUMAN GENOME. A FAIRLY SIZABLE BREAKTHROUGH. AND IN FACT, YOU WILL SEE, IF YOU LOOK AROUND THROUGH SCIENCE AND NATURE AND THE FOLLOWING YEARS AS THEY DID IN A NUMBER OF OTHER MAJOR GENOMES, THEY'D HAVE A COUPLE-PAGE PAPER IN THERE THAT WOULD TYPICALLY DESCRIBE SORT OF THE MARCO CHARACTERISTICS OF IT AND ESTABLISH CREDIT THAT THEY DID IT. AND IN SOME CASES YOU'LL SEE PEOPLE CITING TO THAT AS BEFORE THEY HAD A WAY OF CITING TO THE UNDERLYING DATA. SO THAT WAS ANOTHER ALTERNATIVE. I DO THINK, THOUGH, THAT WE ALSO NEED TO REALIZE THAT THERE ARE SOME SORT OF LOGICAL ISSUES ABOUT CITATION, TOO, THAT WE NEED TO THINK ABOUT THE PRACTICEITIES OF. JUST AS A FOR INSTANCE, -- PRACTICALITIES OF. IT'S QUITE COMMON TO SEE SOMEONE CITE A SPECIFIC SEQUENCE THAT'S BEEN DEPOSITD IN JEN BANK SO IT'S GOT AN AC SESSION NUMBER ON IT. BUT THEN YOU GET THESE OTHER PEOPLE WHO ARE DOING LIKE SYSTEMIC BIOLOGY, WHERE THEY ACTUALLY WANT TO COMPUTE OVER ALL THE SEQUENCES IN JEN BANK BECAUSE THEY ARE INTERESTED IN HOW SOMETHING IS CONSERVED OR VARIES ACROSS SPECIES. AND IT'S HARD TO REALLY THINK ABOUT WHAT'S A MEANINGFUL DATA CITATION THERE, OTHER THAN WE COMPUTED OVER JEN BANK AND GOT THAT.F SO THE NOTION OF YOU AS A CONTRIBUTOR TO JEN BANK, BECAUSE YOU PUT A SEQUENCE IN THERE, GETTING INDIVIDUAL CREDIT BECAUSE YOUR WORK WAS USED ALONG WITH THE CONTRIBUTIONS OF TENS OF THOUSANDS OF OTHER PEOPLE IN STUDIES ABOUT HOW THE TREE OF LIFE IS STRUCTURED IS KIND OF UNREALISTIC. SO THERE ARE SOME -- THERE ARE SOME REAL LIMITATIONS, I THINK, TO DATA CITATION AS WELL. DID WE HAVE ONE MORE IN THE BACK? GEORGE? >> THANK YOU FOR A GREAT TALK, CLIFF. I WAS WONDERING IF YOU COULD EXTEND THIS INFORMED SPECULATION TO A NARROWER AREA LIKE MEDICAL IMAGING. SOMEBODY POINTED OUT TO ME THE OTHER DAY WHAT'S THE BIG DIFFERENCE IN 1881 AND 1981 WHEN PRESIDENT GARLAND WAS SHOT IN 1881 AND PRESIDENT REAGAN'S EVENT IN 1981? WAS THE DEVELOPMENT OF THE X-RAY, BECAUSE IN 1881, PRESIDENT GARLAND -- GARFIELD SBIFD ABOUT TEN WEEKS. IN 1981 PRESIDENT REAGAN SURVIVED A QUARTER OF A CENTURY. AND THE BIG DIFFERENCE WAS THAT WITH THE X-RAY OR MEDICAL IMAGING DEVICES, YOU COULD PINPOINT THE LOCATION OF THE BULLET AND THEN PROCEED WITH TREATMENT. SO I'M JUST WONDERING IF, IN YOUR THINKING, YOU HAVE FOCUSED ON IMAGING IN PARTICULAR AS A HARBINGER OF BETTER TREATMENTS AND BETTER THINGS THAT INFORMATION-BEARING OBJECTS ESSENTIALLY AND WHERE A LIBRARY COULD, IN FACT, HAVE A ROLE? >> THAT'S A VERY INTERESTING QUESTION, IN PART BECAUSE MY PRESUMPTION IS AT LEAST IS THAT AS WE SEE ELECTRONIC MEDICAL RECORDS DEVELOP, THEY CERTAINLY WOULD INCLUDE IMAGERY THAT'S COLLECTED OF PEOPLE OVER THE COURSE OF THEIR LIVES. NOW, ONE OF THE THINGS THAT I THINK IS STILL RELATIVELY IN ITS FANCY AS A FIELD, ALTHOUGH I KNOW THERE HAS BEEN WORK DONE IN IT, IS IMAGE MINING. RATHER THAN STARTING WITH TEXT MINING OR DATA MINING OF SPECIFICALLY CHARACTERIZED STRUCTURED DATA. CAN WE GO IN AND DEDEUCE THINGS OFF OF THE IMAGES THAT ARE ATTACHED TO THESE RECORDS? THAT'S A VERY INTERESTING KIND OF PROSPECT THAT I THINK HAS GREAT POTENTIAL. AND ONE THAT IS GOING TO BE COM PUTATIONALLY HORRENDOUS AS WELL AS BAND WIDTH INTENSIVE, I THINK. BEYOND THAT I HAVEN'T REALLY THOUGHT ABOUT IT TOO MUCH. I SUSPECT THAT YOU KNOW, MUCH OF THE WORK THAT'S BEEN DONE WITH IMAGES UP TILL NOW HAS BEEN TRYING TO DIAGNOSE OR RECOGNIZE THINGS RATHER THAN TO DERIVE THINGS OUT OF HUGE SETS OF THEM BECAUSE WE HAVEN'T HAD HUGE SETS OF THEM AS AVAILABLE AS WE'D LIKE TO. AND I THINK THERE ARE SOME REAL PROSPECTS THERE. THE OTHER THING THAT'S VERY, VERY HARD ABOUT IMAGES, OF COURSE, IS SORT OF CROSS-IMAGE COMPARISON BECAUSE OF ALL THE VARIATION OF WHAT'S BEING MAEGD -- IMAGINEED, THE IMAGINING DEVICE, EXACTLY WHERE IT'S POINTING, THE REGISTRATION OF THE SUBJECT -- THINGS OF THAT NATURE. I THINK THE OTHER THING THAT IS VERY MUCH IN PROSPECT AS WELL IS MOVING IMAGE STUFF. WE ARE INCREASELY SEEING VARIOUS KINDS OF INSTRUMENTATION THAT PRODUCES NOT SIMPLY STILL IMAGES BUT VERY COMPLEX, MOVING IMAGE MATERIALS AND I THINK THAT THOSE GET EVEN MORE COMPLICATED TO REASON FROM. THAT'S AN AREA THAT I SUSPECT IS VERY POORLY EXPLORED. >> I DON'T SEE ANY MORE PEOPLE JUMPING FOR QUESTIONS. [LAUGHTER] IT LOOKS LIKE WE'RE ABOUT AT TIME. THANKS SO MUCH. >> THANK YOU VERY MUCH. [APPLAUSE] [APPLAUSE]