GOOD AFTERNOON. MY NAME IS ERIC GREEN, DIRECTOR. RESEARCH INSTITUTE. IT'S A PLEASURE TO INTRODUCE TODAY'S SPEAKER. DR. DAVID GOLDSTEIN. DAVID'S BIOLOGY INCLUDES EARNING THE Ph.D. FROM SANFORD, UNIVERSITY IN 994 AND FOLLOWING POSTDOCTORAL TRAINING, TOOK A FACULTY TRAINING AT THE UNIVERSITY COLLEGE LONDON AS PROFESSOR OF GENETICS, A POSITION HE HELD FROM 1999-2005. THEN RECUTED TO DUKE UNIVERSITY AS THE DIRECTOR FOR CENTER OF GENOME VARIATION. AND LAST YEAR HE BECAME THE RICHARD AND PAT JOHNSON DISTINGUISHED UNIVERSITY PROFESSOR AT DUKE. DAVID HAS MANY AWARDS, INCLUDING ONE OF 7 ROYAL SOCIETY MERIT AWARDS IN THE UNITED KINGDOM FOR HIS WORK IN HUMAN POPULATION GENETICS, MOST RECENTLY, ELECTED CO-CHAIR FOR 2011 AND THE CHAIR OF 2012. GORDON RESEARCH CONFERENCE ON HUMAN GENETICS AND GENOMICS. DAVID IS A VERY TALL WANTED, INSIGHTFUL RESEARCHER. WHICH IS WHY WE BROUGHT HIM HERE. HE IS WIDELY REGARDED AS A LEADER, SPECIFICALLY ADVANCES IN WHOLE GENOME SEQUENCE APPROACHES ARE USED TO IDENTIFY THE GENETIC BASES OF COMMON DISEASES AND DRUG RESPONSES. AS YOU KNOW, GENOME WIDE ASSOCIATION STUDIES, KNOWN AT GWAS, ALLOWS RESEARCHERS TO ZERO IN ON REGIONS, BUT MANY QUESTIONS REMAIN ABOUT WHETHER STUDIES CONFINE ALL RISK CONFERRING VARIANTS, AND HOW THEY'RE GOING TO BE IDENTIFIED. DAVID HAVE BEEN AT THE FOR AT THE DEBATE AND INVESTIGATIONS, AND HAS REALLY INVOLVED HIS OWN WORK TO TRY TO DISSECT THIS PROBLEM USING NEXT GENERATION DNA SEQUENCING TECHNOLOGIES AND NEW ANALYTICAL APPROACHES FOR STUDYING IMPORTANT DISEASES. GOING TO DESCRIBE SOME OF THIS WORK TODAY IN HIS TALK ENTITLED HUMAN GENETICS, THE NEXT GENERATION, RARE VARIANTS AND COMMON CURES. DAVID. [APPLAUSE] >> WELL, THANK YOU VERY MUCH FOR THAT INTRODUCTION. AND IT REALLY IS A PRIVILEGE FOR ME TO BE HERE TODAY. I WILL BE TALKING ABOUT THE ROLE OF BOTH RARE VARIANTS AND COMMON VARIANTS IN TRAITS OF MEDICAL INTEREST, AND MAKE COMMENTARY ABOUT THE RESEARCH IN THE FUTURE. BUT BEFORE LAUNCHING INTO TOO MUCH DETAIL ON THE RARE VARIANT SIDE, I WANT TO KICKOFF WITH SOME DISCUSSION OF SOME OF THE WORK THAT WE HAVE BEEN INVOLVED IN IN LOOKING AT THE ROLE OF COMMON VARIANTS IN CONDITIONS OF CLINICAL IMPORTANCE. AND FOR ANYONE HERE TO SEE SORT OF THE LATEST CHAPTER IN THE GWAS WARS, I'M AFRAID YOU MIGHT BE DISAPPOINTED. I HAVE RATHER POSITIVE THINGS TO SAY ABOUT GENOME WIDE ASSOCIATION STUDIES AND RATHER CONCERNING THINGS TO SAY ABOUT SEQUENCING IN SOME REGARDS. BUT LET'S JUST LAUNCH INTO IT. SO FIRST, I'D LIKE TO TELL YOU ABOUT STUDIES THAT WE HAVE BEEN DOING TO LOOK AT THE GENETIC CONTRIBUTIONS OF VARIABLE RESPONSE TO TREATMENT FOR CHRONIC INFECTION WITH HEPATITIS C. AND THE BACKGROUND TO THIS WORK IS REALLY CLEAR, I THINK, TO ANYBODY INTERESTED IN THE GENETIC CONTROL OF CLINICALLY IMPORTANT TRAITS. IF YOU LOOK AT WHAT HAPPENS WHEN STANDARD OF CARE TREATMENT IS USED FOR CHRONIC HEPATITIS C INFECTION, YOU SEE THAT THE STANDARD OF CARE TREATMENT IS REALLY VERY FAR FROM OPTIMAL. FIRST, FUNDAMENTALLY IT WORKS ABOUT HALF THE TIME. ABOUT HALF OF THE TIME THE PATIENTS GO THROUGH THE FULL COURSE OF TREATMENT. THE VIRUS IS GONE FOR GOOD. ONLY WORKS ABOUT HALF THE TIME. ON TOP OF THAT, THE FULL COURSE OF TREATMENT IS QUITE ARDUOUS, ALWAYS UNPLEASANT AND SOMETIMES DANGEROUS. FINALLY, THERE HAS BEEN A STRONG INDICATOR THAT THERE IS A GENETIC CONTRIBUTION TO THIS VARIABLE OUTCOME TO STANDARD OF CARE AND THAT IS THAT ADVISE OF EUROPEAN ANCESTRY ARE KNOWN TO BE MUCH MORE LIKELY TO BE CURED BY STANDARD OF CARE THAN INDIVIDUALS OF FINANCE ANCESTRY. AS I COMMENT MORE IN A LITTLE BIT IF YOU'RE INTERESTED THE GENETICS OF A TRAIT LIKE THIS, THEN IT IS THE CASE, EVEN FOR SOMEBODY LIKE ME, THAT THE FIRST THING YOU DO FOR SURE IS CARRY OUT A GENOME WIDE ASSOCIATION STUDY WHICH IS WHAT WE DID. AND WHEN WE CARRIES OUT A GENOME WIDE ASSOCIATION STEADY WE FOUND A VARIANT, IL28B. IT ENCODES INTERFERON LAMPDA, A PROTEIN RELATED TO INTERFERON ALPHA WHICH IS USED IN THE STANDARD OF CARE TREATMENT. WE FOUND A VARIANT THAT HAS A DRAMATIC IMPACT ON WHAT HAPPENS WHEN PATIENTS ARE TREATED. SO HERE YOU SEE IN THE DIFFERENT RACIAL AND ETHIC GROUPS, INDIVIDUALS THAT ARE GROWN UP BY GENOTYPE. HERE IS THE GOOD RESPONSE YOU SEE, AND THE POOR RESPONSE, T.T. WHAT YOU SEE ON THE Y AXIS IS THE PROPORTION OF INDIVIDUALS IN EACH OF THOSE GENOTYPIC GROUPS WHO ARE CURED. THE LINGO IS SUSTAINED VERLOGICAL RESPONSE. WHAT YOU SEE HERE IS THAT FOR INDIVIDUALS THAT HAVE THIS HOMO ZYGOTE TYPE, MORE THAN 80% OF THEM WILL BE CURED BY THIS TREATMENT. WHEREAS ADVICE WITH THIS ONLY JUST OVER 30% OF THEM WILL BE CURED. WHEN YOU TALK TO CLINICIANS THAT ACTUALLY SEE THESE PATIENTS, MOST OF THEM AGREE THIS IS A DIFFERENCE THEY WOULD WANT TO TALK TO THEIR PATIENTS ABOUT. IF OTHER THING THAT'S STRIKING ABOUT THIS VARIANT, THE EXISTCY OF ITS -- CONSISTENCY. YOU SEE THE HOMO ZYGOTES DO MUCH BETTER AND THOSE OF AFRICAN ANCESTRY DO MUCH WORSE. WHAT HAPPENS IF YOU LOOK AT INDIVIDUALS WHO HAVE THE GOOD RESPONSE, HOMO SIGHINGIS GENOTYPE AND COMPARE THEM TO INDIVIDUALS OF EUROPEAN ANCESTRY THAT HAVE THE POOR RESPONSE? THESE INDIVIDUALS DO ON AVERAGE MUCH BETTER THAN THESE INDIVIDUALS. SO THIS REALLY STRONGLY EMPHASIZES THE IMPORTANCE OF ZEROING IN ON THE UNDERLYING GENETIC CAUSE OF THESE AVERAGE DIFFERENCES BETWEEN THE RACIAL AND ETHNIC GROUPS. MUCH MORE INFORMATIVE. AND AS PEOPLE WHO DON'T KNOW THE STORY MAY BE GUESSING, THE GOOD RESPONSE IS THE DOMINANT ALLELE IN INDIVIDUALS OF EUROPEAN ANCESTRY, THE MINORITY ALLELE IN INDIVIDUALS OF AFRICAN ANCESTRY. THAT DIFFERENCE, IN FACT, EXPLAINS MORE THAN HALF OF THE DIFFERENCE IN AVERAGE RESPONSE RATES TO AN INDIVIDUAL OF AFRICAN AND EUROPEAN ANCESTRY. SO THIS CAME OUT AFTER GENOME WIDE ASSOCIATION STUDY, CLEARLY THE RIGHT TOOL FOR THAT PROBLEM, IDENTIFYING A VARIANT THAT HAS AMBASSADOR BIGIOUS CLINICAL -- AMBIGUOUS CLINICAL RELEVANCE. I WOULD LIKE TO EMPHASIZE THAT THERE IS NOT ALGORITHMIC CLARITY IN WHITE YOU OUGHT TO DO. I THINK ALL YOU CAN REALLY SAY IS THAT THIS INFORMATION GETS INCORPORATED INTO A KIND OF OVERALL -- IF YOU HAVE A POOR RESPONSE GENOTYPE YOU YOU MIGHT DELAY TREATMENT FOR NEWER OPTIONS TO COME ONLINE. I DON'T KNOW WHAT THAT WARNING MEANS. BUT IT MEANS I CAN'T ADVANCE. I HAVE A DISCLOSURE, DUKE SO DUKE AND INVESTIGATORS ARE BENEFICIARIES OF A PATENT ASSOCIATED WITH THE DISCOVERY. I WANT TO MAKE THAT DISCLOSURE. BUT I DO WANT TO POINT OUT THAT IT'S NOT JUST MY PERCEPTION THAT THERE IS CLINICAL RELEVANCE T TEST IS OFFERED BY LAB CORE AND THERE WERE 1,000 OR 1200 TESTS BEING ORDERED A MONTH, SO THIS IS ACTUALLY BEING USED IN THE CLINIC. I HOPE IN THE END WILL PROVE TO HAVE BEEN BENEFICIAL IN THIS SETTING. WE CARRIED OUT ANOTHER GENOME WIDE ASSOCIATION STUDY USING THE SAME DATA. THE DATASET CAME FROM A TRIAL THAT WAS RUN BY SHARON CALLED THE IDEAL TRIAL. THIS WAS FOCUSED ON ONE OF THE IMPORTANT ADVERSE EVENTS ASSOCIATED WITH THIS TREATMENT, THAT IS, ANEMIA. A MAJORITY OF PATIENTS THAT UNDERGO THIS THERAPY HAS SIGNIFICANT DECLINE IN HEMO GLOBIN LEVELS, DUE PRINCIPALLY TO THE RIBAVIRIN COMPONENTS. SO WE ASKED WHETHER THERE WERE GENE VARIANTS THAT MEDIATED THE EXTENT TO WHICH INDIVIDUALS SUFFERED HEMOGLOBIN DECLINE. WE SIMPLY TOOK THE DECLINE IN HEMOGLOBIN OVER THE FIRST 4 WEEKS OF THERAPY, ASSIGNED THAT AS A QUANTITATIVE SCORE TO INDIVIDUALS AND CARRIED OUT A GENOME WIDE ASSOCIATION STUDY. WE FOUND, AGAIN, A GENOME WIDE SIGNIFICANCE ASSOCIATION, THIS TIME ON CHROMOSOME 20. HERE THERE WERE A SET OF VARIANTS THAT WENT UP TOWARD 10 TO THE NICHE 50. AGAIN -- NEGATIVE 50. NO QUESTION AT ALL THAT WE HAVE A REAL ASSOCIATION OF WHO SUFFERS ANEMIA. HERE IS A COMMENT ABOUT INTERPRETING GWAS. ALL THE VARIANTS ON THE CHIP THAT SHOWED ASSOCIATION WERE SITTING IN THIS OPEN READING FRAME. HERE ARE THE VARIANTS THAT HAVE THESE STRONG ASSOCIATIONS. YOU DON'T SEE ASSOCIATIONS NEARLY SO STRONG ANYWHERE ELSE. WE KNOW FOR SURE THAT THE OPEN READING FRAME IS INNOCENT OF ANY EFFECT WHATSOEVER ON ANEMIA. WHAT'S HAPPENING IS THAT THESE VARIANTS ARE MARKERS WE WERE ABLE TO WORK OUT FOR 2 FUNCTIONAL VARIANTS IN A NEIGHBORING GENE, ITPA, ENCODES [INDISCERNIBLE] AND THERE ARE 2 FUNCTIONAL VARIANTS IN THE GENE THAT ARE LOW ACTIVITY VARIANTS. IT TURNS OUT THAT THOSE 2 FUNCTIONAL LOW HANGIVITY VARIANTS ARE ASSOCIATED WITH THE MORE COMMON DISCOVERY VARIANTS. THEY'RE NOT ON THE GENE CHIP. SO ALL THAT'S HAPPENING, THESE MORE COMMON VARIANTS ARE REFLECTING THE EFFECT OF THOSE RARE FUNCTIONAL VARIANT. WE WERE ABLE TO PROVE THAT STATISTICALLY BY DOING THE OBVIOUS THING, TAKING THOSE 2 FUNCTIONAL VARIANT, GENOTYPING THEM, AND THEN ASKING ONCE YOU ACCOUNT FOR THOSE 2 FUNCTIONAL VARIANT, DOES THAT MAKE 2 INITIAL GENOME WIDE ASSOCIATION SIGNAL GO AWAY COMPLETELY? AND THE ANSWER IS YES. IN EVERY RACIAL AND ETHNIC GROUP. SO STATISTICAL GENETIC PERSPECTIVE, THOSE ARE SHOWN TO BE RESPONSIBLE. WE HAVE BEEN ABLE TO WORK OUT THE BIOLOGY BEHIND HOW THIS IS HAPPENING. I'M NOT GOING TO GO INTO THAT AT ALL, IT WOULD BE LENGTHY. IT HAS TO DO WITH HOW LOW ACTIVITY OF THIS ENZYME ALLOWS THE MAINTENANCE OF ATP LEVEL IN RED BLOOD CELLS GOOD SPITE THE EFFECT OF RIBAVIRIN ON KNOCKING DOWN ATP LEVELS. BASICALLY, THE ITP BUILDUP IS CAUSED BY A REDUCTION IN THIS ENZYME, ALLOWS THE BIOSYNTHESIS OF ATP TO CONTINUE DESPITE THE PRESENCE OF RIBAVIRIN. ANYONE INTERESTED, CAN SEE THE WORK JUST COMING OUT BY A POSTDOC, [INDISCERNIBLE], NAME IS INDICATED THERE. I THINK THIS IS A REALLY, TO ME, ENCOURAGING EXAMPLE OF HOW YOU CAN GET A POINTER OUT OF A GENETIC ASSOCIATION STUDY THAT REALLY LEADS YOU TO A BIOLOGICAL CHARACTERIZATION OF WHAT'S HAPPENING. SO I THINK THAT THAT REALLY MAKES CLEAR, AND I DON'T MEAN TO SAY THE EXAMPLES ONLY COME FROM OUR WORK. THERE ARE MANY EXAMPLES WHERE GENOME WIDE ASSOCIATION STUDIES HAVE FOUND CLINICAL USEFUL FINDINGS AND ALSO INSIGHTS TO BIOLOGY. I THINK IT'S FAIR TO SAY, AND AS I HAVE ARKED BEFORE, THERE WERE MANY TREATS WHERE THE VARIANTS RESPONSIBLE ARE NOT SUITABLE TO BE DEWE CANNOTED BY GENOME WIDE ASSOCIATION, BECAUSE THEY'RE TOO RARE TO BE EASILY DETECTED. NOW, FOR SOME TRAITS, AND I THINK THEY HAVE A PATTERN TO THEM, THE TRAITS THAT ARE MORE SUBJECT TO COMMON VARIATION, FOR SOME TRAITS, GENOME WIDE ASSOCIATION APPROACHES DESCRIBE A LOT OF WHAT'S HAPPENING. FOR A LOT OF TRAITS, THEY WON'T. THAT HAS LED TO A TREMENDOUS AMOUNT OF INTEREST IN DEVELOPING THE BASIC MACHINERY THAT WILL ENABLE US TO USE SEQUENCING STRATEGIES TO IDENTIFY RARER VARIANTS TO CONTINUE TO TRAITS OF INTEREST. I WOULD LIKE TO SPEND THE BULK OF THE TALK, TALKING ABOUT HOW WE'RE APPROACHING THAT RIGHT NOW AND WHERE I SEE THAT WORK GOING IN THE COMING YEARS. THE WORK THAT I'M GOING TO TALK ABOUT DEPENDS UPON THE OUTPUT OF A SMALL GENOME FACILITY COMPARED TO GENOME CENTERS THAT WE OPERATE AT DUKE, THE GENOME FACILITY IS RUN BY KEVIN, CURRENTLY CONSISTS OF 11 GENOME ANALYZER 2s WHICH ARE BEING PHASED OUT AND 6 [INDISCERNIBLE] JUST BEING BROUGHT INTO OPERATION. SO I WOULD SAY THAT'S RELATIVELY LARGE COMPARED TO PLACES THAT AREN'T GENOME CENTERS. I WOULD SAY MY INSTITUTION HAS BEEN VERY SURPRISED AT THE COST OF TRYING TO STORE THE DATA THAT WERE GENERATED. THERE ARE ONGOING DISCUSSIONS ABOUT HOW WE DO THAT. THE DATA COME OUT OF THIS FACILITY AND WE HAVE PUT A LOT OF ENERGY, SPECIFICALLY, INTO 2 BIOINFORMATIONIC AREAS IN ORDER TO WORK WITH THESE DATA. ONE THING THAT [INDISCERNIBLE] HAS DONE OVER THE PAST FEW YEARS IS TRIED TO DEVELOP WHAT YOU MIGHT CONSIDER A APPLE MACINTOSH TYPE APPROACH TO LOOKING AT THE VARIANTS THAT EMERGE FROM SEQUENCING DATA. SO I THINK THERE ARE A LOT OF WAYS THAT PEOPLE ARE THINKING ABOUT ANNOTATING THE VARIANTS THAT ARE PRETTY CONSISTENT. WE HAVE FOUND IT'S REALLY IMPORTANT FOR USERS TO BE ABLE TO INTERACT WITH THEIR DATA. TO BE ABLE TO LOOK UP THINGS IN GENES OF INTEREST, PATHWAYS OF INTEREST AND TO INTERACT WITH A REASONABLE TIMEFRAME. [INDISCERNIBLE] HAS SET UP WHAT I THINK IS AN EXTREMELY EASY TO USE USER INTERFACE THAT HAS A KIND OF BROWSER LIKE STRUCTURE TO IT FOR LOOKING AT THE VARIANTS THAT ARE IDENTIFIED IN EACH OF THE GENOMES THAT YOU'VE SEQUENCED AND MAKING COMPARISONS BETWEEN TYPES OF VARIANTS. IF ANYONE IS INTERESTED IT'S AVAILABLE AT THE WEBSITE THAT'S INDICATED. THE OTHER AREA WHERE WE HAVE BEEN PUSHING DEVELOPMENT A LITTLE BIT AS OPPOSED TO DEPLOYING THE METHODS THAT OTHERS HAVE DEVELOPED IS IN CALLING COPY NUMBER VARIANTS USING THE DATA EMERGING FROM THE NEXT GENERATION SEQUENCING. AND WHAT WE HAVE FOUND AS ALSO USING A QUITE DIFFERENT APPROACH, IS THAT WHOLE GENOME SEQUENCE DATA DOES AFFORD A TRULY STUNNING RESOLUTION FOR DEFINING THE COPY NUMBER VARIANTS LANDSCAPE. IF YOU'RE TALKING ABOUT DUPLIES OR DELETIONS OF GREATER THAN, SAY, 2 KILLA BASES IN SIZE, THOSE CAN BE HARD DOWN TO THAT KIND OF SIZE TO PICK UP ALTERNATIVE METHODS BUT TO READ THAT DATA GIVES YOU REALLY GOOD RESOLUTION ON THOSE COPY NUMBER VARIANTS. THE WAY WE HAVE SET THIS UP, WE MAKE USE OF THE ALIGNMENTS THAT COME FROM BWA THAT YOU USE FOR OTHER VARIANT CALLING. [INDISCERNIBLE] APPROACHES TO USE AN ALTERNATIVE SET OF ALIGNMENTS. WE HAVE AN APPROACH THAT USES THE BWA ALIGNMENTS THAT AFFORD VERY GOOD RESOLUTION, INCORPORATES A NOVELTY WHICH IS IT MAKES USE OF INFORMATION ABOUT WHERE HETEROZYGOTE SITES ARE CALLED IN THE GENOME, PRECLUDES THOSE BEING CALLED DELETED, MAKES IT PERFORM WE THINK BETTER. BUT IT'S QUITE CLEAR THAT YOU CAN GET VERY GOOD RESOLUTION ABOUT COPY NUMBER VARIATION IN THE GENOMES AND THAT REALLY, I THINK, LONG TERM IS GOING TO CONSTITUTE A VERY STRONG MOTIVATION FOR MOST OF THE SEQUENCING BEING DONE BY WHOLE GENOME AS OPPOSED TO WHOLE EXOME. IT'S NOT GOING TO BE AS HIGH RESOLUTION, SO AS THE SEQUENCING, WE'LL BE TURNING THAT DIRECTION. THE FIRST EFFORT WE MADE IN TRYING TO SEE WHAT KINDS OF THINGS COME OUT OF LARGE SCALE SEQUENCING STUDIES WAS REPORTED RELATIVELY RECENTLY IN PLUS GENETICS. I WANT TO EMPHASIZE ONE THING FROM THIS STUDY. THAT IS JUST THE SHEER MAGNITUDE OF GENETIC VARIATION THAT EMERGES FROM THESE STUDIES. I'VE GOT 2 PLOTS HERE THAT ILLUSTRATE THAT. WHAT I'M SHOWING HERE IS THE NUMBER OF GENOMES WE CONSIDER. ON THE Y AXIS, THE NUMBER OF NOVEL, SINGLE NEW CLEO TIED VARIANTS. FOR THE FIRST GENOME, IT'S A VARIANCE THAT ISN'T IN DATABASES. THIS IS AT THE TIME WE DID THE ANALYSIS, IT WOULD BE DIFFERENT NOW. A VARIANT THAT'S NOT IN ANY OF THE DATABASES. THE NEXT ONE IS A VARIANT THAT IS IN THIS NEXT GENOME, NOT IN ANY OF THE DATABASES, NOT IN THE PREVIOUS FELLOW THAT YOU SEQUENCED, AND SO ON. THE REAL STRIKING THING IS THAT AFTER YOU'VE ALREADY SEQUENCED 20 GENOMES, EVERY TIME YOU SEQUENCE A GENOME, YOU'RE SEEING MORE THAN 100,000 SINGLE NUCLEOTIDE VARIANTS THAT YOU HAVEN'T SEEN BEFORE. SOME OF THESE ARE NOT REAL BUT THE VAST MAJORITY ARE, SO IT'S A TREMENDOUS AMOUNT OF VARIATION EVERY TIME. BUT MORE THAN THAT, THERE IS A TREMENDOUS AMOUNT OF FUNCTIONAL VARIATION. WE TOOK ALL THE VARIANTS THAT WERE PREDICTED TO BE PROTEIN TRUNCATING AND HOMO HOMOZYGOUS, AND WE ASKED HOW MANY NEW GENES DO YOU LEARN ABOUT THAT APPEAR TO -- SO YOU PREDICT THE PROTEIN IS NOT MADE. WHAT YOU SEE IS THAT EVEN AFTER YOU'VE DONE 20 GENOMES, YOU STILL GET 3, 4, 5 SUCH GENES IDENTIFIED. I WANT TO EMPHASIZE THAT WE FOLLOWED UP SOME OF THESE. A PROPOSE OF THESE ARE NOT REAL. THE PROPOSE OF THESE VARIANTS IS HIGHER THAN A RANDOM VARIANT. IT'S UNLIKELY, EVEN MORE ENRICHED. THIS SHOWS THAT THERE IS A TREMENDOUS AMOUNT OF FUNCTIONAL VARIATION IN EVERY GENOME THAT YOU SEQUENCE. THAT IS GOING TO BE ONE OF OUR CHALLENGES IN TRYING TO IDENTIFY PATHOGENIC PARIENTS. YOU CAN'T JUST LOOK AND SAY THIS DOES SOMETHING QUITE TRACK, THAT IS NOT GOING -- DRAMATIC. THAT IS NOT GOING TO WORK. WILL SEQUENCING WORK? I WANT TO GIVE A COUPLE OF HE CANS THAT I HOPE WILL BE ON THE ENCOURAGING SIDE, MAKE SOME CAUTIONARY COMMENTS AND BACK TO I HOPE BEING ENCOURAGING. SO OUR OWN MOST RECENT STUDY EVALUATING WHETHER SEQUENCING CAN WORK IN THE SENSE OF FINDING PATHOGENIC MUTATIONS THAT YOU HAVE NO INFORMATION ABOUT BEFORE, COMES FROM A STUDY WE DID OF MICROCEPHALY. SO THIS IS WORK THAT WAS DONE IN COLLABORATION WITH COLLEAGUES FROM ISRAEL, [INDISCERNIBLE] IN PARTICULAR, RUN BY ELIZABETH IN MY GROUP. AND THE START OF THIS WAS THAT IN TELEEVEN, THEY HAD 2 FAMILIES THAT BOTH HAD WITH A MICROCEPHALY SYNDROME THAT THE CLINICIAN FELT WAS ANN NOT DESCRIBED BEFORE AND THE SAME. SO NATURALLY WE WANTED TO KNOW IN THAT SITUATION CAN YOU SEQUENCE AND FIND THE MUTATION THAT'S RESPONSIBLE FOR THE PRESUMABLY SAME MICROCEPHALY SYNDROME? HERE IS THE FIRST FAMILY. SO YOU CAN SEE THAT IT LOOKS LIKE IT'S AUTOSOMAL RECESSIVE. THERE ARE 2 EFFECTED THAT WE SEQUENCED. THAT'S A FETUS THAT WAS ABORTED ON THE PRESUMPTION THAT THE FETUS WOULD HAVE THE MICROCEPHALY. WHEN WE SEQUENCED BY WHOLE EXOME SEQUENCING, THESE 2 INDIVIDUALS, AND THEN LOOKED AT ALL THE VARIANTS WE FOUND AND COMPARED THEM TO CONTROL GENOMES, WE ONLY FOUND 3 VARIANTS THAT WERE HOMO SIGHING THE, FUNCTIONAL, AND HOMO ZYGOTE IN BOTH OF THE EFFECTIVE, ONLY ONLY. SO IN THAT FAMILY, WE HAD IT DOWN FROM THE EXOME SEQUENCING TO 3 CANDIDATES. THEN WE LOOKED IN THE OTHER FAMILY. AND SURE ENOUGH, ONE OF THOSE 3 VARIANTS IS HOMO GIG THE. WE, THEN, WENT ON TO TYPE ANOTHER 1,000 INDIVIDUALS FOR THAT VARIANT AND DID NOT SEE THE VARIANCE. WHAT I FIND PARTICULARLY ENCOURAGING ABOUT THIS EXAMPLE IS THAT WE HAD IT DOWN TO 3 CANDIDATES IN THE FIRST FAMILY. AND I THINK THIS MAKES A REAL STRONG CASE THAT EVEN IN ISOLATED GENETIC CONDITIONS, YOU HAVE SOME HOPE OF FINDING THE GENETIC CAUSE. I THINK THAT'S A REALLY IMPORTANT OBSERVATION. NOW, OF COURSE, WE HAD TO HAVE THIS FAMILY TO PROVE IT. BUT IT IS DOWN TO 3 CANDIDATES IN THIS FIRST FAMILY. AND BEING ABLE TO DO THAT REALLY DID DEPEND CRITICALLY ON THE FILTER OF ABSENCE IN CONTROL GENOMES. AND THAT'S ONE OF THE REASONS THAT THIS FRAMEWORK CAN WORK FOR HIGH PEN TRANSPLANT MUTATIONS, WHEN -- OPENETRATIONS WHEN YOU CAN FILTER IT THAT WAY IT'S EASIER TO FIND WHAT'S PATHOGENIC. THE GENE THAT HAS THE MUTATION HAS NOT BEEN IMPLICATED IN ANY MICROCEPHALY BEFORE. IT'S A NEW INSIGHT. IT'S IN THE AS PAR RAGING SYNTHETASE GENE. THE. -- OUR FIRST THOUGHT THE MUTATION WOULD EFFECT DESAMEMATIC ACTIVITY. WE SET UP AN EXPERIMENTAL ASSAY TO TEST THAT. BUT THE POSTDOC FOUND QUICKLY THAT THE MUTATION CAUSES A PROFOUND INSTABILITY OF THE PROTEIN AND SO WE COULDN'T GET ENOUGH PROTEIN TO BE ABLE TO CARRY OUT THE ESSAY, SO WE THINK WHAT CLEARLY IS HAPPENING IS YOU JUST DON'T MAKE VERY MUCH OF THIS PROTEIN WHEN YOU CARRY THE MUTATION. THAT'S WHAT CAUSES THE MICROCEPHALY SYNDROME. NOW IT WILL BE VERY IMPORTANT TO LOOK AND SEE WHAT THE EFFECT OF THIS IS IN A MOUSE KNOCKOUT. I HAVE WE ARE PERSPECTIVE ON ASKING THE QUESTION OF WHETHER SEQUENCING OUGHT TO WORK. AND THIS IS NOW NOT A REAL STUDY, BUT THIS IS AN EXPERIMENT. LET'S RETURN FOR A MINUTE TO -- LET'S RETURN FOR A MINUTE TO THE ANEMIA EXAMPLE, WHERE WE LOOKED AT THE QUANTITATIVE CHANGE IN HEMOGLOBIN FOR EACH. SUBJECTS. EACH HAVE A SCORE OF THE CHANGE IN HEMOGLOBIN. WE CAN ASK WOULD WE HAVE BEEN ABLE TO EASILY FIND THOSE MUTATIONS IF THEY HAD IS NOT SHOWN UP IN THE J WAS. WE GOT VERY LUCKY. THEY'RE NOT ON THE CHIPS. WE ONLY FOUND THE MARKER BECAUSE 2 VARIANTS HAPPENED TO ASSOCIATE JUST RIGHT. MAYBE WE HADN'T GOTTEN 0 SO LUCKY, WE COULD HAVE TRIED BY SEQUENCING. WOULD WE FIND IT? LET'S SET A SIGNIFICANCE THRESHOLD. LET'S AT THAT WE'RE GOING TO TAKE ALL THE NON SYNONYMOUS, TESTING FOR ALL THE VARIANTS. WE HAVE TO CORRECT FOR THAT. CALL THE SIGNIFICANCE LEVEL 10 TO THE NEGATIVE 6, 10 TO THE NEGATIVE 7. INSTEAD OF SEQUENCING EVERYBODY YOU'RE GOING TO SEQUENCE THE ENDS OF THE DISTRIBUTION. HERE IS HEMOGLOBIN CHANGE FOR ALL THE SUBJECTS IN THE STUDY THAT WE DID, THE GWAS STUDY. LET'S SAY YOU TAKE THE UPPER AND LOWER 5 PERCENTILE. THOSE WITH THE MOST CHANGE AND THOSE WITH THE LEAST CHANGE. SEQUENCE THESE, SEQUENCE THESE. THE COUGHS OF DOING THAT -- COST OF DOING THAT IS JUST OVER $100,000. SO THAT IS NOT AN EXPENSIVE EXPERIMENT. NOW, THE LABOR TO GENERATE THE VARIETIANTS, YOU HAVE TO HAVE PEOPLE THAT CAN DO SOMETHING WITH THE VARIANTS. BUT THE DIRECT REAGENT COST. WHAT HAPPENS WHEN YOU DO THAT? THE MOST IMPORTANT OF THOSE FUNCTIONAL VARIANTS THAT I DESCRIBED HAVE AP VALUE OF 10 TO THE NEGATIVE 13. IN A SIMPLE TASTE CONTROL COMPARISON BETWEEN THESE AND THESE. VERY EASILY FINDABLE BY SEQUENCING, NO QUESTION ABOUT IT. WHOLE EXOME SEQUENCING, HERE IS AN INTERESTING COMMENT. HAD WE NOT DONE A GENOME WIDE ASSOCIATION STUDY AND JUMPED TO HIT IT WITH WITHQUENCING, THIS WOULD HAVE BEEN A COMPLEX STRAIGHT THAT YOU CAN EASILY FIND THE GENETIC CAUSE FOR BY SEQUENCING. THAT REALLY IS AN ENCOURAGING EXAMPLE. MAKES YOU WISH IT WASN'T DISCOVERRABLE BY GWAS. BUT IT IS CERTAINLY POSSIBLE THAT SOME TRAITS WILL BEAMINABLE TO DISCOVERY. AS FAR AS I KNOW THERE ISN'T AN EXAMPLE OUT THERE. OKAY. SO I'M GOING TO TURN TO THE BASIC APPROACH WE'RE USING FOR COMPLEX TRAIT DISCOVERY. I WANT TO GIVE A LITTLE BIT OF BACKGROUND TO OURING AND LAUNCH INTO THE STUDIES WE'RE DOING. THE FIRST THING TO REALLY EMPHASIZE, WHICH IS A BIT DIFFERENT FROM GENOME WIDE ASSOCIATION, IS JUST HOW IMPORTANT IT IS TO BE CAREFUL WITH WHO YOU DECIDE TO SEQUENCE. IT REMAINS EXPENSIVE PER INDIVIDUAL COMPARED TO GENOME WIDE ASSOCIATION. YOU'RE GOING TO BE MAKING FINE DISTINCTIONS, YOU WANT TO MAKE REALLY SURE OF PHENOTYPES IN THE INDIVIDUALS YOU SEQUENCE. HERE ARE THE KINDS OF DESIGNS WE'RE THINKING ABOUT. ONE OF THE DESIGNS IS TO GET FAMILIES, SEEK OUT A NUMBER FROM THE FAMILIES, AND DEPENDING ON THE INFORMATION YOU HAVE IN THE FAMILY A GOOD DESIGN MIGHT BE FOR EXAMPLE TAKINGINGTANTLY RELATED COINFECTED, OR MAYBE JUST ONE INDIVIDUAL. BUT SEQUENCING ALLOWS YOU TO LEVERAGE COSEGREGATION DATA IN THOSE FAMILIES IN THE ANALYSIS. THE OTHER DESIGN THAT WE ARE THINKING ABOUT IS TAKING EXTREMES OF SOME KIND FROM THE POPULATION SO THAT THAT ENRICHES FOR CAUSAL VARIANTS, AS YOU CAN SEE HERE. AND THEN SEQUENCE THOSE EXTREMES, LOOK FOR THAT ENRICHMENT OF CAUSAL VARIANTS, BUT TAKE THOSE VARIANTS BACK OUT INTO THE LARGER POPULATION FOR CONFIRMATION OF THEIR ASSOCIATION WITH THE TRAIT. THOSE ARE THE 2 DESIGNS WE'RE TRYING TO MAKE THE MOST USE OF. BEFORE I MOVE INTO THAT, LET ME MAKE SOME GENERAL COMMENTS ABOUT WHAT I SEE AS THE RELATIONSHIP BETWEEN GENOME WIDE ASSOCIATION GOING FORWARD AND SEQUENCING. SO THE FIRST COMMENT IS IF YOU HAVE A NEW TRAIT, I DO TAKE THE VIEW THAT IT'S VERY SENSIBLE TO CARRY OUT A GENOME WIDE ASSOCIATION STUDY FIRST. FOR SOME TRAITS IT CAN BE A QUICKER ROOT TO UNCOVERING THE GENETIC CONTROL. AND I SHOULD JUST CLARIFY FOR THE RECORD, FOR ANY TRAIT THAT YOU'RE INTERESTED IN, YOU WOULD LIKE THERE TO BE GWAS DATA ALREADY. IT WAS VERY VALUABLE TO GO THROUGH THE GENERATE THE DATA. IT'S A SEPARATE QUESTION OF HOW MUCH GWAS DATA YOU NEED. BUT WHETHER YOU DO A CREDIBLE OR GIVEN TRAIT, I THINK THAT'S NOT SOMETHING THAT REASONABLE PEOPLE CAN DISAGREE ABOUT. BUT FOR MOST CONDITIONS WHERE THERE ALREADY BEEN A GWAS DONE OF REASONABLE SIZE, I DON'T THINK THAT GWAS PRACTICED ALONE IS AN IMPORTANT DRIVER OF DISCOVERY ANYMORE. THE REASON IS THAT IF YOU HAVE A CONDITION WHERE YOU HAVE LOTS OF SIGNALS AND YOU'RE HAVING TROUBLE TRACKING DOWN THE CAUSES, ADDING MORE SIGNALS, I THINK, IS NOT REALLY WHERE WE WANT TO CONCENTRATION OUR ATTENTION. WE WANT TO CONCENTRATION MORE ON FINDING THE CALLS OF THE SIGNALS. AND SO THAT MEANS THAT I THINK WE'LL BE INTERPRETING THE GWAS SIGNALS ALONG WITH THE SEQUENCE DATA TO TRY TO FIND THE CAUSES OF THE GWAS SIGNALS. I'D ALSO LIKE TO EMPHASIZE THAT THAT COMBINED USE OF GWAS AND SEQUENCE DATA IS PARTICULARLY IMPORTANT, GIVEN THAT GWAS IS A VERY ACCURATE AND WELL UNDERSTOOD EXPERIMENT AND SEQUENCING IS NOT. SOMEONE HAS TO BE ABSOLUTELY CLEAR ABOUT THAT. WE'RE REALLY GOOD AT DOING GENOME WIDE ASSOCIATION STUDIES, REALLY, REALLY GOOD AT IT. SO WE HAVE THE GENOME TYPING CRASSRY RIGHT THERE WHEN IT'S DONE CAREFULLY. WE'VE GOT IT RIGHT WHERE WE WANT IT TO BE. WE KNOW WHAT THE ARTIFACTS ARE. WE KNOW HOW TO CONTROL FOR POPULATION STRATIFICATION. WE KNOW HOW TO CORRECT FOR THE NUMBER OF HEALTH DISPARITIES WE'RE ALLEGATIONS OF ABUSE -- HIGH BOTH SEES WE'RE TESTING. THAT'S WHY WE WANT TO THINK ABOUT DOING THEM IN TAN -- TANDEM. AND WITHOUT THINKING, LOADING UP ON THE COMMON ONES WITHOUT TAKING A VIEW ON THE PROPORTION, I THINK IT'S CLEAR THAT SOME PROPORTION OF THE SIGNAL MUST BE DUE TO THIS EFFECT AND IF SO, THEY PROVIDE A GOOD POINTER, LOOKING FOR SOME OF THE IMPACT RARER VARIANTS. WINE I SEE THE VOLOF USING THESE 2 TYPES OF APPROACHES TOGETHER. I'D ALSO LIKE TO EMPHASIZE A LITTLE BIT FROM OUR OWN EXPERIENCE, JUST HOW -- I GUESS DANGEROUS IS THE RIGHT WORD, SEQUENCING IS FROM THE PERSPECTIVE OF FALSE POSITIVES. AND I REALLY WANT TO MAKE THAT POINT IN THAT I THINK ONE OF THE UNAPPRECIATED CONTRIBUTIONS OF GENOME WIDE ASSOCIATION WAS TO REALLY CLEAN UP THE LITERATURE AND REDUCE ATTENTION TO A WHOLE BUNCH OF FALSE POSITIVES THAT REALLY DID PLAYING A LOT OF FIELDS. THERE WAS SOME DISCIPLINES WHERE THE VAST MAJORITY OF GENETIC DISCOVERIES ARE SIMPLY UNTRUE. WHEN YOU LOOKED AT THE GENOME WIDE ASSOCIATION PROJECT CAREFULLY, THAT BECAME CLEAR. THAT REPRESENTS A TREMENDOUS WAIST OF EFFORT. AND. >> WAS REALLY DID, I THINK, HELP TO CLARIFY WHAT WAS REAL ASSOCIATION AND WHAT WASN'T. IT REPRESENTS A BETTER ASSOCIATION STUDY THAN CANDIDATE STUDIES IN EVERY SENSE. AND SO I THINK IT'S REALLY CRITICAL WE DON'T LET SEQUENCING GO DOWN THAT OLD OLD BEING FLOODED WITH FALSE POSITIVES. BUT IT WOULD BE VERY EASY TO DO. HERE IS AN EXAMPLE WHERE WE'RE SEQUENCING SCHIZOPHRENIA GENOMES AND WE PUMP INTO A MUTATION THAT INTRODUCES A PREMATURE STOP IN A DOPAMINE RECEPTOR. WE HAVE A FUNCTIONAL VARIANT IN A GENE THAT IS OF CLEAR RELEVANCE TO SCHIZOPHRENIA BY REALLY JUST ABOUT ANYBODY'S DEFINITION. AND COULD WE HAVE MANAGED TO PUBLISH THAT IF WE WANTED TO? YOU BET YOU COULD PUBLISH THAT. BUT YOU ALSO KNOW THESE THINGS HAPPEN. SO WE DECIDE TO SEQUENCE IT, AND THEN WE SHOW WE CAN ACTUALLY SEE THE VARIANT IN SOME MORE CASE BUT WE ALSO SAY IT CROPPING UP IN CONTROLS. BUT MORE THAN THAT, WE ALSO WHEN WE LOOK AT THE NEXT GENERATION SEQUENCE DATA AND THE SANGER SEQUENCE DATA, AT THE END OF THE DAY WE CAN'T BE ABSOLUTELY 100% POSITIVE THAT THERE IS A VARIANTS THERE. SOMETIMES IT CAN BE DIFFICULT TO DETERMINE. SO AT EVERY STAGE YOU'VE GOT PROBLEM. GENOTYPING ACCURACY AND STORY TELLING. SO I THINK IT'S ALL IMPORTANT WE ESTABLISH A REALLY POWERFUL FRAMEWORK TO ESTABLISH WHAT IS CONVINCING ASSOCIATION AND ALSO TO PREVENT TOO MUCH OF THE FALSE POSITIVES COMING OUT AGAIN IN THE LITERATURE. AND SO I GUESS WHAT I'M REALLY SAYING, I ACTUALLY PERCEIVE THE POSSIBILITY, MAYBE THE LIKELIHOOD THAT THERE IS GOING TO BE A LITTLE BIT OF A WILD WEST MENTALITY OF PEOPLE REALLY FORCING ANSWERS OUT. SEQUENCE DATA ANY OLD WAY THEY CAN, MAKING UP A LOT OF STORIES AND LIKE HAPPENED FOR J WAS -- GWAS, I WOULD LIKE TO SEE THE SEQUENCING COMMUNITY COME TOGETHER ON HOW TO DO THIS RIGHT. THIS IS OUR THINKING HOW TO DO IT RIGHT. YOU YOU FIRST COMMENT, YOU DO NOT DECLARE A DISCOVERY BEFORE YOU SEPARATELY GENOME TYPE THE VARIANTS, SHOW IT'S ACCURATE, AND CONVINCING WITH A SEPARATE TECHNOLOGY, SO THE SEQUENCING IS DISCOVERY. YOU TAKE THE VARIANTS AND PROVE THAT YOU'VE ACCURATELY GENOTYPED THEM. MAYBE IN A FUTURE DATE, WE CAN SKIP THAT. I THINK THAT'S A WAYS DOWN THE ROAD. HERE ARE THE OPTIONS FOR FOLLOW UP RIGHT NOW. ACTUALLY PRETTY GOOD. IF YOU WANT TO DO SOME FEW VARIANTS THAT YOU'RE INTERESTED IN, I'LL TELL YOU ABOUT SOME WE HAVE DONE. YOU CAN [INDISCERNIBLE], THAT'S A SEPARATE TECHNOLOGY THAT IS OFTEN COMPLETELY CONVINCING, SOMETIMES NOT. >> IF YOU WANT TO DO SOME NUMBER OF HUNDREDS OF VARIANTS, THEN TECHNOLOGY IS BEAK EXPRESSED. IF YOU WANT TO DO THOUSANDS, APPROPRIATE TECHNOLOGY IS [INDISCERNIBLE], I SHOULD ALSO SAY THAT I DO NOT OWN SHARES. I JUST HAPPEN TO THINK IS THIS IS A GOOD APPROACH FOR THE FOLLOW UP. BUT THERE ARE PROBABLY OTHER COMPANIES OUT THERE THAT COULD PROVIDE RE REFOR SOME OF THESE. IF YOU'RE FOCUS IS ON A GENE, THERE ARE MULTIPLE VARIANTS THAT APPEAR ASSOCIATED AS OPPOSED TO A SINGLE VARIANT, THEN THERE IS A NEW TECHNOLOGY COMING OUT THAT IS REALLY EXACTLY PERFECT FOR THAT, WHERE YOU CAN TAKE ON THE EXXONIC REGIONS OF TO 30 OR 40 GENES, SO 400 EXXONS, SOME NUMBER OF HUNDREDS OR THOUSANDS OF SAMPLES. AND THAT COST MIGHT BE 50 OR $100 A SAMPLE. THIS IS THE FOLLOW-UP FRAMEWORK. I'M GOING TO TELL YOU ABOUT A FEW STUDIES WE HAVE DONE TO CARE OUT THESE STUDIES. THE FIRST ONE IS TAKING A PARTICULAR CLASS OF GENETIC VARIATION. AND SYSTEMICALLY TAKING ALL. VARIANTS THAT ARE IDENTIFIED FOR FOLLOW UP FROM THAT CLASS OF VARIATION. SO WE FOCUSED FIRST ON VARIANTS THAT INTRODUCE PREMATURE PROTEIN TRUNCATION. AND -- TRUNCATION. THE REASON WE DID THAT WAS TWO-FOLD. THESE ARE FUNCTIONAL VARIANTS AND THE OTHER REASON IS THAT WE KNOW FOR PSYCHIATRIC DISEASE, THAT COPY NUMBER VARIANTS, WHICH DO THE SAME THING, BUT TO MULTIPLE GENES IN HETEROZYGOTE FORM REDUCED THE EXPRESSION ARE ASSOCIATED WITH MANY PSYCHIATRIC DISEASES, THAT MAYBE SUGGESTIONS THAT MUTATIONS THAT ARE POINT MUTATIONSTATIONS CAN KNOCK DOWN THE EXPRESSION OF A GENE MIGHT BE RISK FACTORS. SO WHAT WE DID WAS COMBINE SAMPLES THAT WE WERE SEQUENCING IN SKITS -- SCHIZOPHRENIA AND EPILEPSY. AND WE LOOKED AT THE SEQUENCE DATA THAT WE GENERATED, LOOKED AT ALL THE STOP VARIANTS THERE, FOCUSING ON 114 GENOMES WE HAD SEQUENCED WITH SCHIZOPHRENIA, 9 6 WITH EPILEPSY, COMPARED TO 150 CONTROLS. ALL OF THE VARIANTS THAT INTRODUCED PREMATURE STOPS THAT WERE PRESENT IN HOMO ZYGOTE FORM WE TOOK FOR FOLLOW UP. AND THEN WE ALSO LOOKED AT A SPECIAL REGION OF THE GENOME. FOR EXAMPLE, REGIONS WHERE THERE WERE COPY NUMBER VARIANTS IMPLICATED ALREADY. WE TOOK THEM AND PUT THEM INTO A BEAD EXPRESSION, GENOTYPING ASSAY IN A LARGER NUMBER OF SAMPLES. SO THAT'S THE KIND OF INTERMEDIATE SCREEN TO SEE WHICH OF THE VARIANTS LOOK INTERESTING. THERE WERE 384 STOP GAINS THAT WE WERE FOLLOWING UP. THESE ARE GENOTYPED IN MORE CONTROLS IN MORE CASES. AND THEN WE FOUND IN THAT FOLLOW UP THAT 16 WERE PRESENT IN MORE THAN 2 CASES AND NO CONTROLS. AND THE REST OF THEM WENT AWAY BECAUSE WE ACTUALLY FOUND THAT THEY APPEARED IN CONTROLS. AND SO NOW WE'RE DOWN FROM THE 384 TO 16. AND WE FOLLOW UP THOSE 16 NOW WITH THE PACKMAN APPROACH AND REMAIN WITH 6 STILL INVARIANT. I DON'T KNOW THAT WE'RE ZEROING IN HERE. WE DON'T HAVE YET A P VALUE THAT IS CONVINCING. YOU HAVE TO SAY WHAT WAS YOUR ORIGINAL OPPORTUNITY SPACE. AND THE ORIGINAL OPPORTUNITY NASE DEPENDING HOW YOU DEFINE IT COULD BE ONLY THE STOP VARIANTS THAT ARE SEEN IN THE GENOMES OR IT COULD BE ALL THE, SAY,UNE EQUIVICAL FUNCTIONAL CATEGORIES, LIKE ESSENTIAL SPLICE. YOUR P VALUE MIGHT BE AROUND 10 TO THE NEGATIVE 5 OR 6. AND 24 IS WHAT WE'RE GETTING SO FAR. BUT NOW, OF COURSE, WE WILL CARRY OUT INTO FURTHER GENOME TYPING AND THE HOPE IS THAT THIS CHAIN WILL EVENTUALLY IDENTIFY SUFFICIENT EVIDENCE FOR ASSOCIATION. I THINK THAT THAT REALLY IS GOING TO BE ONE OF THE PRIMARY DESIGNS FOR DISCOVERY UNTIL WE CAN GET TO THE POINT OF REALLY DOING TENS OF THOUSANDS OF GENOMES. I THINK THIS BE A PRIMARY ENGINE OF DISCOVERY. THE OTHER COMMENT I'D LIKE TO MAKE IS THAT ONE OF THE BEAUTIFUL THINGS I THINK ABOUT HUMAN GENETICS, IT REALLY IS SUPPOSED TO TELL YOU ABOUT THINGS THAT YOU DIDN'T ALREADY KNOW ABOUT. AND FOR THAT REASON, WE HAVE TAKEN A VERY STRONG VIEW THAT YOU REALLY DON'T PICK GENES. YOU LET THE GENETICS SPEAKING FOR ITSELF. YOU TAKE WHAT GENES COME OUT AND WHATEVER THEY ARE. SO THAT IS VERY MUCH OUR PHILOSOPHY. BUT EVEN SO, I HAVE TO SAY THAT EVEN FROM THAT COMMITTED HUMAN GENETICS PERSPECTIVE, SOME OF THE GENES THAT YOU END UP WITH ARE REALLY PRETTY CRUMBY FROM THE PERSPECTIVE OF NEURO PSYCHIATRIC DISEASE. THIS ONE IN PARTICULAR, I DON'T KNOW IF ANYONE CAN READ THAT, TURNS OUT THAT THAT GENE IS ONE OF THE PRINCIPAL COMPONENTS OF HAIR. AND SO IT DOESN'T MAKE IT HIGH ON ANYBODY'S LIST FOR EPILEPSY OR SCHIZOPHRENIA RISK FACTORS. HOWEVER, IF YOU LOOK AT THE MOUSE BRAIN ATLAS IT TURNS OUT THAT THIS IS, INDEED, EXPRESSED IN THE MOUSE BRAIN. I HAVEN'T A CLUE WHAT THAT MEANS BUT I ACTUALLY WOULD NOT BEING FACETIOUS, I WILL TAKE THE VIEW YOU CAN'T PICK AND CHOOSE. YOU REALLY DO THE GENETICS AND SEE WHERE IT TAKES YOU. SOMETIMES IT SHOWS YOU SOMETHING YOU WERE NOT EXPECTING. SO THAT'S AN EXAMPLE OF FOLLOWING UP CLASSES OF GENETIC VARIANTS. WE HAVE BEEN ALSO BEEN THINKING ABOUT THE IDEA OF ZEROING ON ON PARTICULAR EXTREMES. I'LL TELL YOU ABOUT A TO. WE'RE PARTICULARLY INTERESTED IN THE IDEA THAT YOU CAN USE DRUG RESPONSE TO ZERO IN ON A MORE HOMO GENIUS SUBJECT SET OF PATIENTS. SO IN BOTH EPILEPSY AND IN SCHIZOPHRENIA WE'RE LOOKING AT PARTICULAR SUBGROUPS BASED UPON RESPONSIVENESS. IN IN EPILEPSY, THERE IS A DISTINCTION OF PATIENTS WHEREBY THE MAJORITY OF THEM RESPOND VERY, VERY WELL TO [INDISCERNIBLE] ACID. A MINORITY DO NOT HAVE THEIR SEIZURES CONTROLLED. SO WE HAVE TAPPED INTO OUR CLINICAL COHORTS IN EPILEPSY TO TRY TO BUILD A COHOLY SPIRIT OF REFRACTORY -- COHORT OF REFRACTORY EPILEPSY. SO FIRST ON THE JME SIDE. AS I SAID, A FRACTION OF INDIVIDUALS DO NOT HAVE THEIR SEIZURES WELL CONTROLLED. WE COMBINED SELECTION OF THOSE INDIVIDUALS FROM SPORADIC FAMILIES WITH AGAIN A FAMILY STUDY BEING DONE THROUGH NINDS FUNDING. AND COMBINING BOTH WHAT WE SEE IN FAMILIES WITH WHAT WE SEE FOR SPORADIC CASES. WE HAVE BEEN SEQUENCING JME PATIENTS. AND THEN, AGAIN, ALL THE VARIANTS THAT SHOW EITHER PRESENCE OR ENRICHMENT IN THE REFRACRY PATIENTS, WE'RE CARRYING INTO FOLLOW UP GENOTYPING AND LARGER COHORTS. HERE IS AN EXAMPLE OF SOME OF THE VARIANTS THAT HAVE BEEN TAKEN INTO FOLLOW UP GENOTYPING. THIS IS BASED UPON ENRICHMENT IN THE CASES, EITHER ON A RECESSIVE MODEL, OR IN REGIONS THAT SHOW LINKAGE FOR JME, OR IN FAMILIES AND THAT FOLLOW-UP GENOME TYPING FOR 9 VARIANT OF INTEREST SHOWED THAT 8 OF THEM WERE NOT INVOLVED AT ALL. SO WE COULD RULE THEM OUT. AND ONE OF THEM THEY WANTONED QUITE -- STRENGTHENED QUITE SHORTSTOPLY. IT'S POSSIBLE THAT THAT JEAN MAY BE A RISK FACTOR FOR JME. I DO NOT KNOW YET WHETHER THAT'S TRUE. I'LL NEED YET MORE FOLLOW-UP GENOTYPING BUT WE'RE PROBABLY ON A PATHWAY THAT WILL EVENTUALLY WORK. AND I GUESS THE ONE THING THAT I REALLY, REALLY LIKE ABOUT THIS CANDIDATE GENE, IT HAS NOTHING TO DO WITH HAIR. THIS IS A TRANSPORTING WHICH TRANSPORTS GLUTAMATE, PROBABLY. SO IT'S AN OKAY CANDIDATE GENE. BUT YOU FOLLOW WHERE THE GENETIC ASSOCIATION TAKES YOU. WHERE ARE WE IN TERMS OF P. VALUES? YOU THROW ALL THE CASES TOGETHER, YOU'RE ACTUALLY -- P. VALUE IS FINE, 10 TO THE NEGATIVE 7. THAT IS NOT OKAY TO DO. WE KNOW VERY WELL THAT POPULATION STRUCTURE CAN INFLATE THESE ASSOCIATION STATISTICS. SO YOU DON'T GET TO IGNORE THAT JUST BECAUSE YOU'RE LOOKING AT SEQUENCE DATA. IF YOU, THEN, TAKE THESE VARIANTS AND DO A PROPER CONTROL FOR STRATIFICATION IN THE SAME KIND OF WAY WE USED TO DO FOR GWAS, SO IN OUR CASE WE DID IT FOR EUROPEAN ANCESTRY, P. VALUE IS LOWER. AND WE'RE NOT WHERE WE NEED TO BE. I THINK ACTUALLY FOR THAT ONE WE MAY BE MOVING IN THAT DIRECTION. AND WE'RE DOING A VERY SIMILAR STUDY FOR TREATMENT RESISTANT SCHIZOPHRENIA. THIS IS TAPPING INTO A VERY LONG TERM SET OF COHORTS THAT HERB MELTER HAS BEEN DEVELOPING WITH VERY WELL CHARACTERIZED TREATMENT SCHIZOPHRENIA AND TAPPING INTO COHORTS BEING DEVELOPED WITH RO1 FUNDING AND ALSO A COLLABORATOR OF HOURS AT McLANE HOSPITAL, DEBORAH, SO THESE SOURCES ARE DESCRIBED HERE. IS THE DESIGN IS WE'LL BE SEQUENCING AS MANY OF THE TREATMENT RESISTANT PATIENTS AS WE CAN. IT'S A COMBINATION OF WHOLE EXOME AND WHOLE GENOME, THEN TAKING THE VARIANTS OF INTEREST IN THE SAME DESIGN AS BEFORE. SO FAR WE HAVE ABOUT 100 OF THOSE COMPLETE. THE LAST 2 RELATIVELY EXTREME TRAITS OR VARIANCE TRAITS ARE DRUG INDUCED AND RESISTANCE TO INFECTION BY HIV. THE DRUG INDUCED EXAMPLE IS INTERESTING IN THAT THIS REPRESENTS A VERY DRAMATIC EXTREME FOR THE POPULATION. FOR SOME DRUGS IT MAY BE ONE OF 10,000, 50,000 OR FEWER EXPOSURES THAT CAN LEAD TO SERIOUS LIVER INJURY. SO WE'RE TALKING ABOUT A REAL MINORITY OF THE POPULATION. AND IT'S ENTIRELY POSSIBLE THAT THERE ARE IN THOSE CASES EXTREMELY STRONG GENETIC CONTROL THAT WOULD BE WOULD BE PICK UP IN SMALL SAMPLE SIZES. IF YOU, THEN, FOUND THE GENETIC CONTROL IT WOULD MAKE A REAL IMMEDIATE DIFFERENCE. YOU COULD, THEN, AVOID THE DRUG IN THOSE INDIVIDUALS THAT ARE SUSCEPTIBLE. THERE ARE INDICATIONS OF EXACTLY THAT KIND OF GENETIC CONTROL FOR DRUG INDUCED LIVER INJURY, FOR EXAMPLE, WITH [INDISCERNIBLE], HAVING SUCH A STRONG DEPENDENCE ON [INDISCERNIBLE] 5701. THIS IS REALLY ENCOURAGED US TO PURSUE SEQUENCING EFFORT AND DRUG INDUCED LIVER INJURY, LED BY TOM URBAN IN THE GROUP. AND WE HAVE BEEN ABLE TO ESTABLISH A COLLABORATION WITH A REALLYCEPTIONAL RESOURCES -- EXCEPTIONAL RESOURCE PUT TOGETHER BY THE DYLAN CONSORTIUM, AND THIS IS PARTICIPATING SITES. THIS IS A CONSORTIUM HEADED BY PAUL WATKINS. AND WE ARE NOW PURSUING A VARIETY OF SEQUENCING STUDIES IN PARTNERSHIP WITH THEM. THEY HAVE A TOTAL OF 1,000 PATIENTS WITH CAREFULLY CONFIRMED DRUG INDUCED LIVER INJURY, WHETHER THEY REALLY HAVE A HIGH DEGREE OF CONFIDENCE THAT THEY KNOW A DRUG THAT'S RESPONSIBLE. THERE ARE A MODEST NUMBER OF DRUGS WHERE THERE ARE LARGEST NUMBER OF PATIENTS EFFECTED AND A BUNCH OF DRUGS WITH ONLY A FEW INDIVIDUALS. SO THE DESIGN THAT WE HAVE BEEN FOLLOWING IS TO SEQUENCE ALL THE CASES WHERE THERE ARE MORE THAN 5 CASES FOR A DRUG. AND THEN CARRY OUT THOSE ANALYSES FIRST. THEN WE MAY GO BACK AN EVEN SEQUENCE WHERE THERE IS ONLY A FEW -- FEWER THAN 5 FOR EACH DRUG TO LOOK FOR COMMON RISK FACTORS ACROSS THE DRUGS. SO FAR WE HAVE DOPE SEQUENCING FOR ACID AND [INDISCERNIBLE]. VERY SMALL NUMBERS. AND SO WE DON'T HAVE ANYTHING DEFINITIVE YET. ALTHOUGH I SAW A VERY POSSIBLY GOOD POSSIBILITY WITH [INDISCERNIBLE] ACID. FINALLY, THE AREA WHERE THERE IS THE MOST PROMISE FOR NEAR TERM DISCOVERY, AND MAYBE ALREADY A DISCOVERY. THAT IS LOOKING AT THE GENETIC CONTROL OF WHO CAN AND WHO CANNOT BE INFECTED WITH HIV. THIS WORK IS LED BY KIM AND IT CAPITALIZED ON A COHORT OF INDIVIDUALS THAT WERE EXPOSED BETWEEN 79 AND 82, TO EITHER KNOWN OR SUSPECTED CONTAMINATED INFUSION FACTOR AS PART OF THEIR TREATMENT FOR HEMOPHILIA. THIS COHORT IS BEING ASSEMBLEED, AN NIH FUNDED NETWORK, TO STUDY THE IMMUNOLOGY. THERE ARE 500 HA HAD BEEN ASSEMBLED THAT HAD IF FUSIONS AND DID NOT BECOME HIV POSITIVE, ALTHOUGH THE MAJORITY DID. WE ARE MAKING USE IN THIS EFFORT OF ANOTHER COHORT THAT JIM GETTER ASSEMBLED AT THIS NCI MANY YEARS BEFORE. AND THAT COHORT PROBABLY HAS SLIGHTLY HIGHER OVERALL EXPOSURE RATES, BECAUSE JIM PUT A GREAT DEAL OF EFFORT INTO SELECTING THE VERY MOST EXPOSED INDIVIDUALS. THOSE INDIVIDUALS ARE AVAILABLE TO MIND THE GENOME FOR VARIANTS. WHAT I'D LIKE TO EMPHASIZE HERE IS THAT WE KNOW OF ONE GENETIC FACTOR THAT CERTAINLY CONFERS PROTECTION AGAINST HIV INFECTION. THAT'S HOMO DIZZY GOESTY FOR DELETION. SO IF YOU DON'T MAKE THE APPROPRIATE FORM, THEN MOST FORMS OF HIV CAN INFECT. BUT WE ALSO KNOW THAT THERE ARE OTHER PEOPLE, PROBABLY A MAJORITY OF PEOPLE THAT DON'T HAVE THAT, THAT CAN'T BE INFECTED. THEY ALMOST CERTAINLY HAVE SOMETHING YET TO BE FOUND GENTLY. THAT'S WHAT WE'RE TRYING TO DO. THAT'S THE GOAL, TO FIND THESE PROTECTIVE MECHANISMS. ONE OF THE REASONS WE'RE PARTICULARLY INTERESTED IN DOING THAT, IS THE IDEA THAT IF THE STRATEGY FOR CONTROLLING HIV WERE TO TURN MORE THAN TOWARD PROPHYLAXIS, NOT SAKE IT IT WILL, BUT THEN IT'S MORE DRUGS AND OPTIONS FOR PROPHYLACTICTREME, AND ONE ROUTE IS TO FIND DRUGS THAT WORK ON THE HOST. WE NEED TO FIND MORE NATURAL VARIANTS. GIVING CLUES TO DRUG DEVELOPMENT. SO FAR WE HAVE SEQUENCED 50 INDIVIDUALS THAT WERE EXPOSED FROM 79-82, NOT HIV POSITION. 48 AT THE TIME. NOW DONE 50. WE'RE COMPARING THESE EITHER TO WHOLE GENOME SEQUENCE THAT WE GENERATED IN THE LAB THAT ARE NOT EXPOSED OR WHOLE EXOME SEQUENCE AND LOOKING FOR VARIANTS THAT ARE ENRICHED. WHEN WE DID THAT, WE ACTUALLY FOUND SOME QUITE STRIKING ASSOCIATION IMMEDIATELY JUST IN THE SEQUENCING DATA. AND SO HERE ARE A FEW OF THE MOST INTERESTING ASSOCIATIONS THAT EMERGED FROM THE SEQUENCE DATA. SO WE HAVE VARIANTS BASED ON THE DATA ACHIEVE SIGNIFICANCE QUITE READILY. IN 2 CASES THESE VARIANT ARE IN GENES THAT CAME IN IN AN RNAi SCREEN FOR GENE PRODUCTS THAT NOONENING REPLICATION OF NIH. YOU DON'T KNOW ANYTHING UNTIL YOU FOLLOW UP GENOTYPES. I'M NOT CLAIMING IT'S REAL. WE NEED TO FOLLOW IT BUT THIS LOOKS VERY ENCOURAGING AT THIS STAGE. WE'RE NOW IN THE PROCESS OF FOLLOWING UP THESE VARIANTS AND OTHER VARIANTS. ONE POINT I REALLY DO WANT TO EMPHASIZE, IN DOING ANY OF THIS KIND OF WORK IT'S ABSOLUTELY CRITICAL TO HAVE REALLY GOOD FOLLOW UP RESOURCES TO CONFIRM THE EFFECT OF THE VARIANCE. AND IN OUR CASE FOR HIV, WE'RE IN A TERRIFIC POSITION BECAUSE OF THE COHORT AND BUILDING GOING ON. THE WAY TO FOLLOW THIS UP IS TO JOBO TYPE VARIANTS OF INTEREST, AND FURTHER, TO -- WE HAVE AROUND 500 OF THE EXPOSED INDIVIDUALS, ALL INDIVIDUALS EXPOSED THROUGH SEXUAL CONTACT. WE CAN LOOK AT THE VARIANTS IN THE GENERAL POPULATION. AND THE EXPECTATION, OF COURSE, IS THAT THERE WILL BE AN EASTBOUND -- ENRICHMENT IN THE EXPOSED BUT NOT INFECTED INDIVIDUALS RELATIVE TO THE GENERAL POPULATION. FINALLY WE CAN LOOK AT VARIANTS IN HIV POSITIVE COHORTS. THERE WE EXPECT THE VARIANTS TO BE DEPLETED. AND SO THIS WAS OUR ALREADY, IN OUR OWN GROUP, THE NUMBERS THAT WE HAVE AVAILABLE FOR THIS FOLLOW UP EFFORT. THIS IS OVER 500. SO WE REALLY ARE HERE IN A POSITION TO BE ABLE TO CONFIRM ANYTHING WE FIND. THE SEQUENCING HAS TO GIVE US THOSE OPTIONS. ONE COMMENT HERE, I THINK THIS REALLY ILLUSTRATES THE IMPORTANCE IN DISCOVERY GENETICS OF GROUPS REALLY WORKING TOGETHER. I THINK WHAT WE DON'T WANT TO SEE IS ONE GROUP TRYING TO GET EVERYTHING THAT HE POSSIBLY CAN OUT OF SEQUENCING 150 INDIVIDUALS, BUT NOT HAVING MANY FOR FOLLOW-UP, ANOTHER WITH 200, SO ON. AS FAR AS IS POSSIBLE AND REASONABLE, THIS WORK DOES WORK BEST WHEN IT'S PURSUED TOGETHER AND YOU CAN ACTUALLY HAVE THIS KIND OF A SIZE, FOLLOW-UP COHORT FOR CONFIRMATION THAT WILL ALSO REDUCE THE NUMBER OF FALSE POSITIVES THAT ARE CONTRIBUTED TO THE LITERATURE. SO THAT'S REALLY WHAT I WANTED TO TALK ABOUT. I'D LIKE TO END WITH SOME COMMENTS ABOUT WHERE I SEE THINGS GOING. ONE OF THE THINGS, I THINK, THAT'S REALLY CRITICAL TO APPRECIATE IN THE WAY GENETICS IS MOVING IS THAT WHEN WE FIND VARIANTS THAT HAVE RELATIVELY HIGH IMPACT ON THE TRAIT, IT'S REALLY IMPORTANT TO BE ABLE TO LOOK AT THE SUBJECT THAT CARRIES IT AND THE SUBJECT'S FAMILY, TO WORK OUT WHAT THAT VARIANT DOES. SO FOLLOW UP PHENOTYPING BECOMES MORE IMPORTANT. THIS IS VERY COMMON IN MENDELIANIAN GENETICS LESS SO IN THE AREA OF GENOME WIDE ASSOCIATION STUDY. NOW WE NEED TO GO BACK IN THAT DIRECTION AND WE WANT TO BE SEQUENCING FOR DISCOVERY FOR COMMON DISEASES. INDIVIDUALS THAT ARE STILL AVAILABLE FOR FURTHER EVALUATION, THAT'S REALLY IMPORTANT. WE HAVE RETURNED TO CLASSICAL HUMAN JOKES. ANOTHER CRITICAL QUESTION WE HAVE TO FACE AS EARLY AS POSSIBLE IS THE EXTENT TO WHICH CAUSING MUTATIONS THAT WE IDENTIFY CLUSTER INTO COMMON PATH WEIGHS. ONE OF THE CONCERNS ABOUT HOW TO PERSONALIZE MEDICINE, IF A LOT OF THE ACTION OR RARE VARIANTS, THAT EVERYBODY HAS A DIFFERENT CAUSE OF DISEASE. TO A CERTAIN AGREE THAT WILL HAPPEN, TO A CERTAIN DEGREE. BUT IT'S ENTIRELY POSSIBLE THAT THE DIFFERENT MUTATIONS THAT INDIVIDUALS HAVE WILL BE CLUSTERABLE INTO COMMON PATHWAYS. THAT WILL BE A REALLY IMPORTANT AREA 067 FOCUS. THE FINAL COMMENT, TO THE EXTENT WE START IDENTIFYING RELATIVELY HIGH IMPACT RARE VARIANTS, I THINK IT WILL BE POSSIBLE TO MAKE MORE MEANINGFUL DISTINCTIONS AMONGST INDIVIDUAL IN TERMS OF RISKS THEY FACE FOR DISEASE. THAT MIGHT START TO HAVE IMPLICATIONS FOR PROPHYLAXIS. YOU CAN IMAGINE IF YOU COULD PREDICT WHAT INDIVIDUALS WILL DEVELOP SCHIZOPHRENIA YOU MIGHT START TREATMENT OR EVALUATE TREATMENT OPTIONS. SO I THINK WE'RE IN A VERY EXCITING TIME. THERE WILL BE A TREMENDOUS AMOUNT OF SCORCHRY WITH SEQUENCING, WE HAVE TO BE EXTREMELY CAREFUL ABOUT HOW WE DEPLOY IT TO MAKE SURE WE GET IT AS RIGHT AS WE CAN MOVING FORWARD. THAT'S WHERE I WANTED TO END AND THANK THE PEOPLE THAT HAVE BEEN WORKING WITH ME ON THE VARIOUS PROJECTS. I THINK THAT I MENTIONED, I HOPE, MOST OF THEM. THE OTHERS ARE INDICATED ON THE SLIDE. THANK YOU VERY MUCH FOR YOUR ATTENTION. >> WE HAVE TIME FOR 5 MINUTES OF QUESTIONS AND YOU CAN JOIN US FOR A RECEPTION AT 4:00. THERE IS CERTAINLY TIME FOR QUESTIONS FROM THE AUDIENCE. I'LL START WITH THE FIRST ONE. SO ONE THING I WASN'T SURE I FOLLOWED WHEN YOU WERE TALKING ABOUT ONE OF THE STORIES ABOUT SCHIZOPHRENIA, YOU SAID YOU HAD DATA FROM PLATFORMS AND SANGER BASED. YOU SEEMED UNCLEAR ABOUT WHAT THE TRUTH WAS. >> FOR THIS ONE VARIANT IN DRD5 IT TURNS OUT THAT THE NEXT GENERATION SEQUENCE DATA IS MESSY, AND SANGER IS A BIT MESSY. >> IS IT JUST PART OF THE. JUST PART -- IT'S HARD TO SEQUENCE SEQUENCE CLEANLY. I THINK WE'LL EVENTUALLY GET IT CLEAN ENOUGH SEQUENCE. RIGHT NOW, WE DON'T. AND IT REALLY JUST -- I MEAN IT DOES EMPHASIZE THAT IN THE GWAS WE WERE FOCUSING ON VARIANT THAT WE KNOW HOW TO TYPE WELL. NOW WE'RE DEALING WITH VARIANTS THAT WE DON'T KNOW HOW TO TYPE WELL. >> DID THE GENOTYPING CHIP GIVE YOU A ACCURATE CALL AT THAT POSITION? >> BECAUSE THAT VARIANT IS NOT KNOWN TO ANYBODY IT'S NOT ANY CHIP AT ALL. >> GOT IT. OKAY. >> QUESTIONS? NO QUESTIONS FROM THE AUDIENCE? IF NOT WE CAN START THE RECEPTION EARLY. A RECEPTION JUST A FEW YARDS DOWN IN THE HALL. >> THANK YOU VERY MUCH. [APPLAUSE]