>> Paul Wakim: Hello,
I'm Paul Wakim.
I'm chief of Biostatistics
and Clinical Epidemiology
Service
at the NIH clinical center,
and this is the second segment
on hypothesis testing.
And in this segment,
we're going to talk
about p-value
and Bayesian approach.
And we're going to start
with sort of a little bit
of fun,
an experiment.
So here's what we're going
to do.
I've got two coins.
This coin actually
has two heads.
It actually does
have two heads.
You can get those online
for something like $7.
But it has two heads,
and this coin is a regular coin.
You can't see them
because they're too far away,
but this is a regular coin,
has head on this side,
tail on this side.
This one has two heads.
So, two coins. I'm going
to put them in my hands
and I'm going to let you
pick one, one hand.
Okay?
So, you pick one hand. Okay?
So, I put the other one aside,
and this is the coin
I'm going to be tossing.
Okay?
You don't know --
I'm not going to tell you
which coin you picked,
but that's the coin
I'm going to be tossing.
So, what we're going to do is,
I'm going to do an experiment
where I toss the coin,
and I come back and tell you
the result of the tossing.
Of course, you don't know
the true state,
and this mimic --
trying to mimic
scientific research.
You don't know the true state,
but you see data.
And based on data that you see,
you make inference about
the coin that you are tossing.
So, I'm going to toss this coin,
and the question is,
given the data,
which coin am I tossing?
The null hypothesis, sort of the
default, it's the regular coin.
It has head on one side
and tail on the other side.
And the alternative hypothesis,
it's that it's the other coin,
it's the two-headed coin, okay?
And so, we have equipoise.
What does that mean?
It means that the coin is
as likely to be the regular coin
as the two-headed coin,
because you picked,
and you had
50/50 chance of getting --
selecting one or the other.
Okay, so, again, the experiment,
I'm just putting
sort of a summary
of what we're doing here.
We have an experiment,
we have two hypotheses.
The null hypothesis is that
it's the regular coin,
the alternative is that
the two-headed coin.
But before I start --
so I'm going to do
my experiment in my lab.
I'm going to come back to you
and say, this is the result.
And you're going to
have to decide.
Are you tossing the regular coin
or are you tossing
the two-headed coin?
But, think of it this way,
your decision
is extremely important,
and it has major, major
public health ramifications.
So, you know,
either mistake is a hug mistake,
has big public health
ramifications. That's one.
Two, the sooner
a decision is made, the better.
People are dying,
and so, you know,
we need answers
as quickly as possible.
And three, every toss,
every experiment
costs a lot of money
and takes a lot of time, okay?
I'm trying to basically parallel
real scientific research
with this tossing of a coin.
So, important that you make
the right decision,
you need to make it
as fast as possible,
and every experiment is
basically costing a lot of money
and take a lot of time, so.
So, I go to the lab, I toss it
once, I come back to you,
and I say I tossed it once,
and I got one head.
And the question is,
are you ready, at this point,
to make a decision?
Are you -- would you reject H0?
Would you conclude
that it's the two-headed coin
because I got head
out of one toss?
Chances are that you wouldn't,
it's just way too early.
So forget the first experiment.
This is important,
forget the first experiment.
Let's say,
when I went to my lab,
I tossed it twice,
and I came back and I told you
I tossed the coin twice,
and I got two heads. Okay?
So, would you now reject H0?
Would you conclude now
that it is the two-headed coin?
Again, forget the first two
experiment,
completely forget them,
it didn't -- they didn't happen.
I went to the lab, I tossed it
three times, and got three head.
Are you ready to now conclude
that I'm tossing
the two-headed coin?
And same thing with four head --
four tosses four heads,
five tosses five heads,
six tosses six heads,
and seven tosses seven heads.
So, you can pause the video
right now,
and stop and say, okay.
Ask yourself, how many heads
would it take for me --
how many tosses and how
many heads would it take
for me to say, okay,
now I'm ready to make a decision
or the conclusion that I am
tossing the two-headed coin?
How many heads would it take?
Ask yourself that question.
And then, now, when you
come back from the pause,
look at this table and says,
well, which one did you pick
and what is
the corresponding p-value?
What is the p-value?
The p-value is basically saying
if I were tossing
the regular coin,
what is the chance of getting
one head out of one toss?
It's 0.5. Now we're talking
about two tosses.
If I were tossing
the regular coin,
what is the chance
of getting two heads?
0.25. If I were tossing
the regular coin,
what is the chance of getting
seven heads out of seven tosses?
0.008.
Okay, well the data is the data.
The data -- we can't say,
oh, but it's --
you didn't get,
really, seven heads.
I did. I did get seven heads.
So, if the data is the data,
and the probability
of getting the data
under an assumption is very low,
what are you going to question?
You're going to question
the assumption.
So, the whole thinking is
if the probability
of getting seven heads
out of seven tosses is 0.008
if I were tossing
the regular coin,
probably I'm not tossing
the regular coin.
It seems like I'm not
tossing the regular coin.
So, you question the assumption
of tossing the regular coin.
And that's what the p-value is.
And I'm going to go over the --
sort of the more
formal definition.
But the interesting part that,
to -- for yourself,
is this corresponding p-value,
the one that you decided --
you will decide
this is when I decide
that I'm tossing
the two-headed coin, what --
how does it compare
to this usually used,
the commonly used p-value
of 0.05?
Okay, so let's kind of think
about the whole thing.
So, we have a hypothesis.
The non-hypothesis,
it's the regular coin,
the alternative
is the two-headed coin.
We did seven tosses,
got seven heads.
P-value is 0.008.
Conclusion, we reject H0.
Okay, now, have we proved
that it's the two-headed coin?
No, it could still be that
I'm tossing the regular coin,
and it just,
by chance, happened --
seven heads occurred.
We haven't proved
that it's two-headed coin.
Is 0.008 the likelihood that
the results are due to chance?
No, it is the chance.
It's not the likelihood that
the results are due to chance,
it is the chance
of getting what we got.
Is 0.008 the probability
that it's the regular coin?
No, it is not.
Is one minus
0.008 the probability
that it's the two-headed coin?
No.
The reason I'm putting these is
because that's --
these are all the
misinterpretations of p-values,
and they are pretty common.
These all -- these
interpretations are incorrect.
So, what is the correct one?
0.008 is the probability
of getting seven heads
if it is the regular coin,
not more, not less.
That's all it is,
and all the other
interpretations are incorrect.
So, what's the formal definition
of a p-value?
It -- the p-value
is the probability
of obtaining
a result as extreme --
by extreme, we mean away
from the null hypothesis,
or more extreme than the one
obtained if H0 is actually true.
Now, let's talk about
the Bayesian approach.
Bayesian is --
the Bayesian approach
is actually very intuitive.
It's really how we think about
things, how we go through life.
We basically know very little,
and then we observe.
And then as we observe,
we update our beliefs.
And then we observe some more,
and we update our beliefs again.
And so, we move through life
like that, we just learn,
learn, learn,
and update our thinking.
Bayesian
is has the same logic.
You start with a prior belief,
what you think is happening.
You don't know,
that's what you think.
It could be very loose, vague,
but that's what you think.
And then you get data,
and you use the data
and the prior belief,
you put them together
to get a posterior belief.
In other words,
your updated belief.
So, you have an old belief,
combine it with data,
and now you have
an updated belief
called the posterior belief.
But then the posterior belief
becomes prior belief.
So, as you're moving,
you're learning and updating,
and learning and updating.
So the posterior belief
becomes the prior belief.
And then you get new data,
and then you update,
and you get a posterior belief,
which becomes a prior belief.
More data, posterior belief.
And then -- and so on, until --
in life, you never stop,
but in clinical trials
and in biostatistics,
you do --
biomedical research,
you do stop when there's enough
posterior belief
to conclude efficacy
or futility.
And this can be
at a patient level.
So, every step that
I'm showing you on the slide
could be at the patient level.
So, every patient comes,
and you update.
Or, it could be
with group of participants
in a clinical trial.
So, the first 10 you update,
and after that, another 10,
and another 10.
Or, it could be clinical trials.
The first clinical trials,
you get data, it becomes prior.
You have another clinical trial,
and you get data and you update.
So, the difference
between the Bayesian
and the Frequentist approach
is that the Bayesian think
of the probability
of getting a head,
p, as a variable.
So, to say it's a regular coin,
what does it mean?
It's the same thing as saying
that the probability of getting
a head is one half, right?
I mean, if tell you I have,
in my hand, the regular coin,
it's the same thing
as if I told you the probability
of getting a head is one half,
p is one half.
Regular coin and p is one half,
are the same thing.
And the two-headed coin
and p is equal to one,
is the same thing.
So, we're talking
about the same.
So, Bayesian approach sees p
as a random variable.
The Frequentists say, no, no.
Once you decide --
once you pick the coin,
and that's the one you're
tossing, there's no probability,
there's no distribution,
there's no chance.
You either have the regular coin
or the two-headed coin,
there is no probability.
The Bayesians say, well,
but yes, I picked it,
and it's determined,
but I don't know what it is,
and therefore I'm going to
put the probability to it.
So, saying what is the chance
that it's the regular coin,
is the same as saying, what is
the chance that p is one half.
And saying, what is the chance
that it's the two-headed coin,
is the same as saying, what is
the chance that p is one.
It's the same thing.
So, if we start
with the prior belief
that both coins
are equally likely --
remember, we selected
one of the two coins randomly,
so we start -- the prior belief
at the very beginning
is that it's basically the --
and you may not be able
to see this,
but you'll get the sense of it
when I complete the slide.
But remember, p is one half
is the same as saying
it's the regular coin,
p is equal to one is the same
as the two headed coin.
And at the beginning,
p is one half.
We have a regular coin,
1.5 chance
of having the regular coin,
and 0.5 chance of having --
of tossing the two-headed coin.
So p, here, is a variable
with a distribution,
but it only can take two values,
one half or one,
because it's either the regular
coin or the two-headed coin.
So, what happens
in the Bayesian approach,
we're going to update
the distribution
of p as we're getting data,
and that's really the difference
between the two thinking.
So now, I'm tossing once
and I get head,
and I'm going to update
that distribution.
So, now, the chance that it is
the two-headed coin
has just increased a little bit.
And the chance that
it is the regular coin
has decreased a little bit,
with one toss.
And then I toss again,
and I get head,
and I update, and it goes up.
And again and again,
seven times.
So, you can see, after seven
tosses and seven heads,
the distribution of p now
is clearly
with the two-headed coin,
in favor of the two-headed coin.
So, you can see,
with every toss, we've updated.
Now, notice the difference
between the two situations.
I said in the first scenario
I was going back to the lab,
forgetting
the other experiments,
and tossing it five times.
Then going back, tossing it six.
Then going back,
tossing it seven.
Here, I'm doing it one
after the other, sequentially.
So, when you update
the probability,
these are the numbers.
So, with the first toss,
the probability,
the Bayesian
posterior probability
of the regular coin is 0.333.
And after the second toss,
it's 0.2.
After the third, et cetera.
And after the seventh toss,
the probability of the regular
coin is 0.008.
And I want to make it
very clear --
and I may repeat it again
in a minute --
very clear, in here,
we are talking about
the probability
that is the regular coin.
Remember, when we were talking
about p-value,
we said specifically,
explicitly,
it is not the probability
that it is the regular coin.
So, very different concepts.
So, what was actually wrong
with the p-value
is actually correct
interpretation here.
They're different concept.
They are calculated differently,
they are --
they have different
statistical definition,
mathematical definition.
So, they're completely
different concept.
It's not the same formula
that's giving you two answers.
No, different concept.
It's important
to distinguish the two.
So, the Bayesian Approach,
again,
the regular coin
is p is equal to one half,
two-headed coin
is p is equal to one.
If we start with
the prior belief that both coins
are equally likely,
after seven tosses
and seven heads,
seven tosses the experiment,
seven heads the data, we do say,
now, we're 0.8 percent sure
that it is the regular coin,
and it is 99.2 percent sure
that it is the two-headed coin.
Something we did not say,
we said it was wrong to say
this when we were talking
about the p-value.
So, I'm doing this
with hesitation,
because I'm going to put
the two tables side by side,
and there is a little bit
of a danger with that
because these two tables --
this is --
they're two tables
you've already seen,
one when we were talking
about the p-value,
one when we were talking about
the posterior probability.
They are very different tables.
The first thing to notice
in the --
on the left, it says,
number of tosses, on the right,
it says, toss number.
On the left, it is --
you remember,
we go to the lab, we do it once.
Then, we forget about the first.
We toss it twice,
we get two heads.
We toss it three times,
we get it three head.
And on the right,
the table on the right,
they Bayesian approach,
it's the toss number.
We do one, we update.
We do the second, we update.
We do the third, we update.
So, right there,
they're different.
And also, they're of course
different in the p-value.
The p-value is not
the probability
that it is the regular coin,
as we discussed.
In the Bayesian approach, it is.
So, you say, well, why am I
putting them side by side?
Because I think it's interesting
to see
that the numbers
are not that different,
even though they're different
concepts,
calculated differently.
But you can see that --
it's interesting to see that
the numbers are fairly close.
Again, remember, the p-value
is not the probability
that it's the regular coin.
I just want to emphasize this
as much as I can
because that's exactly
the misinterpretation
of the p-value.
So, these are the references.
So, in summary, the p-value
is the probability
of obtaining a result
as extreme or more extreme
than the one obtained
if there is no difference,
if, in fact, the reality
is that there's no difference.
And the Bayesian approach
is gaining popularity
as being more intuitive
and is definitely
worth considering.
My questions to you is to --
true or false?
Failing to reject
the null hypothesis
means that
the null hypothesis is true.
The p-value represents
the likelihood that the results
are due to chance.
The p-value is the probability
that the null hypothesis
is true.
True or false?
Thanks for watching.