>> Paul Wakim: Hello
I'm Paul Wakim.
I'm chief of Biostatistics
and Clinical Epidemiology
Service
at the NIH Clinical Center
and this is the fourth segment
on hypothesis testing
and in this segment,
we're going to talk
about subgroup analysis
and interaction --
very important topics
under hypothesis testing.
So, let's start
with what is a subgroup?
A subgroup is defined
based on baseline,
pre-randomization
characteristics.
This is important.
Baseline pre-randomization --
meaning pre-randomization
characteristic --
in other words,
it is highly unrecommended --
recommended against
creating subgroups
based on
post-randomization --
like adherence to a treatment
or retention.
It's really,
highly not recommended
to use those kinds of subgroups.
So, subgroup defined on baseline
pre-randomization
characteristics.
Examples of subgroups: sex,
race, ethnic group,
age group, socioeconomic status,
geographical location,
severity or disease or disorder
before randomization,
or comorbidity with other
diseases or disorders --
again, at baseline,
before randomization.
These are all examples
of subgroups.
So, what are subgroup analyses?
They focus on differences
in treatment effect
among subgroups
of trial participants.
So, now we're interested
in the differences
in treatment effect
between the subgroups.
So, if -- we're going to talk
about sex -- males and females.
Is the treatment effect
different for females
and for males?
That's really the question.
So, subgroup analysis can be
pre-specified
or not pre-specified.
Pre-specified means that you put
that question about the subgroup
in the protocol
or in the proposal.
And not pre-specified --
it means, you know,
it was an afterthought.
It was ad hoc.
You thought about it,
"Oh, why don't I do this
since I have the data?"
-- kind of thing.
There's a huge difference,
in my mind,
between pre-specified
and not pre-specified.
A huge difference
because one tells me --
or tells the reader --
that this is something
that you suspected --
that you were wondering
about upfront -- rather than,
"Oh, well we have all this data,
let's try this, let's try that."
So, it's an important
distinction in my own mind.
Typically, the trial sample size
is not based
on subgroup analysis.
Some key points about subgroup
analysis --
they are very important,
very important.
And I'm going to talk about
this --
but basically what it says is,
you know, you could have
an overall treatment effect,
but as soon as you go down
to subgroups you see
that there's a different story
depending on which subgroup
your talking about.
So, it is important
to try to try to see
if there are differences.
They are valuable
for generating new hypotheses.
So, again, because the sample
size and the experiment
is not designed
for subgroup analysis -- if --
most of the time it is not --
then they are not definitive,
but they are valuable
for generating hypotheses.
They should be based on
clinically plausible hypotheses.
So, you should ask yourself,
"Okay, let's try to see
if there are differences
in these subgroups?"
Well -- you have to have
some kind of a logic --
a reason to do that,
not just create subgroups
just to try things.
They should be pre-specified --
highly recommended.
They should be limited
to a handful
because you could say,
"Oh, I'm going to pre-specify
200 subgroup analyses."
Well, you should try to limit
the number of subgroup analyses
you're doing.
They should not be attributed
as definitive results.
Their limitation should be
clearly reported in manuscripts.
So, again, this is a little bit
of the modesty
and the uncertainty.
So, here's a table --
I hope you can see it clearly,
even though it's --
the characters are small.
I like this table
because it's very compact.
It has a lot of information
in a -- in one table.
And -- so, what you have on
the left is the subgroups.
So, it's doing
a subgroup analysis
but on several subgroups:
the sex, the age, whether
they have diabetes or not.
What this trial is about --
it's a multi-centered
trial in Spain
and the participants
with high cardiovascular risk
but no cardiovascular disease
at enrollment,
were randomly assigned
to one of three diets.
One is a Mediterranean diet
supplemented with extra-virgin
olive oil.
Another is a Mediterranean diet
supplemented with mixed nuts.
And the third diet
was just a control diet --
advice to reduce dietary fat.
And the primary endpoint --
the --
was a composite
of myocardial infarction,
stroke, and death
from cardiovascular causes.
So --
and the overall hazard ratio
for Mediterranean diets
combined --
so, in this table you see
the combined Mediterranean diet.
So, they put the two
Mediterranean diets
together
versus the control --
and the hazard ratio
between the two was .71 --
the overall --
and the p-value was .005.
So, that's, kind of,
the context of the trial.
But what -- the reason
I'm showing this slide
is, of course,
to talk about subgroup analysis.
Let's take the top one -- sex --
males and females.
And the way not to do it --
that this is about
hazard ratio --
so you see this dashed like,
this vertical that's set at one.
Basically, it says that
if the interval crosses
one there's no difference.
A hazard ratio of one means
there is no difference
between the Mediterranean --
the two Mediterranean diets
and the control.
And, like it says,
on the bottom of the table,
if you're on the left,
the Mediterranean diet
is better,
if you're on the right
the control diet is better.
Okay, so one means
there's no difference.
So, the way not to do it --
the way not to do it
is to look at the two intervals
that you see
for males and females --
you see those two intervals.
The female interval crosses one
and the male does not.
So, the way not to do it --
not to --
not the way to interpret it
is to say,
"Oh, well,
it is better for males --
the Mediterranean diets
are better for males
but are not better --
or are just the same
as the control for females."
This is incorrect way
of doing it.
So, what's the correct way?
The correct way is you look
at the p-value for interaction.
And the p-value
for interaction is .62.
So, there doesn't seem
to have an interaction.
What this -- the -- what
you're really asking yourself
is the hazard ratio different
for males and females.
So, you are comparing
the two intervals,
not with respect to each one
separately with respect
to the one,
but you are comparing them.
and you can see that the
two intervals are not different.
So, males and females
are not different
when it comes
to the effect of --
the overall effect of
the Mediterranean diet.
Okay, just for fun or for
exercise, let's look at the BMI.
It says the p-value
for interaction is .05.
Now, let's look at what's going
on with this BMI interaction.
Well, you can see that
for BMI greater than 30,
it seems that this is, you know,
kind of different than --
certainly than the one
for BMI between 25 and 30.
So, it is -- you could say,
based on this table --
again with modesty
and uncertainty --
it appears that for participants
or for individuals
with BMI greater than 30,
the Mediterranean diet is better
but not much
for the ones below 30 BMI.
And that's, kind of,
reinforced --
confirmed with that p-value.
Again, what we're trying to --
putting pieces together
to come up with a conclusion
rather than just focusing
on one number.
A better way even to do it is
instead of just,
you know -- here --
I'm not going to go
over the trial in detail --
but here, notice the dashed line
on -- over one.
But what they did --
the others did here --
is also put a solid line
over the overall hazard ratio,
which was not
in the previous table.
So, not only it says
this is the line
one that represent
no difference,
but it says this is the line
that represents
overall hazard ratio."
And if you take
the first subgroup --
the positive and the negative
nodal status --
you can see that even though
one covers one
and the other one does not,
they both cover the overall.
And the two are not --
the two subgroups
are not different
in terms
of the treatment effect.
So, that's for
subgroup analysis.
So, now I'm going to talk
about interaction
in statistical models.
It's a very important concept.
We just talked
about interaction --
that table had
the interaction p-value.
So, lets talk about that.
What is an interaction?
So, in a model you have,
let's say, an end point.
So, we're talking about
cigarette smoking.
We talked about the nicotine
patch and the placebo patch.
We talked about
a number of cigarette smoke
during the first
three months of treatment.
So, put the statistical model
and we have an endpoint
where we say,
"Okay, there's
the treatment effect."
This is whether the medication
is better than placebo.
That's what we test.
Then we have sex effect.
We add a sex effect in the model
and that answers the question,
is there a difference
between females
and males overall,
over both treatments?
And then we add treatment
by sex interaction.
So, what is it really means
treatment by sex interaction?
It means,
is the treatment effect
different for females and males?
Is the difference
between placebo and medication
different between males
and females?
I'm going to show you
some illustration on that.
So, it says, does the effect
of the treatment depend on sex?
Is the treatment effect
different for males and females?
If yes, there is a treatment
by sex interaction.
If no, there is no interaction.
Okay, so, questions for you.
Can there be no overall
treatment effect
and yet a treatment
by sex interaction?
Can you have a -- you don't have
an overall treatment effect
and yet there is an interaction?
The answer is yes.
And I'll show you.
Can there be
no overall sex effect
and yet a treatment
by sex interaction?
And the answer is yes.
Can there be
no overall treatment effect
and no overall sex effect
and yet there is a treatment
by sex interaction?
And the answer is yes.
In fact,
all eight combinations --
eight combinations come from
treatment effect
being significant or not --
yes no --
sex effect being significant
or not -- yes no --
so two by two.
And a treatment
by sex interaction
being significant or not --
another two.
So, two by two by two, you have
eight possible combinations
and all eight are
theoretically possible,
although, some are
less likely than others.
And I'm going to show you
an illustration of each one
of those eight combination
and, hopefully,
that's going to help you,
kind of, see visually
what interaction means
and how is it different from
treatment effect and sex effect.
So, lets start.
So, here's the graph.
So, you have the outcome --
so high is bad,
number of cigarettes smoking --
high is bad.
The green --
the diamond green is placebo.
The purple circle
is the active medication. Okay?
And then we have male --
females and males. Okay?
So, this --
what I'm showing you
is the result
of the clinical trial.
So, my questions is is there
a treatment effect?
No, because basically
the placebo and the medication
are on top of each other.
There's no -- there's no effect.
There's no distance
between the purple
and the green for females
or for males.
Is there a sex effect?
Are females --
do females smoke more
or less than males?
And the answer is no. Looks like
they smoke about the same.
Is there a treatment
by sex interaction?
Is the difference
between placebo and medication
different from the difference
between placebo
and medication for males?
And the answer is no.
It's the same distance.
So, that's one out of eight.
Let's go over another one.
Is there a treatment effect?
Yes, now the placebo is higher
than the medication.
So, there is more
cigarette smoking
so there is a treatment effect.
Is there a sex effect?
Well the way to look
for sex effect --
look at the middle
of the two points for females
and the middle
of the two points for males.
Are they different?
So, over both treatments,
do females smoke
more than males?
And the answer is no. Is there
a treatment by sex interaction?
For the treatment
by sex interaction,
you look at the distance
between the two treatments
for females -- that distance.
Is it really the same
as the distance for males?
And the answer is yes.
The distance is the same
and there is no interaction.
How about this?
Is there a treatment effect?
Yes. The active medication is
lowering cigarette consumption.
Is there a sex effect?
Well, yes because
now you can see --
if I take the middle of
the two points for females
it's lower than the middle
of the two points for males.
So, overall,
over both treatments,
females smoke less than males.
That's a sex effect.
But is there a treatment
by sex interaction?
No, because the distance --
the effect, the difference is
the same for females and males.
How about this?
Is there a treatment effect?
Yes.
Is there a sex effect?
No, because if you take
the middle, they are the same.
If you take the middle
of the two points,
and the middle of the two points
they are the same.
Is there a treatment
by sex interaction? Yes.
Why? Because the distance --
the effect for female
was a lot bigger
than the effect for males.
So, there's a difference
in treatment effect.
Is there a treatment effect?
No, because
they cancel each other.
In this case, the two -- the two
treatments cancel each other.
It's better for females
but worse for male --
the medication.
Is there a sex effect?
No, because the middle of
the two points is the same.
Is there a treatment
by sex interaction? Definitely.
This is called
a qualitative difference.
Is it -- it is good for female
and harmful for males.
So, the ones for males --
medication actually
makes them smoke more.
Of course, all of these
are hypothetical examples.
Is there a treatment
effect here? No.
The distance is the same.
Is there a sex effect?
Yes, because females here
appear to smoke less than males.
Is there a treatment by sex
in interaction?
No. Let's go to the next one.
Is there a treatment effect?
Yes.
The green line is above --
clearly above the purple line.
Is there a sex effect? Yes.
Females smoke
more than female [sic].
Is there a treatment
by sex interaction?
Yes, this is called
a quantitative.
It goes in the same direction
for both females and males --
the medication does good,
but it does good less
for males than for females.
The distance is bigger
for females than for males.
Is there a treatment effect?
No, because they -- again,
they cancel each other.
Is there a sex effect?
Yes, because females here smoke
less than males.
And is there a treatment
by sex interaction? Yes.
Again, it flips, so it's
a little bit good for females --
the medication
is a little bit good for females
and a little bit bad for males.
So, here are the references.
I hope you got a sense
of subgroup analysis
and interaction.
In summary, results from
subgroup analysis
should not be interpret
as definitive
and their limitation should be
clearly reported in manuscripts.
And a statistical test
for interaction determines
whether the treatment effect
is different
for different subgroups --
e.g. males and females.
And my questions to you
are what are subgroup analyses?
Why are subgroup
analyses important?
And why is a statistical test
for interaction important?
Thank you for watching.