MODERATOR: Welcome and thank you for joining us today for the November 2018 Precision Medicine
and Population Health Webinar.
Before beginning our presentation, I'd like to provide a few tips for using WebEx so that
you get the most out of today's webinar.
All attendees have been muted and will remain muted for the duration of the webinar.
Please free to submit your questions throughout the presentation in a Q&A panel and select
all panelists from the drop down.
We will ask the question on your behalf during the Q&A portion of the webinar.
If you need to view live post captioning, please refer to the media viewer panel.
I also wanted to note that today's webinar is being recorded and will be posted online
at a later date.
And with that I'd like to pass it over to Debbie Winn who will introduce our speaker.
DEBBIE WINN: Thank you very much.
We're delighted to have today Professor George Davey Smith from the University of
Bristol in the UK.
He's a clinical epidemiologist whose research has pioneered understanding of the causes
and alleviation of health inequalities, life course epidemiology, systematic reviewing
of evidence of effectiveness of health care and health care policy interventions and population
health contributions to the new genetics.
He's published over a thousand peer review journals and 15 books in edited collections.
He is co-editor of the International Journal of Epi and during his tenure the impact factor
has increased from less than two to over nine, which is terrific.
He's been established his potential to the running of a large number of epidemiologic
cohort studies evolving detailed clinical and biomarker assessments.
Currently he's Director of the Avon Longitudinal Study of Parents and Children and he is the
Director of the Medical Research Council Center for Causal Analyses and Translational Epidemiology
as well as the integrative epidemiology unit.
He is Director of the Welcome Trust Four Year PhD Program and Life Course in Genetic Epidemiology
at the School of Social and Community Medicine, University of Bristol.
So, we are really pleased to have him today.
He will be talking about what is Mendelian randomization and how it can better used as
a tool for medicine and public health opportunities and challenges.
He'll be using examples from cancer cardiovascular diseases and other fields to illustrate this
approach.
So, thank you very much Dr. Smith for being here and please go ahead.
GEORGE DAVEY SMITH: That's great.
Thank you indeed for the invitation to present this webinar and thanks for the kind introduction.
I'm sort of clearly nearing retirement phase because many of those things you read out
I've ceased to do.
So, I stopped a year or two ago being the co-editor in the International Journal of
Epidemiology and Nick Timpson is now the Director of the Avon Longitudinal Study of Parents
and Children.
And we changed the name of our center.
It was CAITE, C-A-I-T-E was the abbreviation, but when we actually Googled that that meant
you were not good and worn out in Gaelic.
So, we are now the MRC Integrative Epidemiology Unit.
So, I'll just say thanks again for this invitation.
I'm going to talk briefly.
Well, not that briefly.
About 40 minutes or less of Mendelian randomization.
And then it will be questions and answers hopefully.
So, this is just the outline of the talk I'm going to give.
I'm going to give a brief introduction to MR and then describe the instrumental variables
assumptions that are necessary for interpreting Mendelian randomization studies.
A series of sensitivity analyses and ways of checking the assumptions or at least doing
sensitivity analyses assumptions.
And finish on describing or discussing Mendelian randomization when you're looking at disease
liability.
So, I'm starting with a schema of conventional observational epidemiology where you measure
your modifiable exposure, for example, cholesterol levels or seriatric (ph.) protein.
That's sort of factor of behavior, smoking or consumption, physical activity.
You measure that in a large group of people and then follow them up and relate the exposure
to the outcome, for example, coronary heart disease.
But when examining this relationship there are of course many confounders that could
come into play, factors that influence both the exposure and the outcome and leads to
an association between the two, which is non-causal.
So, for example, smoking increases the levels of seriatric protein and also increases the
risk of coronary heart disease and would confound the association of seriatric protein and coronary
heart disease.
Or secondly you could have reverse causation and that would be that the disease process,
for example, arthrosclerosis, which leads to coronary heart disease, the disease first
influences the exposure, so arthrosclerosis is an inflammatory condition and would increase
the active protein.
And that then would lead to the association between the two including a prospective association,
which is due to actually early stages of the outcome influencing the exposure.
That would be reverse causation.
So, the trick in Mendelian randomization is to take what is called an instrumental variable.
This is something which reliably associates with your exposure, but will not suffer from
the confounding or the reverse causation that you see in a conventional observational study.
So genetic variance can serve as such instrumental variables.
So, consider genetic variance which are reliably associated with LDL cholesterol levels.
The genetic variance will not in general be confounded by the socioeconomic pace or lifestyle
physiological factors which would confound the conventional association.
And second obviously the disease process cannot influence your gene line, genetic variance.
The variance you received at conception.
So, there's no reverse causation and generally there is no confounding.
Thus, the instrumental variable or the genetic variance can serve as a proxy.
So, the observation association is often impossible to exclude confounding or reverse causation,
but the genetic variant will not suffer from the confounding or the reverse causation that
the measured risk factor, the measured exposure or cholesterol level here, for example, would
suffer from.
So, in this set up if you examine the association of the instrumental variable, the genetic
variant with your outcome, coronary heart disease, with the association of the instrumental
variable with cholesterol level, and indeed you divide the first, the association with
the outcome by the association with the cholesterol, with cholesterol levels, this allows you to
estimate the causal effect of the exposure, in this case LDL cholesterol with coronary
heart disease.
Now the Mendelian randomization aspect of this set up, this is a straightforward sort
of instrumental variable analysis, but the Mendelian randomization aspect, and if you
don't make this interpretation then in my view you're not performing a Mendelian randomization
study, although people use the terminology when they are not aiming to make, to infer
cause for modifiable exposure.
The Mendelian randomization aspect you are considering that the modifiable exposure,
the effects of the modifiable exposure on the outcome will be the same whether the modifiable
exposure is influenced by environmentally changeable factors, such as diet, use of cholesterol
lowering medication, etc. as when it is influenced by genotype.
So that is the absolutely key assumption is that the effect of the exposure on the outcome
will be the same whether the exposure is influenced by modifiable factors that you can do something
about in the same way as it is influenced by genetic variation.
So, this can be considered to reflect the phenocopy/genocopy dialectic.
So, in the late 1930s Goldschmidt introduced the concept of phenocopy, which is where something
in the environment, for example, your very high temperatures, could lead to a modification
which could also occur through genetic difference.
So that's the notion of phenocopy, i.e., the environment is copying something that
can be produced by genotype.
And Schmalhausen (ph.) around the same time introduced the notion of genocopy, which is
simply the mirror image of phenocopy.
Genocopy is that a genetic variant can mimic the effects of the environment.
So, you can see the two are basically saying the same thing, it's just whether you're
looking at it from the phonotypic end or the genotypic end.
So, we can consider sort of a classic example of this, which is Hartnup's syndrome, which
is induced by genotype, but the disease looks like pellagra and pellagra is induced by niacin
deficiency.
Now the mutation in Hartnup's syndrome is in a gene which is related to niacin.
And so, if in the 1950s Hartnup's syndrome was identified and seen to look like pellagra,
if it hadn't been realized that a niacin deficiency was the cause of pellagra that
genotypic association would have given the same information.
So Hartnup's syndrome is a genetic syndrome, can be considered a genocopy of pellagra or
pellagra can be considered a phenocopy of Hartnup's syndrome.
So that's the important interpretive factor.
And I'll go back to this slide, the only one thing you remembered, than this would
be it, is that you must making this assumption that the modifiable exposure, however it is
influenced, is having the same effect on the outcome.
And that's how you can use the genotype to infer something about how modification
of exposures will change disease risk.
So, this is common place in developmental genetics, the notion of gene environment equivalence
of gene environment interchangeability that Mary West Eberhard, for example, discusses
at length, this notion that gene expression has changed, it will have the same impact
whether it's changed by a genetic difference or by an environmental influence.
It's a nice quote from Zuckerkandl and Villet that no doubt all environmental effects can
be mimicked by one or several mutations.
I think that's maybe slight overstatement.
I mean, I think being hit by a bus, probably can't be mimicked by mutations, but many
of the things that we're interested in preventative medicine, for example, can be mimicked by
genetic variation.
So just one example here of Mendelian randomization.
And this is in the examination of selenium and prostate cancer risk.
And then here we see the Mendelian randomization occurs so the dice being thrown and that's
the genetic variance inherited independently of each other.
Mendelian law of independence assortment and of environmental factors which relates to
the little segregation, at least of environmental factors acting before birth, before conception
and then before birth.
So, the genetic variance serves as instrumental variable to the modifiable exposure which
is selenium and the observational studies those selenium levels or selenium intake will
be confounded and then maybe reverse causation or perhaps the early stages of prostate cancer
influence your selenium levels.
So, there was a substantial amount of epidemiological and other evidence that suggested that selenium
protected against prostate cancer risk.
Indeed, this evidence was considered substantial enough to launch a large scale randomized
controlled trial, which is called the select trial.
So, on the left we see the RCT where the randomization method, randomized people either to a placebo
or to selenium supplementation, which increased plasma selenium levels.
On the right-hand side, we see the Mendelian randomization equivalent.
Of course, the randomization there is at conception so it's likely that from soon after conception
on it is likely that the fetus and then the adult would be exposed to different levels
of selenium.
So, this is an important thing to bear in mind when I discuss the interpretation of
the instrumental variables estimates.
But after the randomized control trial if you randomize by genotype you're carrying
out the equivalent of an intention to treat analysis.
You don't take into account what the actual selenium levels are in folk, you analyze by
genotype, just as in randomized control trial your intention to treat analysis is by what
they were originally randomized to, not what their selenium levels are.
So, in the select trial despite the sort of substantial evidence there was from observational
studies, the randomization to selenium supplementation did not product prostate cancer risk and that
was in the trail.
And then in Mendelian randomization study using 22,000 cases and 22,000 controls in
the practical consortium scaling up the effect of subtle genetic variance that relate to
selenium levels and scaling is up to the difference in selenium levels that were induced by randomization
to the supplement, one gets a closely similar estimate and closely similar precision to
the randomized control trial.
And it was a select trial was said to cost around 100 million dollars to carry out and
the question that it would be nice to know the answer to the question would the select
trial be launched today with these Mendelian randomization data suggesting that sustained
differences in selenium levels not leading to a reduction in prostate cancer.
And indeed, sort of the basic principle of Mendelian randomization was considered to
be simple enough to be understood by pregnant women attending anti-natal clinics in the
late 60s and early 1970s.
Here's the front of a leaflet advising about the inheritance of hemophilia on the pregnant
woman there who looks scarily like Margaret Thatcher.
She has here either two possible daughters and two possible sons and beautifully you
actually see the dice being rolled, just as we saw in the previous slide.
This is Mendelian randomization in action.
The carrier mother with one hemophilia carrier in x chromosome will not be suffering from
the condition, but of her daughter's one in two of her daughters would be carriers.
They would also carry the chromosome.
And one of two of her sons would suffer from the condition hemophilia.
So, there's Mendelian randomization in action.
Now if you followed up these two daughters and thousands more of these daughters, you
would be able to compare those with the phenotype, this is influenced by the hemophilia chromosome,
hemophilia carrying chromosome versus the control with the non-hemophilia carrying chromosome.
Now women who carry hemophilia are relatively anti-coagulated, I mean, they don't suffer
from the clinical condition, but on average they are relatively anti-coagulated.
And if you did follow up these girls then what you would see is the ones who are carriers
have a quite substantially lower risk of carrying heart disease.
And now we actually, you know, we know from their trials drugs which lead to relative
anti-coagulation that this reduces the risk of coronary heart disease.
If we didn't have that evidence then this would provide evidence to support that notion,
because the course that girls don't differ in any systematic way by anything other than
their carriage of that particular chromosome.
They don't differ in other ways.
Indeed, they look pretty identical.
So, the look as though they don't have any confounding differences between them, and
indeed they wouldn't.
So, a couple of studies have done precisely that.
And as I say have looked at the risk of coronary heart disease in carriers versus non-carriers
and found that difference.
And getting around to randomization today is often carried out using the two-sample
approach.
And the two-sample approach you obtain genetic variance that relates to the exposure from
GWAS and then you look at a GWAS for the outcome and what is meant to be a similar population
in terms of ancestry, etc., the way you have genetic data on how the genetic variance relates
to the outcome and from that you can infer or calculate the supposed causal effect of
the exposure.
Two sample Mendelian randomization allows Mendelian randomization to be carried out
at home as it were using platforms like MR base and does render it a quite simple way
of obtaining some evidence.
Now the assumptions that you need to make in a Mendelian randomization study can be
the same as the instrumental variable's assumption because that the analysis you're
carrying out.
The first assumption or the relevance assumption is that the genetic variant is reliably associated
with the exposure.
This is the only one of the assumptions you can actually test, do a statistical test of
whether the instrumental variance does indeed relate to your exposure.
The second assumption is that the genetic variance is independent of confounders or
in epidemiological terminology exchangeability so that the greeps defined by genotype do
no differ by confounding factors.
And population stratification would be a major potential cause of confounding here, which
can be dealt with in the ways that population stratification is dealt with genome wide studies,
etc.
The third assumption is that the genetic variant only influences the outcome through the exposure.
So, the genetic variance does not have an effect on the outcome other than one that
is mediated through the exposure.
And that's called the exclusion restriction.
Now IV assumptions two and IV assumptions three cannot be tested.
We quite often occasionally see people who say that they've developed a test for these,
tests for IV2 and IV3 and either they deserve a Nobel Prize or they're wrong.
So far, the latter has sadly been the case.
You can examine how the genetic variance relates to measures of confounders and one would suggest
that that that should be done.
As many measures of confounders that you have.
But, of course, you can never demonstrate, by definition you can't demonstrate associations
without known confounders.
For IV assumption three, we need to see this in the (unint.)
field where instrumental variables came from.
You often see it referred to as here that the genetic variance is independent to the
outcome, while conditional on the exposure X and then the confounders U.
Well, that's something of a heart sink moment because the whole point of Mendelian randomization
is that you don't have measures of the unmeasured confounders by definition.
So how can you examine this if you need to have measures of them?
And, in fact, this statement only becomes true if you actually try to condition on the
exposure X, i.e., if you just do a genotype to outcome association for your exposure you
would just do a genotype to coronary heart disease association for your exposure then
you might think that that should abolish the association with genotype on coronary heart
disease.
Now there's two reasons why that shouldn't happen.
The first reason is that you will not measure the lifetime long term cholesterol measures
in any feasible observational study.
So, if the ecological factor is long term cholesterol level, then you will not be taking
this into account by adjusting to the measures you have in an observation study.
They'll be residual effects of the genotype on the outcome.
And the second, and even sadder thing is that if you do adjust for the exposure for the
X variable in this PowerPoint then that is equivalent to adjusting for a colida (ph.),
adjusting for a colida is when you adjust for something which is influenced by more
than one factor, like by exposure of interest and the measure of confounders in this schematic.
When you adjust for that factor you then induce an association, you induce an association
between the genotype and confounder.
When there is no such confounding in the source population once you do the adjustment, you
induce such confounders.
So not only does it not work it actually makes the situation worse, adjusting for the mediating
factor.
And indeed, these supposed tests for the exclusion restriction often boil down to a performing
such an adjustment.
So, if there is one thing to remember from this webinar it's do not condition on the
intermediate phenotype.
This is probably one of the major issues with Mendelian randomization studies that are published
is that this is attempt … And actually, I care a lot about Mendelian randomization,
so I'll say please do not condition on the intermediate phenotype.
Those of you who are used to mediation analyses, for example, you would instantly recognize
that the schematic I drew is simply a schematic for complete mediation.
It's like the same thing.
You're saying that the effects of the genotype on the outcome are completely mediated by
X.
So, the analyses are the same as if you were trying to just do a conventional mediation
analysis.
And this is why adjusting for a mediator doesn't work as a way of doing mediation analysis.
You induce colida bias and you leave residual effects.
So those are the three, the main three IV assumptions.
You sometimes see people discuss the fourth IV assumptions.
And this is an assumption which is required if you're going to generate IV effect testaments.
I said earlier that you could divide the genotype coronary heart disease association by the
genotype cholesterol association.
And that will give you some estimates of the causal effect of cholesterol on coronary heart
disease.
But the issue is who does that causal effect apply to?
So, we tend to think that it would apply, it would be an average population effect.
Now that would be the case if you accept IV assumption four, which is that there is homogeneity
of exposure effect on the outcome.
So, cholesterol to outcome effect is the same in everyone.
Now you know at the limits of course that's implausible if that's the case, though it's
likely to be some actuality.
The issue is how much, how unreasonable is that assumption?
That would allow you to estimate now for the exposure effect on everyone.
A different assumption would be the monotonicity assumption and that is that is that the genetic
variant leads to a higher level of cholesterol in everyone or no change in cholesterol level.
If an account factual situation could actually change the variant it could crisper people
or whatever.
There's no people who the cholesterol level would go down if they would change from the
high to the low cholesterol variant.
If you assume monotonicity the effects all range in a positive direction from the null
then you can compute what's in the sort of randomized control trials and other incidental
variables is called a complier average exposure effect, i.e., the effect in people where the
genetic variant hasn't indeed influenced the exposure.
Or a third assumption can be made is there's no interaction between your genotype and the
confounders with respect to influencing your exposure.
So, these allow you to make some formed of IV estimates that can be applied to some part
of the population.
So, the worrying thing is that these can't be tested, these IV four assumptions cannot
be tested.
And this has led some I think rather over enthusiastic critics of Mendelian randomization
to state that the effect estimation in the Mendelian randomization is extremely highly
problematic, horribly invalid.
Even though you can't test it, you can interrogate the degree to which it appears to be a problem.
And you can do this by looking at the variance of your outcome in relation to the different
levels of your genetic variant, your different levels of your instrument, because violation
of any of those three components, the homogeneity, monotonicity or the no interaction.
All of them would lead to a variance increase, an increase in the variance in one level of
instrument compared to another level of an instrument.
There would be a difference in variance if this was the case, whatever you consider to
be the treatment allele.
Unless there's an utterly implausible complete cross over situation, which is literally,
is utterly implausible.
So, you'd expect to see a variance difference.
So that's good news because we can of course interrogate genotype associations with outcome
or with the exposure, we can look at the variance when those things are continuous.
And then you can then get some notion of the degree to which this assumption is violated.
And this realized, or this is discussed many years ago by RA Fisher.
So, RA Fisher of course is celebrated for introducing randomized control trials.
In fact, his daughter Joan Fisher Box wrote as have other people he modeled, Fisher modeled
the randomized control trial on Mendel's Law of Independent Assortment.
He was absolutely explicit about that in a 1951 lecture.
So Mendelian randomization actually came before randomized control trials.
Randomized control trials are the phenocopy of Mendelian randomization if you like.
But anyway, Fisher was being criticized in a Fisher randomized trails which were in the
agriculture sphere and largely were being criticized by people making the same criticisms
about the heterogeneity of response doesn't allow you to make any sensible interpretation
of who would benefit.
And in a letter to HE Daniels he said this rather beautiful thing, which is that this
point has, I think, received a rather large amount of theoretical attention which is has
chiefly through lack of contact with the practical experimental situation.
I.e., the people who are raising this problem were ones who'd never got close to a randomized
control trial and didn't actually have any sense of how plausible it was as a serious
violation.
And I think the current situation of Mendelian randomization is the folks who are raising
this issue are generally rather distant from actually doing studies.
And certainly, unlike Fisher they haven't realized that you can look at variance differences
as a way of getting some evidence of how extreme the violation is.
And, of course, we have massively wide association studies which allows us now to have power
to really characterize the variance very well.
This is a nice paper by Alex Young, which shows that for FTO there is this small, there
is evaluation of, nothing massive, but there is evidence of some difference in variance
across genotype for many, many other potential exposures of outcomes, this isn't seen.
So, it is possible to interrogate that assumption.
So anyway, so once you get as far as believing you might as well, you can make IV estimates,
then how do you interpret them?
Well, firstly you need to think of all the assumptions, assumptions one to four.
Then you need to think that the Mendelian randomization estimate will be an estimate
of the lifetime exposure effect from the people who get genotypes related to higher cholesterol
will have higher cholesterol from birth probably and arthrosclerosis develops across the lifetime.
So, you'd be looking at like 60 years or something higher cholesterol.
In a randomized trial you just look at lower cholesterol levels at five years and six years
or so.
So, if you have a biological model that is a cumulative effect of cholesterol, which
makes sense because you know at age 18 when Korean and Vietnam War soldiers were autopsied
they already had arthrosclerosis and we could show that in autopsy cases, young autopsy
cases it related to cholesterol level.
So, what you'd expect is a greater effect in doing randomization studies and what you
actually see is about two-and-a-half-fold effect in an MRI study for a given change
in cholesterol in a trail for five years.
So that's really, really helpful, because it helps us know that lowering cholesterol
in relatively early life will produce greater benefit than lowering it later.
The second thing is that the exposure, it might not be accumulative effect, it might
just relate to a particular critical period, for reasons I won't go into at length here,
there is reason to believe that the robust Mendelian randomization finding that higher
Vitamin D levels protect against multiple sclerosis.
That effect may only apply to the period that people receive an EVB infection, which is
essential a prerequisite for developing multiple sclerosis.
And it's the response to EBV infection which is modulated by Vitamin D.
So, giving people Vitamin D after they've received infection often in childhood in adolescence
say, giving it later than that isn't going to be beneficial.
So, the effect may only relate to a critical period whereas your randomization being from
conception goes right the way across both conception and life.
The interpretation relates to the phenotype that your genotype relates to.
So, if your genotype relates to an enzyme, say a fatty acid, relates to that enzyme then
your interpretation has to be to all the factors then influenced by that enzyme being modified.
You see many Mendelian randomization studies that will use sort of fab student (ph.) factors
as an instrument for linolenic acid because that's what you're interested in.
It's related to many other things.
You can't just pick the fact you're interested in and say it's an instrument for that when
it relates to differences in many other things.
So, the interpretation is to the phenotype which is the genotype is the most proximal,
is the most proximal phenotype to factor the genotype influences.
So, the enzyme, for example, in those cases.
May only be interpreted in terms of liability to the exposure not the exposure itself.
And I'll finish on that point, so I'll come back to that.
And then finally in most current studies, in particular two sample Mendelian randomization
studies, generally relates to disease occurrence, not outcome because the genome wide association
studies the one it's using are of disease occurrence, heart attack cases versus controls,
breast cancer cases serve as controls, prostate cancers versus controls, etc.
So, the Mendelian randomization studies are telling you about how to prevent the disease
not how to treat the disease.
So, take lung cancer, for example, the g wide studies of lung cancer, the top variance in
the first three studies was in a nicotine receptive variance which related to heaviness
of smoking.
So, there you go, Mendelian randomization will cost in terms of millions of dollars
as those studies then cost, shows that smoking is indeed a cause of lung cancer.
But of course, once you develop lung cancer, stopping smoking is hardly an effective therapy.
And that might relate to other conditions.
We simply don't know how many conditions are such that factors influencing onset influence
progression and secondary events.
Coronary heart disease, for example, it seems that most of the factors that influence onset
high cholesterol, smoking, high blood pressure, etc., if you modify those after people have
had a first heart attack it lowers the risk for a second event.
So, in some cases the MR studies are telling you about outcome.
But in many cases, we simply don't know if that's the case.
There are certainly cases where very clearly Mendelian randomization studies, an occurrence
do not speak to disease progression, disease development.
And so, if you want to explore treatment of disease through Mendelian randomization you
need to have Mendelian randomization studies of disease progression, starting with people
with cases and then following them on, which has additional complexities which I'm not
going to talk about here, but have been quite extensively written about.
So, we need more Mendelian randomization studies of disease progression if we want to talk
to treatment.
So, with all those problems why generate IV estimates?
Well, the reasons are, there are many reasons, but a few of them are that most of the sensitivity
analyses or extensions of MR analyses depend upon the ratio of the genetic variance to
outcome to the genetic variance to exposure, because that is what you would expect will
be stable across different genetic variance that relates to the exposure and the outcome.
If the studies are not biased, if the genetic variance are all producing an unbiased effect
they will produce the same IV estimate.
So, you need them for the sensitivity analyses.
Second, comparing the IV estimates and what is observed in trials can help inform about
relative immediacy treatment effects.
With cholesterol suggests they're not very immediate, with blood pressure and cardio
vascular events suggesting they might not be quite as long term or the long-term effects
might not be so many times greater than the shorter-term treatment affects you see.
So, it can be very useful for helping us understand that.
And in some cases, of course, for example, using genetic variance in pregnant women that
relate to their smoking behavior in relation to birth outcomes, such as birth weight, you
have a reasonable guess about which period of the exposure is acting.
So, what are the limitations of Mendelian randomization?
Well, first is to introduce confounding through horizontal pleiotropy, which is where our
genetic variance influences the disease outcome through a pathway which is not through your
exposure.
If it's through your exposure, the genetic variance influences BMI which is your exposure
and that influences blood pressure.
That is called vertical pleiotropy and indeed is the essence of Mendelian randomization,
not a violation.
But you need to have horizontal pleiotropy through a separate biological pathway.
You can interrogate this by using multiple genetic variance because how do you have more
and more genetic variance that relates to an exposure and outcome when they predict
the same causal effect?
It becomes decreasingly likely that they could all be having horizontally pleiotropic effects
which perfectly better balance their predictive causal effect to generate the same effect
estimates.
You can now do this, this just shows the degree to which variance related to LDL cholesterol
relate to coronary heart disease is proportional for their effects on LDL cholesterol.
You can now redo this as many more variants, so it becomes rather implausible that they
are all being violated by horizontal pleiotropy.
Our friend in this situation is often just the simple heterogeneity statistic to say
that the heterogeneity between the effects of the different genetic variance, and you
can interrogate the heterogeneity in various ways.
Secondly you can do a series of sensitivity analyses.
Here is one which is what's known as MR-Egger, which allow you to relax assumptions and obtain
estimates with less stringent assumptions.
So, in the MR-Egger case from the previous slide, the slide I go back to here, you see
that you can get an effect estimate simply about any regression through the genetic variant
effect to the exposure on the X and the outcome on the Y axis.
You force that through the origin, the slope of that is your causal effect estimate.
In the MR-Egger setting you simply don't force the line through the origin and that
then means that the intercept is an estimate for unbalanced pleiotropic effect, the degree
to which the pleiotropy balances so the pleiotropy distorts the estimate in one direction over
the other.
And if you don't force the intercept through the origin the slope remains a valid causal
estimate.
You relax the assumptions of no pleiotropy here to the assumption that the strength of
your genetic variance on the exposure, i.e., on cholesterol, does not perfectly correlate
with any horizontal pleiotropy to factor those variables, which is certainly less stringent
but could have caused the (unint.)
agent.
But there are many such approaches, and more are arriving every day, which I've highlighted
here.
It isn't that any one of these approaches is the correct approach.
People sometimes say that there's a correct approach, they tend to say that their approach
is the correct approach.
They all have different sets of assumptions and they all allow relaxation assumptions
in different ways.
The approach would be to run multiple sensitivity analyses and things which remain robust under
multiple sensitivity analyses are the most believable.
You can also interact your genetic variant with an external, another factor and relate
how the genetic variant effect on the exposure is different in different groups to how the
genetic variant effect on the outcome is different in different groups.
And that gives you another sensitivity analyses of an MR study.
It's easiest to show this by an example, which here is alcohol intake on the risk of
cardiovascular disease, for example.
So, as you know whenever studies are reported of saying alcohol is good for you, it reduces
cardiovascular risk they get much attention.
Here is a recent study in Britain that said that drinking alcohol reduced cardiovascular
risk.
In the newspapers it got very widely covered.
Moderate drinking can lower risk of heart attacks.
This study at Guardian, a sort of high end paper in Britain, going slightly down market
drinking a pint of beer a day linked to reduced risk of heart attacks.
Further down-market Cheers!
Drinkers who have one glass of wine a night are of less risk of heart failure than teetotalers.
Right down to the gutter of our press, the Daily Star, here you see drinking alcohol
slashes risk of heart problems – if you drink this much per week.
In the U.S. of course Time was very sober, alcohol is good for your heart, most of the
time.
And in the Irish Times, a good headline, moderate drinking may cut risk of heart disease.
But I liked this thing here, that moderate drinking may be okay for heart disease but
even heavy drinking … sorry … even heavy drinking may be good for your health.
That was in the Irish Times.
Anyway, a Mendelian randomization study, I've only got a couple of minutes left so I'll
go through this quite quickly.
In cartoon form when you drink alcohol it's metabolized by alcohol dehydrogenate acetaldehyde
and cleared by acetaldehyde dehydrogenates to acetic acid and acetaldehyde gives you
the pleasant effects of alcohol flushing, headache, palpitations, etc.
And in East Asian populations, China, Japan, Korea, Vietnam, etc., people carry, many people
carry a knockout of this genotype and the homozygotes with the knockout drinking alcohol
becomes extremely unpleasant due to flushing and palpitations, etc.
The heterozygotes drink an intermediate amount and the heterozygote wild type do most.
Not so much amongst the men, but you'll see when these studies were carried out, this
is an old analysis, but now huge studies show the same thing.
In these populations women weren't drinking whatever their genotype.
So, this gives us a rather nice no relevance point or a negative control, which is in the
women genotype is not related to alcohol, but the horizontally pleiotropic effects of
the genotype would be expected to remain.
So, if those pleiotropy you would see, and it's not alcohol which has an effect, you
would see the same effects of genotype in men and women.
But if alcohol is generating the association and therefore alcohol is identified as a causal
modifiable factor, you'd see effects in men but not women.
And this is what you see and as you see now huge studies have shown precisely the same
thing, including with stroke risk and other disease outcomes, which is that the men who
are homozygote knockout and drink no alcohol have considerably lower blood pressure than
the men who are heterozygote or homozygote can drink more alcohol.
But in women it would demonstrate the same pleiotropic effects nothing you've seen.
So, this provides evidence against pleiotropy.
And indeed, you can use, if you have multiple groups where there is a different level of
exposure association with a genotype with a much smaller effect or a much large effect
you can estimate the causal effect by regression which is not through the origin.
The intercept is again an estimate of unbalanced, of pleiotropy.
But the slope remains an estimate of the causal effect to using this demonstrates that causal
effect to alcohol on blood pressure.
And alcohol raises HDL cholesterol but lowers LDL cholesterol, which has been shown in other
designs, including trials in many cases.
So, this is a way of using genotype by exogenous factors, it's preferably exogenous factors
such as in this example, which can influence by genotype.
A genotype effect modifier interaction can be used for effect estimation.
And is another sensitivity analyses.
So, I'm going to finish very quickly refer discussing what we'll often see which are
Mendelian randomization studies of a disease where the disease is the exposure and then
you try to say what's the effect of the disease.
And here's a recent paper, which is the most recent one, so I'll show it that I
saw, where it's saying that it's claiming to demonstrate a causal inference of schizophrenia.
This is the exposure on cannabis use.
So, the interpretation here is that the disease is causing cannabis use.
But actually, when you're doing Mendelian randomization, especially if you're doing
it in a two-sample setting, but in a one sample setting unless you've actually measured
when the disease occurs, and events after it's occurred, which isn't the usual situation,
you weren't actually looking at the effects of the disease you are looking at the effects
of the genetic liability to the disease.
So, the interpretation here is actually that you're saying that there's a genetic effect
on disease liability but you're also thinking that modifiable exposures influenced disease
liability.
And the result is environmental effects and that that liability will influence disease
A. But if you then through a Mendelian randomization study in this situation the liability might
have genotypic expression even when the disease hasn't occurred.
This is certain in the case of schizophrenia where it ages before schizophrenia can occur
you see effects of the genetic liability on many traits, including incidentally participation
in studies.
And in populations where no one actually has schizophrenia you see phenotypic effects of
the liability.
So, your interpretation here actually is to the disease liability.
You're again hoping that in the study you're saying something about modifiable effects
on liabilities as well as genetic effects on liability and that they may have an effect
on the outcome that you could interfere with.
But you can't let your interpretation be to the disease as a previous case.
So that's interpretation in terms of liability.
I would say that Mendelian randomization studies should always be considered in the context
of triangulation of evidence where you aim to utilize distant study designs, all of which
may have biases and have different biases, but you would aim to utilize different methods
and bring the evidence together.
Mendelian randomization would be a good method that should be put into the stew.
But that your overall interpretation is still an evaluation, a triangulation of evidence
coming from different study designs, hopefully study design which all of course may be biased,
but where the biases will be orthogonal.
Bias in one study would not generate, the mechanisms generating a bias in one study
would not apply, a study design would not apply to your other study designs.
So, I'd just like to finish by saying that I'm sure all of you want more on Mendelian
randomization and what better than three days in sunny Bristol in July 2019 when there will
be the Fourth International Mendelian Randomization conference.
The website is www.mendelianrandomization.org.uk for those of you who are interested.
I'll just leave up a PowerPoint with some further reading if you're interested in
looking at more detail in some of the points.
That's that.
Thanks very much.
MUIN KHOURY: Okay.
Thank you, George, very much.
This is Muin Khoury and I'm from the CDC Office of Family Genomics.
I'm looking at the clock here.
We have about ten minutes for discussion and so those of you in the room and on the web,
please send your questions.
I see that we have one question already.
Just to start off the discussion and again I've just being watching the field of Mendelian
randomization grow so much over the last decade in large part due to your efforts George and
it's amazing to see that now we have conferences dedicated to Mendelian randomization where
you have 200 to 300 people at a time trying to do studies.
I do remember distinctly more than ten years ago in the early days of Mendelian randomization
where we invited George to CDC to give a talk.
He flew in one day, gave a talk and then flew back out to the UK the next day.
And it was like a whirlwind of activities.
So, this just is sort of a global question and I like the idea of triangulation that
you put at the end.
I want to push you a little bit and see if others have any specific questions, so you've
used the example of selenium and prostate cancer as potentially if we had Mendelian
randomization study we may or may not have done an expensive randomized clinical trial,
even though that observation study is around selenium and prostate cancer protective effect
were quite suggestive.
So how do we actually use MR studies to either avoid or accelerate RCDs or even do away with
them?
So, for example, in this case wouldn't have done an RCT or maybe in another case we would
have accelerated or said, oh, we need an RCT here, because MR is pointed in that direction.
Or could there be situations where an RCT is not even needed and you can jump to causality?
I mean, given all the limitations of the instrumental variable approach what would be kind of your
take on the ultimate utility of Mendelian randomization to either avoid or accelerate
the process of randomization and causal inference?
GEORGE DAVEY SMITH: Thanks.
So, I think that Mendelian randomization should help virtualize trials.
So, I think it provides some evidence when you can't do a trial on cardiovascular risk
factor.
I think even 20 something years ago someone had counted 256 cardiovascular risk factors
that have been proposed.
You can't do 256 RCTs.
Now it's vastly more.
But if you're actually trying to prioritize what you would do a randomized trial on, I
would say, for example, the negative MR studies on HDL cholesterol I think would reduce enthusiasm
for having carried out randomized trials of HDL cholesterol beyond the first one or two
perhaps.
So, I think it's the prioritization.
Certainly, I would not see Mendelian randomization ever replacing randomized control trials.
I finished the first article, the first extended exposition of Mendelian randomization 15 years
ago.
We finished it by, (unint.) and I finished it by saying that we saw Mendelian randomization
as a way of helping put up the best candidates for randomized control trials, which in the
end … but they were necessary to actually evaluate therapies before.
They were definitely necessary to evaluate therapies before they came into therapeutic.
And I gain I'd also say that a negative Mendelian randomization study on its own wouldn't
say let's not do a trial in selenium … it would just feed into the evidence, the evaluation
of whether that's your best current target, you know.
Is that where you have the best evidence, is that when you can't do the trial of all
256 risk factors, these really are the best candidates.
And so, I definitely see Mendelian randomization as never doing the very randomized controlled
trials or any randomized control trial and always feeding in to the prioritization of
which trials to do.
MUIN KHOURY: So, we have a question from the web about the select study and the continuing
with selenium and prostate cancer.
Was it possible to check for genetic variance in the select RCT participants?
That would be a nice thing if those data were available.
Do you know George if such genotyping was done?
Or could that have been done and do some group analysis?
GEORGE DAVEY SMITH: I agree very much.
I don't think it's been done and I don't know if they collected DNA, I mean, trials
now would generally, large scale trials would generally at least retain samples that people
just sort of used to throw away the cells.
I mean, I remember very well doing field work in epidemiological studies 30 years ago when
we would just discard the cells after getting the serum from the plasma.
I mean, I now know it's definitely kept.
And it's definitely extremely attractive to obtain DNA and genetic data within trials
to allow replication of what's been taken from MR studies.
I don't know of any direct head to head that's been done.
That would be a fantastic study design.
MUIN KHOURY: So, we have one discussion point around the use of genetic risk scores that
will be generated from GOIS data.
So actually, one of our seminars earlier this year was on the use of genetic risk scores
in medicine, so the process of genetic risk scores usually emanates from GOIS data and
end up with having not one genetic marker or variance but thousands or dozens of them.
If you can give us sort of in a nutshell your idea of the use and limitations of using genetic
risk scores for multiple steps rather than for one gene at a time.
I mean, what would be the problems with that?
GEORGE DAVEY SMITH: Yeah.
So, the advantages of those genetic risk scores are that they obviously have very considerably
greater risk or power than any individual single genotype.
That's the advantage you have high power.
The disadvantage is that there is a balance that you've just got one instrument and,
so you can't do any of the sensitivity analyses, except you could do the interaction sensitivity
analyses.
That's the only one really of the sensitivity analyses that you could do.
The others, or the other commonly used ones are currently random, having multiple genetic
variance.
So, you have high power.
In fact, for the straightforward what's called inverse variance weighted estimator,
which is basically the slope forcing of life through to zero in the cholesterol coronary
heart disease slide I showed, that estimator doesn't have much lower power than putting
them all together in a single polygenic score.
But of course, it takes, it's a lot more effort to do up the whole stack of sensitivity
analyses.
So, I see, in terms of, I think there's also use of polygenic scores for other things
like prediction, etc., prognosis, you know, not for cause.
So, this isn't saying that this is the only use of polygenic scores in any way.
But in terms of because I see a particular use of polygenic scores is that you have high
power, you can generate them in data sets and you can use them for hypothesis generation
if you like, for looking at things you might be the factor that is influencing some downstream
phenotype.
In fact, if you're interested in that area Tom Richardson from our group has very recently
put a paper on BAO Archive, which is a negotiable web based tool which is generated I think
about 150 polygenic scores in the UK bio bank for 150 different traits.
And then you can relate those to many hundreds of outcomes in the UK bio bank and you can
… it's web available.
And so, you can look something up.
Then it's the hard work of then following that up if you do find such a … which gives
you some hint of some causal relationship.
You need to follow it up with formal Mendelian randomization studies with all the sensitivity
analyses and you need to follow it up with triangulation of evidence from other data
sets.
But there is a sort or tool available which gives you very rapid ability to interrogate
data from UK bio bank.
MUIN KHOURY: Alright, so two more questions before we let you go.
I know it's getting late in Europe here.
So, we have two questions from the web.
One of them is relevant to whether or not we can use Mendelian randomization for microchondrial
DNA.
That's something maybe we need to scratch our heads a little bit on.
The second question was relevant to whether or not Mendelian randomization can be used
to study causal mechanisms inside the cell or body, like SNIP (ph.) like an instrumental
variable finding the causal effect of gene A expression on gene B, like A regulates B.
GEORGE DAVEY SMITH: Yeah.
Okay.
So, for the first one I have to say that I have no experience of that.
And of what the phenotype was at the microchondrial DNA's infancy, because I think the key thing
is that in my view Mendelian randomization is about making some interpretation about
a modifiable exposure.
So, the question there is what is your modifiable exposure?
And then for the second question, so definitely gene expression and methylation and these
other phenotypes are indeed in principle modifiable, except where they methylation (audio) of a
base switch.
So absolutely you can do Mendelian randomization of regulation.
It does expression influence methylation, what is methylation expression.
And there are quite a few papers doing that or attempting that.
The problem is you very often have only one essential, essentially just one instrument
or variance probably with the same causal variant.
So, you can't separate pleiotropy from causation with just one instrument.
We might start producing more instruments and allow that to be the case.
And, in fact, many of you I'm sure would have come across the polygenic, I'm sorry the monogenic
model and would have been interested in that notion.
The latest paper from the group which has been posted on bio marker, they don't actually
formalize that model in terms of peripheral genotypes as they refer, call them, they have
their downstream influence in regulating core genotypes, which is a mediation model, multivariable
Mendelian randomization which I didn't talk about.
But there's a paper by Anderson in the further reading of multivariable Mendelian randomization.
It would be in fact an ideal method for testing those sorts of associations, for looking at
regulation, of looking at regulation of one gene on other genes.
So, this is an area which is in its extreme infancy, but is certainly one where some of
the methods developed in Mendelian randomization would have relevance.
MUIN KHOURY: Okay.
So, I think we're about out of time here.
So, I would like to thank you George for a very stimulating conversation and just tell
the audience that the slides will be online as actually the whole hour will be online,
maybe in the next couple of weeks.
And with that we conclude ten very successful seminars, webinars for the whole year, 2018,
which was done in collaboration between NIH and CDC.
Three of them were done at NIH and seven of them were done at CDC.
So, if you want to check out the slides and the titles and presentation you can go to
the website and so happy holidays for all of you.
And if you have ideas for topics for 2019, we're all ears and would like to have more
discussion.
So, thank you all and we'll see you all next year.
GEORGE DAVEY SMITH: Thanks very much.





Không có nhận xét nào:
Đăng nhận xét