Skip to main content

Mendelian Randomization: getting genes to run randomized trials for you

One of the core elements of my day job is dealing with causal relations: we try to understand what cause caused an effect. An area where much work has been done in understanding causal relationships is medicine where randomized controlled trials are used to understand the relationship between taking a medicine and some outcome.

But some things are hard to perform a trial on. It's all very well if you have a medicine to try out, but what if you want to know if, for example, having low serum cholesterol is associated with an increased risk of cancer?

That's not an idle question, the Framingham Heart Study, was thought to have shown a relationship between serum cholesterol and cancer (e.g. The serum cholesterol-cancer relationship: an analysis of time trends in the Framingham Study). But the question is: given that there appears to be a relationship, is it causal? Does low serum cholesterol cause cancer?

It could be that it's the other way around (called reverse causation: Low Cholesterol May Be Marker of Undiagnosed Cancer): if you are likely to get cancer you are likely to have low serum cholesterol. Or it could be that there's a confounding factor: something causes both low serum cholesterol and cancer.

It turns out that genetics, and specifically the fact that genes are randomly assigned during meiosis (in humans, for example, half the genes come from the mother and half from the father). Gregor Mendel's law of independent assortment says that the alleles of genes are chosen randomly from the possible alleles when a baby is being formed from the genetic material of mother and father.

This Mendelian Randomization means that it's possible to have nature perform a randomized trial for you. If you can find an allele that affects the trait you are trying to understand you can use it to sample the population to look for a cause and effect relationship.

In the case of low serum cholesterol there's a specific allele associated with Apolipoprotein E. The variant Apo E2 is associated with low serum cholesterol. And because of Mendel's law of independent assortment it will be assigned randomly in the population.

In 1986 Martijn B. Katan published a letter in The Lancet pointing out that Apo E2 causes a rare disease where patients have almost zero serum cholesterol.

Since Apo E2 is randomly assigned by Mendel's laws it's enough to look at the population and examine cancer rates and their relationship to the presence of the Apo E2 gene. So a 'trial' can be run by selecting a control group from the population and examining the rate of Apo E2 in that control. Then a group with cancer is tested for Apo E2.

If there's really a connection between low serum cholesterol and cancer then the cancer group should have a higher prevalence of Apo E2 than the control. You can think of the presence of Apo E2 being random across the population, if it's less than random in the cancer group (i.e. there's more or less than expected) then a causal relationship can be inferred. One way to see that is to look at a causal diagram of the relationships.


The arrows in the diagram represent causal relationships.

1. There's an arrow from Apo E2 to serum cholesterol because it is known that this allele causes low serum cholesterol.

2. The hypothesis is expressed in the arrow from low serum cholesterol to cancer. It's that arrow that's being determined.

3. There are other factors (age, diet, location, illnesses) which could affect both serum cholesterol and cancer.

4. There's no arrow leading to Apo E2 because it is completely determined by Mendel's laws. There's also no arrow from Apo E2 directly to the other factors because they are not affected by Apo E2.

5. There's no arrow directly from Apo E2 to cancer because there's no known direct relationship between the two.

(Note that these assumptions have to be justified. For example, #1 needs biological justification, as does #5).

With those relationships in place it's just a matter of performing the statistical test on the control group and cancer group to see if more Apo E2 is present in the cancer group (there's more on that in Mendelian randomization as an instrumental variable approach to causal inference).

This technique has been used to show a causal relationship between alcohol intake and blood pressure (see Alcohol Intake and Blood Pressure: A Systematic Review Implementing a Mendelian Randomization Approach) and to show no causal relationship between a mother's BMI and the fatness of her offspring (see Exploring the Developmental Overnutrition Hypothesis Using Parental–Offspring Associations and FTO as an Instrumental Variable).

And what of low serum cholesterol and cancer? A study (Apolipoprotein E Genotype, Plasma Cholesterol, and Cancer: A Mendelian Randomization Study) from 2009 concludes: "These findings suggest that low cholesterol levels are not causally related to increased cancer risk."

Thanks, Mendel!

Comments

Trevor Burnham said…
If I understand you correctly, you only have a population with random assignment if every person in it has one parent with allele A and one parent without it. Do you only recruit such people for your Apo E2 work?
Jonathan Histed said…
Fascinating: thank you for that :)

Popular posts from this blog

Your last name contains invalid characters

My last name is "Graham-Cumming". But here's a typical form response when I enter it:


Does the web site have any idea how rude it is to claim that my last name contains invalid characters? Clearly not. What they actually meant is: our web site will not accept that hyphen in your last name. But do they say that? No, of course not. They decide to shove in my face the claim that there's something wrong with my name.

There's nothing wrong with my name, just as there's nothing wrong with someone whose first name is Jean-Marie, or someone whose last name is O'Reilly.

What is wrong is that way this is being handled. If the system can't cope with non-letters and spaces it needs to say that. How about the following error message:

Our system is unable to process last names that contain non-letters, please replace them with spaces.

Don't blame me for having a last name that your system doesn't like, whose fault is that? Saying "Your last name …

Importing an existing SSL key/certificate pair into a Java keystore

I'm writing this blog post in case anyone else has to Google that. In Java 6 keytool has been improved so that it now becomes possible to import an existing key and certificate (say one you generated outside of the Java world) into a keystore.

You need: Java 6 and openssl.

1. Suppose you have a certificate and key in PEM format. The key is named host.key and the certificate host.crt.

2. The first step is to convert them into a single PKCS12 file using the command: openssl pkcs12 -export -in host.crt -inkey host.key > host.p12. You will be asked for various passwords (the password to access the key (if set) and then the password for the PKCS12 file being created).

3. Then import the PKCS12 file into a keystore using the command: keytool -importkeystore -srckeystore host.p12 -destkeystore host.jks -srcstoretype pkcs12. You now have a keystore named host.jks containing the certificate/key you need.

For the sake of completeness here's the output of a full session I performe…

More fun with toys: the Ikea LILLABO Train Set

As further proof of my unsuitability to be a child minder (see previous post) I found myself playing with an Ikea LILLABO 20-piece basic set train.


The train set has 16 pieces of track (12 curves, two straight pieces and a two part bridge) and 4 pieces of train. What I wondered was... how many possible looping train tracks can be made using all 16 pieces?

The answer is... 9. Here's a picture of the 9 different layouts.


The picture was generated using a little program written in Processing. The bridge is red, the straight pieces are green and the curves are blue or magenta depending on whether they are oriented clockwise or anticlockwise. The curved pieces can be oriented in either way.

To generate those layouts I wrote a small program which runs through all the possible layouts and determines which form a loop. The program eliminates duplicate layouts (such as those that are mirror images of each other).

It outputs a list of instructions for building loops. These instructions con…