A Simple Guide to Odds-Ratio
In this post, let’s have a look at odds-ratio. Odds-ratio plays an important part in social sciences, in particular in the sociological study of social mobility. It used to understand the strength of the association between parents’ social class and the social class of their children.
Odds-ratio in a measure of association between two categorical variables. A categorical variable is a variable made of discrete categories such as married, divorced, single, …
Odds
The idea of odds is very close to the more familiar notion of probability.
If I roll a dice I have 1 chance over 6 to get any number. My probability of getting a
However, the odds to get a

Let’s use the symbol
We can retrieve a probability from odds by using the following equations.
Empirical Example
Let’s illustrate the use of Odds-Ratio with an example.
We have a population of 1000, with 700 (70%) are poor and 300 (30%) are rich.
In this population, 200 have been to university (
We see in this table that poor people have much less chance to enter university than rich people. Although, poor people make up 70% of the population (with 700 people), only 50 go to university.
Odds
We can use odds to describe in more detail the patterns of the table.
Let’s start with the poor. We have
This means that for every 1 poor person who goes to university 13 poor people do not go to university.
The odds of poor people going to university is
For the rich, we have
This means that for every 1 rich person who goes to university 1 rich person does not go to university.
The odds of rich people going to university is
Odds-Ratio
The odds ratio puts these two odds together
Is this odds-ratio (
Independance Table
Let’s put this odds-ratio (
How do we calculate an independent table?
- Poor people make 70% of this population, then they should make 70% of the people who went to university, if being poor did not matter for access to university.
Let’s call
This is the so-called NULL hypothesis
Let’s calculate the number of people who went to university according to the population proportion of the poor and the rich. In other words, the number if
These values would fill these two cells
We can also calculate the numbers for those who did not go to university (which is 800 people) according to the population proportion of economic status.
Now we have a table that reflects the proportion of the population of the poor and the rich in university attendance. The access to university simply reflects the size of each population.
This is the table we should observe if access to university was independent to economic status. In other words, if
Another simple way to calculate this is to multiply the margin and divide by the total. For instance for cell number 1,
Expected Table
Observed Table
Odds-Ratio of the independence table
Let’s investigate the odds-ratio we would have given that economic status had no bearing on university access. In other words, if the two variables were not associated.
We first have our independence table, also called the “expected” table (expected given no association or expected if the NULL hypothesis was true).
If we repeated the calculation of odds for the poor and the rich for the independence/expected table we would have the following result
Now if we take again the ratio of these two odds we have
We have discovered that when two variables are independent (not associated, uncorrelated, etc) we have an odds-ratio of 1 (The odds-ratio of independence
How does our observed odds-ratio of
In other words, how far is
How likely is it to get this odds-ratio by chance alone?
Statistical inference of Odds-Ratio
This bit is a bit more technical because it involves hypothesis testing, which is a another topic.
A common way for testing this is to use a log transformation because we can then works with a normal distribution.
f
Note that the log value of the odds-ratio for the independant value is
So, we now ask the same question: how far is -2.56 from 0?
Is -2.56 statistically significantly different from 0?
To answer this question we need to calculate the standard error, \text{SE}, which is simply
We can derive a z-value by
In our case the z-value is
A simply way to compute the p.value for the odds-ratio is to do a poisson regression
In R
this is straightforward
# the data
ex = expand.grid(status = c('poor','rich'), university = c('no', 'yes'))
ex$obs = c(650, 150, 50, 150)
ex$status = relevel(ex$status, 'rich')
We use the function glm
and run a regression interacting university and economic status
glm(obs ~ university*status, data=ex, family = 'poisson')
We find the same value for the coefficient of
Remember that exponentiating the log returns the odds, and in this case the odds-ratio.
In R
you can retrieve your table by predicted the values from the poisson regression
g1 = glm(obs ~ university*status, data=ex, family = 'poisson')
predict(g1, type = "response") # # same as below
exp(predict(g1)) # same as above
Neatly, you can generated the values under independence by simply running a regression without the interaction term.
# Independance model
g_indep = glm(obs ~ university + status, data=ex, family = 'poisson')
predict(g_indep, type = "response") # these are the values under undependance