In this post, let’s have a look at odds-ratio. Odds-ratio plays an important part in social sciences, in particular in the sociological study of social mobility. It used to understand the strength of the association between parents’ social class and the social class of their children.

Odds-ratio in a measure of association between two categorical variables. A categorical variable is a variable made of discrete categories such as married, divorced, single, …


Odds

The idea of odds is very close to the more familiar notion of probability.

If I roll a dice I have 1 chance over 6 to get any number. My probability of getting a 2, is simply 16

However, the odds to get a 2 is 1:5. In other words “the odds are against me”.


Let’s use the symbol π (pi) to denotes probabilities. We can define the odds simply as The probability of “success” against the probability of “failure”.

(1)odds=π1π

We can retrieve a probability from odds by using the following equations.

(2)π=odds1+odds


Empirical Example

Let’s illustrate the use of Odds-Ratio with an example.

We have a population of 1000, with 700 (70%) are poor and 300 (30%) are rich.

In this population, 200 have been to university (D=1 is “yes”, D=0 is “no”).

universitypoorrichno650150800yes501502007003001000

We see in this table that poor people have much less chance to enter university than rich people. Although, poor people make up 70% of the population (with 700 people), only 50 go to university.


Odds

We can use odds to describe in more detail the patterns of the table.

Let’s start with the poor. We have 50 people with a university degree and 650 without. The probability is 50700=0.07143, and the odds is 50650=0.0769. Let’s get a fractional form 10.0769=113.

This means that for every 1 poor person who goes to university 13 poor people do not go to university.

The odds of poor people going to university is

(3)113


For the rich, we have 150 people with a university degree and 150 without. The probability is 150300=0.5, and the odds is 150150=11.

This means that for every 1 rich person who goes to university 1 rich person does not go to university.

The odds of rich people going to university is

(4)11


Odds-Ratio

The odds ratio puts these two odds together

(5)OR=50650150150=11311

Is this odds-ratio (OR) reflects a strong association? We can think about the strength of the association by calculating the odds-ratio (OR) is economic status and education were not related (i.e. independent).


Independance Table

Let’s put this odds-ratio (OR) in context by calculating the table we would observed if being rich or poor did not have any association with university access. We say that the variables are independent if they are not associated. Learning about one variable is not informative or predictive about the other variable.

How do we calculate an independent table?

  • Poor people make 70% of this population, then they should make 70% of the people who went to university, if being poor did not matter for access to university.

Let’s call δ (delta) the “access to university”, with δpoor=δrich denoting that access is the same whether poor or rich. If the access is the same, then economic status does not play a role in predicting university access, therefore the variables are independent.

This is the so-called NULL hypothesis H0:δpoor=δrich. The alternative hypothesis is HA:δpoorδrich, i.e. access is conditional economic status. Economic status is associated with access to university.


Let’s calculate the number of people who went to university according to the population proportion of the poor and the rich. In other words, the number if H0 was true (i.e. if access was independent of economic status).

(6)0.7×200=140

(7)0.3×200=60

These values would fill these two cells

universitypoorrichno800yes140602007003001000

We can also calculate the numbers for those who did not go to university (which is 800 people) according to the population proportion of economic status.

(8)0.7×800=560

(9)0.3×800=60


Now we have a table that reflects the proportion of the population of the poor and the rich in university attendance. The access to university simply reflects the size of each population.

This is the table we should observe if access to university was independent to economic status. In other words, if H0 was true.

universitypoorrichno560240800yes140602007003001000

Another simple way to calculate this is to multiply the margin and divide by the total. For instance for cell number 1, 700×800=560.


Expected Table universitypoorrichno560240800yes140602007003001000

Observed Table universitypoorrichno650150800yes501502007003001000


Odds-Ratio of the independence table

Let’s investigate the odds-ratio we would have given that economic status had no bearing on university access. In other words, if the two variables were not associated.

We first have our independence table, also called the “expected” table (expected given no association or expected if the NULL hypothesis was true).

If we repeated the calculation of odds for the poor and the rich for the independence/expected table we would have the following result

(10)odds poor=140560=14=0.25

(11)odds poor=60240=14=0.25

Now if we take again the ratio of these two odds we have

(12)OR ind=14056060240=0.250.25=1

We have discovered that when two variables are independent (not associated, uncorrelated, etc) we have an odds-ratio of 1 (The odds-ratio of independence OR ind=1).


How does our observed odds-ratio of 113 compares with the odds-ratio if the two variables were independent OR ind=1?

In other words, how far is 113 from the expected value of 1?

How likely is it to get this odds-ratio by chance alone?


Statistical inference of Odds-Ratio

This bit is a bit more technical because it involves hypothesis testing, which is a another topic.

A common way for testing this is to use a log transformation because we can then works with a normal distribution.

(13)L=ln(OR)

f

(14)L=ln(113)=2.56

Note that the log value of the odds-ratio for the independant value is

(15)L=ln(11)=0


So, we now ask the same question: how far is -2.56 from 0?

Is -2.56 statistically significantly different from 0?

To answer this question we need to calculate the standard error, \text{SE}, which is simply

(16)SE=1650+150+1150+1150=0.187

We can derive a z-value by z=LSE and from it derive a p-value.

In our case the z-value is

(17)z=2.560.187=13.68


A simply way to compute the p.value for the odds-ratio is to do a poisson regression

statusuniversityobs1poorno650.002richno150.003pooryes50.004richyes150.00

In R this is straightforward

# the data
ex = expand.grid(status = c('poor','rich'), university = c('no', 'yes'))
ex$obs = c(650, 150, 50, 150)
ex$status = relevel(ex$status, 'rich')

We use the function glm and run a regression interacting university and economic status

glm(obs ~ university*status, data=ex, family = 'poisson')
termestimatestd.errorstatisticp.value1(Intercept)5.010.0861.370.002university : yes0.000.120.001.003status : poor1.470.0916.190.004university-yes x status-poor2.560.1913.740.00

We find the same value for the coefficient of 2.56, which is the log of the odds, and the z-value of 13. We find a p-value less than 0.00, which indicates a highly significant result


Remember that exponentiating the log returns the odds, and in this case the odds-ratio. exp(2.56)=113.

In R you can retrieve your table by predicted the values from the poisson regression

g1 = glm(obs ~ university*status, data=ex, family = 'poisson')
predict(g1, type = "response") # # same as below 
exp(predict(g1)) # same as above 

Neatly, you can generated the values under independence by simply running a regression without the interaction term.

# Independance model
g_indep = glm(obs ~ university + status, data=ex, family = 'poisson')
predict(g_indep, type = "response") # these are the values under undependance