Introducing the Individual Synthetic Control

In this post I will introduce the Individual Synthetic Control (ISC) method proposed in my paper with Richard Breen “Earnings and Income Penalties for Motherhood: Estimates for British Women using the Individual Synthetic Control Method” published in the European Sociological Review. I will not explain the technical details of the Synthetic Control method here, instead I will focus on introducing the (neglected) topic of “individual causal effects”.

Introduction

Did you ever wonder what would have happened if you had chosen to go down this road instead of that road? How different would your life be? This type of question is at the heart of causal investigations. Causality, in a nutshell, is trying to evaluate counterfactuals. Counterfactuals are concerned with alternative states of the world. This is exemplified in popular culture in films like Sliding Doors or Melinda Melinda. In the film Sliding Doors we follow what would have happened if Gwyneth Paltrow had not missed her train that day. This film is about the counterfactual story of that one person—a counterfactual individual story.

However, the idea of individual causal effect is generally at odds with the causal inference literature. This is because in the real world we never observe both what happened and what would have happened. Morgan and Winship write “causal effects cannot be observed or directly calculated at the individual level” (2001, p.5). This has lead the causality literature to focus almost exclusively on average effects, as if we could not attempt to calculate individual counterfactuals.

In our paper, we show how to calculate causal effects at the individual level using the Synthetic Control method.

Counterfactual Case Studies

The Synthetic Control (SC) method proposed by Abadie and colleagues has become very popular in political science and economics. The method from its inception is tailored for single case studies. The first study using the SC estimated the effect of terrorism on GDP in the Basque Country (Abadie and Gardeazabal 2003). In this original study, there is only one treated case (the Basque Country) and a few control cases (the other Spanish regions) used to construct the counterfactual Basque Country. In this method, the counterfactual is called “synthetic”.

Applications of the SC method include synthetic California, Russia, Cuba, and synthetic Britain. In most applications, the Synthetic Control is used to create a single case counterfactual (one treated unit). The strength of the Synthetic Control is the way its algorithm constructs the weights used to create the counterfactual. Because in most applications, the pre-treatment counterfactual line (outcome and other covariates) is nearly identical to the observed treated line, the method is thought to capture not only observed characteristics but also time-varying unobserved characteristics. The parallel trend assumption in the SC can easily be inspected visually. Furthermore, the parallel trend between the treated and controls can take any shape.

Even though the Synthetic Control uses a particularly powerful algorithm to build weights, the principle of counterfactual inference of the method is the same as more traditional methods such as matching and even regressions: conditioned on a set of pre-treatment covariates/trends/etc, the treatment can be considered as good as random (for relevant confounders) and can therefore be used to compute causal effects. This is simply the ignorability condition. Because the synthetic control uses many pre-treatment variables to compute weights, and generally because the pre-treatment trend in the control outcome variable is identical to the treated outcome variable, we can have faith that the ignorability assumption holds. Ignorability is more challenging in a cross-sectional setting where the matching is done on “static” characteristics. However, in theory matching should be as good as the synthetic control method if matching is performed on trends or previous time periods.

The Idea of Individual Causal Effects

Most applications of the Synthetic Control are based on countries or states; large aggregate units. In our paper, we applied this method to actual individuals (thus the Individual Synthetic Control or ISC). More precisely, our study focused on the motherhood earning penalty in the UK. The outcome is monthly earnings and the “treatment” variable is the birth of a first child. To our knowledge, this is the first time that the SC method has been applied to actual individuals. Using the ISC, we were able to create the counterfactual earning trajectory for each mother in our sample had they not had a child. In other words, for each mother we calculated the causal effect of first birth on their earning growth.

One straightforward advantage of the ISC is how easy it is to compute heterogenous effects. Because the effects are first calculated at the level of the individual, the aggregation into different groups (education, class, etc) is unproblematic.

One of the strengths of our application is that mothers were drawn from a representative random sample of the population. Therefore, the causal effect is generalisable, which is not often the case with Synthetic Control applications based on case studies. It can be challenging to assess the validity of inference made from one case study. Perhaps even more challenging in case studies of a region or a country is the issue of “general equilibrium”. A region (like the Basque Country) is not independent from the larger economy of a country, and a country (like Britain) is not independent from the European economy. It is difficult to imagine what “holding everything else constant” really means in a context where units are so deeply interrelated. Furthermore, some studies make counterfactual inference decades after the treatment, as if the social world did not continuously adapt and find new equilibrium. Does talking about “partial” equilibrium (ceteris paribus) make sense in a context like this? In our application, the issue of general equilibrium is less problematic because the units (the individuals) are independent both in a statistical term (drawn randomly from a population) and substantively (one birth does not reshape the labour market of a country).

Conclusion

The Synthetic Control framework is very powerful for assessing the counterfactual of single units (either aggregate or individual units). Its application to actual individuals from longitudinal surveys is very promising.

The main issue with this method is that it requires a large number of pre-treatment periods. Few panel surveys meet this requirement. Another important issue is how computationally intensive this method is. It would be virtually impossible at this stage to run an Individual Synthetic Control (ISC) on 10,000 individuals. It would just take too long. There is important and interesting work to be done on the computational aspects of this method. Another exciting area of research is the comparison of the ISC with other panel data methods. The ISC theoretically should capture unobserved time-varying trends but this requires much more empirical research.

One area that comes to mind where the ISC could play an important role is medicine. One issue with medicine is the difficulty of tailoring treatment at an individual level. A lot of medical research relies on average effects based on large groupings (sex, age groups, etc.). Being able to tailor a medical treatment to the uniqueness of the individual will greatly improve its effectiveness.