Causal inference has many tangible applications in a wide variety of scenarios, but in my experience, it is a subject that is rarely talked about among data scientists.
In this article, we define causal inference and motivate its use. Then, we apply some basic algorithms in Python to measure the effect of a certain phenomenon.
Causal inference is a field of study interested in measuring the effect of a certain treatment.
Another way to think about causal inference, is that it answers what-if questions. The goal is always to measure some kind of impact given a certain action.
Examples of questions answered with causal inference are:
- What is the impact of running an ad campaign on product sales?
- What is the effect of a price increase on sales?
- Does this drug make patients heal faster?
We can see that these questions are relevant for decision-makers, but they cannot be addressed with traditional machine learning methods.
Causal inference vs traditional machine learning
With traditional machine learning techniques, we generate predictions or forecasts given a set of features.
For example, we can forecast how many sales we would do next month.
In other words, machine learning models uncover correlations between features and a target to better predict that target. In that sense, any correlation between some feature and the target is useful if it allows the model to make better predictions.
When it comes to causal inference, we wish to measure the impact of a treatment.
For example, we can determine how increasing a product’s price will impact sales.
Thus, with causal inference, we seek to uncover causal pathways.