Causal Inference: Introduction

Anton Lebedevich

data science engineer, independent contractor
background in backend performance optimization
loves time series and anomalies
worked on spam filtering, marketing campaign optimization, demand forecasting
blogs at mabrek.github.io

Outline

when supervised learning is not enough
causation vs. association
causal diagrams
inference methods
warnings
where to go next

Story: dog's tail

Photo by Chris Arthur-Collins on Unsplash

ML problem: make a dog friendly

Photo by Jana Sabeth on Unsplash

Dogs dataset

breed	size	tail wagging	friendly
pug	small	yes	yes
doberman	big	yes	yes
chihuahua	small	no	no
…

tail wagging → friendliness

you decided to wag his tail

By Roberto Ferrari from Campogalliano (Modena), Italy - Dog, CC BY-SA 2.0, Link

YOU DIED

Story: sales before weekend

Photo by nrd on Unsplash

Product weekly sales

ML problem: optimize replenishment

Mj-bird / CC BY-SA

Usually truck arrives on Monday noon.
Model predicts low sales on Fri-Sun despite having some items in stock.
You decide to send one truck on Monday evening.
…
NO PROFIT, Monday sales dropped.
The product expires in one week,
customers don't want to stockpile soon to be expired product for a weekend

better to send two trucks

typical ML introduction courses be like:

Problem → ML → Solution

source knowyourmeme.com/memes/futurama-fry-not-sure-if

Causation

There is something that you can't figure out from historical data alone.

You need to have domain knowledge or run experiments.

Kids learn that hard way (have you tried to wag a dog's tail?)

Causal questions:

Estimate the treatment effect on a population.
Which treatment to choose for a patient?
How many items will be sold if we set price to X?
How much money your model deployment could bring?
Does X cause Y?
What if …?

And the most important question:

Will this dog be happy if we pet it?

Photo by Toshi on Unsplash

It's not a typical supervised ml problem

Correlation does not imply causation

inability to legitimately deduce a cause-and-effect relationship
between two variables solely on the basis of
an observed association or correlation between them

en.wikipedia.org/wiki/Correlation_does_not_imply_causation

supervised ML models learn association (correlation), not causation (effect of an intervention)

source completelyseriouscomics.com/?p=16

Causation vs Association

source Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

AB-tests (Randomised Controlled Trials)

"gold standard"
real intervention to test causation
randomization breaks unwanted associations

Observational Studies

causal structure is assumed, not tested by an intervention
unmeasured confounding introduces bias

AB-tests (Randomised Controlled Trials)

difficult to make double-blind surgery
non-adherence in clinical trials
small sizes, limited geography
too long or expensive

Observational Studies

you can use historical data

Observational studies

Use it only if you can't run a randomized experiment:

you have to make assumptions
requires exchangeability, positivity, consistency
breaks under unmeasured confounding

Exchangeability

The conditional probability of receiving every value of treatment, though not decided by the investigators, depends only on measured covariates

Positivity

The probability of receiving every value of treatment conditional on covariates is positive.

Consistency

The values of treatment under comparison correspond to well-defined interventions that, in turn, correspond to the versions of treatment in the data

story: being fit by genes and exercising vs. being obese and forced to diet and exercise

Causal diagrams

story: critically ill patients

Conditioned confounding

story: critically ill patients

Selection bias

story: folic acid and cardiac malformation

Complicated bias (M-bias)

source Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

Complicated bias with conditioning

source Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

Inference methods

Stratification
Propensity scores (many ways to use)
Outcome regression (is my profession)
… to be continued, active research

Moar Layers and Moar Data myth

Adding features could add more bias.
More data doesn't mean that extra data is unbiased.
ML model could throw away your treatment variable.
Confounders are correlated with causes, incorrect feature importance.
Regression coefficients are unstable for correlated features.
Still can't find causation, only association…