© 2020 Anton Lebedevich
Photo by Chris Arthur-Collins on Unsplash
Photo by Jana Sabeth on Unsplash
breed | size | tail wagging | … | friendly |
---|---|---|---|---|
pug | small | yes | yes | |
doberman | big | yes | yes | |
chihuahua | small | no | no | |
… |
By Roberto Ferrari from Campogalliano (Modena), Italy - Dog, CC BY-SA 2.0, Link
Usually truck arrives on Monday noon.
Model predicts low sales on Fri-Sun despite having some items in stock.
You decide to send one truck on Monday evening.
…
NO PROFIT, Monday sales dropped.
The product expires in one week,
customers don't want to stockpile soon to be expired product for a weekend
typical ML introduction courses be like:
There is something that you can't figure out from historical data alone.
You need to have domain knowledge or run experiments.
Kids learn that hard way (have you tried to wag a dog's tail?)
Will this dog be happy if we pet it?
inability to legitimately deduce a cause-and-effect relationship
between two variables solely on the basis of
an observed association or correlation between them
en.wikipedia.org/wiki/Correlation_does_not_imply_causation
supervised ML models learn association (correlation), not causation (effect of an intervention)
source Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
AB-tests (Randomised Controlled Trials)
|
|
Observational Studies
|
AB-tests (Randomised Controlled Trials)
|
|
Observational Studies
|
Use it only if you can't run a randomized experiment:
The conditional probability of receiving every value of treatment, though not decided by the investigators, depends only on measured covariates
The probability of receiving every value of treatment conditional on covariates is positive.
The values of treatment under comparison correspond to well-defined interventions that, in turn, correspond to the versions of treatment in the datastory: being fit by genes and exercising vs. being obese and forced to diet and exercise
story: critically ill patients
story: critically ill patients
story: folic acid and cardiac malformation
source Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
source Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
Adding features could add more bias.
More data doesn't mean that extra data is unbiased.
ML model could throw away your treatment variable.
Confounders are correlated with causes, incorrect feature importance.
Regression coefficients are unstable for correlated features.
Still can't find causation, only association…
Success:
"Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes" Dahabreh et al. 2012
"Can We Trust Observational Studies Using Propensity Scores in the Critical Care Literature? A Systematic Comparison With Randomized Clinical Trials" Kitsios et al. 2015
"Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials" Anglemeyer et al. 2014
Failure:
Anton Lebedevich
mabrek@gmail.com
@mabrek