Randomizing every non-essential variable in a scientific experiment to isolate only specific relations is the simple but powerful idea behind Randomized Controlled Trials (RCT). In practical applications however, RCTs have proven to be arduous to set-up and because of the stringent inclusion/exclusion criteria, their results are often plagued by bias and shallow statistical value. On the other hand, observational studies are easier to implement, but the statistical results inferred are also heavily biased, due to the lack of knowledge or understanding of the underlying relations between the relevant variables of an experiment, and even worse skewed by spurious correlations. Machinelearning and deep-learning algorithms are not exempt from such grave issues: they leverage quantity of data over their quality, introducing bias and fundamentally lacking explainability as expected by their black box nature. In the following, we introduce the framework of causal inference, which allows one to disentangle the interactions between variables of an experiment in terms of their cause-effect relations to produce a glass box of explainable results. As long as these interactions are known, even though not necessarily observed or quantifiable, the formalism allows one to answer interventional questions typical of RCT, e.g. “what happens if I increase the dose of the drug?”, leveraging purely observational data. Under stronger assumptions, questions aiming at uncovering the “why?” behind a specific mechanism can also be rigorously answered within the framework. Causal inference is thus not only a logical, human-centered approach to statistical problems but also the next frontier of intelligent data analysis in clinical and medical settings.
Clinical trials are typically designed so that the effectiveness of a treatment on a specific outcome, such as the improvement of symptoms can be assessed. Deceivingly simple, the process of obtaining unbiased and statistically significant results is far from straightforward. In fact, while correlations between treatment and outcome can be easily computed, in principle exist many factors that can influence the outcome and treatments at the same time, (heavily) affecting the result. These factors may be known, observed or not (i.e. data was or was not collected for that specific factor/feature), or even unknown to the experimenter, inevitably lead to biased estimates. To bypass this problem an experimenter would keep all these factors either static or completely random, so that if they do have any influence it is averaged out. Randomized controlled trials (RCT) of this kind are however very difficult to set-up: inclusion/exclusion criteria tend to be too-strict, patients drop out, discontinue or mix treatments; other times, the cohort of patients cannot be selected by the experimenter (e.g. Covid-19 pandemic) leading to biased samples. In these situations, observational studies are instead performed: raw data – with no further information about their mutual interactions – are statistically correlated and, for instance, the effectiveness of a treatment on a symptom is evaluated.