Complementarity of experimental and observational data

Randomized controlled trials (RCTs) are considered the gold standard approach for assessing the causal effect of an intervention or a treatment on an outcome of interest. However, RCTs can come with drawbacks. They can be expensive, take a long time to set up, and be compromised by insufficient sample size due to either recruitment difficulties or restrictive eligibility criteria. The latter criteria often narrow the eligible population so that it differs markedly from the population that will be potentially eligible for the treatment. Therefore, the findings from RCTs can lack generalizability (external validity) to the real target population of interest.

In contrast, there is an abundance of observational data, collected without systematically-designed interventions. Such data can come from different sources: they can be collected from research sources such as disease registries, cohorts, biobanks, epidemiological studies, or they can be routinely collected through electronic health records, insurance claims, administrative databases. In that sense, observational data can be readily available, and include large samples that are representative of the target populations, while being less costly than RCTs.

In order to leverage observational data for causal effect analysis in health domains, several laws built on the Food and Drug Administration (FDA) work encourage the use of real world data (RWD), defined as data derived from sources other than randomized clinical trials, for regulatory decision making. Clinical evidence regarding the usage and potential benefits or risks of a medical product derived from the analysis of RWD is named Real World Evidence (RWE). The European Medicines Agency (EMA) regulatory authority is also very active in working with RWD to facilitate development and access to medicines. However, there are often concerns about the quality of these “big data” given that the lack of a controlled experimental intervention opens the door to confounding bias.

Promising potential of machine learning for integrative analysis

To evaluate the causal effect of an intervention, considerable progress arose recently in machine learning. Well anchored biostatistical methodologies, which often rely on linear models, have been challenged by flexible machine learning approaches that can handle large dimensional variables and complex relationships between variables and health outcomes. The extent to which more flexible approaches provide better evaluation of causal effects is a matter of discussion, which will be addressed during the workshop.

Observational data and clinical trial data can provide different perspectives when evaluating an intervention or a medical treatment. Combining the information gathered from experimental and observational data is a promising avenue for medical research, because the knowledge that can be acquired from integrative analyses would not be possible from any single-source analysis alone.

The objective of this workshop is to ...

  • ... bring together main actors working with observational or real-world data, and developing biostatistics and machine learning models for causal inference
  • ... present the state of the art in dynamic treatment regime (DRT)
  • ... highlight the complementarity of using both Structural Causal Model and Potential Outcome point of views
  • ... present the possibilities offered by real word data and by the integration of real word data and RCTs
    • Validate observational studies
    • Generalize the treatment effect in a target patient population
    • Better estimate the (heterogeneous) effects
    • Handle unobserved confounders
    • Help better designing RCT with better external validity
  • ... discuss about which real word data should be collected
  • ... promote collaboration between the different actors


The workshop is dedicated to researchers in ML, in health, clinicians, companies interested in these topics.


Elias Bareinboim<

Elias Bareinboim
Associate Professor at Columbia University.
AI, Causal Reinforcement Learning, Structural Causal Models.

Patrick M. Bossuyt

Patrick M. Bossuyt
Professor of Clinical Epidemiology at the University of Amsterdam & Chair of the division of Public Health & Clinical Methods, Academic Medical Center.
Clinical trials, biomarker evaluation.

Bill Crown

Bill Crown
Distinguished Research Scientist at Heller School for Social Policy & Management at Brandeis University. Former Chief Scientific Officer at Optum working on OPERAND project .

Yohann Foucher

Yohann Foucher
Associate Professor at Nantes University.
Biostatistics, longitudinal data, time-to-event data, competing risk, etc.

Els Goetghebeur

Els Goetghebeur
Professor at Ghent University.
Causal Inference, Survival, Health

Devin Incerti

Devin Incerti
Scientist at Genentech.
Causal Inference, Survival data, Combining RCTs and Observational data.

Clémence Leyrat

Clémence Leyrat
Assistant Professor in medical Statistics at London School for Hygiene and Tropical Medicine.
Trials emulation, Missing values, Cancer Survival, etc.

Nicolas Loiseau

Nicolas Loiseau
Data Scientist at Owkin.
AI, causal inference, clinical trials, health.

Judea Pearl

Judea Pearl
Professor at the Computer Science Department, UCLA.
(tentatively accepted)

Mihaela van der Schaar

Mihaela van der Schaar
John Humphrey Plummer Professor at the University of Cambridge & Fellow at Alan Turing Institute.
Machine Learning, Healthcare, Causal effect inference, AutoML

Nigam H. Shah

Nigam H. Shah
Professor of Medicine (Biomedical Informatics) at Stanford University & Associate CIO for Data Science at Stanford Healthcare.
Safe, ethical and cost-effective use of machine learning in health care.

Rui Song

Rui Song
Professor at North Carolina State University.

Causal Inference, ML, DTR

Elizabeth. A. Stuart

Elizabeth A. Stuart
Professor at Johns Hopkins Bloomberg School of Public Health.
Causal Inference, Generalization, Mediation, Mental Health, etc.


The program starts at Tue Jun 22 2:00 pm and ends at Wed 23 7:00 pm (UTC+2).
Tuesday (Jun 22) Wednesday (Jun 23)
14:00-14:15 Opening by the workshop organizers
Mihaela van der Schaar

Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge. In addition to leading the van der Schaar Lab, Mihaela is founder and director of the Cambridge Centre for AI in Medicine (CCAIM). Her ground-breaking work on machine learning for healthcare includes the development of improved methods for forecasting individual risks and for identifying covariates that are most important for forecasting risk. Her work has identified better treatment options for patients with heart failure, cystic fibrosis, breast cancer, and Alzheimer’s disease.

Individualized treatment effect inference: theoretical insights, algorithms and applications in healthcare
Chair: Julie Josse (Inria, IDESP)
Els Goetghebeur

Els Goetghebeur is Full Professor in the Department of Applied Mathematics, Computer Science and Statistics. She chairs the UGent expert Center for Statistics. She is chief-editor of ‘Statistics in Medicine’ and chair of the causal inference group of STRATOS ‘Strengthening Analytical Thinking for Observational Studies’. Her research is devoted to the development and application of data analytic methods for primarily biomedical applications generally and with a focus on causal inference, survival analysis and quality of care.

Transportable effects in evolving patient populations: learning from RCTs and trial emulation in observational studies
Chair: Raphaël Porcher (Université de Paris)
Clémence Leyrat

After her studies of cognitive sciences and biostatistics at the University of Bordeaux, Clémence did her PhD in Paris on the use of propensity score methods for the analysis of cluster randomised trials. Since 2015, she has worked at the department of Medical Statistics at the LSHTM. She does methodological research on missing data methods in propensity score analysis. And she is also interested in the analysis of observational data, in collaboration with researchers from the Electronic Health Records group at LSHTM. Since 2018, she is also part of the Inequalities in Cancer Outcomes Network and interested in the application of causal inference methods, including emulated trials, to understand cancer inequalities in the UK.

Using large observational data to emulate target trials: What can be done to address immortal-time bias?Abstract: Real world evidence from observational studies is becoming increasingly important to influence health care policies. However, unlike randomised trials, observational studies are prone to bias, such as confounding or immortal-time bias. A general framework to emulate randomised trials from observational data has been proposed to estimate causal effects while addressing these two issues. Briefly, the approach involves cloning, censoring and then re-weighting observations. In this talk, I will present the methodology and its underlying assumptions, and discuss the benefits of this procedure over more standard modelling approaches as well as the role double robust methods and machine learning could play in this context. This will be illustrated with an application to the estimation of the causal effect of early surgery on 1-year survival among older lung cancer patients using data from the England Cancer Registry.
Chair: Imke Mayer (EHESS, Inria)
Rui Song

Rui Song is Professor of Statistics at North Carolina State University. Her current research interests include Machine Learning, Causal Inference, Precision Health, Financial Econometrics. More specifically, she works on high dimensional statistical learning, reinforcement learning, semiparametric inference and dynamic treatment regimes. She is currently a fellow of American Statistical Association (ASA) and Institute of Mathematical Statistics (IMS).

On Optimal Treatment Decision Making with Multiple Data Resources
Chair: Julie Josse (Inria, IDESP)
15:25-15:45 Nicolas Loiseau
Inferring treatment effect with external control arms
Chair: Michael Blum (Owkin)
15:45-16:00 Coffee break 15:40-16:00 Coffee break
Yohann Foucher

Yohann Foucher is Professor in Nantes’s University in France. In his research he focuses on longitudinal data analysis with censoring, precision and stratified medicine, and propensity weighting score methods.

Simulation-based studies related to the G-computation for causal inference: an overview of recent resultsAbstract: Different methods are available to estimate a marginal effect in presence of confounders. In the literature, the G-computation (GC) is a less applied method than those based on the propensity score. We conduct several simulation-based studies to explore its performances in distinct circumstances. We focus on a binary treatment, a binary outcome and baseline confounders.
Firstly, we investigate the impact on the performances of covariates' selection. We use different sets of covariates in the GC as well as in three other methods: inverse probability of treatment weighting (IPW), full matching and targeted maximum likelihood estimator (TMLE). The results suggest that considering all the covariates causing the outcome leads to the lowest variance without increasing the bias, particularly for the GC. (Chatton et al., Sci Rep 2020)
Secondly, we combine machine learning and G-computation. We evaluate the performances of several methods, including penalized logistic regressions, neural network, support vector machine, boosted classification and regression trees, and super learner. We report that the super learner leads to a bias close to zero without variance or convergence issue, even in small sample sizes. The support vector machine also performs well, but the mean bias is slightly higher than the super learner one. (Le Borgne et al., Sci Rep 2021)
Thirdly, we study the robustness of GC to a near-violation of the positivity assumption associated with different level of extrapolation issue. We compare the performances between GC, IPW, truncated-IPW, TMLE, and truncated-TMLE. The GC seems less prone to bias than IPW and truncated-IPW, except in the presence of an improbably high extrapolation issue. The TMLE have similar properties but at the cost of a severe under-coverage in all studied scenarios. (Léger et al., In revision)
Lastly, we define the theoretical background of the GC implementation for time-to-event analyses, and we compare the performances of GC and IPW in such a context. We report higher efficiency for GC when the causes of the outcomes are considered, even in the presence of a high censoring rate. (Chatton et al, In revision)
In light of these results, we propose novel functions in the R package “RISCA” to encourage the use of the GC. We are working to further propose a super learner for causal inference in the context of time-to-event outcomes.
Chair: Raphaël Porcher (Université de Paris)
16:00-16:20 Devin Incerti
A meta-analytic framework for decision making and error control in clinical trials with external control arms
Chair: Michael Blum (Owkin)
Bill Crown

Bill Crown is a Distinguished Research Scientist in the Heller School. He is an expert in real world data analysis, focusing upon research designs and statistical methods for drawing causal inferences from transactional health care datasets such as medical claims and electronic health records. He then became lead health economics consultancies at Truven and Optum. And until recently, he was Chief Scientific Officer at OptumLabs. He currently co-chairs the ISPOR Task Force on Machine Learning and is particularly interested in the intersection of machine learning and causal inference methods, as well as transparency in the conduct and reporting of empirical health care research.

Causal Estimates of Treatment Effects Using Routinely Collected Healthcare Data
Chair: Michael Blum (Owkin)
Elizabeth A. Stuart

Elizabeth A. Stuart is Vice Dean for Education and Bloomberg Professor of American Health in the Departments of Mental Health Biostatistics, and Health Policy and Management of the Johns Hopkins Bloomberg School of Public Health. She received her Ph.D. in Statistics in 2004, from Harvard University. She is particularly interested in the trade-offs in different designs for estimating causal effects, especially in terms of improving internal validity of non-experimental studies and external validity of randomized studies. She is a Fellow of the American Statistical Association and the American Association for the Advancement of Science.

Integrating experimental and population data to estimate population average treatment effectsAbstract: With increasing attention being paid to the relevance of studies for real-world practice (such as in education, international development, and comparative effectiveness research), there is also growing interest in external validity and assessing whether the results seen in randomized trials would hold in target populations. While randomized trials yield unbiased estimates of the effects of interventions in the sample of individuals (or physician practices or hospitals) in the trial, they do not necessarily inform about what the effects would be in some other, potentially somewhat different, population. While there has been increasing discussion of this limitation of traditional trials, until relatively recently there was little statistical work developing methods to assess or enhance the external validity of randomized trial results. This talk will discuss design and analysis methods for combining experimental and population data to assess and increase external validity, including the potential value of machine learning and other flexible modeling approaches.
Chair: Julie Josse (Inria, IDESP)
Elias Bareinbom

Elias Bareinboim is an associate professor in the Department of Computer Science and the director of the Causal Artificial Intelligence Lab at Columbia University. He obtained his Ph.D. in Computer Science at the University of California, Los Angeles, advised by Judea Pearl. His research topics are Artificial Intelligence, Machine Learning, Statistics, Robotics, Cognitive Science, and Philosophy of Science. When it comes to causal inference, Elias’ work covers applications to data-driven fields, a.k.a. data science, in the health and social sciences as well as artificial intelligence and machine learning, with a focus in how to make robust and generalizable causal and counterfactual claims in the context of heterogeneous and biased data collections, including due to issues of confounding bias, selection bias, and external validity.

Causal Inference and Fusion
Chair: Gaël Varoquaux (Inria)
17:10-18:00 CANCELLED
Judea Pearl

Chair: Michael Blum (Owkin)
Panel discussion: Is there a place for innovative machine learning methods (regulatory-approved) in RWE analysis?

Panelists: Patrick M. Bossuyt, Bill Crown, Rui Song, Elizabeth A. Stuart
Moderators: Raphaël Porcher (Université de Paris), Gaël Varoquaux (Inria)


Due to the current pandemic, the workshop will be held 100% remotely. To register, please fill in the registration form.

Please note that for technical reasons, you will be asked to register for multiple sessions (4 sessions) in order to participate in the entire workshop.

Call for questions

We will collect questions from the audience for the panel discussion, before and during the workshop. You can either submit your questions while registrating for the workshop or anytime using the Dory Q&A tool.


The conference is jointly organized by:

With the help of the missing values and causality group at Inria with Bénédicte Colnet, Imke Mayer and Paul Roussel.


If you have questions regarding registration to the workshop or other questions related to the workshop, please use the button below to reach out to us.