Data Science Interdisciplinary Research Cluster

Missing data problem is common in health science research. The default option in standard statistical software for handling missing data is to delete the observations with missing data, which can result in biased results. In this two-day workshop, we will learn how to handle missing data in a responsible way. On day one, Professor Aya Mitani will go over the challenges posed by distinct types of missing data and how each of the three missing data mechanisms (MCAR, MAR, and MNAR) affects the analysis results. Attendees will learn several ways to visually inspect the amount and patterns of missing data. Missing data methods for cross-sectional data using multiple imputation (MI) will be presented. On day two, Professor Mitani will introduce missing data methods for longitudinal and clustered data including inverse-probability weighting for dropout and MI for multilevel data, with further extensions to interaction and derived variables.

This workshop is only open to current DLSPH students, staff and faculty. To reserve a spot in this workshop, please register here.

Learning Objectives:

  • Understand the missing data mechanisms and their implications in analysis
  • Use R to visually inspect patterns of missing data
  • Select proper methods to deal with missing data
  • Apply multiple imputation using R in cross-sectional and multilevel data
  • Apply multiple imputation using software Blimp in cross-sectional and multilevel data
  • Apply inverse probability weighting for longitudinal data with dropout

Workshop Notes:

  • Most of the analyses will be performed in R
  • Software Blimp (https://www.appliedmissingdata.com/blimp) is recommended to download before the workshop
  • Basic knowledge of R and experience in fitting standard regression models (linear and logistic regression) are expected
  • We will supply the R code and data sets as part of the workshop package

For questions, please contact Chelsea Mantin at