Data Science Interdisciplinary Research Cluster

Missing data problem is common in health science research. The default option in standard statistical software for handling missing data is to delete the observations with missing data, which can result in biased results. In this two-day workshop, we will learn how to handle missing data in a responsible way. On day one, Professor Aya Mitani will go over the challenges posed by distinct types of missing data and how each of the three missing data mechanisms (MCAR, MAR, and MNAR) affects the analysis results. Attendees will learn how to visually inspect patterns of missing data, test for assumptions of the missing data mechanism, and select the proper analytical method. Missing data methods for cross-sectional data, including likelihood-based methods, multiple imputation (MI), and expectation-maximization algorithm will be presented. On day two, Professor Aya Mitani will introduce missing data methods for longitudinal and clustered data including inverse-probability weighting for dropout and MI for multilevel data, with further extensions to interaction and derived variables. Finally, Professor Aya Mitani will briefly review methods for nonignorable missing data and discuss the connection between fairness and missing data.

Learning Objectives:

  • Understand the missing data mechanisms and their implications in analysis
  • Use R to visually inspect patterns of missing data
  • Select proper methods to deal with missing data
  • Apply multiple imputation using R in cross-sectional and multilevel data
  • Apply inverse probability weighting for longitudinal data with dropout
  • Become familiar with approaches for nonignorable missingness

Workshop Notes:

  • All analyses will be performed in R
  • Basic knowledge of R and experience in fitting standard regression models (linear and logistic regression) are expected
  • We will supply the R code and data sets as part of the workshop package