Intelligible Predictive Health Models


Feature importance bi-clustering across diseases and predictors [1].

EHR data offer a promising opportunity for advancing the understanding of how clinical decisions and patient conditions interact over time to influence patient health. However, EHR data are difficult to use for predictive modeling due to the various data types they contain (continuous, categorical, text, etc.), their longitudinal nature, the high amount of non-random missingness for certain measurements, and other concerns. Furthermore, patient outcomes often have heterogenous causes and require information to be synthesized from several clinical lab measures and patient visits. Researchers often resort to using complex, black-box predictive models to overcome these challenges, thereby introducing additional concerns of accountability, transparency and intelligibility.

This project seeks to analyze and improve the intelligibility and/or explainability of ML methods deployed in the health system. We empirically analyze how intelligible current ML tools are [1] and develop novel approaches to interpretable ML [2].

  1. La Cava, W., Bauer, C. R., Moore, J. H., & Pendergrass, S. A. (2019). Interpretation of machine learning predictions for patient outcomes in electronic health records. AMIA 2019 Annual Symposium. arXiv

  2. La Cava, W., Lee, P.C., Ajmal, I., Ding, X., Cohen, J.B., Solanki, P., Moore, J.H., and Herman, D.S (2021). Application of concise machine learning to construct accurate and interpretable EHR computable phenotypes. In Review. medRxiv,