A Multi-Instance Support Vector Machine with Incomplete Data for Clinical Outcome Prediction of COVID-19

Lodewijk Brand, Lauren Zoe Baker, Hua Wang

BCB - 2021

In order to manage the public health crisis associated with COVID-19 it is critically important that healthcare workers quickly identify high-risk patients in order to provide effective treatment with limited resources. Statistical learning tools have the potential to help predict serious infection early-on in the progression of the disease. However, many of these techniques are unable to take full advantage of temporal data on a per-patient basis as they handle the problem as a single-instance classification. Furthermore, these algorithms rely on complete data to make their predictions. In this work, we present a novel approach to handle the temporal and missing data problems, simultaneously; our proposed Simultaneous Imputation-Multi Instance Support Vector Machine method illustrates how multiple instance learning techniques and low-rank data imputation can be utilized to accurately predict clinical outcomes of COVID-19 patients. We compare our approach against recent methods used to predict outcomes in a cohort of 361 COVID-19 positive patients from a hospital in Wuhan, China. In addition to improved prediction performance early on in the progression of the disease, our method identifies a collection of biomarkers associated with the liver, immune system, and blood, that deserve additional study and may provide additional insight into causes of patient mortality due to COVID-19. We publish the source code for our method online.