As of mid-April 2020, two million people are infected worldwide with the novel coronavirus. Now, the USA is at the epicenter of this pandemic, where it has already killed 20,000 people. Approaches to slow the progression are urgently needed. This requires a better fundamental understanding of the factors affecting not only virus spread, but also who develops complications and ultimately dies from the infection. It is becoming clear that many factors are at play, including molecular, physiological, lifestyle, behavioral, demographic and socio-economic ones. In particular, co-morbidities such as diabetes and high blood pressure are known risk factors for COVID-19 complications and death but are likely only the tip of the iceberg. Molecular data indicates that as many as 100 co-morbidities exist. Given this complexity, statistical approaches are needed to integrate and account for all of these factors when predicting and assessing the health risks arising from coronavirus spread and infection. This project will create computational tools that will help individuals and healthcare professionals make decisions related to coronavirus, helping target human and material resources where they are most needed. To decrease the numbers of people suffering from this pandemic, these tools are needed urgently.
Integrating large numbers of risk factors through machine-learning approaches allows the building of statistical models that take all evidence into account. COVID-19 infections will be predicted at the individual and population levels. At the individual level, two binary (yes/no) classifiers will be built, (1) if an individual is likely infected with coronavirus, and if yes, (2) will the patient develop complications. As with all predictions, they cannot replace real data, but they can help prioritize who gets tested, who gets quarantined, who gets more closely monitored for signs of complications, and who gets personalized recommendations. Existing approaches include symptom-tracker apps, such as the coronavirus self-checker apps offered by the CDC, many healthcare providers and local government authorities and the National Early Warning Score (NEWS) and Modified Early Warning Score (MEWS), which determine the degree of illness of a patient. None of these approaches account for co-morbidities, and they lack the use of machine learning for data integration needed to predict individual outcomes. At the population level, possible routes of infection will be analyzed using graph analysis, through analysis of proximity, social interactions, and materials transport, taking the individual-level information into account where available. The project will be highly interdisciplinary, integrating biochemistry and computer science with ongoing input and feedback from healthcare professionals. This will ensure that the work will be relevant to the current crisis and easier to adopt by healthcare providers. Students and postdocs who participate in this research will be trained in interdisciplinary research and will be exposed directly to frontline workers in the pandemic. A publicly available, free app and a web interface will disseminate the predictions made in this project broadly in the hope it will find many users.