A Linear Primal-Dual Multi-Instance SVM for Big Data Classifications
Lodewijk Brand, Lauren Zoe Baker, Carla Ellefsen, Jackson Sargent, Hua Wang
ICDM - 2021
Multi-instance learning (MIL) is an area of machine learning that handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting and is able to classify bags which can contain any number of instances. This property allows MIL to be naturally applied to solve the problems in a wide variety of real-world applications from computer vision to healthcare. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper we present a novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) derivation and implementation that can operate efficiently on large scale data. Our method relies on an algorithm derived using a multiblock variation of the alternating direction method of multipliers (ADMM). The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are generally used to optimize MIL algorithms based on SVMs. In addition, we modify our derivation to include an additional optimization designed to avoid solving a least-squares problem during our algorithm; this optimization increases the utility of our approach to handle a large number of features as well as bags. Finally, we apply our approach to synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. We end our discussion with an extension of our approach to handle non-linear decision boundaries. Code and data for our methods are available online at: https://github.com/minds-mines/pdMISVM.jl.
Links
- View publications from Carla Ellefsen
- View publications from Zoe Baker
- View publications from Lodewijk Brand
- View publications from Hua Wang
- View publications presented in ICDM
- View publications in the project, An Intelligence-Driven Patient Care Approach to Reduce Medical Errors
- View publications in the project, Intelligent Prediction of Traffic Conditions via Integrated Data-Driven Crowdsourcing and Learning
- View publications in the project, Mining Brain Imaging Genomics Data for Improved Cognitive Health
- View publications in the project, Prediction of coronavirus infections and complications at the individual and the population levels from genomic, proteomic, clinical and behavioral data sources
- View publications researching Multiple-Instance Learning
- View publications applied to Computer Vision