A Linear Primal-Dual Multi-Instance SVM for Big Data Classifications

Lodewijk Brand, Lauren Zoe Baker, Carla Ellefsen, Jackson Sargent, Hua Wang

ICDM - 2021

Multi-instance learning (MIL) is an area of machine learning that handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting and is able to classify bags which can contain any number of instances. This property allows MIL to be naturally applied to solve the problems in a wide variety of real-world applications from computer vision to healthcare. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper we present a novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) derivation and implementation that can operate efficiently on large scale data. Our method relies on an algorithm derived using a multiblock variation of the alternating direction method of multipliers (ADMM). The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are generally used to optimize MIL algorithms based on SVMs. In addition, we modify our derivation to include an additional optimization designed to avoid solving a least-squares problem during our algorithm; this optimization increases the utility of our approach to handle a large number of features as well as bags. Finally, we apply our approach to synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. We end our discussion with an extension of our approach to handle non-linear decision boundaries. Code and data for our methods are available online at: https://github.com/minds-mines/pdMISVM.jl.

Links

View publications from Carla Ellefsen
View publications from Zoe Baker
View publications from Lodewijk Brand
View publications from Hua Wang
View publications presented in ICDM
View publications in the project, An Intelligence-Driven Patient Care Approach to Reduce Medical Errors
View publications in the project, Intelligent Prediction of Traffic Conditions via Integrated Data-Driven Crowdsourcing and Learning
View publications in the project, Mining Brain Imaging Genomics Data for Improved Cognitive Health
View publications in the project, Prediction of coronavirus infections and complications at the individual and the population levels from genomic, proteomic, clinical and behavioral data sources
View publications researching Multiple-Instance Learning
View publications applied to Computer Vision

Copyright Notice

The materials presented in this page are to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.