Scaling Multi-Instance Support Vector Machine to Breast Cancer Detection on the BreaKHis Dataset

Hoon Seo, Lodewijk Brand, Lucia Saldana Barco, Hua Wang

Bioinformatics - ISMB - 2022

Breast cancer is a type of cancer that develops in breast tissue, and, after skin cancer, it is the most commonly diagnosed cancer in women in the United States. Given that an early diagnosis is imperative to prevent breast cancer progression, many machine learning models have automated the histopathological classification of the different types of carcinomas. However, many of them are not scalable to the large dataset. In this study, we propose the novel Primal-Dual Multi-Instance Support Vector Machine (pdMISVM) to determine which tissue segments in an image exhibit an indication of an abnormality. We also derive the efficient optimization approach for the proposed method by bypassing the quadratic programming and least-squares problems, which are commonly employed to optimize Support Vector Machine (SVM) models in multi-instance learning. The proposed method is scalable to large datasets, and it is computationally efficient. We applied our method to the public BreaKHis dataset and achieved promising prediction performance and scalability for histopathological classification. Software is publicly available at:!AiFpD21bgf2wgRLbQq08ixD0SgRD?e=OpqEmY