Adaptive Principal Component Analysis

Xiangyu Li, Hua Wang

SDM - 2022

Principal Component Analysis (PCA) is a powerful approach that can reduce the dimensionality of input data and increase the ease of data interpretation. One major obstacle for using the traditional PCA is its dependence on the squared ℓ2-norm distance that is highly sensitive to outliers and noisy data. This significantly limits the applicability of the PCA when it is applied to large or high-dimensional datasets. In this paper, we review previous efforts of how to overcome this problem and further propose a novel adaptive PCA model. One broadly used hypothesis is that small losses of most data follow the Gaussian distribution and large losses of a few data follow the Laplacian distribution. With this recognition, we choose to use the adaptive loss function that integrates both ℓ1-norm and squared ℓ2-norm distances into a single loss function. To solve the resulted non-convex optimization problem, we derive an efficient iterative algorithm that guarantees that both the objective and solution sequences converge to global optimal solutions at a sub linear convergence rate. The theoretical convergence analysis of our model is not trivial and comprehensively presented in our work. In comparison to other commonly used dimensionality reduction methods, our experiments on seven real-world datasets showed a great advantage of our model regardless of whether the data was noisy or not. We thus anticipate that our method will have broad usage for a wide range of real-world applications.

Links

View publications from Xiangyu Li
View publications from Hua Wang
View publications presented in SDM
View publications in the project, An Intelligence-Driven Patient Care Approach to Reduce Medical Errors
View publications in the project, Intelligent Prediction of Traffic Conditions via Integrated Data-Driven Crowdsourcing and Learning
View publications in the project, Mining Brain Imaging Genomics Data for Improved Cognitive Health
View publications in the project, Prediction of coronavirus infections and complications at the individual and the population levels from genomic, proteomic, clinical and behavioral data sources
View publications researching Optimization
View publications researching Robust Learning Models
View publications applied to Computer Vision

Copyright Notice

The materials presented in this page are to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.