Three-way data pertain to measurements related to three entities (modes) in which repeated observations are collected for the same variables on several occasions (conditions, times, locations). For the exploratory analysis of three-way data set, the Parafac model, independently proposed by \cite{Carroll70} and \cite{Harshman70}, is particularly suitable. The model defines a best low rank approximation of the original data through the alternating least squares algorithm (ALS), obtaining interpretable results, the uniqueness of the solution under mild conditions and keeping separate the variability of each mode. The widespread of the interest in three-mode techniques results in further evolution of three-mode techniques aimed at stemming problems occurring when outliers are included in the estimation. In fact, the Parafac model is extremely sensitive to anomalous observations, in the sense that the influence of those points could artificially increase the variance or shift the mean distorting the analysis and reproducing flaw results. The possibility to run into atypical observations increases when the amount of data to process is huge. For this reason, a robustification of the Parafac model is here proposed and defined COMedian-Parafac taking advantage of some interesting studies of \citep{falk1997mad} on the \textit{Co-median} estimator properties. The method is simple in the robustification procedure, it is able to identify extreme points after few iterations, it is full informative in parameters estimation, robust when the fraction of outliers increases and incredibly fast in high dimensional data computation. The algorithm proposed is compared to the well known Parafac via ROBPCA algorithm (ROB-Parafac) \citep{engelen2009fully, engelen2011detecting} in a simulation study demonstrating less affected and incredibly accurate estimates. It is also applied to different real case studies. \\
A robustification of the Parafac model for high dimensional outliers
DI PALMA MA;GALLO M
2018-01-01
Abstract
Three-way data pertain to measurements related to three entities (modes) in which repeated observations are collected for the same variables on several occasions (conditions, times, locations). For the exploratory analysis of three-way data set, the Parafac model, independently proposed by \cite{Carroll70} and \cite{Harshman70}, is particularly suitable. The model defines a best low rank approximation of the original data through the alternating least squares algorithm (ALS), obtaining interpretable results, the uniqueness of the solution under mild conditions and keeping separate the variability of each mode. The widespread of the interest in three-mode techniques results in further evolution of three-mode techniques aimed at stemming problems occurring when outliers are included in the estimation. In fact, the Parafac model is extremely sensitive to anomalous observations, in the sense that the influence of those points could artificially increase the variance or shift the mean distorting the analysis and reproducing flaw results. The possibility to run into atypical observations increases when the amount of data to process is huge. For this reason, a robustification of the Parafac model is here proposed and defined COMedian-Parafac taking advantage of some interesting studies of \citep{falk1997mad} on the \textit{Co-median} estimator properties. The method is simple in the robustification procedure, it is able to identify extreme points after few iterations, it is full informative in parameters estimation, robust when the fraction of outliers increases and incredibly fast in high dimensional data computation. The algorithm proposed is compared to the well known Parafac via ROBPCA algorithm (ROB-Parafac) \citep{engelen2009fully, engelen2011detecting} in a simulation study demonstrating less affected and incredibly accurate estimates. It is also applied to different real case studies. \\I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.