The growing interest in high dimensional data contributes to the development of new statistical techniques aimed at reducing dimensionality when data are influenced by deviating points. An extreme observation or outlier deviates from the model assumption and severely affects the estimates; as the data quality plays an important role in terms of feasible results, it is thus preferable underweights extremeness. In this context, the Candecomp/Parafac model, a decomposition techniques for high dimensional arrays, is not exempted to be sensible to the presence of extreme observations. The algorithm at the base of the model (Alternating Least Square - ALS) is extremely sensitive to the influence of extremeness reproducing flaw results in the analysis. In this context a robust COMedian algorithm (COMALS) is proposed. The algorithm is based on an incredible fast and accurate procedure able to manage the high dimensionality of the data reporting efficient results at any contamination level.
Dealing with outliers in high dimensional data: a COMALS procedure
Di Palma M. A.;Gallo M.
2019-01-01
Abstract
The growing interest in high dimensional data contributes to the development of new statistical techniques aimed at reducing dimensionality when data are influenced by deviating points. An extreme observation or outlier deviates from the model assumption and severely affects the estimates; as the data quality plays an important role in terms of feasible results, it is thus preferable underweights extremeness. In this context, the Candecomp/Parafac model, a decomposition techniques for high dimensional arrays, is not exempted to be sensible to the presence of extreme observations. The algorithm at the base of the model (Alternating Least Square - ALS) is extremely sensitive to the influence of extremeness reproducing flaw results in the analysis. In this context a robust COMedian algorithm (COMALS) is proposed. The algorithm is based on an incredible fast and accurate procedure able to manage the high dimensionality of the data reporting efficient results at any contamination level.File | Dimensione | Formato | |
---|---|---|---|
IES 2019_DG.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
1.64 MB
Formato
Adobe PDF
|
1.64 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.