The growing interest in high dimensional data contributes to the development of new statistical techniques aimed at reducing dimensionality when data are influenced by deviating points. An extreme observation or outlier deviates from the model assumption and severely affects the estimates; as the data quality plays an important role in terms of feasible results, it is thus preferable underweights extremeness. In this context, the Candecomp/Parafac model, a decomposition techniques for high dimensional arrays, is not exempted to be sensible to the presence of extreme observations. The algorithm at the base of the model (Alternating Least Square - ALS) is extremely sensitive to the influence of extremeness reproducing flaw results in the analysis. In this context a robust COMedian algorithm (COMALS) is proposed. The algorithm is based on an incredible fast and accurate procedure able to manage the high dimensionality of the data reporting efficient results at any contamination level.

Dealing with outliers in high dimensional data: a COMALS procedure

Di Palma M. A.;Gallo M.
2019-01-01

Abstract

The growing interest in high dimensional data contributes to the development of new statistical techniques aimed at reducing dimensionality when data are influenced by deviating points. An extreme observation or outlier deviates from the model assumption and severely affects the estimates; as the data quality plays an important role in terms of feasible results, it is thus preferable underweights extremeness. In this context, the Candecomp/Parafac model, a decomposition techniques for high dimensional arrays, is not exempted to be sensible to the presence of extreme observations. The algorithm at the base of the model (Alternating Least Square - ALS) is extremely sensitive to the influence of extremeness reproducing flaw results in the analysis. In this context a robust COMedian algorithm (COMALS) is proposed. The algorithm is based on an incredible fast and accurate procedure able to manage the high dimensionality of the data reporting efficient results at any contamination level.
2019
978-88-86638-65-4
File in questo prodotto:
File Dimensione Formato  
IES 2019_DG.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 1.64 MB
Formato Adobe PDF
1.64 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/188598
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact