Compositional data with a tridimensional structure are not uncommon in social sciences. The CANDECOMP/PARAFAC model is one of the most adequate techniques for modeling these arrays without confusing modes variability. Estimating parameters in this setting can be particularly difficult because compositional data are multicollinear by definition and because, in general, for socio-economic data the exact number of latent variables is harder to determine. The most used fitting procedure in the literature is the PARAFAC-ALS algorithm which, however, is sensitive to both the difficulties presented, namely it is sensitive to multicollinearity and to the use of the wrong number of factors. In this work an integrated PARAFAC-ALS algorithm initialized with SWATLD steps is proposed as an effective solution to these deficiencies. This approach is tested on simulated multicollinear data in comparison with standard ALS and proves capable of performing better in terms of robustness against over-factoring and temporary degeneracies, it is faster at converging even in case of collinearity and it still provides a least-squares solution.

An integrated algorithm for three-way compositional data

GALLO M;SIMONACCI V;DI PALMA MA
2018-01-01

Abstract

Compositional data with a tridimensional structure are not uncommon in social sciences. The CANDECOMP/PARAFAC model is one of the most adequate techniques for modeling these arrays without confusing modes variability. Estimating parameters in this setting can be particularly difficult because compositional data are multicollinear by definition and because, in general, for socio-economic data the exact number of latent variables is harder to determine. The most used fitting procedure in the literature is the PARAFAC-ALS algorithm which, however, is sensitive to both the difficulties presented, namely it is sensitive to multicollinearity and to the use of the wrong number of factors. In this work an integrated PARAFAC-ALS algorithm initialized with SWATLD steps is proposed as an effective solution to these deficiencies. This approach is tested on simulated multicollinear data in comparison with standard ALS and proves capable of performing better in terms of robustness against over-factoring and temporary degeneracies, it is faster at converging even in case of collinearity and it still provides a least-squares solution.
File in questo prodotto:
File Dimensione Formato  
11135_2018_745_OnlinePDF.pdf

solo utenti autorizzati

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.1 MB
Formato Adobe PDF
1.1 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/180889
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact