Compositional data with a tridimensional structure are not uncommon in social sciences. The CANDECOMP/PARAFAC model is one of the most adequate techniques for modeling these arrays without confusing modes variability. Estimating parameters in this setting can be particularly difficult because compositional data are multicollinear by definition and because, in general, for socio-economic data the exact number of latent variables is harder to determine. The most used fitting procedure in the literature is the PARAFAC-ALS algorithm which, however, is sensitive to both the difficulties presented, namely it is sensitive to multicollinearity and to the use of the wrong number of factors. In this work an integrated PARAFAC-ALS algorithm initialized with SWATLD steps is proposed as an effective solution to these deficiencies. This approach is tested on simulated multicollinear data in comparison with standard ALS and proves capable of performing better in terms of robustness against over-factoring and temporary degeneracies, it is faster at converging even in case of collinearity and it still provides a least-squares solution.
An integrated algorithm for three-way compositional data
GALLO M;SIMONACCI V;DI PALMA MA
2018-01-01
Abstract
Compositional data with a tridimensional structure are not uncommon in social sciences. The CANDECOMP/PARAFAC model is one of the most adequate techniques for modeling these arrays without confusing modes variability. Estimating parameters in this setting can be particularly difficult because compositional data are multicollinear by definition and because, in general, for socio-economic data the exact number of latent variables is harder to determine. The most used fitting procedure in the literature is the PARAFAC-ALS algorithm which, however, is sensitive to both the difficulties presented, namely it is sensitive to multicollinearity and to the use of the wrong number of factors. In this work an integrated PARAFAC-ALS algorithm initialized with SWATLD steps is proposed as an effective solution to these deficiencies. This approach is tested on simulated multicollinear data in comparison with standard ALS and proves capable of performing better in terms of robustness against over-factoring and temporary degeneracies, it is faster at converging even in case of collinearity and it still provides a least-squares solution.File | Dimensione | Formato | |
---|---|---|---|
11135_2018_745_OnlinePDF.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.1 MB
Formato
Adobe PDF
|
1.1 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.