Canonical correlation analysis (CCA) is a useful tool for investigating the relationships between two sets of variables. If dispersion matrices can be inverted, canonical variates with maximal correlation are generally identified by means of singular value decomposition. However, when one or both variable groups are compositional, this classical approach cannot be followed. Compositional data are positive values which carry relative information describing the parts of a whole. In consequence they present a perfectly multicollinear structure and are characterized by singular dispersion matrices. As a solution to this issue which excludes a standard approach, an alternative way of computing canonical variates is proposed. Data are first transformed in log-ratio coordinates, then the Partial Least Squares approach is applied. This method provides a fast and easy way to deal with non-invertible dispersion matrices and, in addition, it yields results which are easy to interpret. The proposed methodology is assessed in an experimental study in which a comparison among alternative PLS algorithms is also provided, namely NIPALS, SIMPLS and Kernel.
A PLS method for seeking canonical correlations in case of perfect multicollinearity
Gallo Michele
;Simonacci Violetta
2019-01-01
Abstract
Canonical correlation analysis (CCA) is a useful tool for investigating the relationships between two sets of variables. If dispersion matrices can be inverted, canonical variates with maximal correlation are generally identified by means of singular value decomposition. However, when one or both variable groups are compositional, this classical approach cannot be followed. Compositional data are positive values which carry relative information describing the parts of a whole. In consequence they present a perfectly multicollinear structure and are characterized by singular dispersion matrices. As a solution to this issue which excludes a standard approach, an alternative way of computing canonical variates is proposed. Data are first transformed in log-ratio coordinates, then the Partial Least Squares approach is applied. This method provides a fast and easy way to deal with non-invertible dispersion matrices and, in addition, it yields results which are easy to interpret. The proposed methodology is assessed in an experimental study in which a comparison among alternative PLS algorithms is also provided, namely NIPALS, SIMPLS and Kernel.File | Dimensione | Formato | |
---|---|---|---|
BoACFECMStatistics2019.pdf
accesso solo dalla rete interna
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
2.82 MB
Formato
Adobe PDF
|
2.82 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.