Canonical correlation analysis (CCA) is a useful tool for investigating the relationships between two sets of variables. If dispersion matrices can be inverted, canonical variates with maximal correlation are generally identified by means of singular value decomposition. However, when one or both variable groups are compositional, this classical approach cannot be followed. Compositional data are positive values which carry relative information describing the parts of a whole. In consequence they present a perfectly multicollinear structure and are characterized by singular dispersion matrices. As a solution to this issue which excludes a standard approach, an alternative way of computing canonical variates is proposed. Data are first transformed in log-ratio coordinates, then the Partial Least Squares approach is applied. This method provides a fast and easy way to deal with non-invertible dispersion matrices and, in addition, it yields results which are easy to interpret. The proposed methodology is assessed in an experimental study in which a comparison among alternative PLS algorithms is also provided, namely NIPALS, SIMPLS and Kernel.

A PLS method for seeking canonical correlations in case of perfect multicollinearity

Gallo Michele
;
Simonacci Violetta
2019-01-01

Abstract

Canonical correlation analysis (CCA) is a useful tool for investigating the relationships between two sets of variables. If dispersion matrices can be inverted, canonical variates with maximal correlation are generally identified by means of singular value decomposition. However, when one or both variable groups are compositional, this classical approach cannot be followed. Compositional data are positive values which carry relative information describing the parts of a whole. In consequence they present a perfectly multicollinear structure and are characterized by singular dispersion matrices. As a solution to this issue which excludes a standard approach, an alternative way of computing canonical variates is proposed. Data are first transformed in log-ratio coordinates, then the Partial Least Squares approach is applied. This method provides a fast and easy way to deal with non-invertible dispersion matrices and, in addition, it yields results which are easy to interpret. The proposed methodology is assessed in an experimental study in which a comparison among alternative PLS algorithms is also provided, namely NIPALS, SIMPLS and Kernel.
2019
978-9963-2227-8-0
File in questo prodotto:
File Dimensione Formato  
BoACFECMStatistics2019.pdf

accesso solo dalla rete interna

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.82 MB
Formato Adobe PDF
2.82 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11574/190857
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact