Principal Component Analysis

Principal component analysis (PCA) is one of the most widely used methods for exploratory multivariate analysis. It can be used to display the major features of shape variation in a dataset and also as an ordination method to discover patterns in the relations among observations.

PCA of Procrustes coordinates or Procrustes residuals is the same as a 'relative warp analysis' without weighting for bending energy and including the uniform component of shape variation (e.g. Rohlf 1993).

Background

PCA is based on the spectral decomposition of a covariance matrix (outside of geometric morphometrics, PCA often is based on a correlation matrix). For this reason, PCA is intimately tied to this covariance matrix, and in MorphoJ, PCA must be invoked from a covariance matrix.

The nature of this covariance matrix can vary. For instance, it can be derived from the symmetric or asymmetry components of shape variation, it can be a genetic or phenotypic covariance matrix, and many more. For all of these, the PCA is mostly the same.

Details on methods

Number of PCs. The number of PCs to display is determined by using a threshold of 10^-14 for the eigenvalues. This should normally result in the correct number of dimensions (the threshold exceeds the rounding error in most analyses), but there may be some eigenvalues that are reported as 0.00000000 in the output. Users should decide themselves on how many PCs to report; normally, this will be fewer than those provided in the output.

PC scores are only computed if the covariance matrix is directly derived from a dataset. PC scores are computed as the vectors of deviations of the observations from the sample mean, multiplied by the vectors of PC coefficients (eigenvectors).

Note: PC scores from PCAs of pooled within-group covariance matrices. The PC scores are computed with deviations from the grand mean even if the covariance matrix is a pooled within-group covariance matrix. Because the covariance structure among groups may not agree with the within-group covariance structure, the PC scores are not necessarily uncorrelated among each other. They are uncorrelated, however, at the level of within-group variation.

PCA-related indices for quantifying morphological integration

PCA is closely related to methods for quantifying morphological integration using the distribution of eigenvalues. The variance of eigenvalues is a measure of integration that quantifies how much the variation is concentrated in just few dimensions or spread across many directions of shape space. If there is no integration at all, if there are equal amounts of variation in all directions of shape space, this variance will have its minimum value of 0. The index has its maximum if all variation is contained in a single dimension (complete integration).

In traditional morphometrics, the variance of eigenvalues is usualy computed from correlation matrices of measurements (e.g. Pavlicev et al. 2009). For geometric morphometric data, the use of correlation matrices is not appropriate because it would destroy the consistent scaling of all axes in units of Procrustes distance (e.g. Klingenberg and Zaklan 2000).

Because the covariance matrix is scaled in units of Procrustes distance, variances of eigenvalues are difficult to compare between different analyses. It is therefore sensible to scale the eigenvalues by the total variance before computing the variance (Young 2006). The result is a dimension-free index that can range from 0 to (p – 1)/p², where p is the dimensionality of the shape space.

Scaling by (p – 1)/p² therefore yields an index of integration that ranges from 0 to 1 (in analogy to the reasoning for correlation matrices, see Pavlicev et al. 2009). This index can be used in various contexts (for an excellent example, see Gómez et al. 2014).

Note: These indices are computed using the dimensionality of the shape space derived from the number of landmarks, whether the data are 2D or 3D, and whether there is object symmetry (and if so, whether the PCA uses the symmetric or asymmetry component of the shape space). This is the appropriate dimensionality in most instances, but users need to consider whether this is so for each particular case (it might not be appropriate, e.g., for imported shape change vectors with symmetry/asymmtry components for analyses of complex symmetry).

Note: Estimates of the variance of eigenvalues and drived statistics are expected to be sensitive to sample size. Estimating eigenvalues is statistically a fairly demanding task. With small sample sizes, it is expected that these statistics will overestimate the true degree of integration.

Requesting a PCA

Click on a covariance matrix (recognizable in the Project Tree by this icon: ) in the Project Tree window. Then select Principal Component Analysis from the Variation menu.

This will start the PCA, and the graphical and printed output will appear. If multiple covariance matrices are selected, a PCA is computed for every one of them.

Graphical output

The PCA produces three types of graphs in a tab in the Graphics window:

PC shape changes: A diagram showing the shape changes associated with the PCs (eigenvectors). This graph is only produced if the covariance matrix is for data that can be represented as shape changes. The scale factor for this graph is directly the magnitude of the shape change as a Procrustes distance; the default is 0.1, which corresponds to a change of the PC score by 0.1 units in the positive direction.
Eigenvalues: A diagram showing the percentages of total variance for which the PCs account.
PC scores: A scatter plot of PC scores (or a histogram if there is just a single dimension that contains variation). This graph is only produced if the covariance matrix on which the PCA is based is derived directly from a dataset.

Text output

The text output in the Results window contains the following information:

Eigenvalues: For each PC, the corresponding eigenvalue is given in the original units (e.g. units of Procrustes variance), as a percentage of the total variance and as the cumulative percentage of total variance.
Total variance: The total variance is the sum of the variances across all coordinates in shape space (the trace of the covariance matrix) or, equivalently, the sum of all eigenvalues in the PCA.
Indices of morphological integration: These include the variance of the eigenvalues in the original units (of squared Procrustes distance), which can range from 0 (for no integration at all) to a maximum that is specific to each dataset. The second index is the variance of eigenvalues scaled by the total variance, which can range from 0 to (p – 1)/p². The third index is the variance of eigenvalues scaled by the total variance and number of variables, which can range from 0 to a maximum of 1. (These statistics are only displayed for PCAs done in version 1.06c or higher.)
PC coefficients: The PC coefficients (eigenvectors) are given in tabular form (if the data permit, a graphical representation is given by the graph of PC shape changes in the Graphics window).

Output dataset

If the covariance matrix used for the PCA is directly derived from a dataset, a new output dataset is generated that contains the PC scores. The identifiers and classifier variables are copied from the original dataset, and the new dataset is also linked automatically to the original dataset and all other datasets to which it is linked in turn.

References

Gómez, J. M., F. Perfectti, and C. P. Klingenberg. 2014. The role of pollinators in the evolution of corolla shape integration in a pollination-generalist plant clade. Philosophical Transactions of the Royal Society of London B Biological Sciences 369:20130257.

Klingenberg, C. P., and S. D. Zaklan. 2000. Morphological integration between developmental compartments in the Drosophila wing. Evolution 54:1273–1285.

Pavlicev, M., J. M. Cheverud, and G. P. Wagner. 2009. Measuring morphological integration using eigenvalue variance. Evolutionary Biology 36:157–170.

Rohlf, F. J. 1993. Relative warp analysis and an example of its application to mosquito wings, Pages 131–159 in L. F. Marcus, E. Bello, and A. García-Valdecasas, eds. Contributions to morphometrics. Madrid, Museo Nacional de Ciencias Naturales.

Young, N. M. 2006. Function, ontogeny and canalization of shape variance in the primate scapula. Journal of Anatomy 209:623–636.