Map Onto Phylogeny

Mapping shapes onto an existing phylogenetic tree is an important part of comparative morphological studies. This can serve to reconstruct the history of change in the traits of interest or to account for the effects of phylogeny in a study of the processes of evolution.

Background

MorphoJ uses squared-change parsimony (Maddison 1991) for mapping data onto phylogenies. As a result, the locations of internal nodes of the phylogeny can be reconstructed. This information can be used to interpret the diversification of related species in direct relation to their phylogenetic history. MorphoJ provides various plots for visualizing reconstructed ancestral shapes, reconstructed evolutionary trajectories through shape space, and the trees themselves.

As part of the output from the procedure, MorphoJ also produces two new datasets: (1) a dataset with the differences along all the branches of the phylogeny, which can be used for analyses such as evolutionary principal component analysis (Schlick-Steiner et al. 2006; also implemented in the Rhetenor module of Mesquite, Maddison and Maddison 2007), and (2) a dataset with independent contrasts (Felsenstein 1985) that has a wide variety of uses for further comparative analyses of evolutionary change (e.g. Klingenberg and Marugán-Lobón 2013).

The user can request a permutation test for the phylogenetic signal in the data (Klingenberg and Gidaszewski 2010). This test simulates the null hypothesis of the complete absence of a phylogenetic signal by randomly permuting the phenotypic data (shape, centroid size, etc.) among the terminal taxa in the analysis. The test statistic used is the total amount of squared change, summed over all branches of the tree. Rejection of the null hypothesis indicates that there is some phylogenetic signal in the tree and, therefore, that comparative methods should be used that take this phylogenetic structure into account.

The phylogenies are assumed to be known, and the methods used in MorphoJ do not take into account any uncertainty about the tree topology or branch lengths. Among other things, this means that any polytomies in the trees are treated as hard polytomies (simultaneous split of the ancestral lineage into more than two branches) and not as soft polytomies (expressing uncertainty about the sequence of branching events).

The user can choose whether the phylogenetic tree is to be considered rooted or unrooted for the analysis. Under squared-change parsimony, it does make a difference (usually a small one) whether the trees are rooted or unrooted (Maddison 1991; further discussion and illustration in Klingenberg and Gidaszewski 2010). For most applications, considering trees as rooted is the most straightforward option.

Requesting the analysis

To map morphometric data onto a phylogeny, MorphoJ needs a phylogenetic tree. This can be imported from a Nexus file and needs to be present in the project before the mapping can be done.

To start mapping morphometric data onto a phylogeny, select Map Onto Phylogeny from the Comparison menu. The following dialog box will appear.

At the top, there is a drop-down menu for choosing the phylogeny to be used. Below this, there is a text field for the name of the analysis (by default, the name of the phylogeny).

Below this, there is a series of three drop-down menus for selecting the dataset, the data matrix, and the classifier that designates the taxa. The values of this classifier must agree completely with the labels used for the taxa in the phylogeny (spelling, upper- and lowercase letters, spaces).

For all the variables derived from coordinate data (e.g. Procrustes coordinates, centroid size, PC or CV scores, etc.), all the observations included in the dataset are automatically used. Any observations with missing values are excluded already during the Procrustes fit. For covariates, however, it is possible that there are missing values for observations that are included in the dataset. If there is at least one observation for each taxon for which there is a value for a given covariate, the covariate is mapped onto the phylogeny. However, if there is at least one taxon for which there is no value for a given covariate, the covariate is excluded from the computations and is omitted from the output datasets. If none of the covariates have values for all of the taxa, a warning message appears if the covariates were selected as the data type for the analysis (if another data type was selected, the output data sets will not contain any covariates).

The button Weighting by branch lengths controls whether weighted or unweighted squared-change parsimony is used for the mapping (Maddison 1991). This option is only available if there is complete information on the branch lengths in the tree. If branch lengths are not available, the button for this option is inactivated, as in the screen shot above.

The button Mapping with rooted tree controls whether the tree is considered rooted or unrooted. This option has a small effect on the mapping, because the presence or absence of a root node affects the way trait changes are allocated to the branches of the tree (further details Klingenberg and Gidaszewski 2010). For most applications, the tree is considered to be rooted, as investigators are interested in reconstructing the history of evolutionary change in a clade of organisms.

The button Permutation test for phylogenetic signal starts a permutation test for the hypothesis of no phylogenetic structure in the data (Klingenberg and Gidaszewski 2010).

Clicking on the Execute button starts the analysis.

Graphical output

The graphical output depends on the contents of the dataset that are mapped onto the phylogeny. All phylogeny items can produce graphs showing the respective phylogenetic tree. Other graphs depend on whether data have been mapped onto the phylogeny and, if so, what kind of data was used.

Phylogenetic trees

MorphoJ also provides graphs showing the phylogenetic tree used for an analysis, including the numbering of the internal nodes.

When importing phylogenies from Nexus files, MorphoJ assigns a number to all internal nodes that don't have labels (nodes that are labelled in the input file keep those names). Note that the numbering of internal nodes changes when a phylogeny is matched to a dataset.

The screen shot above is from a phylogeny that is rooted. For the same tree topology, the unrooted tree looks like this:

Note that the root node has been removed from the branch between internal nodes 1 and 5. Node 5 is shown at the bottom of the diagram as a trichotomy. This corresponds to the situation in the rooted tree, where there also three branches that are linked at node 5: one branch from node 1 (via the root node), one branch to the terminal node "Melan" and one branch to node 6.

If a no data have been mapped to a phylogeny (e.g. a phylogeny that has been imported from a Nexus file but not attached to a dataset and use to map its contents), all the taxa in the phylogeny are shown. If data have been mapped to the phylogeny, the tree diagram shows only the taxa that were used in the mapping.

Shape coordinates

If the dataset contains Procrustes coordinates or the symmetric or asymmetry components, MorphoJ will display a graph where the shapes at different nodes of the phylogeny can be viewed and compared.

For instance, the diagram above shows the shape difference from the reconstructed shape of the root node in the rooted version of tree above (light blue outline drawing) to the shape reconstructed for node 7 in the phylogeny (dark blue), amplified by a factor of 2.5.

The start and target nodes can be selected with the items Choose the Start Node and Choose the Target Node in the popup menu for this graph. The amplification was selected by using Set the Scale Factor in the popup menu; changing the scale factor does not alter the starting shape, but alters only the target shape.

Scores from multivariate analyses

If the dataset contains the scores from multivariate analyses, MorphoJ displays scatter plots of the taxon means of those scores, with the phylogeny superimposed according to the reconstructed ancestral values.

In these diagrams, the identities of the internal nodes can be revealed by shift-clicking on the nodes themselves. For rooted trees (as in the screen shot above), the root node is indicated by an orange circle and the label "Root".

If the dataset used for mapping the data contains classifiers in addition to the one that designates the terminal taxa and if some or all of these are constant within terminal taxa, then it is possible to use those classifiers to determine the colors of the points designating the terminal nodes. For instance, the colors might designate ecological variables or distribution areas. To do this, use Color the Terminal Taxa Points from the popup menu of this graph.

The user can determine whether the labels of the individual taxa should appear in the graph by selecting or un-selecting the check box Display Labels of Taxa in the popup menu. Note that you can see the name of any particular taxon by shift-clicking the corresponding point in the diagram.

Individual variables

Variables that are not part of a multivariate analysis, for instance centroid size or covariates, are plotted separately as follows:

The horizontal axis provides the values of the variable (here centroid size), whereas the vertical direction corresponds to the cumulative branch length from the root of the tree (if the mapping is done by weighted squared-change parsimony). If the branch lengths can be interpreted as evolutionary time, then the graph represents the movement of evolutionary lineages along the axis of the variable.

If the mapping is done with unweighted squared-change parsimony, the heights at which the nodes are drawn use regular intervals, as for the diagram of the phylogenetic tree.

The variable for this graph can be chosen by using Choose the Variable to Display in the popup menu. The identity of internal nodes can be found by shift-clicking on the respective branch points in the graph.

If the dataset used for mapping the data contains classifiers in addition to the one that designates the terminal taxa and if some or all of these are constant within terminal taxa, then it is possible to use those classifiers to determine the colors of the points designating the terminal nodes. For instance, the colors might designate ecological variables or distribution areas. To do this, use Color the Terminal Taxa Points from the popup menu of this graph.

The user can determine whether the labels of the individual taxa should appear in the graph by selecting or un-selecting the check box Display Labels of Taxa in the popup menu. Note that you can see the name of any particular taxon by shift-clicking the corresponding point in the diagram.

Text output

The output contains a summary of the variables used in the analysis, information on the tree lengths and, if a permutation test for a phylogenetic signal was done, the P-value and number of permutations.

For shape variables (e.g. Procrustes coordinates) or multivariate statistics derived from them (e.g. principal component scores), the tree length and permutation test consider all variables jointly, as a multivariate analysis. In contrast, for covariates and for centroid size (untransformed and log-transformed centroid size), the tree lengths and permutation tests are computed for each variable separately.

Output datasets

Each analysis produces two output data sets: one with the values of the changes along the branches of the phylogeny for the values of shape variables, centroid sizes and covariates in the original dataset, and the other with scores for phylogenetically independent contrasts (Felsenstein 1985) for the same variables.

Changes along branches

One of the output datasets contains changes along all the branches in the phylogeny. The observations in this dataset are the branches of the phylogeny. All the data types from the original dataset are transferred to the new one.

Depending on whether weighted or unweighted parsimony is used, the name of the new dataset starts with "BranchDiffs, weighted:" or "BranchDiffs, unweighted:" and ends with the name of the original dataset.

One way to use the changes along the branches is to conduct an evolutionary principal component analysis (EPCA; Schlick-Steiner et a. 2006; also implemented in the Rhetenor module of Mesquite, Maddison & Maddison 2007). To conduct an EPCA in MorphoJ, select the dataset with the changes along branches in the Project Tree window, and then choose Generate Covariance Matrix from the Preliminaries menu to produce a covariance matrix for the type of data that is of interest. Then select the covariance matrix and invoke Principal Component Analysis from the Variation menu. The results of this EPCA provide information about the shape changes that account for most of the evolutionary differentiation along the branches of the tree.

Note that the observations in this dataset usually are not independent of each other. After all, the changes along the branches were computed from a set of taxa that is smaller than the number of branches in the tree (unless the tree is a completely unresolved bush). For a completely resolved dichotomous tree with k terminal taxa, there are 2k – 2 branches, which therefore clearly cannot be independent of each other. For instance, there can at most be k – 1 non-zero eigenvalues in an EPCA. For this reason, data on changes along the branches of a phylogeny should normally not be used in statistical tests. Most statistical tests do assume that the observations are independent.

Phylogenetically independent contrasts

Independent contrasts (Felsenstein 1985) were developed to address the problem of interdependence in data with a phylogenetic structure. The method is based on contrasts between sister nodes (nodes that share the same direct ancestor) in the phylogeny.

MorphoJ produces an output dataset with the scores for the independent contrasts for the data in the original dataset. The name of this dataset is "IndContrasts:" and the name of the original dataset. The observations in this dataset are the contrasts between sister nodes.

In the construction of contrasts, polytomies are resolved by inserting additional nodes connected by zero-length branches (Felsenstein 1985; Rohlf 2001, p. 2154). For setting up the identifiers for the output dataset, these new nodes are not differentiated, but the naming or numbering of the original tree is maintained (numbers of nodes corresponding to polytomies will appear for multiple contrasts).

Phylogenetically independent contrasts can be used in a wide variety of further analyses to address questions on different aspects of evolutionary change (e.g. Klingenberg and Marugán-Lobón 2013). For instance, Drake and Klingenberg (2010) used matrix correlations between covariance matrices for independent contrasts and other levels of variation (within taxa, fluctuating asymmetry) to compare the patterns of integration at the different levels, and they also used independent contrasts to examine whether a hypothesis of modularity holds for evolutionary divergence of shape. Another study used independent contrasts to examine cranial integration, evolutioary allometry and modularity across birds (Klingenberg and Marugán-Lobón 2013).

References

Drake, A. G., and C. P. Klingenberg. 2010. Large-scale diversification of skull shape in domestic dogs: disparity and modularity. American Naturalist 175:289–301.

Felsenstein, J. 1985. Phylogenies and the comparative method. American Naturalist 125:1–15.

Klingenberg, C. P., and N. A. Gidaszewski. 2010. Testing and quantifying phylogenetic signals and homoplasy in morphometric data. Systematic Biology:245–261.

Klingenberg C.P., and J. Marugán-Lobón. 2013. Evolutionary covariation in geometric morphometric data: analyzing integration, modularity and allometry in a phylogenetic context. Systematic Biology: advance online, DOI: 10.1093/sysbio/syt025.

Maddison, W. P. 1991. Squared-change parsimony reconstructions of ancestral states for continuous-valued characters on a phylogenetic tree. Systematic Zoology 40:304–314.

Maddison, W. P. and D.R. Maddison. 2007. Mesquite: a modular system for evolutionary analysis. Version 2.01 http://mesquiteproject.org

Rohlf, F. J. 2001. Comparative methods for the analysis of continuous variables: geometric interpretations. Evolution 55:2143–2160.

Schlick-Steiner, B. C., F. M. Steiner, K. Moder, B. Seifert, M. Sanetra, E. Dyreson, C. Stauffer, and E. Christian. 2006. A multidisciplinary approach reveals cryptic diversity in Western Palearctic Tetramorium ants (Hymenoptera: Formicidae). Molecular Phylogenetics and Evolution 40:259–273.