Modularity: Evaluate Hypothesis

Modularity is an important principle of organization in biological systems, which is also manifest at the morphological level (Klingenberg 2008). An important task for morphometric studies is therefore to evaluate hypotheses of modularity. MorphoJ implements a method for doing this, which is described in a paper (Klingenberg 2009) that should be consulted by those intending to use the analysis.

Modules are units within which there is a high degree of integration from many and/or strong interactions, but which are relatively independent of other such units. The nature of the interactions can be, for instance, developmental, functional, or genetic, depending on the context of the study. For a morphometric analysis, these interactions will be manifest as strong covariation among parts within modules and weak covariation between modules.

Evaluating hypotheses of modularity therefore addresses the question whether the hypothetical modules correspond to units with a low degree of covariation. To assess the hypothesis, the degree of covariation between the hypothesized modules can be compared to alternative partitions of the total structure into parts.

For morphometric studies of modularity, a configuration of landmarks can be subdivided into subsets corresponding to the hypothesized modules. If the partition corresponds to the true boundary between modules (blue line in the diagram), covariation between subsets is expected to be weak because it reflects only the weak between-module covariation. In contrast, if the partition into subsets cuts across modules (red line), covariation between subsets is expected to be stronger, because the subsets are linked by the strong within-module integration. If the hypothesis of modularity is true, the covariation between subsets of landmarks corresponding to the hypothesis should be lower than for different subdivisions of the landmarks (Klingenberg 2009).

This reasoning provides an approach for evaluating a-priori hypotheses about modularity by comparing the degree of covariation for the hypothesis to the range of covariation for alternative partitions. If the hypothesized subsets of landmarks correspond to the true modules, a lower covariation is expected for this partition than for any other subdivision of landmarks. Low covariation, by itself, does not imply modularity, but it is a prediction of the modularity hypothesis. Therefore, if covariation between the subsets for the hypothesis of modularity is not weaker than for most or all of the alternative partitions, the hypothesis of modularity can be rejected (Klingenberg 2009).

Background

To evaluate a hypothesis of modularity in a configuration of landmarks, the user needs to specify a partition of the landmarks into two or more subsets that correspond to the hypothesized modules. MorphoJ then compares the strength of covariation between this subdivision and either all or a large number of the possible alternative partitions with the same numbers of landmarks as in the hypothesized modules (Klingenberg 2009).

The RV coefficient

The method uses the RV coefficient (Escoufier 1973) as a measure of covariation between two sets of landmarks.

The covariance matrix of Procrustes coordinates can be rearranged so that the landmark coordinates of the two subsets are is separate blocks:

In this matrix, the blocks S₁ and S₂ are the covariance matrices within the first and second subsets of landmarks. The off-diagonal blocks S₁₂ and S₂₁ are the matrices of covariances between the two subsets (S₂₁ is the transpose of S₁₂). The RV coefficient can them be written as

The trace of a square matrix in the sum of its diagonal elements. The expression trace(S₁₂S₂₁) is the sum of the squared covariances between the variables in the two blocks, and is thus a measure of the total (squared) covariation. Likewise, trace(S₁S₁) and trace(S₂S₂) are the sums of the squared variances and covariances within the two blocks, and are thus measures of the (squared) variation within the blocks. In other words, the RV coefficient is roughly analogous to a squared correlation coefficient.

If there are more than two sets of landmarks, the multi-set RV coefficient (Klingenberg 2009) can be used, which is computed as the average of the RV coefficients between all pairs of landmark sets:

MorphoJ computes the RV coefficient or multi-set RV coefficient for the partition of the landmarks specified as the hypothesis by the user and either for all possible alternative partitions or for a large number of random alternative partitions.

Spatial contiguity

In some biological contexts, the integration of landmarks within modules is assumed to rely on tissue-bound developmental interactions, and modules are therefore expected to be spatially contiguous units, e.g. structures arising from a particular developmental field (for discussion, see Klingenberg 2009).

To determine whether a set of landmarks is spatially contiguous, MorphoJ uses an adjacency graph where neighboring landmarks are connected by the edges of the graph (Klingenberg 2009). A set of landmarks is contiguous if all the landmarks are connected, and a partition of the entire landmark configuration into subsets is contiguous if all subsets are contiguous (Klingenberg 2009).

As a starting point for the adjacency graph, MorphoJ produces a Delaunay triangulation in two or three dimensions. In most biological applications, however, this is not a realistic characterization of the relations between landmarks, and therefore needs to be modified by the user. MorphoJ offers a user interface to add or remove edges of the adjacency graph or to copy the adjacency graph from one analysis to another.

The adjacency graph can influence the performance considerably. If there are relatively few edges in the graph, the proportion of partitions that are spatially contiguous can fall quite dramatically. For instance, if landmarks are distributed along an elongate structure such as a mandible and the edges of the adjacency graph mainly link landmarks along the long axis, there may be quite few partitions that are contiguous. It therefore may be a good idea to be somewhat generous with the links, including edges in the adjacency graph for all pairs of landmarks that can plausibly interact directly.

Comparisons of alternative subsets

The analysis compares the RV coefficient or multi-set RV coefficient for the partition of the landmark configuration into the hypothesized modules with alternative partitions into subsets of the same numbers of landmarks.

The user can choose whether all possible partitions are enumerated (limited to those that are spatially contiguous, if so requested) or if a number of random partitions of the configuration is to be used instead (the default number for this option is 10,000 partitions). The number of possible partitions increases rapidly with the total number of landmarks and the number of subsets. Therefore, the time for computing RV coefficients for all possible partitions can be substantial. With large numbers of landmarks and multiple subsets, it may be a good idea to try with a fixed number of random partitions before running an analysis with the full enumeration.

Requesting the analysis

To start the analysis, select a covariance matrix (recognizable in the Project Tree by this icon: ) in the Project Tree tab. Then select Modularity: Evaluate Hypothesis from the Covariation menu.

A user interface like the following will appear, which has three tabs labeled Start analysis, Select subsets of landmarks, and Modify adjacency graph. (The example shown here is in three dimensions and has object symmetry; details of the user interface may vary in different datasets.)

The tab Start analysis contains the main controls for the analysis.

The first element is a text field for specifying a name for the analysis, which will be displayed in the Project Tree.

Below it, there is a check box to specify whether only spatially contiguous partitions of landmarks or all partitions are to be included in the comparisons. Whether or not this option should be selected depends on the biological context of the study (Klingenberg 2009).

The next choice is whether the analysis should use a full enumeration of all possible partitions (limited to spatially contiguous ones, if that option was chosen) or a number of random partitions. If the latter option is selected, the text field for entering the number of random partitions is activated.

Below these elements, there is a diagram of the landmark configuration showing both the subsets of landmarks specified as the hypothesis of modularity (colored dots) and the adjacency graph for establishing spatially contiguous partitions (only used if that selection was chosen). In the screen shot above, no subsets are indicated because no partition has been selected yet (accordingly, there is a warning notice to the right of the graph).

After the user has specified a partition of the landmarks and modified the adjacency graph (see below), the user interface may appear as follows:

Clicking the Accept button will start the analysis or, if some information is missing, may redirect the user to the interface for entering that information (if no partition of the landmarks has been specified). Clicking Cancel will abort the analysis and activate the Project Tree tab.

Before this can be done, however, a hypothesis of modularity needs to be specified.

The tab Select subsets of landmarks is for specifying which subsets of the landmarks belong to the hypothesized modules.

The user should start by specifying the number of subsets, using the drop-down menu in the upper right corner of the user interface.

The landmarks can then be assigned to the subsets either by dragging landmarks from the diagram to the colored bar below. Alternatively, the user can select one or more landmarks in the list on the right side of the user interface and assign them to a subset using the drop-down menu and the Assign button below the list.

For landmark configurations with object symmetry, paired landmarks are always treated equally (if one landmark of a pair is included in a subset, the other landmark of the pair is automatically assigned to the same subset as well).

If there is a set with fewer than two landmarks, a warning message is shown and the Accept button is inactivated. Each subset must contain at least two landmarks (better more).

Clicking the Accept button will record the partition of landmarks and redirect the user to the tab Start analysis, which contains the main control elements for the analysis (see above). The Cancel button will abort the analysis and activate the Project Tree tab.

The tab Modify adjacency graph is only relevant if the user requests an analysis where only spatially contiguous partitions of landmarks are used. In this case, the interface presented in this tab is used for specifying the adjacency graph that is the criterion for defining which sets of landmarks are spatially contiguous.

This interface creates the adjacency graph in a manner very similar to editing a wireframe (see Create or Edit Wireframe in the Preliminaries menu). There are only a few differences in the user interface.

New links can be added to the adjacency graph by dragging from one landmark to another in the diagrams to the left or by using the drop-down menus and the button Link landmarks to the right of the interface. Existing links can be deleted by dragging between the two landmarks in one of the diagrams or by selecting the link in the list, and then clicking the button Delete link. (For more detail, see the documentation page on Create or Edit Wireframe in the Preliminaries menu.)

If there is object symmetry in the covariance matrix under study, the symmetry of the adjacency graph will always be enforced.

If the user already has an existing adjacency graph in a different analysis of a modularity hypothesis, it can be copied and pasted into the user interface.

Clicking the Accept button will make the changes in the adjacency graph permanent, whereas the Cancel button is to leave the adjacency graph unchanged. After clicking either button, the tab Select subsets of landmarks is activated.

Copying and pasting landmark partitions and adjacency graphs

If there is already an existing analysis of a modularity hypothesis, the information on the subsets of landmarks and on the adjacency graph can be copied to the new analysis instead of entering it anew.

The graphs in the Hypothesis and Minimum covariation tabs of the graphical output from the analysis have an extra item Copy Information in their pop-up menus:

Selecting Copy Information will copy the information on the partition of landmarks and the adjacency graph onto your computer's system clipboard. This makes it possible to paste the relevant information in the Select subsets of landmarks tab (select Paste Partition in the pop-up menu of the diagram) or in the Modify adjacency graph tab (select Paste Connectivity in the pop-up menu of the diagram).

Note that copying and pasting information in this way is possible only if the numbers of landmarks are the same and corresponding landmarks are listed in the same order in both configurations. Violating these requirements may produce nonsensical results (if the numbers of landmarks differ, MorphoJ won't paste the information).

Graphical output

The analysis will produce graphical output in three tabs inside the Graphics tab:

The Hypothesis tab shows a diagram showing the partition of the data corresponding to the hypothesized modules (including the adjacency graph).

The RV coefficient tab (or Multi-set RV coefficient, for partitions into three or more subsets) shows a histogram of the distribution of the RV coefficients for the alternative partitions that were evaluated and a red arrow indicating the RV coefficient for the hypothesis (for partitions with three or more subsets, the value are multi-set RV coefficients).
In this graph, the size of the arrow can be adjusted by using the command "Choose the Size of the Arrow" in the pop-up menu of the graph. The default size is 1.0, and the number entered is used as a scal factor (e.g. a value of 0.75 results in an arrow 75% of the default size, a value of 2.0 produces an arrow double the size of the original).

Finally, the Minimum covariation tab shows the partition that resulted in the smallest RV coefficient or multi-set RV coefficient (note: this partition cannot necessarily be taken as a guide to delimit modules).

Text output

The following text output will be appended to the information in the Results tab:

First, there is some information on the hypothesized modules (number of subsets, landmarks in the subsets). Then, the RV coefficient or multi-set RV coefficient for the partition into the hypothesized modules is listed.

Below that, the results from the comparisons with alternative partitions are presented. First, there is a description of the alternative subsets (whether a full enumeration or a fixed number of random partitions was used) and the number of partitions that were examined. Then, the number and the proportion of alternative partitions with RV coefficients or multi-set RV coefficients that are smaller than or equal to the value for the hypothesis are shown.

Finally, there is information about the partition that corresponds to the lowest RV coefficient or multi-set RV coefficient.

References

Escoufier, Y. 1973. Le traitement des variables vectorielles. Biometrics 29:751–760.

Klingenberg, C. P. 2008. Morphological integration and developmental modularity. Annual Review of Ecology, Evolution and Systematics 39:115–132.

Klingenberg, C. P. 2009. Morphometric integration and modularity in configurations of landmarks: Tools for evaluating a-priori hypotheses. Evolution & Development 11:405–421. (Link to PDF file)