Popup menu in scatter plots

Scatter plots of scores from various analyses have a range of specific options that can be controlled using the popup menu of the graph. These are explained below.

For explanations of the generic menu items Print, Export to SVG File and Close Panel, follow this link.

Popup menu in a scatter
        plot

Choose ... for the Horizontal Axis

Choose ... for the Horizontal Axis

The variables to be displayed in the scatter plot can be selected with the menu items Choose ... for the Horizontal Axis and Choose ... for the Vertical Axis. Selecting either one of them will bring up a dialog box like the following (from a CVA in this example):

Dialog box for
        choosing variable for horizontal axis of the plot

The drop-down menu contains all the available variables. Select one of them and click OK to change the graph or click Cancel to leave the scatter plot unchanged.

Use Same Scaling for Both Axes

If this option is selected, MorphoJ scales the horizontal and vertical axes of the plot so that they have equal scaling for the respective variables (same distance per unit of each variable). This is very desirable if the different axes are in the same units (e.g. scores in a PCA or CVA).  It is less relevant if the axes are scaled in different units (e.g. the scores for different blocks in a PLS analysis, or the depenednt and independent variables in a regression). 
If this option is not selected, MorphoJ scales the axes of the plot to fit the data into the available space of the screen. As a consequence, the scaling of the horizontal and vertical axes is not the same.

By selecting Use Same Scaling for Both Axes, the user can specify that both axes of the plot are scaled equally, so that distances in the plot are preserved and the relative amounts of variation in both directions is directly visible.

Use Same Scaling for ALL Axes

This option ensures that the scaling of the axes of all possible scatter plots is equal. This is useful if the investigator wants to combine multiple plots for comparison. As a consequence, however, some plots may not fill the space available in the graphics window. Note that it is assumed that the first axis (in most analyses, this is the one with the greatest range of values) is used as the horizontal axis, as it is conventional. Changing the size of the MorphoJ frame (and therefore the graphics window) will change the scaling of all plots — to obtain multiple plots that are scaled consistently, the user should therefore not resize the MorphoJ window.

Selecting Use Same Scaling for ALL Axes automatically also selects Use Same Scaling for Both Axes. Likewise, unselecting Use Same Scaling for Both Axes automatically unselects Use Same Scaling for ALL Axes.

Color the Data Points

MorphoJ can color the data points of a scatter plot according to the values of a classifier variable. For instance, such a classifier might indicate species, geographic origin or sex. To set up the association of the colors and the groups indicated by a classifier, select Color the Data Points from the popup menu. This command also changes the coloring for confidence ellipses or convex hulls, if the option for coloring according to groups defined by a classifier variable was chosen.

Dialog box for
        choosing colors of data points etc.

If the check box labeled 'Use a classifier variable ...' is selected, a drop-down menu is activated, in which the user can select the variable that is to be used as the grouping criterion (this classifier is 'Species' in the screen shot above). A new choice of this variable eliminates any existing choice of colors.

If a classifier variable has been chosen, the list labeled 'Classes' contains all the values that occur for this classifier (in the example, these are 's1/f', 's1/m' etc.). Initially, each of these values is shown in a different color. This color will be used for the data points of the respective group in the scatter plots.

The colors assigned to the specific groups can be changed by selecting one or more of the values in the list and then selecting a new color from the interface to the right, and then clicking the button Use Color.

Finally, click OK to use the changed colors, or Cancel to stop.

If the check box labeled 'Use a classifier variable ...' is not selected, the dialog box can be used to select a color for all data points. Select the new color, click Use Color, and then click OK.

Resize Data Points

To change the size of the dots that indicate the data points, use the menu item Resize Data Points. A dialog box like the following will appear:

The dialog box
        for resizing the data points in a scatter plot

The text field indicates the current diameter of the dots in pixels. This value can be changed by the user. Click OK to update the graph with the new size of dots or click Cancel to leave it unchanged.

Label Data Points

If the check box for Label Data Points is selected, the identifier strings for all the observations are shown in the graph. This is useful if there are only few observations, but becomes very cumbersome with bigger datasets. If you have many observations, do not select this option, but check the identity of individual data points by shift-clicking on them (this will invoke a dialog box with the identifier string).

Confidence Ellipses

It is possible to add confidence or equal frequency ellipses to a scatter graph. To do so, select Confidence Ellipses from the popup menu. A dialog box like the following will appear:

Dialog box for
        requesting confidence ellipses

The checkbox at the top, Draw ellipse(s), determines whether or not ellipses are going to be drawn in the graph.

The two radio buttons Equal frequency ellipse(s) and Condifence ellipse(s) for mean(s) determine what type of ellipses are to be drawn.

For a given probability level, the equal frequency ellipse is the ellipse that contains randomly drawn data points from a sample with that probability. In other words, it is the ellipse that encloses a proportion of data points in the sample that corresponds to the probability (e.g., the 90% equal frequency ellipse contains about 90% of the data points, and each data point has a probability of 0.9 of falling within the ellipse).
The confidence ellipse for the mean at a given probability is the ellipse that, if the sampling process were repeated over and over, would have that probability to overlap the true sample mean (note that this is not the same as saying that the true sample mean has that probability of lying within the confidence ellipse).
The calculations of both the equal frequency and confidence ellipses assume that the data (or more specifically, the scores in the graph) follow a multivariate normal distribution. This is usually a reasonable approximation (because the scores are computed as linear combinations of Procrustes coordinates, the central limit theorem can be invoked). Still, it is preferable not to overinterpret confidence or equal frequency ellipses (interpret them cautiously).

The check box Use a classifier as a criterion for grouping observations determines if a single ellipse is drawn for the entire dataset or if separate ellipses are to be drawn for several groups. If so, use the drop-down menu below the check box to select the classifier to be used as the criterion for forming groups.

To use the same classifier for choosing colors for distinguishing groups, select the option Use this classifier to determine the colors of the ellipses and data points. Note that this selection may override colors that were already chosen for the same scatter plot (it will do so if the user chooses a classifier different from the one previously used as a coloring criterion). For adjusting the colors per se, use Color the Data Points.

Because the ellipses may extend outside the area of the graph, there is an option Clip the ellipse(s) at the margins of the graph. If this option is selected, the ellipses are drawn only inside the rectangular boundaries of the scatter plot.

Finally, the option Show the data points can be de-selected if a scatter plot contains very many data points and many groups, so that it would be confusing. In this situation, it may be easier to show the ellipses only (click the check box to select or unselect this option).

Click OK to update the graph with the new selections for ellipses or click Cancel to leave it unchanged.

Note that ellipses are only drawn for groups with three or more observations.

Convex Hulls

An alternative to equal frequency ellipses, especially when sample sizes are relatively small, are convex hulls. Convex hulls are convex polygons (areas without any 'dents' where the contour is concave toward the outside) enclosing all the data points in a scatter plot, usually drawn for groups of items in the plot (e.g. taxa, populations, etc.). If large sample sizes are available, equal frequency ellipses tend to give a better impression of the distribution of points because they take into account all data points in a sample, whereas convex hulls, by definition, focus only on the extreme points in every direction.

To add convex hulls to a scatter plot, select Convex Hulls in the popup menu.

Dialog box for adding
        convex hulls to a scatter plot

The checkbox at the top, Draw convex hull(s), determines whether or not convex hulls are going to be drawn in the graph.

The check box Use a classifier as a criterion for grouping observations determines if a single convex hull is drawn for the entire dataset or if separate ellipses are to be drawn for several groups. If so, use the drop-down menu below the check box to select the classifier to be used as the criterion for forming groups.

To use the same classifier for choosing colors for distinguishing groups, select the option Use this classifier to determine the colors of the convex hulls and data points. Note that this selection may override colors that were already chosen for the same scatter plot (it will do so if the user chooses a classifier different from the one previously used as a coloring criterion). For adjusting the colors per se, use Color the Data Points.

Finally, the option Show the data points can be de-selected if a scatter plot contains very many data points and many overlapping groups, so that it would be confusing to have all the data points in the plot. In this situation, it may be easier to show the convex hulls only (click the check box to select or unselect this option). 

Click OK to update the graph with the new selections for ellipses or click Cancel to leave it unchanged.

For groups containing only two data points, convex hulls are lines between the corresponding pairs of points. Convex hulls do not exist for single data points.