Link Datasets

To relate observations in different datasets, the datasets must be linked. For instance, if a regression analysis uses dependent and independent variables from two different datasets, or if a partial least-squares analysis examines the covariation between the shapes of two different structures, the link between the datasets establishes the one-to-one correspondence between the observations in the datasets.

When an analysis produces a new dataset from the observations of another one, MorphoJ automatically links these datasets, because they are based on the same observations. For instance, after a principal component analysis, you can use the PC scores in the new dataset and the covariates from the original dataset in a regression.

An automatic linking is not possible, however, for datasets that are created by the user from imported data. For instance, if different datasets contain data from different bones of the same sample of individuals, it is necessary to request a link between them.

To do this, select Link datasets from the Preliminaries menu. The following dialog box will appear:

The two drop-down menus contain the datasets in the active project (the one selected in the first menu is omitted from the second one). In the screen shot above, there is a dataset named 'Keratohyale' and another one named ' Hyomandibulare'. Use the menus to select the two datasets you want to link.

Then click the Link button to proceed or the Cancel button to stop.

Note that the identifiers for the observations must match exactly, because the links between datasets are established using the identifiers.

If the same identifier string appears multiple times in any one of the datasets (e.g. two observations with the identifier "specimen 1" in the first dataset), the observations with the same identifier are used in the order in which they appear in the datasets (as it was read from the input file, using Create New Project or Create New Dataset). If both datasets have the multiple occurrences, and the same numbers, the pairs are formed according to the order in the datasets (e.g., the first observation in dataset 1 with identifier "specimen 1" is paired with the first observation in dataset 2 with identifier "specimen 1"; the second observation in dataset 1 with identifier "specimen 1" is paired with the second observation in dataset 2 with identifier "specimen 1", etc.). If one dataset has fewer occurrences of the same identifier string than the other, the pairing stops once the smaller number of repeats has been reached (if dataset 1 has two occurrences of "specimen 1" but dataset 2 has only one, only the first observation with identifier "specimen 1" from dataset 1 is paired with the only observation with identifier "specimen 1" from dataset 2, but the second observation with identifier "specimen 1" from dataset 1 is not paired with any observation in dataset 2 -- it will be left oput of analyses like PLS or regression that uses the linkage between datasets).
These rules are rather complicated. The message from this is to make sure that the values of the identifiers are unique in each dataset.