Extract New Classifier From ID String

Investigators often encode information in the identifiers they use for the specimens. For instance, the identifier 'MelanM51' contains information on the species ('Melan' for melanogaster), the sex ('M' for male) and a number for the specimen (51). It is therefore important to extract this information from the identifier in a form that can be used for further analyses.

MorphoJ can extract new classifier variables from the identifiers in a dataset. (Note: covariates normally have to be imported from a text file.)

The values for the classifier are parts of the identifier string. For instance, 'Melan' is the part including the first to fifth characters, 'M' is the third-last (or the sixth) character, and '51' are the two last characters.

To start, select Extract New Classifier from ID String from the Preliminaries menu. The following dialog box will appear:

The list to the upper-left shows the names of the existing classifiers in the dataset ('Species' and 'Sex' in this example). The list in the upper-right part shows the identifiers in the dataset. Both these lists are just to provide information to assist the user in defining the new classifier.

Below the list, there is a text field for entering the name of the new classifier.

Below that, there are two further text fields for entering the numbers of the first and last characters to be included in the new classifier. Positive numbers indicate that the count is from the left side. For instance, the first and last character for designating the species in the example above are 1 and 5 (first and fifth from the left: e.g. the values 'Mauri', 'Melan'). Negative numbers indicate that characters are counted from right to left. The first and last characters for sex are both -3 because the letter encoding sex is the third last one (values 'M' and 'F').

To generate the new classifier, click the Execute button. Alternatively, click Cancel to abort the procedure.

To check the results after extracting classifiers, it may be a good idea to use Edit Classifiers.

Identifiers of variable length

It is quite frequent that identifiers in a dataset have variable numbers of characters, so that 'Gulo gulo m 23' and 'Macrogalidia musschenbroekii f 06' might occur together in the same dataset.

Because MorphoJ can count characters from the left or from the right, it is easy to extract classifiers even from these identifiers. To direct MorphoJ to count from the right, use negative numbers: '-1' designates the last character in the string, '-2' the second-last, etc.

For instance, to define a classifier designating the species as, use the string running from the first character to the sixth-last character, for which the entries for first and last would be "1" and "-6". The resulting strings will be 'Gulo gulo' and 'Macrogalidia musschenbroekii'.