Input file types

MorphoJ can import coordinate data from the following file types: row/column, NTSYSpc, TPS and Morphologika.

The extension of the file name is ignored by MorphoJ, and the file names with extensions other than ".txt", ".tps" or ".nts" will therefore be read without problems. All file types are text files, and they differ in their internal structure.

Row/column format

This file format is a special kind of text file (it used to be called "text file" in earlier versions of MorphoJ, but that caused persistent confusion for users).

In the text file of this format, each line corresponds to one specimen and each column to a variable. Each line starts with a label for the specimen, followed by the x, y, and possibly z coordinates of the landmarks.

The row/column files for importing landmark coordinates into MorphoJ can be text files delimited by commas, semicolons or tab stops, such as they are generated by many spreadsheet programs (e.g. the formats called 'tab-delimited text' or 'comma-separated values'). Note that spaces are not sufficient as delimiters.

Spaces are permissible. Spaces before and after the commas, semicolons or tab stops used as delimiters (i.e., the leading and trailing spaces within each column) are ignored. Spaces embedded within the text of the first entry of each line are treated as part of the identifier (e.g., "Homo sapiens f 025"). There must be no spaces within the number strings containing the landmark coordinates.

The first line can contain labels for the columns in the file (e.g. "ID", "x1", "y1", etc.). In this case, the whole first line will be ignored.

To test whether the first line contains column labels or is the first line of data, MorphoJ checks whether any of the second and following entries cannot be interpreted as numbers. If all entries can be interpreted as numbers, it is used as the first line of coordinate data; otherwise, it is ignored. This decision rule will fail if you use numbers as the labels for the columns in your file. Therefore, do not use "1", "2" or "3.147" as the column labels, but something like "x1", "coordinate7", or "col13".

The data lines must contain a label identifying the specimen as the first entry, and then the landmark coordinates in the order x, y, x, y,..., x, y for two-dimensional data or in the order x, y, z, x, y, z,..., x, y, z for three-dimensional data. The label can be any combination of text, numbers, or special characters, except for the comma, semicolon or tab stop (those are interpreted as delimiters). The coordinates must be numbers, but can be written as integers (e.g. "23"), decimals (e.g. "0.123") or in exponential notation (e.g. "1.23E-1"). The following is a possible data line:

Mus musculus 123%f_1, 0.276, 0.268, 0458, 0.126, 2, 0.0789, 2.345, 0.564, 1.856, 0.642

In this example, MorphoJ would use the string "Mus musculus 123%f_1" as the identifier for the observation.

Spaces at the beginning or end of the identifier string are ignored (actually, they are removed): " Mouse " is treated the same as "Mouse". Note, however, that spaces inside the identifier string do matter: in an identifier string like "House mouse", it does make a difference whether there are one or two spaces between the words.

Only a single label, which MorphoJ uses as the identifier, can be included in the data file for importing the coordinates. All entries after the first are considered to be landmark coordinates (if they are not numbers, reading of the file will fail). If other information (e.g. taxonomic group, sex, habitat, etc.) is to be included, put it into a separate file and use Import Classifier Variables in the File menu to add that information to your dataset.

For missing values, enter "-9999" in the data file. If there is a possibility that -9999 is a legitimate value of a coordinate, consider transforming your data (e.g. if your measurement device returns coordinates in micrometers for an object that is several millimeters across, consider dividing by 1000 to obtain the coordinates in millimeters).

Possible problems. It can happen that MorphoJ cannot read a file even though it looks fine when viewed in a text editor or word processor program. A possible reason is a problem with character encoding (e.g. 7-bit versus 8-bit encoding). Normally, this problem can be solved by opening the file in a program such as MicroSoft Excel and saving it as a tab-delimited text file using the default encoding for the operating system (in Excel for Windows, this is "Text (Tab delimited) (*.txt)" and not "Unicode Text (*.txt)").

Another problem may occur for users whose computers are set to use a language other than English or for text files that were produced with such a computer. In this case, the comma is often used instead of the decimal point ("3,141" instead of "3.141"). Because MorphoJ uses the comma as a delimiter, however, the string "3,141" is read as two numbers (3 and 141), which will yield nonsensical results. If you are using an operating system that uses commas instead of decimal points and experience problems with reading text files, it may help to change the commas to dots (".") using the find-and-replace function of a text editor.

NTSYSpc files

MorphoJ can read files in the NTS file format written by Jim Rohlf's program NTSYSpc, the programs from Jim Rohlf's tps... series, or various others. There are a few restrictions on the contents of the file, and some editing may be needed if a file contains more than just a single data matrix.

MorphoJ only reads rectangular data matrices from NTS files. Other data formats used by NTSYSpc are not recognized by MorphoJ. Therefore, the parameter line must start with the code "1".

The data matrix to be read by MorphoJ must be the first item in the file. Other items, for example covariance or tree matrices, may also be in the file after the data matrix, but will be ignored.

If the NTS file contains labels fo the specimens, note that NTSYSpc has a convention that spaces are to be interpreted as delimiters, so that a label cannot contain spaces.

The NTS file format permits files without labels for the configurations contained in the file (the rows of the matrix). Because MorphoJ requires an identifier for each configuration, it uses "config1", "config2" etc. as the identifiers (the numbers are allocated in the sequence of occurrence in the file) when no labels are included in an NTS file.

The handling of missing values follows the convention of NTS files. If any missing values are present in the data, the fourth code in the parameter line must be "1" and an arbitrary code for the missing values must follow (e.g. "-9999"). If the fourth code of the parameter line is "0", it is assumed that there are no missing data, and all values are interpreted as coordinate measurements.

TPS files

The TPS file format is specific to Jim Rohlf's tps... series of programs. It has a specific file format, and MorphoJ can read coordinate data from such files. MorphoJ will read the landmark coordinates (following the LM= or LM3= keyword) and the identifier (from the line starting with the ID= keyword). If no ID= keyword is available, MorphoJ will use information from the IMAGE= keyword for the identifier, and if that is not available either, it will make up an identifier.

It is often the case that the entry for the IMAGE= keyword in TPS files contains more useful information for making the identifier than that of the ID= keyword (by default, tspDig2 enters the number of the record, counting from zero). To use the IMAGE= keyword, it is therefore helpful to disable the ID= keyword. This can be done easily with a text editor by changing every occurrence of "ID=" into something else, for instance "@ID=" (in this way, the change is reversible if you want to use the same TPS file for something else...). If this is done, MorphoJ will use the IMAGE= keyword instead.

For records that contain a SCALE= keyword, the corresponding scale factor is applied to the landmark coordinates to convert them into real-world units. If there is no SCALE= keyword for a record in the TPS file, a scale factor of 1.0 is used (equivalent to SCALE=1.0).

The number of landmarks, as specified by the LM= keyword, must be the same in all records in a TPS file. If it is not, an error will occur and no MorphoJ dataset can be created from such a file. The only exception, since MorphoJ version 1.08.0, are records with no landmarks at all (LM=0), which are not included in the new dataset (but the respective records are listed in the Reports window).

No classifiers or covariates can be read from this file format, but they have to be imported from separate files.

Since the update of tpsDig2 to version 2.18, missing landmarks are indicated by coordinate values of -1.0 in TPS files (see Morphmet posting by Jim Rohlf from 1 April 2015). MorphoJ reads the observations and initially excludes any observations with missing landmarks. The affected observations can be restored to the analyses by excluding the landmarks that are missing in some specimens (Select Landmarks in the Preliminaries menu).

Morphologika

Morphologika is a program for morphometric analyses written by Paul O'Higgins and Nicholas Jones. It supports a file type that can contain a substantial amount of extra information in addition to landmark coordinates. MorphoJ reads some of that information and stores it with the dataset in the appropriate way.

MorphoJ will read te identifiers for the observations from the [names] attribute in the Morphologika file. If no [names] attribute is present in the file, MorphoJ will generate new identifiers for the specimens: "config1", "config2", etc. To avoid difficulties with the identities of specimens, it is strongly recommended to include the [names] attribute in a Morphologika file before importing the data into MorphoJ.

If the [groups] attribute is set in the file, MorphoJ will use it to generate a classifier variable called "groups" that has the corresponding group labels as its values.

The [labels] and associated [labelvalues] attributes are used to produce either covariates (if all the values for a label are numeric) or as classifiers (otherwise).

There can be problems because Morphologika files have only one kind of [labels] and [labelvalues] for categorical (e.g. male vs female) or continuous variables (e.g. body weight or the percentage of meat in the diet). MorphoJ interprets every label as a covariate for which every corresponding labelvalue can be read as a number. Every label for which at least one of the corresponding labelvalues cannot be read a a number is interpreted as a classifier. That means numerical codes should not be used for categorical variables (e.g. use 'm' and 'f' and not '1' and '2'). If categories are denoted by numbers, use something like "type 1", "type 2" and "type 3", but not "1", "2" and "3".

The [wireframe] attribute, if present, is automatically imported and available as a wireframe associated with the new dataset for visualization of shape changes.

The landmark data are read from the [rawpoints] attribute in the Morphologika file.

There is no way to designate missing values in Morphologika files. All values will be read as coordinates.