Mirage -- for interactive or offline data analysis and visualization.
Mirage descends from Proximal, a software originally designed for interactive exploration and classification of large datasets produced in photonics simulations. It has since evolved into a research tool for experiments in algorithmic methods for pattern recognition. There are no built-in limitations, other than those determined by machine capacity, for dimensionality, number of entries, and types of measurements. Groups of related attributes that are on comparable scales can be defined as a feature vector. It is assumed that in a space of such feature vectors, a meaningful metric exists (Euclidean distance) for computing clusters.
In addition to the support for basic exploratory data analysis, special focus is made on studying correlation of multiple proximity structures computed from the same data. Two simple clustering methods (kmeans for partitional clustering, and complete-linkage agglomerative procedure for hierarchical clustering) are included to compute proximity structures. Clustering results from external algorithms can be imported provided that they are converted to the same format. The data format, command syntax, data operations, and exploration algorithms are still under development.
| -data myData.dat | start with a specific data file |
| -log myLog.txt | direct run time output to a specific file |
| -path myPath.txt | create temporary job directories under a specific path |
| -off | turn off all graphics and run the interpreter in command line mode |
| -cmd myScript | start with a specific command script |
Data format can be specified in 3 ways:
(1) by format statements stored in a separate format file
name.fmt that accompanies the data file name.dat;
(2) by format statements preceding all data lines
in the data file; or
(3) by using the default options when neither (1) nor (2) is in place.
Specifying data formats is essential if not all attributes are on
the same scale. A vector statement in a data format defines which
attributes are to be interpreted as a group. By default, any line
beginning with a "#" is skipped, i.e., assumed to be neither a format
nor a data line. Other skip strings can be used by loading the data
set with special options. See the Commands for details.
The default data format uses the first line of the data file to determines how many variables there are, and names each variable as X1, X2, ..., etc. There is no designated ID field, and the data entries will be identified with their sequential numbers in the input order.
| statement | meaning | defaults |
| format var v1 v2 ... | specifies the complete list of variable names; this line must precede all other format lines | X1 X2 ... |
| format id v1 | designates variable v1 to be the entry identifier | entries identified by serial number in input order |
| format text v2 | force variable v2 to be interpreted as a text field | a field is text only if not parsable as a number |
| format vec vecName v1 v2 ... | defines a feature vector "vecName" with variables v1, v2, ... etc. | no feature vectors defined |
| format extent vecName xlabel X xorig 8.0 xmin 8.5 xmax 22.5 ylabel Y yorig 0.0 ymin 0.0 ymax 100.0 | specify the labels of the x, y axes in a feature vector plot and the extents and origins. the x axis refers to the name of each component, and the y axis refers to the value of the component. this allows for better reading of the component values and some control of the scope of the display. | xlabel = "x"; xorig = 0.0; xmin = 0.0; xmax = 1.0; ylabel = "y"; yorig = 0.0; ymin = 0.0; ymax = 1.0; |
| format image myImage.jpg row varY col varX | bind the dataset to a JPEG image in an external file, using variables varY and varX to specify the coordinates of a data point in this image. varY and varX must be included in a previous "format var" statement | no image is bound to a data set |
| format pstruct fileName | read a partitional structure from a file. in Mirage, this file is produced by a "run kmeans" command | no such file is read |
| format hstruct fileName | read a hierarchical structure from a file. in Mirage, this file is produced by a "run hclus" command | no such file is read |
| format cstruct fileName | read a composite structure from a file. a composite structure is a concatenation of a partitional and a hierarchical structure. the hierarchical structure is built on top of the partitional cluster centers. in Mirage, this is produced by a "run phclus" command | no such file is read |
Please refer to the demo examples and the output of built-in clustering commands.