Overview

Mirage -- for interactive or offline data analysis and visualization.

Background

Mirage descends from Proximal, a software originally designed for interactive exploration and classification of large datasets produced in photonics simulations. It has since evolved into a research tool for experiments in algorithmic methods for pattern recognition. There are no built-in limitations, other than those determined by machine capacity, for dimensionality, number of entries, and types of measurements. Groups of related attributes that are on comparable scales can be defined as a feature vector. It is assumed that in a space of such feature vectors, a meaningful metric exists (Euclidean distance) for computing clusters.

In addition to the support for basic exploratory data analysis, special focus is made on studying correlation of multiple proximity structures computed from the same data. Two simple clustering methods (kmeans for partitional clustering, and complete-linkage agglomerative procedure for hierarchical clustering) are included to compute proximity structures. Clustering results from external algorithms can be imported provided that they are converted to the same format. The data format, command syntax, data operations, and exploration algorithms are still under development.

Usage

java -jar Mirage$VERSION.jar [options]

Options

-data myData.dat	start with a specific data file
-log myLog.txt	direct run time output to a specific file
-path myPath.txt	create temporary job directories under a specific path
-off	turn off all graphics and run the interpreter in command line mode
-cmd myScript	start with a specific command script

Example Runs

java -jar Mirage$VERSION.jar
java -jar Mirage$VERSION.jar -data myData.dat
java -jar Mirage$VERSION.jar -off
java -jar Mirage$VERSION.jar -log myLog.txt
java -jar Mirage$VERSION.jar -path myPath
java -jar Mirage$VERSION.jar -path myPath -log myPath/myLog.txt
java -jar Mirage$VERSION.jar -off -cmd myScript.dat
java -jar Mirage$VERSION.jar -off -cmd myScript.dat -log myLog.txt
java -jar Mirage$VERSION.jar -off -cmd myScript.dat -log myLog.txt >& /dev/null &

Data Format

Data format can be specified in 3 ways:
(1) by format statements stored in a separate format file name.fmt that accompanies the data file name.dat;
(2) by format statements preceding all data lines in the data file; or
(3) by using the default options when neither (1) nor (2) is in place.
Specifying data formats is essential if not all attributes are on the same scale. A vector statement in a data format defines which attributes are to be interpreted as a group. By default, any line beginning with a "#" is skipped, i.e., assumed to be neither a format nor a data line. Other skip strings can be used by loading the data set with special options. See the Commands for details.

The Default Format

The default data format uses the first line of the data file to determines how many variables there are, and names each variable as X1, X2, ..., etc. There is no designated ID field, and the data entries will be identified with their sequential numbers in the input order.

Format Examples

(each format statement must be on a new, single line)

statement	meaning	defaults
format var v1 v2 ...	specifies the complete list of variable names; this line must precede all other format lines	X1 X2 ...
format id v1	designates variable v1 to be the entry identifier	entries identified by serial number in input order
format text v2	force variable v2 to be interpreted as a text field	a field is text only if not parsable as a number
format vec vecName v1 v2 ...	defines a feature vector "vecName" with variables v1, v2, ... etc.	no feature vectors defined
format extent vecName xlabel X xorig 8.0 xmin 8.5 xmax 22.5 ylabel Y yorig 0.0 ymin 0.0 ymax 100.0	specify the labels of the x, y axes in a feature vector plot and the extents and origins. the x axis refers to the name of each component, and the y axis refers to the value of the component. this allows for better reading of the component values and some control of the scope of the display.	xlabel = "x"; xorig = 0.0; xmin = 0.0; xmax = 1.0; ylabel = "y"; yorig = 0.0; ymin = 0.0; ymax = 1.0;
format image myImage.jpg row varY col varX	bind the dataset to a JPEG image in an external file, using variables varY and varX to specify the coordinates of a data point in this image. varY and varX must be included in a previous "format var" statement	no image is bound to a data set
format pstruct fileName	read a partitional structure from a file. in Mirage, this file is produced by a "run kmeans" command	no such file is read
format hstruct fileName	read a hierarchical structure from a file. in Mirage, this file is produced by a "run hclus" command	no such file is read
format cstruct fileName	read a composite structure from a file. a composite structure is a concatenation of a partitional and a hierarchical structure. the hierarchical structure is built on top of the partitional cluster centers. in Mirage, this is produced by a "run phclus" command	no such file is read

Format of Clustering Results

Please refer to the demo examples and the output of built-in clustering commands.