IDC
A program for the calculation of independent contrasts.
by Liam J. Revell

Contents


1.  Getting started

2.  Running IDC

3.  Running multiple data sets and trees

4.  Reading the output

5.  References and further reading

6.  Contact information

7.  Appendix - new features - possible bugs

Getting Started

Introduction

IDC is a program designed to calculate independent contrasts from a multivariate quantitative data set and a fully bifurcating phylogeny in Newick format. The calculation of contrasts follows Felsenstein (1985).

IDC also calculates the variance-covariance matrix and correlation matrix of independent contrasts. These are calculated following Garland et al. (1992).

Installation - Linux/UNIX

To install in Linux/UNIX, download the tarball gzip archive file idc_program.tar.gz to your programs directory or other appropriate directory, e.g., /home/your_user_name/programs/idc_program.tar.gz. It is not necessary to create a directory structure for the program. To decompress, execute the following command:



The following directory structure should appear:



Before compiling IDC, first check the status of your gcc compiler:



You should see some variant of the above message. If not, you need to install a gcc compiler. See the follow link for help.

To compile, navigate to linux/ and type the following command:



If your terminal window looks like that after compiling IDC, you're probably in good shape.

Installation - Windows

Unfortunately, IDC is not a real Windows program. However it has been compiled for Windows and can be executed from the Command Prompt in Windows or by double-clicking on the executable.

Most Windows installations do not come packaged with a compiler, so I'm distributing the executable and hopefully that will work on your system (it should). The link to that executable is here. If you'd like to download a compiler to compile IDC yourself in Windows, I recommend MinGW. You should also be able to unzip the archive file idc_program.tar.gz using a program such as WinZip or WinAce.

Back to top

Running IDC

Input file format

Any run of IDC requires two input files. Input files should be created as plain text which can be accomplished in Linux by using a text editor such as gedit or in Windows by using WordPad and saving in Save as Type: Text Document Format.

The first input file is your quantitative trait data file, formatted as follows:


In this input file, the two integers in the header are the number of traits (2 in this example) and number of taxa (5), respectively. The taxa are then listed by number, and the traits are in columns adjacent to the appropriate taxon number. Below the data array is a conversion table, in which taxon numbers are converted to taxon names. Actually, this conversion table is only included for the edification of the user and is not even read by IDC. I use one to avoid getting taxa mixed up between the data and tree files.

The second input file is your tree file, formatted as follows:


Again- the taxon conversion table is only included for the edification of the user.

For ease of use the input files should be in the same directory as the executable!

NOTE ON FILE FORMAT- This program will probably be finicky about file format, so be careful to follow the format presented herein and the format of the example data files available with IDC.

Running IDC

Running IDC is easy. In Windows XP, you should be able to run the executable idc.exe simply by double clicking on it. If you'd prefer, you can also run it from the command prompt. The easiest way to get a command prompt in XP is to open a RUN window and enter:



At the command prompt, navigate to the appropriate directory and type:


To run in Linux/UNIX navigate to the appropriate directory and type:


From here on out execution in Linux/UNIX and Windows are identical, so I will follow execution in Linux/UNIX.

The entry screen should appear more or less as follows:



Press a key (other than space) to continue, and the following screen should appear:



Hitting S, T, and P in upper or lower case, followed by Enter will toggle the states of fairly obvious options. When running a single data set and tree, only two options are relevant: P, which toggles whether or not variance-covariance and correlation matrices of independent contrasts [calculated following Garland et al. (1992)] are output to file along with the contrasts. In parentheses are the currently selected states. When you've selected your options, press D and then Enter to continue. At this point you should be prompted to input data and tree filenames, and a desired output filename. Any file in the working directory with the same filename as your chosen output file will be overwritten without notice!. The prompting screen should appear as follows.



Press Enter and an output file with your contrasts should be created. Please note that the program pauses for effect for about 2 seconds before exiting. This is normal.

The output file will appear as follows:


More on reading the output file later.

Back to top

Running multiple data sets and trees

Why run multiple data sets or trees?

Many simulation studies generate large numbers of trees and data sets. Also, confidence intervals can be generated around a hypothesis from the comparative method by calculating independent contrasts from many resampled data sets and/or many bootstrap phylogenies or phylogenies sampled from the posterior distribution of a Bayesian analysis. Luckily, in IDC it is very easy to simultaneously run many datasets and/or many phylogenies.

File format for multiple data sets or trees

The file format for multiple datasets and trees is extremely similar to that used for a single data set/tree. The format for multiple data sets is as follows:


Note that the conversion table is absent.

The format for the tree file is even simpler and is as follows:


Note that data sets and trees need not all have the same number of taxa or even characters. However, in any case in which there are multiple data sets and trees there needs to be a one-to-one correspondence between data sets and phylogenies. (Obviously, in the case of a single data set and many trees or vice versa, there must be the same number of taxa in all phylogenies or data sets.)

Running multiple data sets and trees

First, make appropriate selections from the options screen:



After toggling to the desired selection, you will be prompted to enter the number of data sets or trees in your input file(s):



Press Enter and select D as before and you will be prompted to enter input file names. With multiple data sets and/or trees, the contrasts, variance-covariance matrices, and correlation matrices are printed to separate files for which you will be prompted to assign filenames:



Example output files from this analysis are shown below.

Example contrasts file:


Example variance-covariance matrix file:


The correlation matrix file is extremely similar in format.

Back to top

Reading the output from IDC

The general output from IDC looks as follows:


In section 1, column heads are:

number=>contrast number;
length=>branch length of contrast;
corrected=>corrected branch length;
z_i=>contrast for trait i, standardized to have the variance of contrasts with length = total tree length -- contrasts can also be standardized to have unit variance by dividing each by the square-root of the mean-square (which is not the same as the mean, haha) - also, this has been automated as a new feature;
contrast=>phylogenetic contrast in which A vs. B denotes the contrast between tips A and B, and (A,B) vs. C denotes the contrast between the common ancestor of A and B, and C.

In the next section is the variance-covariance matrix of independent contrasts. Diagonal elements are mean-squares and off-diagonal elements are mean cross-products.

Finally, in the last section is the correlation matrix of independent contrasts, in which the correlation is calculated through the origin following Garland et al. (1992).

When multiple data sets or phylogenies are used, by default IDC creates separate output files for the variance-covariance and correlation matrices, e.g.,


In this file, the integer above the matrix indicates the data set/phylogeny rank in the input file.

Back to top

References

1. Felsenstein, J. 1985. Phylogenies and the comparative method. Am. Nat. 125:1-15

2. Garland, T. Jr., P. H. Harvey, and A. R. Ives. 1992. Procedures for the analysis of quantitative data using independent contrasts. Syst. Biol. 41:18-32.

Back to top

Contact information

Please contact me by email with any questions, or if you find the program useful. My email is lrevell@fas.harvard.edu, and my other contact information is listed below.

Although I have thoroughly tested the program, I encourage users to do the same and I would be happy to hear about any bugs you might find.

Liam J. Revell
Department of Organismic and Evolutionary Biology
Harvard University
Cambridge MA 02138
(314) 935-7256

Back to top

Appendix

New features

Standardization to unit variance - An updated IDC can also print, instead of contrasts standardized to have the expected variance of a contrast = to the tree length, contrasts standardized to have unit variance following Felsenstein (1985). This is performed by selecting U from the modified options menu:



The default option is to not standardize the contrasts as the standardized contrasts can be readily obtained by dividing each contrast by the square-root of the mean-square. The variance-covariance matrix is still calculated from the unstandardized contrasts (or else it would be identical to the correlation matrix). Obviously, the unstandardized contrasts can be obtained from the standardized contrasts by multiplying each contrast by the square-root of the appropriate diagonal element of the variance-covariance matrix.

The output for the same analysis as shown above would now look like the following:


Minimalist output - An updated version of IDC can also return a minimalist output. This might be useful for batch processing (which is what the program is designed to do in the first place). This is performed by selecting M from the modified options menu:


Minimalist output looks as follows:



in which the three numbers preceding each set of contrasts indicate (respectively) the rank of the data set in the input file, the number of contrasts, and the number of traits.

Back to top

Possible bugs

Sept. 13, 2006 - It's possible that contrasts will be calculated incorrectly if the taxa order in the data file does not correspond to numerical order. Until I confirm and fix this problem, users should just be careful to order the entries in the data file numerically.

Sept. 18, 2006 - One user problem reported today is that if extra tabs are included in the input file (as MS excel tends to do), then the file will be read improperly and an incorrect result will be produced. This problem is particularly devious because it can result in the file being misread without a program error, thus the user might be inclined to believe that the analysis was completed correctly. Just be careful to delete all extraneous tabs from the input file (those at the ends of each line, particularly after the first line).




Back to top

Content copyright. Last updated 11 Oct. 2006.