Tuesday, August 5, 2008

PS 29-138: Making it easy and intuitive to do informative multivariate analyses with too many species

M. Henry H. Stevens, Miami University

Background/Question/Methods

Scientists in many disciplines including ecology are generating massive amounts of data regarding a relatively small number of samples, but we have great difficulty making sense of it all. Here I describe a statistical method that extracts information about such data by partitioning the multivariate variation among samples and applying hypothesis tests. In the community ecology context, it can partition variation in species composition, and relate experimental treatments or descriptive predictors to this variation. This relatively new method goes by various names including nonparametric multivariate analysis of variance, distance-based redundancy analysis, and multiple regression analysis on distance matrices. It is most easily understood as multivariate analysis of variance (MANOVA), but where the sums of squares is estimated by a little known technique, using the n × n outer product matrix YY′. This alternate method has long been known, but has not often been utilized.

Results/Conclusions

The use of this method now allows the use of any metric or semimetric measure dissimilarity or distance between samples. This method therefore allows a partitioning of variance using distance measures that more faithfully describe relevant differences among samples than do Euclidean distances, which are implicit in the classical method. Permutation tests, based on F-tests, provide powerful and robust estimators of frequentist test criteria (i.e. P-values). I also describe methods that can identify which response variables are most important in generating the patterns originally seen in the full response matrix. In short, this method allows investigators to quickly assess the relative importance of categorical and continuous predictors of multivariate response data, in the same conceptual framework as a simple ANOVA.