Discussiones Mathematicae Probability and Statistics 25(2) (2005) 217-239

EFFECT OF CHOICE OF DISSIMILARITY MEASURE ON CLASSIFICATION EFFICIENCY WITH NEAREST NEIGHBOR METHOD

Tomasz Górecki

Faculty of Mathematics and Computer Science,
Adam Mickiewicz University,
Umultowska 87, 61-614 Poznań

e-mail:drizzt@amu.edu.pl

Abstract

In this paper we will precisely analyze the nearest neighbor method for different dissimilarity measures, classical and weighed, for which methods of distinguishing were worked out. We will propose looking for weights in the space of discriminant coordinates. Experimental results based on a number of real data sets are presented and analyzed to illustrate the benefits of the proposed methods. As classical dissimilarity measures we will use the Euclidean metric, Manhattan and post office metric. We gave the first two metrics weights and now these measures are not metrics because the triangle inequality does not hold. Howeover, it does not make them useless for the nearest neighbor classification method. Additionally, we will analyze different methods of tie-breaking.

Keywords and Phrases: nearest neighbor method, discriminant coordinates, dissimilarity measures, estimators of classification error.

2000 Mathematics Subject Classification: 62H30, 62J05.

References

[1]C. Blake, E. Keogh and C. Merz, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/ mlearn/MLRepository.html, Univeristy of California, Irvine, Department of Information and Computer Sciences.
[2]T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Trans. Information Theory 13 (1) (1967), 21-27.
[3] L. Devroye, L. Gy[o\ddot]rfi and G. Lugosi, Probabilistic Theory of Pattern Recognition, Springer New York 1996.
[4]R. Gnanadeskian, Methods for Statistical Data Analysis of Multivariate Observations, John Wiley & Sons London Second, New York 1997.
[5] R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis, Prentice-Hall, New Jersey 1982.
[6]W.J. Krzanowski and F.H.C. Marriott, Multivariate Analysis, 1 Distributions, Ordination and Inference, Edward Arnold London 1994.
[7]W.J. Krzanowski and F.H.C. Marriott, Multivariate Analysis, 2 Classification, Covariance Structures and Repeated Measurements, London 1995.
[8] D.F. Morrison, Multivariate statistical analysis, PWN, Warszawa 1990.
[9]R. Paredes and E. Vidal, A class-dependent weighted dissimilarity measure for nearest neighbor classification problems, Pattern Recognition Letters 21 (2000) 1027-1036.
[10]G.A.F. Seber, Multivariate Observations, John Wiley & Sons, New York 1984.

Received 18 June 2004
Revised 4 August 2005