Decision Boundary - HomePage Server for UT Psychology

advertisement
Decision Boundary
A Decision Boundary is a partition in n-dimensional space that divides the space
into two or more response regions. A decision boundary can take any functional form, but
it is often useful to derive the optimal decision boundary that maximizes long-run
accuracy.x
The use of decision boundaries is widespread, and forms the basis of a branch of
statistics known as Discriminant Analysis. Usually Discriminant Analysis assumes a
linear decision bound and has been applied in many settings. For example, the clinical
psychiatrist might be interested in identifying the set of factors that best predict whether
an individual is likely to evidence some clinical disorder. To achieve this goal the
researcher will identify a set of predictor variables taken at time 1 (e.g., symptoms,
neuropsychological test scores, etc), and will construct a linear function of these
predictors that best separates depressed from non-depressed or schizophrenic from nonschizophrenic patients diagnosed at time 2. The resulting decision bound can then be
applied to symptom and neuropsychological test data collected on new patients to
determine whether they are at risk for that clinical disorder later in life. Similar
applications can be found in machine learning (e.g., automated speech recognition) and
several other domains.
To make this definition more rigorous, suppose we have two categories of clinical
disorders, such as depressed and non-depressed individuals with predictor variables in ndimensional space. Denote the two multivariate probability density functions fD(x) and
fND(x) and the two diagnoses RD and RND. To maximize accuracy it is optimal to use the
following decision rule:
If fD(x)/fND(x) > 1 then RD, else RND.
(1)
Notice that the optimal decision bound is the set of points that satisfy
FD(x)/fND(x) = 1
It is common to assume that fD(x) and fND(x) are multivariate normal. Suppose
that D and ND denote the depressed and non-depressed means, respectively, and that D
and ND denote the multivariate normal covariance matrices. In addition, suppose that D
= ND = . Under the latter condition the optimal decision bound is linear.
Expanding Equation 1 yields
(2)-n/2 ||-1/2 exp[-1/2(x - D)’ -1 (x - D)]
fD(x)/fND(x) = 1 = --------------------------------------------------(2)-n/2 ||-1/2 exp[-1/2(x - ND)’ -1 (x - ND)]
= exp[-1/2(x - D)’ -1 (x - D) + 1/2(x - ND)’ -1 (x - ND)]
(2)
Taking the natural log of both sides of Equation 2 yields
h(x) = ln [fD(x)/fND(x)] = (ND - D)’ -1x + ½(D’ -1 D - ND’ -1 ND)
(3)
which is linear in x.
As a concrete example, suppose that the objects are two-dimensional with D =
[100 200]’, ND = [200 100], D = ND =  50I (where I is the identify matrix). Applying
Equation 3 yields
.04x1 - .04x2 = 0
Further Readings
Ashby, F.G., & Maddox, W.T., (1993). Relations between prototype, exemplar, and
decision bound models of categorization. Journal of Mathematical Psychology, 37, 372400.
Fukunaga, K. 1972). Introduction to statistical pattern recognition. New York: Academic
Press.
Morrison, D.F. (1967). Multivariate statistical methods. New York: McGraw-Hill.
Download