Image Segmentation with Opencv and cvMahalanobis

=Introduction= When I was researching this technique for use in the Minigrand Project, I had a hard time finding solid information. Because this is the website with the most views I have access to, here is some info to get you started if you want to use this method for any of your projects.
 * I graduated!! But that also means that I'm not really a member of the Robotics Club anymore. If you find this article useful, bug me at Adam dot Brockett at gmail to finish it Adam.brockett 18:33, 20 August 2010 (UTC)

The general problem we are trying to solve here is to have our computer make a judgement call as to whether a given element belongs to a predefined set or not. In machine learning terms, we want to build a Classifier. This problem will break down into two parts:
 * 1) Generate a model of this set.
 * 2) Compute how "close" a new piece of data is to this model.

I think the best was to show this is to take a look at a very simple classifier for pixels.

=Euclidian Distance= To start off with some quick review, remember from geometry that if we want to find the distance between two points in 2 dimensions, we simply compute $$d=\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}$$. As you intuitively expect, this can be expanded to 3 dimensions: $$ d=\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2 + (z_1 - z_2)^2}$$.

Every pixel can be thought of as a point in a 3 dimensional space (with the dimensions being Red Blue and Green). Let that sink in a bit, I'll wait. What this means is that the equation above is a perfect cannidate for "Part 2" of our Classifier. Now we just need to find a way to model the set of pixels that we are interested in. In the code examples below, I'm going to use the average from the Red, Green, and Blue channels from a user selected region. But you can use anything you come up with. For example, if you are interested in only red pixels, use (255, 0, 0).

=Mahalanobis Distance= In general, the above technique works pretty well, especially considering how simple it is. But we can do better. To explain how, allow me to take an asside to talk about statistics.

Classifiers aren't just used for image segmentation. In this example, we're going to talk about peoples body weight and height. Lets say we are trying to find people who have a similar body type to a target. However, because we're pulling in data from an outside survey, we don't get to choose the units. In this extreme example, the weight is in units of ounces, and height is in meters. Obviously, if we use the Euclidean Distance method we would not get good results, because a small change in body weight would overpower any change in meters. We need to find a way to exploit the fact that we know more about our target set than just the average. We also know the range and distribution of each of the features. (In fact, we know even more than that. We also know the correlation between the features.  For instance, tall people are usually heavier). Using Mahalanobis Distance instead of Euclidian Distance will allow us to use this information.