In probabilistic classification the class of new instance is calculated based on sum of the distance-weighted contributions
of each training instance for each class. Then the class that has the highest sum is selected as the class for the new instance.
As an example consider the following problem:
Classify the points within square area (length of sides=1)
Let diagonal from (0,0) to (1,1) divides all points in 2 classes.
The training instances are randomly generated and the class is assigned based on what part of square the instance belong.
So for each training instance (x,y)
The perl code below is created for this algorithm. The training data is used to classify
the number of new instances. In the end the confusion matrix is printed to screen to show how many
correctly or not correctly instances are classified.
This example is for 2 classes with 2 dimensions for instances. However this is just initial parameters and can be easy changed
in the begining of the program in case different number of classes / dimensions is needed.
The distance-weighted contribution is calculated by the program as
The sigma is the parameter that should be ajusted to get enough small error of the
classifying.
The perl code for the script can be downloaded from the link below. The script can be used to create classifier for different data mining purposes.