Here we will implement in perl Naive Bayes classifier for text files.
The problem that is used for classification is following:
We have the set of labeled text files in some folder. The file name of each file is the class label plus "_" and the index number and the extension. So it can look like this: classname_0.txt, classname_1.txt, classname_2.txt ....
We need to classify new text file.
The perl script opens each file , counts the frequency of each word and creates matrix of probabilities for each word and class. This matrix later is used to find the class label that is the best for need to be classified document.
|
|