Applied Math & Computer Science Lab
Data Analysis, Optimization & Mathematical Modeling, Artificial Intelligence, Neural Net For Everyday Life Applications
Artificial Intelligence/Data Mining Links Webmaster Resources AMCSL Forum: Web Mining Submit Link New Additions Archive Consulting Service
Products      Clickstream Miner   
Search the Web:    

Naive Bayes Classification with Perl



    In what following we will implement Naive Bayes classifier in perl. The problem that will be used here for classification is following: for given data below need to classify new data pair ('perl language').



The first two columns in the data are features (or attributes) and the 3rd column (CS or FV) is the class label that is assigned based on the values of the featues. Thus the Naive Bayes classifier should be able to assign the label to new not classified yet data. The theory of naive Bayes classification is well described in many books on machine learning or data mining. So here will follow just some notes on perl script for Naive Bayes classification.
     The script reads the input data, the number of features can be any number however the last column is the class label column. The first step is counting of frequency for each class, we need this number for calculating probabilities.
     The second step is calculating 3 dimentional matrix P, which is used for estimating probabilities. The matrix P[i][j][k] is estimate of probability that i-th feature (attribute) has the value that has the index j when the item class has the index k. In mathematical notation it is P(featurei=valuej|class=classk)
     The last step is classifying. It's just selecting the class that has max estimate of probability for given input data. The output in our example is 'CS' which means that it belongs to the class 'CS'.
     Thus we saw how naive Bayes classification can be implemented in perl for classifying text data. It's easy to implement and easy to use.



Source Code

1. Naive Bayes Classification Perl Script