Applied Math & Computer Science Lab
Data Analysis, Optimization & Mathematical Modeling, Artificial Intelligence, Neural Net For Everyday Life Applications
Links Online Free Courses Bookstore Forum Submit Link New Additions Archive
Search the Web:    

Search Keywords Analysis Using K-means Clustering Algorithm

Search keywords the web users use for search can provide useful information and this section will use data mining algorithm to divide search phrases in the groups. Group or group summary is more convenient and quick way to find most used keys associated with other words.

I downloaded search history from Google Analytics for the period about 2 years.After that I did manually some word changing to eliminate words with different spelling but the same meaning. For example the word 'k-means' has different ways of using, it can be also k_means, k means, kmeans, kmean. Next I took 2 column (one is search phrase, and one quantity count , next to search phrase column) and copied to text file. This text file is the input file for the program that is doing clustering.
The perl program itself is using k-means data mining algorithm which is developed early. The program has 3 steps:
1.Load data from input file to computer memory and prepare data array for k-means algorithm 2.K-means algorithm which is doing clustering based on similarity between search phrases.
3.Printing result to output file.
For calculating similarity the program uses euclidean distance. Current version of the program does not use quantity (second column).
Here is the example of output, just for cluster 3.

The first number is just index of search phrase, the number 3 after => is cluster number and the number in the end is a quantity. The cluster has been built on words adaboost code and it shows also some other words that are used for example with the word 'code' (source, how to code, code example, algorithm code ....)
Thus the example script showed how data mining (k-means algorithm) can be applied to group serach phrases. Some improvements that can be done to this program may include following:
1. Looking for trend between last time period and current period.
2. Automatic conversion words with different spelling but having the same meaning.
3. Automatic modifying words with plural form, suffix.


References


1. Search keywords analysis using k-means clustering algorithm - perl script
2. Module for K-means clustering
3. K-means Clustering
4. Forum about clustering
5. Search keywords analysis using k-means clustering algorithm - online demo