Applied Math & Computer Science Lab
Data Analysis, Optimization & Mathematical Modeling, Artificial Intelligence, Neural Net For Everyday Life Applications
Links Online Free Courses Bookstore Forum Submit Link New Additions Archive
Search the Web:    

Hierarchical Clustering

Hierarchical clustering is one of the clustering methods which allows to group objects to clusters and build hierarchy of clusters. In the "bottom up" approach the program in the beginning assigns each object to separate cluster and at each step it merges two clusters. The decision which clusters to merge is based on the distance (or similarity) between clusters and can be implemented at different ways [2].



In this discussion we will use perl code to create program for hierarchical clustering with "bottom up" approach. The input data for clustering in our example is 2 dimensional array, however it's not limited size of two.
At each step we keep one dimensional array. The ith element of this array keeps track of objects that are included in ith cluster on the given step. The program saves object ids to string, each object id is separated by special separator character "_". For example if cluster 3 has objects 2,6,9 then 3d element of this array will have string value as "2_6_9"
After each step the size array will be decreased by one as we merge clusters. So in the last step there will be only one cluster with all objects. Of course we can make program to stop early if we need.
In the end of each step the program outputs clusters and objects that are included in the clusters. So we can see hierarchy of clusters.
The program is using Euclidean distance between objects.
For the input data

the output is produced by program is following:
In the end the program has only one cluster with all objects in this cluster. In the one step before the end the program merged objects 3 and 4 in a separate cluster. Thus based on the history how the clusters were created the hierarchy of clusters can be built from this output.
Perl source code link is provided below [1]. With some minor ajustments it can be used in data mining practial tasks for hierarchical clustering.


References



1. Hierarchical clustering - 'bottom up' approach - perl script
2. Hierarchical clustering - 'bottom up' approach - online clustering demo
3. Wikipedia: Hierarchical clustering