Hierarchical clustering is one of the clustering methods which allows to group objects
to clusters and build hierarchy of clusters. In the "top down" approach the program
in the beginning assigns all objects to one cluster and then at each step split selected cluster.
The decision which cluster to split is based on the distance (or similarity) between
objects in the cluster and can be implemented at different ways [2].
In this discussion we will use perl code to create program for hierarchical clustering with "top down" approach.
The input data for clustering in our example is 2 dimensional array, however it's not limited size of two.
At each step we keep one dimensional array. The ith element of this array keeps track of objects that are included in ith cluster on the given step.
The program saves object ids to string, each object id is separated by special separator character
"_". For example if cluster 3 has objects 2,6,9 then 3d element of this array will have string value as "2_6_9_"
After each step the size array will be decreased by one as we merge clusters.
In the end of each step the program outputs clusters and objects that are included in the clusters.
So we can see hierarchy of clusters.
The program is using Euclidean distance between objects.
For the input data
the output is produced by program is following:
Perl source code link is provided below [1]. With some minor ajustments it can be used in data mining
practial tasks for hierarchical clustering.