Collaborative Filtering for Wesite Traffic Data - Perl Source Code
When browsing on Amazon.com - I was always interesting how they do this feature - "Customers Who Bought Items in Your Recent History Also Bought ..."
After reading the book [1] I decided to apply collaborative filtering to website traffic data.
Weblog data can provide the answer to such questions as: the users who visited the page A also
visited pages ... OR the users who were interested in link A also clicked on links ....
So I created perl source code. The program finds similar pages for every page in the
weblog. In our context similar page
mean pages that were visited by similar users.
The script opens weblog file, navigates through each line, extracts ip,
url and saves data in the memory.
The final output for this step is datatable, one row for each page, and each cell in this raw indicates
how many visits was done by specific ip.
This is two dimensional array data[url][ip]. And this array is input for the next step - collaborative filtering.
In this step for each page we use the data array to calculate
similarity mesure between this page and
all other pages. On the end the script print the results to file.
With this feature on website the user can find more easy and quickly interesting
links, pages or web resources. And as we just saw it's very easy to implement
in perl.