|
||||||||||||
| AI/Data Mining Links | Online Free Courses | Online Bookstore | AMCSL Forum | Submit Link | New Additions | Archive | ||||||
| Practical Data Mining Courses Get Certificate of Completion Now for Free | ||||||||||||
Feature Subset Selection with PerlIntroductionFeature selection is useful for learning relevant features from data. Many different algorithms exist for this technique. We will consider numerical example and programming in perl for feature subset selection using sequential forward algorithm. This algorithm search through feature space for good features. Thus it is filter type algorithm. The name of algorithm indicates how the search is implemented: it's start from empty set and add features one by one. There are also different ways how to measure goodness of selected feature subset. We will use correlation measure.Numerical ExampleThe following artificial dataset is created for this example:y = 2 IF x0 + x2 > 5 y = 1 IF x0 + x2 <= 5 The index here starts from 0 as the program also counts from 0. The index 0 means first column from the left This dataset is saved in the txt file , space is separator between columns and the program will open this file and load data to memory for processing. Let say we don't know how is obtained last column and want to detect features that are relevant to this column. Perl Script for Feature Subset SelectionThe script first does several passes through dataset and calculates correlation coefficients for all pairs of features including last column. Then it starts looking for features. In the beginning the subset of features is empty. In the first step we are looking for one feature that make max our goodness criteria. We add this feature to our subset of features. In the program we use string of column indexes separated by "_" to track selected features.After first pass we are doing second step. In this pass we again choose one by one all features except the one we aready choosed in the previous step and calculate the goodness of 2 features sets , and keep the record of best feature.In the end of this pass we select the best feature and update our record for feature subset. In the similar way we are doing next steps. For example in the next step we add one by one any remaining feature to our selected 2 feature subset. This gives us 3 feature subset and we calculate the goodness and again select the best set in the end. Program OutputThe program also output some temp. result. So if we run for our artificial example we will get the following output. |
|