Preference Learning Toolbox
Automatic Feature Selection
This toolbox supports a number of feature selection algorithms to automatically find the most relevant subset of input features for the model.\n +
Different search methods can be selected to traverse the space of possible feature combinations. The performance of each subset of features considered is computed as the prediction accuracy of a model trained using that subset of features as input. All of the preference learning algorithms implemented in the tool can be used to train this model.
Search algorithm
The user can choose between two hill-climbing algorithms: sequential forward feature selection [1] and n-best individuals selection [1].
- n-best individuals selection (n-Best): each of the features is evaluated individually and the n features with the highest performance are selected.
- Sequential forward feature selection (SFS): bottom-up algorithm where one feature is added at a time to the current feature set. The feature to be added is selected from the subset of the remaining features such that the new feature set generates the maximum value of the performance function over all candidate features for addition. The selection procedure terminates when an added feature yields equal or lower performance to the performance obtained without it.
Evaluation method
The user can select the training algorithm and the method to evaluate its performance.
Training algorithms
Two training algorithms for multi-layer perceptrons[2] and one for support vector machines[3] are supported, namely backpropagation[4], neuroevolution[5] and ranking SVM[6].
- Multi-layer perceptron: this is the most common artificial neural network topology. It features a set of neurons organized in a sequence of layers. The output of each neuron is an input to every neuron in the next layer. The input features are connected to all the neurons in the first layer and the output of the neurons in the last layer form the outputs of the network. The user can choose the number of layers, the number of neurons per layer (except for the output layer which contains a single neuron) and the activation function of each neuron, affecting the complexity of the model trained.
- Backpropagation: this is a gradient-descent algorithm that iteratively (over a given number of epochs) optimizes an error function by adjusts the weights of the network proportionally to the gradient of the error with respect to the current value of the weights and current data samples. The proportion and therefofore the strength of each update is regulated by the given learning rate. The error function used is the Rank Margin function which for a given pair of data samples (A and B, with A preferred over B) yields 0 if the network output for A (fA) is more than one unit larger than the network output for B (fB) and 1.0-((fA)-(fB)) otherwise. The total error is averaged over the complete set of pairs in the training set. If the error is below a given threshold, training stops before reaching the specified number of epochs, and the current weight values are returned as final model.
- Neuro-evolution: this is a genetic-search based algorithm that iteratively optimizes a set (population) of neural networks (individuals) to maximize a fitness function. The fitest neural network after a given number of iterations (generations) is returned as the trained model. The set of individuals on each generation is created by copying a given fraction of the fittest individuals in the previous generation (elitism size) and filling the rest of the population (population size) with offspring resulting from combining pairs of individuals in the previous population (parents). Parents are selected using a roulette wheel algorithm that selects two individuals proportionally to their fitness from a given fraction of the fittest individuals (parents). Once a pair of individuals is selected, they are combined applying a cross-over operator (cross-over type) with a given probability (cross-over probability) and a Gaussian mutation (mean equal to 0.0 and standard deviation equal to 0.1) to each gene (network weight) with a given probability (mutation probability). For each network and pair in the dataset, the fitness function calculates the logistic sigmoid function of the difference between the network's output for the preferred and the network's output for the non-perferred objects in the pair. The parameter of the logistic function (which multiplies the output difference) is equal to 30 if the difference is positive and 5.0 otherwise. The fitness of a network is calculated as the average of this value over the complete set of training pairs.
- Support vector machine: this is a binary classifier that separates the iput put samples linearly in a projected +
space. The decision boundary of the classifier is given by a linear combination of training samples +
(called support vectors) in the projected space. The projection in provided by the kernel function that the user must select.
+ The support vector and weights are selected to satisfy a set of constrains derived from the input samples and a cost parameter (C) defined by the user which regulates the penalization of misclassified training samples; in this implementation, +
the quadratic programmer solver contained in LIBSVM is used. +
- Ranking SVM: this training algorithm uses the same solver as standard training algorithms for binary SVMs; +
the only difference lies in the set of constrains which are defined in terms of pairwise preferences between trainig samples.
+
Evaluation method
The user can choose to either train the models using the complete dataset (no validation) and therefore assesssing the performance as the percentage of correctly classified training pairs or can test the generability of the results by using K-fold cross validation that will train k models using different partitions of the data and return the percentage of correctly classified pairs not used for training (validation accuracy). +
References
+
- G. N. Yannakakis, H. P. Martínez, and A. Jhala. Towards affective camera control in games. User Modeling and User-Adapted Interaction, 20(4):313–340, 2010b.
+
- [2] C.M. Bishop. Neural networks for pattern recognition. Oxford university press, 1995
+
- [3] C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995
+
- [4] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and +
K. Weinberger. Learning to rank with (a lot of) word features. Information retrieval, 13(3):291–314, 2010.
+
- [5] G.N. Yannakakis, M. Maragoudakis, and J. Hallam. Preference learning for cognitive modeling: a case study on entertainment preferences. +
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 39(6):1165–1175, 2009.
+
- [6] T. Joachims, Optimizing Search Engines Using Clickthrough Data. Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002.
+