|
Description
This program is an implementation of the RankBoost algorithm [Freund et al., 2003] also described in [Amini, 2015; p.177-184].
Download and Installation
The program is free for scientific use only, it is developed on Linux and the source code is available from:
http://ama.liglab.fr/~amini/RankBoost/RankBoost.tar.bz2
After downloading the file, and unpackting it:
> bzip2 -cd RankBoost.tar.bz2 | tar xvf -
you need to compile the program in the new directory RankBoost/
> make
After compilation, two executables are created:
- RankBoost-Train (for training the model)
- RankBoost-Test (for testing it)
Training and testing
Both train and test modules operate on feature:value representation of examples:
- Rel feature:value feature:value
where Rel (in -1|1) is the relevance judgement of an example; -1 (or 1) if the example is judged irrelevant (resp. relevant) to the given topic. In RankBoost/ there are two training_set and test_set files in the directory Example/ which are given for test purposes. training_set 10 training examples and test_set contains 8,000 test examples. (These datasets are built over a subset of the RCV1 Reuters collection).
Train the model:
> RankBoost [options] input_file parameter_file
Options are:
-t (integer) | The number of boosting iterations (default 10),
|
-p (integer) | The number of candidate thresholds over features - stumps (default 10),
|
-d (integer) | Display (default 1),
|
-? | Help
|
Test the model:
> RankBoost-Train input_file parameter_file
Example
Running the program on training_set [Freund et al. 2003] (i.e. 10 candidate thresholds and 10 boosting iterations) gives
> ./RankBoost_Train Example/training_set Params
La base d'apprentissage contient 10 exemples, en dimension 20080
Apprentissage ...
1 --> alpha= 0.9729550745276566 j*= 28 theta*=1.00000000
2 --> alpha= 0.7679765526894443 j*= 3 theta*=1.00000000
3 --> alpha= 0.8061756522837317 j*= 28 theta*=1.00000000
4 --> alpha= 0.9058990607798512 j*= 53 theta*=1.00000000
5 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000
6 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000
7 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000
8 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000
9 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000
10 --> alpha= 0.8843577731700001 j*= 657 theta*=3.00000000
Where, j* and theta* are the chosen feature variable and the corresponding threshold which determine the base ranking function at a given iteration ([Freund et al., 2003] eq. 9) and alpha correspond to the boosting weights ([Freund et al., 2003] eq. 6).
The testing of the previous model gives
> /RankBoost-Test Example/test_set Params
La base de test contient 8000 exemples en dimension 21530
AUC=0.518593 AvP=0.299255
This program is publicly available for research use only. It should not be distributed for commercial use and the author is not responsible for any (mis)use of this algorithm.
Acknowledgements
The author is thankful to Reuters for making the RCV1/RCV2 data available and granting permission to distribute processed versions of it as the examples used in this release come from part of RCV1 collection.
Bibliography
[Amini, 2015] Massih-Reza Amini. Apprentissage Machine: de la théorie à la pratique. Eyrolles, 2015.
[Freud et al., 2003] Yoav Freund, Raj Iyer, Robert E. Schapire, Yoram Singer. An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research (JMLR 2003), 2003
|
|