This patch adds:
* Gini index and information gain ratio as
DecisionTree split options;
* handling for numeric Attributes (split point
chosen naïvely on the basis of maximum entropy);
* A couple of additional utility functions in base/
* A new dataset (see sources.txt) for testing.
Performance on Iris performs markedly without discretisation.
This patch
* Adds a one-vs-all meta classifier into meta/
* Adds a LinearSVC (essentially the same as LogisticRegression
but with different libsvm parameters) to linear_models/
* Adds a MultiLinearSVC into ensemble/ for predicting
CategoricalAttribute classes with the LinearSVC
* Adds a new example dataset based on classifying article headlines.
The example dataset is drawn from WikiNews, and consists of an average,
min and max Word2Vec representation of article headlines from three
categories. The Word2Vec model was computed offline using gensim.
* Refactors KNNClassifier to use them
* csv handling moved back into base due to a circular dependency
* Also adds the datasets used to test CSV handling