Rough out logic flow for an average perceptron
Fleshed out example data. Working on FixedDataGrid
support.
Dont use Binary use Float
Update processData to use base helpers to read csv
Move class to end of feature list
Add test for processData
process data to instances
Create path fixed
Add test around Fit. First steps
Modified example, added tests, small fixes
* Only numeric and categorical ARFF attributes are currently supported.
* Only the dense version of the ARFF format is supported.
* Compressed format is .tar.gz file which should allow extensibility.
* Attributes stored using JSON representations.
* Also offers smarter estimation of the precision of numeric Attributes.
* Also adds support for writing instances to CSV
This patch adds:
* Gini index and information gain ratio as
DecisionTree split options;
* handling for numeric Attributes (split point
chosen naïvely on the basis of maximum entropy);
* A couple of additional utility functions in base/
* A new dataset (see sources.txt) for testing.
Performance on Iris performs markedly without discretisation.
This patch
* Adds a one-vs-all meta classifier into meta/
* Adds a LinearSVC (essentially the same as LogisticRegression
but with different libsvm parameters) to linear_models/
* Adds a MultiLinearSVC into ensemble/ for predicting
CategoricalAttribute classes with the LinearSVC
* Adds a new example dataset based on classifying article headlines.
The example dataset is drawn from WikiNews, and consists of an average,
min and max Word2Vec representation of article headlines from three
categories. The Word2Vec model was computed offline using gensim.
* Refactors KNNClassifier to use them
* csv handling moved back into base due to a circular dependency
* Also adds the datasets used to test CSV handling