This patch adds:
* Gini index and information gain ratio as
DecisionTree split options;
* handling for numeric Attributes (split point
chosen naïvely on the basis of maximum entropy);
* A couple of additional utility functions in base/
* A new dataset (see sources.txt) for testing.
Performance on Iris performs markedly without discretisation.
This patch also:
* Completes removal of the edf/ package
* Corrects an erroneous print statement
* Introduces two new CSV functions
* ParseCSVToInstancesTemplated makes sure that
reading a second CSV file maintains strict Attribute
compatibility with an existing DenseInstances
* ParseCSVToInstancesWithAttributeGroups gives more control
over where Attributes end up in memory, important for
gaining predictable control over the KNN optimisation
* Decouples BinaryAttributeGroup from FixedAttributeGroup for
better casting support