1
0
mirror of https://github.com/sjwhitworth/golearn.git synced 2025-04-26 13:49:14 +08:00

7 Commits

Author SHA1 Message Date
Ilya Tocar
676f69a426 trees: speed-up training
Avoid quadratic loop in getNumericAttributeEntropy.
We don't need to recalculate whole distribution for each split,
just move changed values. Also use array of slices instead of
map of maps of strings to avoid map overhead.

For our case I see time reductions from 100+ hours to 50 minutes.
I've added benchmark with synthetic data (iris.csv repeated 100 times)
and it also shows a nice improvement:

name               old time/op  new time/op  delta
RandomForestFit-8    117s ± 4%      0s ± 1%  -99.61%  (p=0.001 n=5+10)

0 is a rounding quirk of benchstat, it should be closer to 0.5s:

name               time/op
RandomForestFit-8  460ms ± 1%
2018-05-08 14:59:41 -05:00
Richard Townsend
7ba57fe6df trees: Handling FloatAttributes.
This patch adds:

	* Gini index and information gain ratio as
           DecisionTree split options;
	* handling for numeric Attributes (split point
           chosen naïvely on the basis of maximum entropy);
	* A couple of additional utility functions in base/
	* A new dataset (see sources.txt) for testing.

Performance on Iris performs markedly without discretisation.
2014-10-26 17:40:38 +00:00
Amit Kumar Gupta
21bb2fc9fa Remove redundant import renames 2014-08-22 07:21:24 +00:00
Richard Townsend
c2d040af30 trees: merge from v2-instances 2014-08-03 15:17:13 +01:00
Niclas Jern
627a5537d3 Comments should be of the form "<Struct> ..." or "<MethodName> ..." 2014-07-18 13:48:28 +03:00
Richard Townsend
12ace9def5 Identified source of the low accuracy 2014-05-17 20:37:19 +01:00
Richard Townsend
db3ac3c695 ID3 algorithm working 2014-05-17 17:28:51 +01:00