golearn

mirror of https://github.com/sjwhitworth/golearn.git synced 2025-04-26 13:49:14 +08:00

Author	SHA1	Message	Date
Ilya Tocar	676f69a426	trees: speed-up training Avoid quadratic loop in getNumericAttributeEntropy. We don't need to recalculate whole distribution for each split, just move changed values. Also use array of slices instead of map of maps of strings to avoid map overhead. For our case I see time reductions from 100+ hours to 50 minutes. I've added benchmark with synthetic data (iris.csv repeated 100 times) and it also shows a nice improvement: name old time/op new time/op delta RandomForestFit-8 117s ± 4% 0s ± 1% -99.61% (p=0.001 n=5+10) 0 is a rounding quirk of benchstat, it should be closer to 0.5s: name time/op RandomForestFit-8 460ms ± 1%	2018-05-08 14:59:41 -05:00
Richard Townsend	7ba57fe6df	trees: Handling FloatAttributes. This patch adds: * Gini index and information gain ratio as DecisionTree split options; * handling for numeric Attributes (split point chosen naïvely on the basis of maximum entropy); * A couple of additional utility functions in base/ * A new dataset (see sources.txt) for testing. Performance on Iris performs markedly without discretisation.	2014-10-26 17:40:38 +00:00
Amit Kumar Gupta	21bb2fc9fa	Remove redundant import renames	2014-08-22 07:21:24 +00:00
Richard Townsend	c2d040af30	trees: merge from v2-instances	2014-08-03 15:17:13 +01:00
Niclas Jern	627a5537d3	Comments should be of the form "<Struct> ..." or "<MethodName> ..."	2014-07-18 13:48:28 +03:00
Richard Townsend	12ace9def5	Identified source of the low accuracy	2014-05-17 20:37:19 +01:00
Richard Townsend	db3ac3c695	ID3 algorithm working	2014-05-17 17:28:51 +01:00

7 Commits