golearn

mirror of https://github.com/sjwhitworth/golearn.git synced 2025-04-28 13:48:56 +08:00

History

Ilya Tocar 676f69a426 trees: speed-up training

Avoid quadratic loop in getNumericAttributeEntropy.
We don't need to recalculate whole distribution for each split,
just move changed values. Also use array of slices instead of
map of maps of strings to avoid map overhead.

For our case I see time reductions from 100+ hours to 50 minutes.
I've added benchmark with synthetic data (iris.csv repeated 100 times)
and it also shows a nice improvement:

name               old time/op  new time/op  delta
RandomForestFit-8    117s ± 4%      0s ± 1%  -99.61%  (p=0.001 n=5+10)

0 is a rounding quirk of benchstat, it should be closer to 0.5s:

name               time/op
RandomForestFit-8  460ms ± 1%

2018-05-08 14:59:41 -05:00

benchdata.csv

trees: speed-up training

2018-05-08 14:59:41 -05:00

entropy.go

trees: speed-up training

2018-05-08 14:59:41 -05:00

gini.go

trees: Handling FloatAttributes.

2014-10-26 17:40:38 +00:00

gr.go

trees: Handling FloatAttributes.

2014-10-26 17:40:38 +00:00

id3.go

trees: Try to fix premature write-after-Close issue

2018-01-28 16:35:55 +00:00

random.go

Fix bad import, reformat

2017-09-10 20:35:34 +01:00

tree_bench_test.go

trees: speed-up training

2018-05-08 14:59:41 -05:00

tree_test.go

Fixing all tests

2018-01-28 16:22:33 +00:00

trees.go

neural: stop-gap support for neural networks

2014-08-09 19:27:20 +01:00