golearn

OrgLion-ML/golearn

Fork 0

mirror of https://github.com/sjwhitworth/golearn.git synced 2025-04-26 13:49:14 +08:00

Commit Graph

Author	SHA1	Message	Date
Ilya Tocar	676f69a426	trees: speed-up training Avoid quadratic loop in getNumericAttributeEntropy. We don't need to recalculate whole distribution for each split, just move changed values. Also use array of slices instead of map of maps of strings to avoid map overhead. For our case I see time reductions from 100+ hours to 50 minutes. I've added benchmark with synthetic data (iris.csv repeated 100 times) and it also shows a nice improvement: name old time/op new time/op delta RandomForestFit-8 117s ± 4% 0s ± 1% -99.61% (p=0.001 n=5+10) 0 is a rounding quirk of benchstat, it should be closer to 0.5s: name time/op RandomForestFit-8 460ms ± 1%	2018-05-08 14:59:41 -05:00

Author

SHA1

Message

Date

Ilya Tocar

676f69a426

trees: speed-up training

Avoid quadratic loop in getNumericAttributeEntropy.
We don't need to recalculate whole distribution for each split,
just move changed values. Also use array of slices instead of
map of maps of strings to avoid map overhead.

For our case I see time reductions from 100+ hours to 50 minutes.
I've added benchmark with synthetic data (iris.csv repeated 100 times)
and it also shows a nice improvement:

name               old time/op  new time/op  delta
RandomForestFit-8    117s ± 4%      0s ± 1%  -99.61%  (p=0.001 n=5+10)

0 is a rounding quirk of benchstat, it should be closer to 0.5s:

name               time/op
RandomForestFit-8  460ms ± 1%

2018-05-08 14:59:41 -05:00

1 Commits