1
0
mirror of https://github.com/sjwhitworth/golearn.git synced 2025-04-26 13:49:14 +08:00

52 Commits

Author SHA1 Message Date
ss8651twtw
1e1b5f11fb Format code 2018-06-16 22:14:18 +08:00
yenck
bf907556f5 testcase 2018-06-16 22:11:59 +08:00
yenck
80bc1ac6f8 some test for C0 2018-06-16 22:11:59 +08:00
yenck
30071eb8a4 some test for C9 2018-06-16 22:11:59 +08:00
Ilya Tocar
676f69a426 trees: speed-up training
Avoid quadratic loop in getNumericAttributeEntropy.
We don't need to recalculate whole distribution for each split,
just move changed values. Also use array of slices instead of
map of maps of strings to avoid map overhead.

For our case I see time reductions from 100+ hours to 50 minutes.
I've added benchmark with synthetic data (iris.csv repeated 100 times)
and it also shows a nice improvement:

name               old time/op  new time/op  delta
RandomForestFit-8    117s ± 4%      0s ± 1%  -99.61%  (p=0.001 n=5+10)

0 is a rounding quirk of benchstat, it should be closer to 0.5s:

name               time/op
RandomForestFit-8  460ms ± 1%
2018-05-08 14:59:41 -05:00
Richard Townsend
58ae6f4d1b trees: Try to fix premature write-after-Close issue 2018-01-28 16:35:55 +00:00
Richard Townsend
e2279995c1 Fixing all tests 2018-01-28 16:22:33 +00:00
Richard Townsend
ce78cd0406 Passes the tests 2018-01-27 18:56:01 +00:00
Richard Townsend
f722f2e59d trees: implement serialization 2018-01-27 18:00:52 +00:00
Richard Townsend
e7fee0a2d1 Reformat, fix tests 2017-09-10 21:10:54 +01:00
Richard Townsend
fc110aab48 Fix bad import, reformat 2017-09-10 20:35:34 +01:00
Richard Townsend
aee475ca14 Fix the trees tests 2017-09-10 20:13:41 +01:00
Richard Townsend
e27215052b ensemble: tests pass 2017-09-10 19:30:02 +01:00
Richard Townsend
768d2cd19f meta: tests are almost passing 2017-09-10 16:59:05 +01:00
Richard Townsend
57e6054404 base: fix unmarshalling attributes, add JSON 2017-08-26 14:56:31 +01:00
Richard Townsend
e68361c162 Genericize for ensemble use 2017-08-08 12:37:57 +01:00
Richard Townsend
a90ef09781 Remove excessive logging 2017-08-08 12:29:00 +01:00
Richard Townsend
d23619eac2 OK, but with a lot of extra printing 2017-08-07 17:26:11 +01:00
meirwahnon
674de9cae3 change Probability order 2017-07-17 16:01:49 +03:00
meirwahnon
518c0d84c4 extren fields of ClassProba 2017-07-17 15:35:35 +03:00
meirwahnon
2b478a0513 fix to float precise 2017-07-17 15:01:08 +03:00
meirwahnon
f56fce1a43 support PredictProba 2017-07-17 14:48:38 +03:00
Ryan Schmukler
cf6192c81c fix(id3): fix panic on SplitAttribute being nil 2016-06-28 14:36:48 -04:00
Richard Townsend
7ba57fe6df trees: Handling FloatAttributes.
This patch adds:

	* Gini index and information gain ratio as
           DecisionTree split options;
	* handling for numeric Attributes (split point
           chosen naïvely on the basis of maximum entropy);
	* A couple of additional utility functions in base/
	* A new dataset (see sources.txt) for testing.

Performance on Iris performs markedly without discretisation.
2014-10-26 17:40:38 +00:00
Amit Kumar Gupta
4d93b9de89 Convert remaining tests to goconvey 2014-08-23 05:22:16 +00:00
Amit Kumar Gupta
1809a8b358 RandomForest returns error when fitting data with fewer features than the RandomForest plans to use
- BaseClassifier Predict and Fit methods return errors
- go fmt ./...

Conflicts:
	ensemble/randomforest.go
	ensemble/randomforest_test.go
	trees/tree_test.go
2014-08-22 13:39:29 +00:00
Amit Kumar Gupta
529b3bcaa5 Avoid renaming packages on import 2014-08-22 13:39:29 +00:00
Amit Kumar Gupta
947ee8380e Return error instead of panicking when unable to get confusion matrix 2014-08-22 13:39:29 +00:00
Amit Kumar Gupta
14aad31821 Consistently use (t *testing.T) instead of T or testEnv 2014-08-22 08:44:41 +00:00
Amit Kumar Gupta
695aec6eb6 Favor idiomatic t.Fatalf over panic for test failures 2014-08-22 08:07:55 +00:00
Amit Kumar Gupta
45545d6ebd Remove Println's from automated test suite since they aren't assertions 2014-08-22 07:58:01 +00:00
Amit Kumar Gupta
21bb2fc9fa Remove redundant import renames 2014-08-22 07:21:24 +00:00
Richard Townsend
f9c1e24e5b neural: stop-gap support for neural networks 2014-08-09 19:27:20 +01:00
Richard Townsend
47341b2869 base: Cleaned up duplicate Attribute resolution functions 2014-08-03 15:17:20 +01:00
Richard Townsend
c2d040af30 trees: merge from v2-instances 2014-08-03 15:17:13 +01:00
albrow
132e3f4527 Create a new default logger and change some print statements to use the logger instead of fmt.Println. 2014-07-20 15:26:13 -04:00
Niclas Jern
627a5537d3 Comments should be of the form "<Struct> ..." or "<MethodName> ..." 2014-07-18 13:48:28 +03:00
Niclas Jern
32f36f28c3 if block ends with a return statement -> drop this else and outdent its block 2014-07-18 13:20:46 +03:00
Remo Hertig
f77c1dcde0 use multiple return values instead of an array in InstancesTrainTestSplit 2014-06-06 21:33:17 +02:00
Richard Townsend
a6072ac9de Package documentation 2014-05-19 12:59:11 +01:00
Richard Townsend
889fec4419 Examples for RandomForest, ID3 and Random trees 2014-05-19 12:42:03 +01:00
Richard Townsend
45ca6063f1 Not sure if this bagging version is better or not
More more similar to "Attribute bagging:improving accuracy of classifier ensembles by using random feature subsets" (Brill)
2014-05-18 11:49:35 +01:00
Richard Townsend
26660e1470 Corrected a problem with pruning, actual ID3 decision tree type
Going to modify Bagging to select attributes on its own
2014-05-17 21:45:26 +01:00
Richard Townsend
12ace9def5 Identified source of the low accuracy 2014-05-17 20:37:19 +01:00
Richard Townsend
13c0dc3eba Reduced-error pruning 2014-05-17 18:06:01 +01:00
Richard Townsend
c516907b13 Passes all the tests 2014-05-17 17:35:10 +01:00
Richard Townsend
db3ac3c695 ID3 algorithm working 2014-05-17 17:28:51 +01:00
Richard Townsend
cf165695c8 ChiMerge seems to improve accuracy 2014-05-17 16:20:56 +01:00
Richard Townsend
fdb67a4355 Initial work on decision trees
Random Forest has occasional disastrous accuracy:
	 never seen that happen in WEKA
2014-05-14 14:00:22 +01:00
Stephen Whitworth
1ade0afca6 Refactored KNN to implement the estimator interface 2014-05-05 22:41:55 +01:00