ss8651twtw
1e1b5f11fb
Format code
2018-06-16 22:14:18 +08:00
yenck
bf907556f5
testcase
2018-06-16 22:11:59 +08:00
yenck
80bc1ac6f8
some test for C0
2018-06-16 22:11:59 +08:00
yenck
30071eb8a4
some test for C9
2018-06-16 22:11:59 +08:00
Ilya Tocar
676f69a426
trees: speed-up training
...
Avoid quadratic loop in getNumericAttributeEntropy.
We don't need to recalculate whole distribution for each split,
just move changed values. Also use array of slices instead of
map of maps of strings to avoid map overhead.
For our case I see time reductions from 100+ hours to 50 minutes.
I've added benchmark with synthetic data (iris.csv repeated 100 times)
and it also shows a nice improvement:
name old time/op new time/op delta
RandomForestFit-8 117s ± 4% 0s ± 1% -99.61% (p=0.001 n=5+10)
0 is a rounding quirk of benchstat, it should be closer to 0.5s:
name time/op
RandomForestFit-8 460ms ± 1%
2018-05-08 14:59:41 -05:00
Richard Townsend
58ae6f4d1b
trees: Try to fix premature write-after-Close issue
2018-01-28 16:35:55 +00:00
Richard Townsend
e2279995c1
Fixing all tests
2018-01-28 16:22:33 +00:00
Richard Townsend
ce78cd0406
Passes the tests
2018-01-27 18:56:01 +00:00
Richard Townsend
f722f2e59d
trees: implement serialization
2018-01-27 18:00:52 +00:00
Richard Townsend
e7fee0a2d1
Reformat, fix tests
2017-09-10 21:10:54 +01:00
Richard Townsend
fc110aab48
Fix bad import, reformat
2017-09-10 20:35:34 +01:00
Richard Townsend
aee475ca14
Fix the trees tests
2017-09-10 20:13:41 +01:00
Richard Townsend
e27215052b
ensemble: tests pass
2017-09-10 19:30:02 +01:00
Richard Townsend
768d2cd19f
meta: tests are almost passing
2017-09-10 16:59:05 +01:00
Richard Townsend
57e6054404
base: fix unmarshalling attributes, add JSON
2017-08-26 14:56:31 +01:00
Richard Townsend
e68361c162
Genericize for ensemble use
2017-08-08 12:37:57 +01:00
Richard Townsend
a90ef09781
Remove excessive logging
2017-08-08 12:29:00 +01:00
Richard Townsend
d23619eac2
OK, but with a lot of extra printing
2017-08-07 17:26:11 +01:00
meirwahnon
674de9cae3
change Probability order
2017-07-17 16:01:49 +03:00
meirwahnon
518c0d84c4
extren fields of ClassProba
2017-07-17 15:35:35 +03:00
meirwahnon
2b478a0513
fix to float precise
2017-07-17 15:01:08 +03:00
meirwahnon
f56fce1a43
support PredictProba
2017-07-17 14:48:38 +03:00
Ryan Schmukler
cf6192c81c
fix(id3): fix panic on SplitAttribute being nil
2016-06-28 14:36:48 -04:00
Richard Townsend
7ba57fe6df
trees: Handling FloatAttributes.
...
This patch adds:
* Gini index and information gain ratio as
DecisionTree split options;
* handling for numeric Attributes (split point
chosen naïvely on the basis of maximum entropy);
* A couple of additional utility functions in base/
* A new dataset (see sources.txt) for testing.
Performance on Iris performs markedly without discretisation.
2014-10-26 17:40:38 +00:00
Amit Kumar Gupta
4d93b9de89
Convert remaining tests to goconvey
2014-08-23 05:22:16 +00:00
Amit Kumar Gupta
1809a8b358
RandomForest returns error when fitting data with fewer features than the RandomForest plans to use
...
- BaseClassifier Predict and Fit methods return errors
- go fmt ./...
Conflicts:
ensemble/randomforest.go
ensemble/randomforest_test.go
trees/tree_test.go
2014-08-22 13:39:29 +00:00
Amit Kumar Gupta
529b3bcaa5
Avoid renaming packages on import
2014-08-22 13:39:29 +00:00
Amit Kumar Gupta
947ee8380e
Return error instead of panicking when unable to get confusion matrix
2014-08-22 13:39:29 +00:00
Amit Kumar Gupta
14aad31821
Consistently use (t *testing.T) instead of T or testEnv
2014-08-22 08:44:41 +00:00
Amit Kumar Gupta
695aec6eb6
Favor idiomatic t.Fatalf over panic for test failures
2014-08-22 08:07:55 +00:00
Amit Kumar Gupta
45545d6ebd
Remove Println's from automated test suite since they aren't assertions
2014-08-22 07:58:01 +00:00
Amit Kumar Gupta
21bb2fc9fa
Remove redundant import renames
2014-08-22 07:21:24 +00:00
Richard Townsend
f9c1e24e5b
neural: stop-gap support for neural networks
2014-08-09 19:27:20 +01:00
Richard Townsend
47341b2869
base: Cleaned up duplicate Attribute resolution functions
2014-08-03 15:17:20 +01:00
Richard Townsend
c2d040af30
trees: merge from v2-instances
2014-08-03 15:17:13 +01:00
albrow
132e3f4527
Create a new default logger and change some print statements to use the logger instead of fmt.Println.
2014-07-20 15:26:13 -04:00
Niclas Jern
627a5537d3
Comments should be of the form "<Struct> ..." or "<MethodName> ..."
2014-07-18 13:48:28 +03:00
Niclas Jern
32f36f28c3
if block ends with a return statement -> drop this else and outdent its block
2014-07-18 13:20:46 +03:00
Remo Hertig
f77c1dcde0
use multiple return values instead of an array in InstancesTrainTestSplit
2014-06-06 21:33:17 +02:00
Richard Townsend
a6072ac9de
Package documentation
2014-05-19 12:59:11 +01:00
Richard Townsend
889fec4419
Examples for RandomForest, ID3 and Random trees
2014-05-19 12:42:03 +01:00
Richard Townsend
45ca6063f1
Not sure if this bagging version is better or not
...
More more similar to "Attribute bagging:improving accuracy of classifier ensembles by using random feature subsets" (Brill)
2014-05-18 11:49:35 +01:00
Richard Townsend
26660e1470
Corrected a problem with pruning, actual ID3 decision tree type
...
Going to modify Bagging to select attributes on its own
2014-05-17 21:45:26 +01:00
Richard Townsend
12ace9def5
Identified source of the low accuracy
2014-05-17 20:37:19 +01:00
Richard Townsend
13c0dc3eba
Reduced-error pruning
2014-05-17 18:06:01 +01:00
Richard Townsend
c516907b13
Passes all the tests
2014-05-17 17:35:10 +01:00
Richard Townsend
db3ac3c695
ID3 algorithm working
2014-05-17 17:28:51 +01:00
Richard Townsend
cf165695c8
ChiMerge seems to improve accuracy
2014-05-17 16:20:56 +01:00
Richard Townsend
fdb67a4355
Initial work on decision trees
...
Random Forest has occasional disastrous accuracy:
never seen that happen in WEKA
2014-05-14 14:00:22 +01:00
Stephen Whitworth
1ade0afca6
Refactored KNN to implement the estimator interface
2014-05-05 22:41:55 +01:00