-
Notifications
You must be signed in to change notification settings - Fork 1
/
sample_session
300 lines (247 loc) · 12.7 KB
/
sample_session
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
# Each command in the session below is preceded by a hashed comment.
# For more detailed descriptions of individual commands, see "*readme" files.
# The user inputs are preceded by a $ sign. Immediately following the user
# inputs are the system outputs.
----------------------------------------------
# To compile everything, type
$ make mktree
$ make gendata
$ make display
----------------------------------------------
# The following is the simplest use of mktree.
# It Induces a decision tree for the dataset linear.train, and stores it.
# The decision trees induced by mktree are always written to files.
# When cross validation is used, only the tree for the first fold is
# written to a file. The default file names are <training data>.dt
# for the pruned file and <training data>.dt.unpruned
# for the unpruned file. If pruning is set off (-p0) or if it is not
# possible to prune, the unpruned tree is written to <training data>.dt.
# Note that every run of MKTREE produces a log file (default name = oc1.log)
# in which all the parameter settings used for that run are written.
$ mktree -tlinear.train
Unpruned decision tree written to linear.train.dt.
acc. on training set = 100.00 #leaves = 2 max depth = 1
----------------------------------------------
# The next command chooses 48 points randomly out of the file
# linear.train, induces a decision tree for them, and computes the
# classification accuracy of that tree on the rest of the points
# in the file. As the pruning portion is specified as -p0, no pruning is
# done. The output is in "very verbose" mode, as the -v option is specified
# more than once. The decision trees, as usual, are written to files.
$ mktree -tlinear.train -n48 -p0 -v -v
48 training examples loaded from linear.train.
32 testing examples loaded from linear.train.
Attributes = 2, Classes = 2
Restart 1: Initial Impurity = 6.484
hill climbing for coeff. 1. impurity 6.484 -> 6.484
hill climbing for coeff. 2. impurity 6.484 -> 6.484
hill climbing for coeff. 1. impurity 6.484 -> 6.484
hill climbing for coeff. 2. impurity 6.484 -> 6.484
hill climbing for coeff. 1. impurity 6.484 -> 6.484
hill climbing for coeff. 2. impurity 6.484 -> 6.484
hill climbing for coeff. 1. impurity 6.484 -> 6.484
hill climbing for coeff. 2. impurity 6.484 -> 6.484
hill climbing for coeff. 1. impurity 6.484 -> 6.484
hill climbing for coeff. 2. impurity 6.484 -> 6.484
hill climbing for coeff. 1. impurity 6.484 -> 6.484
hill climbing for coeff. 2. impurity 6.484 -> 6.261
hill climbing for coeff. 1. impurity 6.261 -> 5.359
hill climbing for coeff. 2. impurity 5.359 -> 5.299
hill climbing for coeff. 1. impurity 5.299 -> 4.485
hill climbing for coeff. 2. impurity 4.485 -> 4.480
hill climbing for coeff. 1. impurity 4.480 -> 4.480
hill climbing for coeff. 2. impurity 4.480 -> 4.480
hill climbing for coeff. 1. impurity 4.480 -> 4.480
hill climbing for coeff. 2. impurity 4.480 -> 4.480
hill climbing for coeff. 1. impurity 4.480 -> 4.480
hill climbing for coeff. 2. impurity 4.480 -> 4.480
hill climbing for coeff. 1. impurity 4.480 -> 4.480
hill climbing for coeff. 2. impurity 4.480 -> 4.480
hill climbing for coeff. 1. impurity 4.480 -> 4.480
hill climbing for coeff. 2. impurity 4.480 -> 4.480
hill climbing for coeff. 1. impurity 4.480 -> 4.480
random jump. impurity 4.480 -> 0.000
** Root: Left:[20,0] Right:[0,28]
Unpruned decision tree written to linear.train.dt.
accuracy = 87.50 #leaves = 2.00 max depth = 1.00
Category 1 : accuracy = 88.89 (16/18)
Category 2 : accuracy = 85.71 (12/14)
----------------------------------------------
# The next task is to induce a decision tree for the data
# in one file (linear.train), and estimate the classification
# accuracy of that tree on the data in a second file (linear.test).
# Both files should have the same format - one instance per line,
# attributes and class label listed in the same order for all instances,
# attributes separated by blanks, tabs or commas, and missing values
# denoted with ?.
# The output is in "verbose" mode due to one use of the -v option.
$ mktree -tlinear.train -Tlinear.test -v
80 training examples loaded from linear.train.
Attributes = 2, Classes = 2
8 randomly chosen instances kept away for pruning.
** Root: Left:[33,0] Right:[0,39]
No pruning possible.
Unpruned decision tree written to linear.train.dt.
20 testing examples loaded from linear.test.
accuracy = 95.00 #leaves = 2.00 max depth = 1.00
Category 1 : accuracy = 100.00 (12/12)
Category 2 : accuracy = 87.50 (7/8)
# The line
** Root: Left:[33,0] Right:[0,39]
# means that a hyperplane is induced at the root of the decision tree,
# and it has 33 instances of category 1 on the left,
# 0 instances of category 2 on the left,
# 0 instances of category 1 on the right and
# 39 instances of category 2 on the right.
# The "labels" of the decision tree nodes have
# the form "Root","l","r","lr" etc. "lr" is the label of the node which
# is the *r*ight child of the *l*eft child of the root. See sample.dt
# for a sample decision tree.
# A simple variant of the above command occurs when the data in the test
# file is unlabeled. In this case, we want to classify that data using the
# tree we built on linear.train.
$ mktree -tlinear.train -Tlinear.unl -u
Unpruned decision tree written to linear.train.dt.
Test instances with labels written to linear.unl.classified.
----------------------------------------------
# An obvious reason for writing the decision trees to files
# in every run is that we want to use the trees later - for classification,
# for graphical displays or for generating data.
# Later on in this file, we will see ways in which we can 1) graphically
# display decision trees and 2) generate data given a decision tree.
# The following command illustrates how to estimate the accuracy of
# a decision tree (in the file linear.train.dt) on some data
# (in the file linear.test).
$ mktree -Dlinear.train.dt -Tlinear.test
accuracy = 95.00 #leaves = 2.00 max depth = 1.00
----------------------------------------------
# There are four different modes in which MKTREE can be used to
# induce a decision tree. See mktree.readme for details. The following
# is an example of using mktree in axis parallel mode. Note that
# the axis parallel tree is inappropriate for this data, as the correct
# concept description is known to be a single oblique hyperplane (as
# was found in the first example above).
$ mktree -a -tlinear.train
Pruned decision tree written to linear.train.dt.
Unpruned decision tree written to linear.train.dt.unpruned.
acc. on training set = 97.50 #leaves = 4 max depth = 2
# Another mode in which mktree can be run is the
# CART-linear-combinations-mode (-K option).
$ mktree -K -tlinear.train -v -v
80 training examples loaded from linear.train.
Attributes = 2, Classes = 2
8 randomly chosen instances kept away for pruning.
CART hill climbing for coeff. 1. impurity 7.111 -> 7.111
CART hill climbing for coeff. 2. impurity 7.111 -> 5.950
CART hill climbing for coeff. 3. impurity 5.950 -> 4.508
CART hill climbing for coeff. 1. impurity 4.508 -> 0.000
** Root: Left:[33,0] Right:[0,39]
No pruning possible.
Unpruned decision tree written to linear.train.dt.
acc. on training set = 100.00 #leaves = 2 max depth = 1
Category 1 : accuracy = 100.00 (38/38)
Category 2 : accuracy = 100.00 (42/42)
----------------------------------------------
# Each run of Mktree can be customized in several ways by using the
# command line options (and the hash defined constants in oc1.h).
# The following illustrates a complicated use of mktree.
# The aim is to do a 5-fold cross validation (-V5) on linear.dta,
# restarting from at most 30 different random hyperplanes
# at each node (-r30), trying at most 25 random perturbations at
# each local minimum (-m25). 20% of the training data is reserved for
# pruning (-p0.2). The initial random seed is 200 (-s200). The order
# of perturbation for the hill climbing coefficient perturbation
# algorithm is set to Best-First (-b). The decision tree generated
# is written to linear.howtonameit.dt.
$ mktree -V5 -tlinear.dta -r30 -m25 -p0.2 -s200 -b -Dlinear.howtonameit.dt
Unpruned decision tree written to linear.howtonameit.dt.
fold 1: acc. = 100.00 #leaves = 2 max. depth = 1
fold 2: acc. = 95.00 #leaves = 2 max. depth = 1
fold 3: acc. = 85.00 #leaves = 2 max. depth = 1
fold 4: acc. = 100.00 #leaves = 2 max. depth = 1
fold 5: acc. = 95.00 #leaves = 2 max. depth = 1
accuracy = 95.00 #leaves = 2.00 max depth = 1.00
----------------------------------------------
# GENDATA is the second module in OC1. The main purpose of this module
# to generate data given a decision tree.
# In the following example, gendata generates 200 random, uniformly
# distributed instances in the unit square, reads-in the decision tree
# from linear.howtonameit.dt, labels the 200 points according to the
# decision tree, and writes the labeled points into the file linear.test2.
# The output is in the verbose mode. There IS no very verbose mode
# for display and gendata.
$ gendata -v -Dlinear.data.dt -n200 -olinear.test2
Decision tree read from linear.howtonameit.dt.
Number of attributes = 2, Number of classes = 2
200 instances generated.
Category 1 : 109 points
Category 2 : 91 points
Instances written to linear.test2.
# GENDATA can also generate unlabeled data (-u), or data with
# random class labels (when no decision tree is given).
----------------------------------------------
# We will now induce an axis parallel tree on the data generated
# above, and use the DISPLAY program to display that tree along
# with the oblique concept underlying the data.
# First, build an axis parallel decision tree on the set linear.test2.
# Output the decision tree to linear.ap.dt.
$ mktree -tlinear.test2 -a -Dlinear.ap.dt
Pruned decision tree written to linear.ap.dt.
Unpruned decision tree written to linear.ap.dt.unpruned.
acc. on training set = 88.50 #leaves = 3 max depth = 2
# display the axis parallel tree induced above, along with the
# data as the PostScript file linear.ap.ps.
$ display -tlinear.test2 -Dlinear.ap.dt -o linear.ap.ps
# Separately display the underlying concept for the data
# (linear.howtonameit.dt), along with the data instances, as the
# PostScript file linear.o.ps.
$ display -tlinear.test2 -Dlinear.howtonameit.dt -o linear.o.ps
# Now, the files PostScript files can be viewed, using any standard
# PostScript viewer, to contrast the relative appropriateness of
# the axis parallel and oblique trees for this data.
----------------------------------------------
# This final example illustrates the animation (-A) option for mktree.
# See mktree.readme for details.
# When a file name is specified with the -A option, mktree dumps all
# the hyperplanes considered into that file. DISPLAY can read from
# such a dump, to show-erase-and-show consecutive hyperplane locations
# as an animation. I found this facility useful to understand what goes on
# during tree induction. Hope you do too!
$ mktree -Alinear.anim -tlinear.dta -r1 -v -p0 -o
100 training examples loaded from linear.dta.
Attributes = 2, Classes = 2
All hyperplane perturbations being written to linear.anim.
Use the Display() program for animation.
Restart 1: Initial Impurity = 16.063
hill climbing for coeff. 1. impurity 16.063 -> 15.048
hill climbing for coeff. 2. impurity 15.048 -> 9.375
hill climbing for coeff. 1. impurity 9.375 -> 6.811
hill climbing for coeff. 2. impurity 6.811 -> 6.210
hill climbing for coeff. 1. impurity 6.210 -> 6.160
hill climbing for coeff. 2. impurity 6.160 -> 6.160
hill climbing for coeff. 1. impurity 6.160 -> 6.160
hill climbing for coeff. 2. impurity 6.160 -> 6.160
hill climbing for coeff. 1. impurity 6.160 -> 6.160
hill climbing for coeff. 2. impurity 6.160 -> 5.889
hill climbing for coeff. 1. impurity 5.889 -> 5.889
hill climbing for coeff. 2. impurity 5.889 -> 5.889
hill climbing for coeff. 1. impurity 5.889 -> 5.889
hill climbing for coeff. 2. impurity 5.889 -> 5.889
hill climbing for coeff. 1. impurity 5.889 -> 5.889
random jump. impurity 5.889 -> 4.333
hill climbing for coeff. 1. impurity 4.333 -> 4.163
hill climbing for coeff. 2. impurity 4.163 -> 4.163
hill climbing for coeff. 1. impurity 4.163 -> 4.163
hill climbing for coeff. 2. impurity 4.163 -> 4.163
random jump. impurity 4.163 -> 0.000
** Root: Left:[50,0] Right:[0,50]
Unpruned decision tree written to linear.dta.dt.
acc. on training set = 100.00 #leaves = 2 max depth = 1
Category 1 : accuracy = 100.00 (50/50)
Category 2 : accuracy = 100.00 (50/50)
# Now generate a PostScript file from the animation dump generated above.
$ display -tlinear.dta -Dlinear.anim -olinear.anim.ps
# View the postscript file using any of your favorite viewers. (See
# mktree.readme).
----------------------------------------------
----------------------------------------------