Algorithm Parameters
New algorithms are frequently included in the Learning Framework, and we try to support their parameters. This section is a work in progress, but we give some information about the parameters available for you to use.
LibSVM
LibSVM can be used as a standard classifier in NER or classification mode. Probabilities are included by default, and are “real” probabilities. However, the resulting model is a little slower than if probabilities aren’t calculated. If you are happy to use classifications without information about the probability of that classification being correct, and need your learner to be faster, you can turn off probability generation using “-b” as described below.
The full range of parameters are supported, which can be specified in the “algorithmParameters” field as a space-separated flagged sequence, as described in the LibSVM documentation:
-s svm_type : set type of SVM (default 0)
0 – C-SVC
1 – nu-SVC
2 – one-class SVM
3 – epsilon-SVR
4 – nu-SVR
-t kernel_type : set type of kernel function (default 2)
0 – linear: u’*v
1 – polynomial: (gamma*u’*v + coef0)^degree
2 – radial basis function: exp(-gamma*|u-v|^2)
3 – sigmoid: tanh(gamma*u’*v + coef0)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking: whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight: set the parameter C of class i to weight*C, for C-SVC (default 1)
More on weights
Set weights for example like so: “-w0 1 -w1 2.5 -w2 2.5 -w3 4”. Assign them assuming classes are in alphabetical/natural order. Weight assignments will be fed back to the logger so you can check them. If you assign the wrong number of weights for your classes, all the weights will be rejected. It’s your responsibility to know how many classes you have in your training corpus.
Mallet CRF
No parameters are used with Mallet CRF. You will need to set the sequenceSpan to something sensible; one learning instance will be generated for each sequence annotation. For example, if you are doing NER and are using “token” as instance, then you might give a sequence span of “sentence”, because tokens fall into meaningful patterns within sentences, but not so much across sentence boundaries. Mallet CRF will then learn the material in sentence-long chunks. You need to have sentence annotations prepared on your document, as well as the tokens.
A sequence classifier is an appropriate choice in any context where your instances tend to fall into predictable sequences. It’s good for named entity detection, for example, because named entities are often predictable sequences, such as descriptions of symptoms in clinical applications, and the contexts in which they appear are also often meaningful sequences. However, it wouldn’t be so good for document classification because documents don’t tend to form a meaningful sequence.
Mallet Classification Algorithms
A number of Mallet classification algorithms are integrated, and the following parameters are available. In each case, parameters are given space separated and unflagged in the order specified:
Balanced Winnow
epsilon (double, default 0.5)
delta (double, default 0.1)
max iterations (int, default 30)
cooling rate (double, default 0.5)
C45:
max depth (int)
Decision Tree:
max depth (int)
Max Ent GE Range, Max Ent GE, Max Ent PR
These all take an array of constraints. This isn’t currently supported.
Max Ent:
gaussian prior (double, a parameter to avoid overtraining. 1.0 is the default value.)
max iterations (int. I have coded this in but it is possible that Mallet still doesn’t use it.)
MC Max Ent:
The following configurations only are supported:
gaussianPriorVariance (double, a parameter to avoid overtraining)
OR
gaussianPriorVariance (double)
useMultiConditionalTraining (boolean)
OR
hyperbolicPriorSlope (double)
hyperbolicPriorSharpness (double)
OR no arguments.
Naive Bayes, Naive Bayes EM
These don’t take any parameters.
Winnow:
a (double)
b (double)
nfact (double, optional)