Using Weka

IMPORTANT NOTE: Using Weka is currently only possible on Linux. For OS X or Windows, the steps below may work if some form of linux compatibility is available (e.g. using Cygwin on Windows), but no instructions for how to do this are provided yet.

Weka (http://www.cs.waikato.ac.nz/ml/weka/) is a collection of Machine Learning and other tools, implemented in Java. However Weka cannot be directly integrated in the LearningFramework plugin because its license (GPL) is not compatible with the license of other projects used by the plugin, and with the license of the plugin itself.

Instead, the use of Weka is possible by running everything related to Weka externally, in a separate process (and separate VM):

Choosing the WEKA algorithm for training

If the [[LF_TrainClassification]] or [[LF_TrainRegression]] PR is used to train a WEKA model from the LearningFramework, then the PR parameters should be set as for other learning algorithms. In addition the algorithmParameters parameter must be set:

weka-wrapper

The weka-wrapper (https://github.com/GateNLP/weka-wrapper) software is necessary to apply a model or to automatically train a model from inside the LearningFramework.

The following steps are needed to prepare the weka-wrapper for use with the LearningFramework:

Installing weka-wrapper

Making sure the weka-wrapper commands work

From within the directory weka-wrapper, run the command ./bin/wekaWrapperApply.sh. If this shows an error message about missing parameters, weka-wrapper should be ready to use.

Telling the LearningFramework how to run weka-wrapper commands

The LearningFramework needs to be able to run the weka-wrapper commands wekaWrapperApply.sh and wekaWrapperTrain.sh in order to use Weka properly. For this the LearningFramework needs to know the location of where weka-wrapper is installed, i.e. the path of the directory (called weka-wrapper by default) which was created when the zip file was extracted during installation. This can be done by setting one of the following to the full path to that directory:

The setting in weka.yaml takes precedence over the java property which takes precedence of the environment variable.

If any of these are set to a relative path, then the LearningFramework will try to interpret that as relative to the data directory used.