LF_ApplyTopicModel Processing Resource
This model uses a trained LDA topic model to assugn topic distributions to documents/instances. As for the training algorithm, a “document” for the topic model algorithm may be identical to a GATE document but can also be just the text under an instance annotation, represented by the token annotation type and optional feature.
algorithmParameters
(String, default: empty) - parameters to pass on to the LDA application algorithm. Note that this depends on the algorithm that has been used for training.dataDirectory
(file URL, default: no default) where the model is stored and where to save any additional report filesinputASName
(String, default: empty = default annotation set) Annotation set containing any instance annotations and the token annotationsinstanceType
(String, default: empty=use “Document”) - if specified, the annotation type used to cover the text that corresponds to one document for the LDA algorithm, if not specified, use the whole GATE documenttokenAnnotationType
(String, default: Token) - annotation type used to identify the document words/tokenstokenFeature
(String, default: string) - feature containing the token text for each token, if missing, use the cleaned text of the underlying document text
If the instanceType
parameter is empty, then the algorithm will check if the input annotation set contains any “Document” annotations and if yes, will use them. If there are no Document annotations, the algorithm will create one Document annotation that covers the whole GATE document and use that.
The following features are set in each of the “Document” or instanceType annotations:
- whatever is specified for parameter
featurePrefix
is put in front of the following feature names: BestTopic
: integer, index of most prominent/likely topic for this documentBestTopicProb
: float, the probability of the most likely topic for this documentTopicDist
: a list of as many float values as there are topics, representing the probabilities for each of the topics in the document.
AlgorithmParameters
Algorithm MalletLDA_CLUS_MR
The algorithm supports the following parameters for application:
-i/-iters
(int, default 10) – total number of iterations for Gibbs sampling-B/-burnin
(int, default 10) – number of iterations before first sample-T/-thinning
(int, default 0) – number of iterations between saved samples-s/0seed
(int, default 0) – random seed to use
Algorithm GensimLDA_CLUS_DR
NOT IMPLEMENTED YET