AssignStatsPR Processing Resource

This PR loads a data file which has been created previously by the CorpusStatsPR processing resource and uses the loaded corpus statistics to assign features with term-specific statistics like tf*idf to annotations.

Runtime Parameters

Available Term-Statistics

The statistics which can be calculated are calculated from the loaded corpus statistics data (the statistics previously calculated using the CorpusStatsPR processing resource), from the current document, or both. The corpus from which the statistics were calculated will be called the “stats corpus” below – it can be the same corpus on which the AssignStatsPR is run or a different one.

The following statistics are supported:

Multi-Threaded Operation

his PR can be safely used in a pipeline which is run in multi-processed mode, e.g. in GCP, by duplication the PR using GATE’s duplication mechanism.