CorpusStats - GATE plugin

The CorpusStats plugin for the GATE NLP framework can be used to calculate statistics for terms and term pairs over a corpus.

Documentation Overview

For individual terms it can calculate:

For pairs of terms it can calculate, for various kinds of contexts:

The plugin offers the following processing resources (PRs):

Installation

With GATE version 8.5 or newer, the CorpusStats plugin gets installed just like most other standard GATE plugins, using the plugin manager. This is only necessary if you start with a new pipeline that requires the plugin - if you load a pipeline that already uses the plugin, it will automatically get downloaded to your computer under the hood.

In the GATE GUI:

For GATE version 8.4.x or earlier, older versions of the CorpusStats plugin can get installed manually: