GATE Plugin Format Bdoc

This plugin adds support for loading and saving GATE documents represented as “Bdoc” (BasicDocument) instances. This representation tries to be as simple as possible while still representing everything that can be represented in a GATE SimpleDocument instance. The representation is also almost identical to how Document instances are represented in the Python gatenlp package and thus ideal for exchanging GATE documents between Java GATE and Python gatenlp. The representation is also aware of the differences between Python and Java of how Unicode strings are represented and allows to convert annotation offsets between the two.

This plugin allows to save and load GATE Documents represented as BasicDocument instances in the following serialization formats (see section Formats below for details):

See BasicDocument for a description of the “Basic Document” representation of a GATE document.

In addition it can load and process gatenlp ChangeLog instances (data that represents changes to be made to a GATE document).

Saving and loading work exactly as for the default GATE XML format with the following exceptions:

Maven Coordinates for the plugin:

Formats

The following formats are supported for loading and saving (all formats are supported by the Python gatenlp package):

JSON

JSON, Gzip compressed

YAML

YAML, Gzip compressed

MessagePack

ResourceHelper API

The ResourceHelper API allows for a run-time only programmatic use of some of the plugin features without the need to make the using code depend on the plugin. Instead the generic GATE ResourceHelper Interface is used to invoke plugin functionality.

The call method takes the following parameters:

The method returns some Object or null.

The following action values can be used, the expected parameters and their types are listed in parentheses:

Speed and Size comparison with GATE XML and FastInfoset formats

Single Threaded

Format load save size
xml 0.009 0.022 0.458
finf 0.008 0.028 0.309
bdocjs 0.008 0.097 0.173
bdocjs.gz 0.007 0.104 0.153
bdocmp 0.008 0.065 0.065
bdocym 0.008 0.075 0.169
bdocym.gz 0.007 0.092 0.152

JavaDocs

See https://javadoc.io/doc/uk.ac.gate.plugins/format-bdoc/latest/index.html