Format bdocjs/bdocjsgz

The “Basic Document” or “Bdoc” representation is a simple way to represent GATE documents, features, annotation sets and annotations through basic datatypes like strings, integers, maps and arrays so that the same representation can be easily used from several programming languages. The representation is limited to the following data types: string, integer, float, boolean, array/list, map (basically what is supported by basic JSON).

The bdocjs file format is a JSON serialization of that bdoc representation of a document as a map/dictionary usually stored as a file with the “.bdocjs” extension. The bdocjsgz file format is simply a gzip-compressed bdocjs file usually stored as a file with the .bdocjs.gz extension.

API

The abstract BdocDocument representation

A document is map with the following keys. All keys are optional!

The document text must be able to represent any Unicode text and different serialization methods may use different ways of how to encode the text.

Features are represented as a map:

An Annotation set is represented as a map with the following keys:

Annotations are represented as a map with the following keys:

Examples

Here is a simple examle document serialized as JSON (bdocjs):

{
   "offset_type" : "p",
   "name" : "",
   "features" : {
      "feat1" : "value1"
   },
   "annotation_sets" : {
      "" : {
         "annotations" : [
            {
               "end" : 2,
               "id" : 0,
               "features" : {
                  "a" : 1,
                  "b" : true,
                  "c" : "some string"
               },
               "start" : 0,
               "type" : "Type1"
            }
         ],
         "name" : "",
         "next_annid" : 1
      },
      "Set2" : {
         "annotations" : [
            {
               "id" : 0,
               "start" : 2,
               "features" : {},
               "type" : "Type2",
               "end" : 8
            }
         ],
         "next_annid" : 1,
         "name" : "Set2"
      }
   },
   "text" : "A simple document"
}