public class DocumentJsonUtils extends Object
This class contains utility methods to output GATE documents in a JSON format which is (deliberately) close to the format used by Twitter to represent entities such as user mentions and hashtags in Tweets.
{ "text":"Text of the document", "entities":{ "Person":[ { "indices":[startOffset, endOffset], // other features here }, { ... } ], "Location":[ { "indices":[startOffset, endOffset], // other features here }, { ... } ] } }
The document is represented as a JSON object with two properties, "text" holding the text of the document and "entities" representing the annotations. The "entities" property is an object mapping each "annotation type" to an array of objects, one per annotation, that holds the annotation's start and end offsets as a property "indices" and the other features of the annotation as its remaining properties. Features are serialized using Jackson's ObjectMapper, so string-valued features become JSON strings, numeric features become JSON numbers, Boolean features become JSON booleans, and other types are serialized according to Jackson's normal rules (e.g. Map values become nested JSON objects).
The grouping of annotations into blocks is the responsibility of the
caller - annotations are supplied as a Map<String,
Collection<Annotation>>, the map keys become the property
names within the "entities" object and the corresponding values
become the annotation arrays. In particular the actual annotation
type of an annotation within one of the collections is ignored - it
is allowed to mix annotations of different types within one
collection, the name of the group of annotations in the "entities"
object comes from the map key. However some overloadings of
writeDocument
provide the option to write the annotation
type as if it were a feature, i.e. as one of the JSON properties of
the annotation object.
Constructor and Description |
---|
DocumentJsonUtils() |
Modifier and Type | Method and Description |
---|---|
static String |
toJson(Document doc,
Map<String,Collection<Annotation>> annotationsMap)
Convert a GATE document to JSON representation and return it as a
string.
|
static void |
writeDocument(Document doc,
Long start,
Long end,
Map<String,Collection<Annotation>> annotationsMap,
com.fasterxml.jackson.core.JsonGenerator json)
Write a substring of a GATE document to the specified
JsonGenerator.
|
static void |
writeDocument(Document doc,
Long start,
Long end,
Map<String,Collection<Annotation>> annotationsMap,
Map<?,?> extraFeatures,
com.fasterxml.jackson.core.JsonGenerator json)
Write a substring of a GATE document to the specified
JsonGenerator.
|
static void |
writeDocument(Document doc,
Long start,
Long end,
Map<String,Collection<Annotation>> annotationsMap,
Map<?,?> extraFeatures,
String annotationTypeProperty,
com.fasterxml.jackson.core.JsonGenerator json)
Write a substring of a GATE document to the specified
JsonGenerator.
|
static void |
writeDocument(Document doc,
Long start,
Long end,
Map<String,Collection<Annotation>> annotationsMap,
Map<?,?> extraFeatures,
String annotationTypeProperty,
String annotationIDProperty,
com.fasterxml.jackson.core.JsonGenerator json)
Write a substring of a GATE document to the specified
JsonGenerator.
|
static void |
writeDocument(Document doc,
Map<String,Collection<Annotation>> annotationsMap,
File out)
Write a GATE document to the specified File.
|
static void |
writeDocument(Document doc,
Map<String,Collection<Annotation>> annotationsMap,
com.fasterxml.jackson.core.JsonGenerator json)
Write a GATE document to the specified JsonGenerator.
|
static void |
writeDocument(Document doc,
Map<String,Collection<Annotation>> annotationsMap,
OutputStream out)
Write a GATE document to the specified OutputStream.
|
static void |
writeDocument(Document doc,
Map<String,Collection<Annotation>> annotationsMap,
Writer out)
Write a GATE document to the specified Writer.
|
public static void writeDocument(Document doc, Map<String,Collection<Annotation>> annotationsMap, OutputStream out) throws com.fasterxml.jackson.core.JsonGenerationException, IOException
doc
- the document to writeannotationsMap
- annotations to write.out
- the OutputStream
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.public static void writeDocument(Document doc, Map<String,Collection<Annotation>> annotationsMap, Writer out) throws com.fasterxml.jackson.core.JsonGenerationException, IOException
doc
- the document to writeannotationsMap
- annotations to write.out
- the Writer
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.public static void writeDocument(Document doc, Map<String,Collection<Annotation>> annotationsMap, File out) throws com.fasterxml.jackson.core.JsonGenerationException, IOException
doc
- the document to writeannotationsMap
- annotations to write.out
- the File
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.public static String toJson(Document doc, Map<String,Collection<Annotation>> annotationsMap) throws com.fasterxml.jackson.core.JsonGenerationException, IOException
doc
- the document to writeannotationsMap
- annotations to write.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.public static void writeDocument(Document doc, Map<String,Collection<Annotation>> annotationsMap, com.fasterxml.jackson.core.JsonGenerator json) throws com.fasterxml.jackson.core.JsonGenerationException, IOException
doc
- the document to writeannotationsMap
- annotations to write.json
- the JsonGenerator
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.public static void writeDocument(Document doc, Long start, Long end, Map<String,Collection<Annotation>> annotationsMap, com.fasterxml.jackson.core.JsonGenerator json) throws com.fasterxml.jackson.core.JsonGenerationException, IOException, InvalidOffsetException
doc
- the document to writestart
- the start offset of the segment to writeend
- the end offset of the segment to writeannotationsMap
- annotations to write.json
- the JsonGenerator
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.InvalidOffsetException
public static void writeDocument(Document doc, Long start, Long end, Map<String,Collection<Annotation>> annotationsMap, Map<?,?> extraFeatures, com.fasterxml.jackson.core.JsonGenerator json) throws com.fasterxml.jackson.core.JsonGenerationException, IOException, InvalidOffsetException
doc
- the document to writestart
- the start offset of the segment to writeend
- the end offset of the segment to writeannotationsMap
- annotations to write.extraFeatures
- additional properties to add to the generated
JSON. If the map includes a "text" key this will be
ignored, and if it contains a key "entities" whose value
is a map then these entities will be merged with the
generated ones derived from the annotationsMap. This would
typically be used for documents that were originally
derived from Twitter data, to re-create the original JSON.json
- the JsonGenerator
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.InvalidOffsetException
public static void writeDocument(Document doc, Long start, Long end, Map<String,Collection<Annotation>> annotationsMap, Map<?,?> extraFeatures, String annotationTypeProperty, com.fasterxml.jackson.core.JsonGenerator json) throws com.fasterxml.jackson.core.JsonGenerationException, IOException, InvalidOffsetException
doc
- the document to writestart
- the start offset of the segment to writeend
- the end offset of the segment to writeextraFeatures
- additional properties to add to the generated
JSON. If the map includes a "text" key this will be
ignored, and if it contains a key "entities" whose value
is a map then these entities will be merged with the
generated ones derived from the annotationsMap. This would
typically be used for documents that were originally
derived from Twitter data, to re-create the original JSON.annotationTypeProperty
- if non-null, the annotation type will
be written as a property under this name, as if it were an
additional feature of each annotation.json
- the JsonGenerator
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.InvalidOffsetException
public static void writeDocument(Document doc, Long start, Long end, Map<String,Collection<Annotation>> annotationsMap, Map<?,?> extraFeatures, String annotationTypeProperty, String annotationIDProperty, com.fasterxml.jackson.core.JsonGenerator json) throws com.fasterxml.jackson.core.JsonGenerationException, IOException, InvalidOffsetException
doc
- the document to writestart
- the start offset of the segment to writeend
- the end offset of the segment to writeextraFeatures
- additional properties to add to the generated
JSON. If the map includes a "text" key this will be
ignored, and if it contains a key "entities" whose value
is a map then these entities will be merged with the
generated ones derived from the annotationsMap. This would
typically be used for documents that were originally
derived from Twitter data, to re-create the original JSON.annotationTypeProperty
- if non-null, the annotation type will
be written as a property under this name, as if it were an
additional feature of each annotation.annotationIDProperty
- if non-null, the annotation ID will
be written as a property under this name, as if it were an
additional feature of each annotation.json
- the JsonGenerator
to write to.com.fasterxml.jackson.core.JsonGenerationException
- if a problem occurs while
generating the JSONIOException
- if an I/O error occurs.InvalidOffsetException
Copyright © 2024 GATE. All rights reserved.