public class DocumentStaxUtils extends Object
Modifier and Type | Field and Description |
---|---|
static String |
GATE_XML_VERSION |
static char |
INVALID_CHARACTER_REPLACEMENT
The char used to replace characters in text content that are
illegal in XML.
|
static Comparator<Annotation> |
LONGEST_FIRST_OFFSET_COMPARATOR
Comparator that compares annotations based on their offsets; when
two annotations start at the same location, the longer one is
considered to come first in the ordering.
|
static int |
LT_THRESHOLD
The number of < signs after which we encode a string using CDATA
rather than writeCharacters.
|
static String |
XCES_NAMESPACE
XCES namespace URI.
|
static String |
XCES_VERSION
Version of XCES that this class can handle.
|
Constructor and Description |
---|
DocumentStaxUtils() |
Modifier and Type | Method and Description |
---|---|
static Boolean |
readAnnotationSet(XMLStreamReader xsr,
AnnotationSet annotationSet,
Map<Integer,Long> nodeIdToOffsetMap,
Set<Integer> allAnnotIds,
Boolean requireAnnotationIds)
Processes an AnnotationSet element from the given reader and fills
the given annotation set with the corresponding annotations.
|
static FeatureMap |
readFeatureMap(XMLStreamReader xsr)
Processes a GateDocumentFeatures or Annotation element to build a
feature map.
|
static void |
readGateXmlDocument(XMLStreamReader xsr,
Document doc)
Reads GATE XML format data from the given XMLStreamReader and puts
the content and annotation sets into the given Document, replacing
its current content.
|
static void |
readGateXmlDocument(XMLStreamReader xsr,
Document doc,
StatusListener statusListener)
Reads GATE XML format data from the given XMLStreamReader and puts
the content and annotation sets into the given Document, replacing
its current content.
|
static void |
readRelationSet(XMLStreamReader xsr,
RelationSet relations,
Set<Integer> allAnnotIds) |
static String |
readTextWithNodes(XMLStreamReader xsr,
Map<Integer,Long> nodeIdToOffsetMap)
Processes the TextWithNodes element from this XMLStreamReader,
returning the text content of the document.
|
static void |
readXces(InputStream is,
AnnotationSet as)
Read XML data in XCES format
from the given stream and add the corresponding annotations to the
given annotation set.
|
static void |
readXces(XMLStreamReader xsr,
AnnotationSet as)
Read XML data in XCES format
from the given reader and add the corresponding annotations to the
given annotation set.
|
static FeatureMap |
readXcesFeatureMap(XMLStreamReader xsr)
Processes a struct element to build a feature map.
|
static String |
toXml(Document doc)
Returns a string containing the specified document in GATE XML
format.
|
static void |
writeAnnotationSet(AnnotationSet annotations,
String asName,
XMLStreamWriter xsw,
String namespaceURI)
Retained for binary compatibility, new code should call the
Collection<Annotation> version instead. |
static void |
writeAnnotationSet(AnnotationSet annotations,
XMLStreamWriter xsw,
String namespaceURI)
Writes the given annotation set to an XMLStreamWriter as GATE XML
format.
|
static void |
writeAnnotationSet(Collection<Annotation> annotations,
String asName,
XMLStreamWriter xsw,
String namespaceURI)
Writes the given annotation set to an XMLStreamWriter as GATE XML
format.
|
static void |
writeDocument(Document doc,
File file)
Write the specified GATE document to a File.
|
static void |
writeDocument(Document doc,
File file,
String namespaceURI)
Write the specified GATE document to a File, optionally putting the
XML in a namespace.
|
static void |
writeDocument(Document doc,
Map<String,Collection<Annotation>> annotationSets,
XMLStreamWriter xsw,
String namespaceURI)
Write the specified GATE Document to an XMLStreamWriter.
|
static void |
writeDocument(Document doc,
OutputStream outputStream,
String namespaceURI) |
static void |
writeDocument(Document doc,
XMLStreamWriter xsw,
String namespaceURI)
Write the specified GATE Document to an XMLStreamWriter.
|
static void |
writeFeatures(FeatureMap features,
XMLStreamWriter xsw,
String namespaceURI)
Write a feature map to the given XMLStreamWriter.
|
static void |
writeRelationSet(RelationSet relations,
XMLStreamWriter xsw,
String namespaceURI) |
static void |
writeTextWithNodes(Document doc,
Collection<Collection<Annotation>> annotationSets,
XMLStreamWriter xsw,
String namespaceURI)
Writes the content of the given document to an XMLStreamWriter as a
mixed content element called "TextWithNodes".
|
static void |
writeTextWithNodes(Document doc,
XMLStreamWriter xsw,
String namespaceURI)
Write a TextWithNodes section containing nodes for all annotations
in the given document.
|
static void |
writeXcesAnnotations(Collection<Annotation> annotations,
OutputStream os,
String encoding)
Save annotations to the given output stream in XCES format, with
their IDs included as the "n" attribute of each
struct . |
static void |
writeXcesAnnotations(Collection<Annotation> annotations,
XMLStreamWriter xsw)
Save annotations to the given XMLStreamWriter in XCES format, with
their IDs included as the "n" attribute of each
struct . |
static void |
writeXcesAnnotations(Collection<Annotation> annotations,
XMLStreamWriter xsw,
boolean includeId)
Save annotations to the given XMLStreamWriter in XCES format.
|
static void |
writeXcesContent(Document doc,
OutputStream out,
String encoding)
Save the content of a document to the given output stream.
|
public static final char INVALID_CHARACTER_REPLACEMENT
public static final String GATE_XML_VERSION
public static final int LT_THRESHOLD
public static final String XCES_VERSION
public static final String XCES_NAMESPACE
public static final Comparator<Annotation> LONGEST_FIRST_OFFSET_COMPARATOR
public static void readGateXmlDocument(XMLStreamReader xsr, Document doc) throws XMLStreamException
xsr
- the source of the XML to parsedoc
- the document to updateXMLStreamException
public static void readGateXmlDocument(XMLStreamReader xsr, Document doc, StatusListener statusListener) throws XMLStreamException
xsr
- the source of the XML to parsedoc
- the document to updatestatusListener
- optional status listener to receive status
messagesXMLStreamException
public static Boolean readAnnotationSet(XMLStreamReader xsr, AnnotationSet annotationSet, Map<Integer,Long> nodeIdToOffsetMap, Set<Integer> allAnnotIds, Boolean requireAnnotationIds) throws XMLStreamException
xsr
- the readerannotationSet
- the annotation set to fill.nodeIdToOffsetMap
- a map mapping node IDs (Integer) to their
offsets in the text (Long). If null, we assume that the
node ids and offsets are the same (useful if parsing an
annotation set in isolation).allAnnotIds
- a set to contain all annotation IDs specified in
the annotation set. It should initially be empty and will
be updated if any of the annotations in this set specify
an ID.requireAnnotationIds
- whether annotations are required to
specify their IDs. If true, it is an error for an
annotation to omit the Id attribute. If false, it is an
error for the Id to be present. If null, we have not yet
determined what style of XML this is.requireAnnotationIds
. If the passed in
value was null, and we have since determined what it should
be, the updated value is returned.XMLStreamException
public static void readRelationSet(XMLStreamReader xsr, RelationSet relations, Set<Integer> allAnnotIds) throws XMLStreamException
XMLStreamException
public static String readTextWithNodes(XMLStreamReader xsr, Map<Integer,Long> nodeIdToOffsetMap) throws XMLStreamException
xsr
- nodeIdToOffsetMap
- XMLStreamException
public static FeatureMap readFeatureMap(XMLStreamReader xsr) throws XMLStreamException
XMLStreamException
public static void readXces(InputStream is, AnnotationSet as) throws XMLStreamException
is
- the input stream to read from, which will not be
closed before returning.as
- the annotation set to read into.XMLStreamException
public static void readXces(XMLStreamReader xsr, AnnotationSet as) throws XMLStreamException
cesAna
tag and will be left pointing to the
corresponding end tag.xsr
- the XMLStreamReader to read from.as
- the annotation set to read into.XMLStreamException
public static FeatureMap readXcesFeatureMap(XMLStreamReader xsr) throws XMLStreamException
XMLStreamException
public static String toXml(Document doc)
doc
- the documentpublic static void writeDocument(Document doc, File file) throws XMLStreamException, IOException
doc
- the document to writefile
- the file to write it toXMLStreamException
IOException
public static void writeDocument(Document doc, File file, String namespaceURI) throws XMLStreamException, IOException
doc
- the document to writefile
- the file to write it tonamespaceURI
- the namespace URI to use for the XML elements.
Must not be null, but can be the empty string if no
namespace is desired.XMLStreamException
IOException
public static void writeDocument(Document doc, OutputStream outputStream, String namespaceURI) throws XMLStreamException, IOException
XMLStreamException
IOException
public static void writeDocument(Document doc, Map<String,Collection<Annotation>> annotationSets, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
doc
- the Document to writeannotationSets
- the annotations to include. If the map
contains an entry for the key null
, this
will be treated as the default set. All other entries are
treated as named annotation sets.xsw
- the StAX XMLStreamWriter to use for outputXMLStreamException
- if an error occurs during writingpublic static void writeDocument(Document doc, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
writeDocument(Document, Map, XMLStreamWriter, String)
.XMLStreamException
public static void writeAnnotationSet(AnnotationSet annotations, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
annotations.getName
.annotations
- the annotation set to writexsw
- the writer to use for outputnamespaceURI
- XMLStreamException
public static void writeAnnotationSet(Collection<Annotation> annotations, String asName, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
asName
.annotations
- the annotation set to writeasName
- the name under which to write the annotation set.
null
means that no name will be used.xsw
- the writer to use for outputnamespaceURI
- XMLStreamException
public static void writeRelationSet(RelationSet relations, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
XMLStreamException
public static void writeAnnotationSet(AnnotationSet annotations, String asName, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
Collection<Annotation>
version instead.XMLStreamException
public static void writeTextWithNodes(Document doc, Collection<Collection<Annotation>> annotationSets, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
doc
- the document whose content is to be writtenannotationSets
- the annotations for which nodes are required.
This is a collection of collections.xsw
- the XMLStreamWriter
to write to.namespaceURI
- the namespace URI. May be empty but may not be
null.XMLStreamException
public static void writeTextWithNodes(Document doc, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
public static void writeFeatures(FeatureMap features, XMLStreamWriter xsw, String namespaceURI) throws XMLStreamException
INVALID_CHARACTER_REPLACEMENT
(a space). Feature
names are not modified - an illegal character in a feature
name will cause the serialization to fail.features
- xsw
- namespaceURI
- XMLStreamException
public static void writeXcesContent(Document doc, OutputStream out, String encoding) throws IOException
doc
- the document to saveout
- the stream to write toencoding
- the character encoding to use. If null, defaults to
UTF-8IOException
public static void writeXcesAnnotations(Collection<Annotation> annotations, OutputStream os, String encoding) throws XMLStreamException
struct
.
The stream is not closed by this method, that is left to
the caller.annotations
- the annotations to save, typically an
AnnotationSetos
- the output stream to write toencoding
- the character encoding to use.XMLStreamException
public static void writeXcesAnnotations(Collection<Annotation> annotations, XMLStreamWriter xsw) throws XMLStreamException
struct
.
The writer is not closed by this method, that is left to
the caller. This method writes just the cesAna element - the XML
declaration must be filled in by the caller if required.annotations
- the annotations to save, typically an
AnnotationSetxsw
- the XMLStreamWriter to write toXMLStreamException
public static void writeXcesAnnotations(Collection<Annotation> annotations, XMLStreamWriter xsw, boolean includeId) throws XMLStreamException
INVALID_CHARACTER_REPLACEMENT
(a space). Feature names
are not modified, nor are annotation types - an illegal character
in one of these will cause the serialization to fail.annotations
- the annotations to save, typically an
AnnotationSetxsw
- the XMLStreamWriter to write toincludeId
- should we include the annotation IDs (as the "n"
attribute on each struct
)?XMLStreamException
Copyright © 2024 GATE. All rights reserved.