@CreoleResource(name="GATE Document", interfaceName="gate.Document", comment="GATE transient document.", icon="document", helpURL="http://gate.ac.uk/userguide/sec:developer:documents") public class DocumentImpl extends AbstractLanguageResource implements TextualDocument, CreoleListener, DatastoreListener
The DocumentImpl class implements the Document interface. The DocumentContentImpl class models the textual or audio-visual materials which are the source and content of Documents. The AnnotationSetImpl class supplies annotations on Documents.
Abbreviations:
We add an edit method to each of these classes; for DC and AS the methods are package private; D has the public method.
void edit(Long start, Long end, DocumentContent replacement) throws InvalidOffsetException;
D receives edit requests and forwards them to DC and AS. On DC, this method makes a change to the content - e.g. replacing a String range from start to end with replacement. (Deletions are catered for by having replacement = null.) D then calls AS.edit on each of its annotation sets.
On AS, edit calls replacement.size() (i.e. DC.size()) to figure out how long the replacement is (0 for null). It then considers annotations that terminate (start or end) in the altered or deleted range as invalid; annotations that terminate after the range have their offsets adjusted. I.e.:
A note re. AS and annotations: annotations no longer have offsets as in the old model, they now have nodes, and nodes have offsets.
To implement AS.edit, we have several indices:
HashMap annotsByStartNode, annotsByEndNode;which map node ids to annotations;
RBTreeMap nodesByOffset;which maps offset to Nodes.
When we get an edit request, we traverse that part of the nodesByOffset tree representing the altered or deleted range of the DC. For each node found, we delete any annotations that terminate on the node, and then delete the node itself. We then traverse the rest of the tree, changing the offset on all remaining nodes by:
newOffset = oldOffset - ( (end - start) - // size of mod ( (replacement == null) ? 0 : replacement.size() ) // size of repl );Note that we use the same convention as e.g. java.lang.String: start offsets are inclusive; end offsets are exclusive. I.e. for string "abcd" range 1-3 = "bc". Examples, for a node with offset 4:
edit(1, 3, "BC"); newOffset = 4 - ( (3 - 1) - 2 ) = 4 edit(1, 3, null); newOffset = 4 - ( (3 - 1) - 0 ) = 2 edit(1, 3, "BBCC"); newOffset = 4 - ( (3 - 1) - 4 ) = 6
Modifier and Type | Field and Description |
---|---|
protected DocumentContent |
content
The content of the document
|
protected AnnotationSet |
defaultAnnots
The default annotation set
|
protected String |
encoding
The encoding of the source of the document content
|
protected Boolean |
markupAware
Is the document markup-aware?
|
protected String |
mimeType
The document's MIME type.
|
protected Map<String,AnnotationSet> |
namedAnnotSets
Named sets of annotations
|
protected int |
nextAnnotationId
The id of the next new annotation
|
protected int |
nextNodeId
The id of the next new node
|
protected URL |
sourceUrl
The source URL
|
protected Long |
sourceUrlEndOffset
The end of the range that the content comes from at the source URL (or null
if none).
|
protected Long |
sourceUrlStartOffset
The start of the range that the content comes from at the source URL (or
null if none).
|
dataStore, lrPersistentId
name
features
DOCUMENT_ENCODING_PARAMETER_NAME, DOCUMENT_END_OFFSET_PARAMETER_NAME, DOCUMENT_MARKUP_AWARE_PARAMETER_NAME, DOCUMENT_MIME_TYPE_PARAMETER_NAME, DOCUMENT_PRESERVE_CONTENT_PARAMETER_NAME, DOCUMENT_REPOSITIONING_PARAMETER_NAME, DOCUMENT_START_OFFSET_PARAMETER_NAME, DOCUMENT_STRING_CONTENT_PARAMETER_NAME, DOCUMENT_TYPE_PARAMETER_NAME
DOCUMENT_URL_PARAMETER_NAME
Constructor and Description |
---|
DocumentImpl()
Default construction.
|
Modifier and Type | Method and Description |
---|---|
void |
addDocumentListener(DocumentListener l)
Adds a
DocumentListener to this document. |
void |
cleanup()
Clear all the data members of the object.
|
int |
compareTo(Object o)
Ordering based on URL.toString() and the URL offsets (if any)
|
void |
datastoreClosed(CreoleEvent e)
Called when a
DataStore has been closed |
void |
datastoreCreated(CreoleEvent e)
Called when a
DataStore has been created |
void |
datastoreOpened(CreoleEvent e)
Called when a
DataStore has been opened |
void |
edit(Long start,
Long end,
DocumentContent replacement)
Propagate edit changes to the document content and annotations.
|
protected void |
fireAnnotationSetAdded(DocumentEvent e) |
protected void |
fireAnnotationSetRemoved(DocumentEvent e) |
protected void |
fireContentEdited(DocumentEvent e) |
AnnotationSet |
getAnnotations()
Get the default set of annotations.
|
AnnotationSet |
getAnnotations(String name)
Get a named set of annotations.
|
Set<String> |
getAnnotationSetNames() |
Boolean |
getCollectRepositioningInfo()
Get the collectiong and preserving of repositioning information for the
Document.
|
DocumentContent |
getContent()
The content of the document: a String for text; MPEG for video; etc.
|
String |
getEncoding()
Get the encoding of the document content source
|
FeatureMap |
getFeatures()
Cover unpredictable Features creation
|
Boolean |
getMarkupAware()
Get the markup awareness status of the Document.
|
String |
getMimeType()
Get the specific MIME type for this document, if set
|
Map<String,AnnotationSet> |
getNamedAnnotationSets()
Returns a map (possibly empty) with the named annotation sets.
|
Integer |
getNextAnnotationId()
Generate and return the next annotation ID
|
Integer |
getNextNodeId()
Generate and return the next node ID
|
protected String |
getOrderingString()
Utility method to produce a string for comparison in ordering.
|
Boolean |
getPreserveOriginalContent()
Get the preserving of content status of the Document.
|
URL |
getSourceUrl()
Documents are identified by URLs
|
Long |
getSourceUrlEndOffset()
Documents may be packed within files; in this case an optional pair of
offsets refer to the location of the document.
|
Long[] |
getSourceUrlOffsets()
Documents may be packed within files; in this case an optional pair of
offsets refer to the location of the document.
|
Long |
getSourceUrlStartOffset()
Documents may be packed within files; in this case an optional pair of
offsets refer to the location of the document.
|
String |
getStringContent()
The stringContent of a document is a property of the document that will be
set when the user wants to create the document from a string, as opposed to
from a URL.
|
Resource |
init()
Initialise this resource, and return it.
|
boolean |
isValidOffset(Long offset)
Check that an offset is valid, i.e.
|
boolean |
isValidOffsetRange(Long start,
Long end)
Check that both start and end are valid offsets and that they constitute a
valid offset range, i.e.
|
Integer |
peakAtNextAnnotationId()
look at the next annotation ID without incrementing it
|
void |
removeAnnotationSet(String name)
Removes one of the named annotation sets.
|
void |
removeDocumentListener(DocumentListener l)
Removes one of the previously registered document listeners.
|
void |
resourceAdopted(DatastoreEvent evt)
Called by a datastore when a new resource has been adopted
|
void |
resourceDeleted(DatastoreEvent evt)
Called by a datastore when a resource has been deleted
|
void |
resourceLoaded(CreoleEvent e)
Called when a new
Resource has been loaded into the system |
void |
resourceRenamed(Resource resource,
String oldName,
String newName)
Called when the creole register has renamed a resource.1
|
void |
resourceUnloaded(CreoleEvent e)
Called when a
Resource has been removed from the system |
void |
resourceWritten(DatastoreEvent evt)
Called by a datastore when a resource has been wrote into the datastore
|
void |
setCollectRepositioningInfo(Boolean b)
Allow/disallow collecting of repositioning information.
|
void |
setContent(DocumentContent content)
Set method for the document content
|
void |
setDataStore(DataStore dataStore)
Set the data store that this LR lives in.
|
void |
setDefaultAnnotations(AnnotationSet defaultAnnotations)
This method added by Shafirin Andrey, to allow access to protected member
defaultAnnots Required for JAPE-Debugger. |
void |
setEncoding(String encoding)
Set the encoding of the document content source
|
void |
setLRPersistenceId(Object lrID)
Sets the persistence id of this LR.
|
void |
setMarkupAware(Boolean newMarkupAware)
Make the document markup-aware.
|
void |
setMimeType(String newMimeType)
Set the specific MIME type for this document
|
void |
setNextAnnotationId(int aNextAnnotationId)
Sets the nextAnnotationId
|
void |
setPreserveOriginalContent(Boolean b)
Allow/disallow preserving of the original document content.
|
void |
setSourceUrl(URL sourceUrl)
Set method for the document's URL
|
void |
setSourceUrlEndOffset(Long sourceUrlEndOffset)
Documents may be packed within files; in this case an optional pair of
offsets refer to the location of the document.
|
void |
setSourceUrlStartOffset(Long sourceUrlStartOffset)
Documents may be packed within files; in this case an optional pair of
offsets refer to the location of the document.
|
void |
setStringContent(String stringContent)
The stringContent of a document is a property of the document that will be
set when the user wants to create the document from a string, as opposed to
from a URL.
|
String |
toString()
String respresentation
|
String |
toXml()
Returns a GateXml document that is a custom XML format for wich there is a
reader inside GATE called gate.xml.GateFormatXmlHandler.
|
String |
toXml(Set<Annotation> aSourceAnnotationSet)
Returns an XML document aming to preserve the original markups( the
original markup will be in the same place and format as it was before
processing the document) and include (if possible) the annotations
specified in the aSourceAnnotationSet.
|
String |
toXml(Set<Annotation> aSourceAnnotationSet,
boolean includeFeatures)
Returns an XML document aming to preserve the original markups( the
original markup will be in the same place and format as it was before
processing the document) and include (if possible) the annotations
specified in the aSourceAnnotationSet.
|
getDataStore, getLRPersistenceId, getParent, isModified, setParent, sync
checkParameterValues, flushBeanInfoCache, forgetBeanInfo, getBeanInfo, getInitParameterValues, getInitParameterValues, getName, getParameterValue, getParameterValue, getParameterValues, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners
setFeatures
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getDataStore, getLRPersistenceId, getParent, isModified, setParent, sync
getParameterValue, setParameterValue, setParameterValues
setFeatures
getName, setName
protected int nextAnnotationId
protected int nextNodeId
protected URL sourceUrl
protected String mimeType
protected DocumentContent content
protected String encoding
protected Long sourceUrlStartOffset
protected Long sourceUrlEndOffset
protected AnnotationSet defaultAnnots
protected Map<String,AnnotationSet> namedAnnotSets
protected Boolean markupAware
public FeatureMap getFeatures()
getFeatures
in interface FeatureBearer
getFeatures
in class AbstractFeatureBearer
public Resource init() throws ResourceInstantiationException
init
in interface Resource
init
in class AbstractResource
ResourceInstantiationException
public void cleanup()
cleanup
in interface Resource
cleanup
in class AbstractLanguageResource
public String getMimeType()
@Optional @CreoleParameter(comment="MIME type of the document. If unspecified it will be inferred from the file extension, etc.") public void setMimeType(String newMimeType)
public URL getSourceUrl()
getSourceUrl
in interface SimpleDocument
@CreoleParameter(disjunction="source", priority=1, comment="Source URL", suffixes="txt;text;xml;xhtm;xhtml;html;htm;sgml;sgm;mail;email;eml;rtf;pdf;doc;ppt;pptx;docx;xls;xlsx;ods;odt;odp;iob;conll") public void setSourceUrl(URL sourceUrl)
setSourceUrl
in interface SimpleDocument
public Long[] getSourceUrlOffsets()
getSourceUrlOffsets
in interface Document
@CreoleParameter(comment="Should the document preserve the original content?", defaultValue="false") public void setPreserveOriginalContent(Boolean b)
setPreserveOriginalContent
in interface Document
public Boolean getPreserveOriginalContent()
getPreserveOriginalContent
in interface Document
@CreoleParameter(defaultValue="false", comment="Should the document collect repositioning information") public void setCollectRepositioningInfo(Boolean b)
setCollectRepositioningInfo
in interface Document
public Boolean getCollectRepositioningInfo()
getCollectRepositioningInfo
in interface Document
public Long getSourceUrlStartOffset()
getSourceUrlStartOffset
in interface Document
@Optional @CreoleParameter(comment="Start offset for documents based on ranges") public void setSourceUrlStartOffset(Long sourceUrlStartOffset)
setSourceUrlStartOffset
in interface Document
public Long getSourceUrlEndOffset()
getSourceUrlEndOffset
in interface Document
@Optional @CreoleParameter(comment="End offset for documents based on ranges") public void setSourceUrlEndOffset(Long sourceUrlEndOffset)
setSourceUrlEndOffset
in interface Document
public DocumentContent getContent()
getContent
in interface SimpleDocument
public void setContent(DocumentContent content)
setContent
in interface SimpleDocument
public String getEncoding()
getEncoding
in interface TextualDocument
@Optional @CreoleParameter(comment="Encoding", defaultValue="UTF-8") public void setEncoding(String encoding)
public AnnotationSet getAnnotations()
getAnnotations
in interface SimpleDocument
public AnnotationSet getAnnotations(String name)
getAnnotations
in interface SimpleDocument
@CreoleParameter(defaultValue="true", comment="Should the document read the original markup?") public void setMarkupAware(Boolean newMarkupAware)
setMarkupAware
in interface Document
newMarkupAware
- markup awareness status.public Boolean getMarkupAware()
getMarkupAware
in interface Document
public String toXml(Set<Annotation> aSourceAnnotationSet)
public String toXml(Set<Annotation> aSourceAnnotationSet, boolean includeFeatures)
toXml
in interface Document
aSourceAnnotationSet
- is an annotation set containing all the annotations that will be
combined with the original marup set. If the param is
null
it will only dump the original markups.includeFeatures
- is a boolean that controls whether the annotation features should
be included or not. If false, only the annotation type is included
in the tag.public String toXml()
DocumentStaxUtils.toXml(gate.Document)
methodtoXml
in interface Document
DocumentStaxUtils
public Map<String,AnnotationSet> getNamedAnnotationSets()
null
if no named annotaton set exists.getNamedAnnotationSets
in interface Document
public Set<String> getAnnotationSetNames()
getAnnotationSetNames
in interface SimpleDocument
public void removeAnnotationSet(String name)
removeAnnotationSet
in interface SimpleDocument
name
- the name of the annotation set to be removedpublic void edit(Long start, Long end, DocumentContent replacement) throws InvalidOffsetException
edit
in interface Document
InvalidOffsetException
public boolean isValidOffset(Long offset)
public boolean isValidOffsetRange(Long start, Long end)
public void setNextAnnotationId(int aNextAnnotationId)
public Integer getNextAnnotationId()
public Integer peakAtNextAnnotationId()
public Integer getNextNodeId()
public int compareTo(Object o) throws ClassCastException
compareTo
in interface Comparable<Object>
ClassCastException
protected String getOrderingString()
public String getStringContent()
@CreoleParameter(disjunction="source", priority=2, comment="The content of the document") public void setStringContent(String stringContent)
public String toString()
toString
in class AbstractResource
public void removeDocumentListener(DocumentListener l)
Document
removeDocumentListener
in interface Document
public void addDocumentListener(DocumentListener l)
Document
DocumentListener
to this document.
All the registered listeners will be notified of changes occured to the
document.addDocumentListener
in interface Document
protected void fireAnnotationSetAdded(DocumentEvent e)
protected void fireAnnotationSetRemoved(DocumentEvent e)
protected void fireContentEdited(DocumentEvent e)
public void resourceLoaded(CreoleEvent e)
CreoleListener
Resource
has been loaded into the systemresourceLoaded
in interface CreoleListener
public void resourceUnloaded(CreoleEvent e)
CreoleListener
Resource
has been removed from the systemresourceUnloaded
in interface CreoleListener
public void datastoreOpened(CreoleEvent e)
CreoleListener
DataStore
has been openeddatastoreOpened
in interface CreoleListener
public void datastoreCreated(CreoleEvent e)
CreoleListener
DataStore
has been createddatastoreCreated
in interface CreoleListener
public void resourceRenamed(Resource resource, String oldName, String newName)
CreoleListener
resourceRenamed
in interface CreoleListener
public void datastoreClosed(CreoleEvent e)
CreoleListener
DataStore
has been closeddatastoreClosed
in interface CreoleListener
public void setLRPersistenceId(Object lrID)
AbstractLanguageResource
setLRPersistenceId
in interface LanguageResource
setLRPersistenceId
in class AbstractLanguageResource
public void resourceAdopted(DatastoreEvent evt)
DatastoreListener
resourceAdopted
in interface DatastoreListener
public void resourceDeleted(DatastoreEvent evt)
DatastoreListener
resourceDeleted
in interface DatastoreListener
public void resourceWritten(DatastoreEvent evt)
DatastoreListener
resourceWritten
in interface DatastoreListener
public void setDataStore(DataStore dataStore) throws PersistenceException
AbstractLanguageResource
setDataStore
in interface LanguageResource
setDataStore
in class AbstractLanguageResource
PersistenceException
public void setDefaultAnnotations(AnnotationSet defaultAnnotations)
defaultAnnots
Required for JAPE-Debugger.Copyright © 2024 GATE. All rights reserved.