ChangeLog

The ChangeLog can be used to record all changes made to a document. This is primarily used for the Java GATE “Python” plugin (http://gatenlp.github.io/gateplugin-Python/): this plugin sends a document from Java to Python by converting it to the Python representation where it is then modified and the modifications are recorded in a ChangeLog instance. The ChangeLog instance is sent back to Java and the changes are applied to the original Java GATE document.

This can also be used in other sitations e.g. when the same document is annotated in parallel by several separate processes (e.g. different kinds of annotations or different parts of the document getting annotated): each process can then send the changelog to the process responsible to apply it to a single document.

Changes to a document are only recorded if a ChangeLog instance is set for the document, this does not happen by default.

from gatenlp import Document, ChangeLog
from IPython.display import display
# Create a two documents from some text
text = "This is some text"
doc1 = Document(text)
doc2 = Document(text)

# Create a ChangeLog instance
chlog = ChangeLog()

print("Empty ChangeLog:", chlog)

# Make doc1 use a ChangeLog
doc1.changelog = chlog

Empty ChangeLog: ChangeLog([])
# Create a few annotations and features

doc1.features["feature1"] = "some value"
defset = doc1.annset()
ann1 = defset.add(0,4,"SomeType", features={"annfeature1": 1, "annfeature2": 2})
ann2 = defset.add(5,7,"SomeType")
ann3 = defset.add(8,12,"SomeType")
# remove the second annotation
#defset.remove(ann2)
# add a feature to the first annotation
ann1.features["annfeature3"] = 3
# update a feature
ann1.features["annfeature1"] = 11

# show the ChangeLog
print(chlog)
ChangeLog([{'command': 'doc-feature:set', 'feature': 'feature1', 'value': 'some value'},{'command': 'annotations:add', 'set': ''},{'command': 'annotation:add', 'set': '', 'start': 0, 'end': 4, 'type': 'SomeType', 'features': {'annfeature1': 1, 'annfeature2': 2}, 'id': 0},{'command': 'annotation:add', 'set': '', 'start': 5, 'end': 7, 'type': 'SomeType', 'features': {}, 'id': 1},{'command': 'annotation:add', 'set': '', 'start': 8, 'end': 12, 'type': 'SomeType', 'features': {}, 'id': 2},{'command': 'ann-feature:set', 'type': 'annotation', 'set': '', 'id': 0, 'feature': 'annfeature3', 'value': 3},{'command': 'ann-feature:set', 'type': 'annotation', 'set': '', 'id': 0, 'feature': 'annfeature1', 'value': 11}])
# The changelog really just contains a list of dictionaries, each describing some action
# print the list of actions a bit more nicely
for action in chlog.changes:
    print(action)
{'command': 'doc-feature:set', 'feature': 'feature1', 'value': 'some value'}
{'command': 'annotations:add', 'set': ''}
{'command': 'annotation:add', 'set': '', 'start': 0, 'end': 4, 'type': 'SomeType', 'features': {'annfeature1': 1, 'annfeature2': 2}, 'id': 0}
{'command': 'annotation:add', 'set': '', 'start': 5, 'end': 7, 'type': 'SomeType', 'features': {}, 'id': 1}
{'command': 'annotation:add', 'set': '', 'start': 8, 'end': 12, 'type': 'SomeType', 'features': {}, 'id': 2}
{'command': 'ann-feature:set', 'type': 'annotation', 'set': '', 'id': 0, 'feature': 'annfeature3', 'value': 3}
{'command': 'ann-feature:set', 'type': 'annotation', 'set': '', 'id': 0, 'feature': 'annfeature1', 'value': 11}
# Show doc1 and doc2 

print("doc1")
display(doc1)

print("doc2")
display(doc2)
doc1
doc2
# Apply the changelog to doc2 and show it: 
# the second document now has the same features and annotations as the first one
doc2.apply_changes(chlog)
display(doc2)

Notebook last updated

import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1