# Documents & Annotations
The Documents & Annotations tab in the Project management page allows the viewing and management of documents and annotations related to the project.
# Document & Annotation status
# Annotation status
Annotations can be in 1 of 5 states:
- Annotation is completed - The annotator has completed this annotation task.
- Annotation is rejected - The annotator has chosen to not annotate the document.
- Annotation is timed out - The annotation task was not completed within the time specified in the project's configuration. The task is freed and can be assigned to another annotator.
- Annotation is aborted - The annotation task was aborted due to reasons other than timing out, such as when an annotator with a pending task is removed from a project.
- Annotation is pending - The annotator has started the annotation task but has not completed it.
# Document status
Documents also display a list of its current annotation status:
- 1 - Number of completed annotations in the document.
- 1 - Number of rejected annotations in the document.
- 1 - Number of timed out annotations in the document.
- 1 - Number of aborted annotations in the document.
- 1 - Number of pending annotations in the document.
# Importing documents
Documents can be imported using the Import button. The supported file types are:
.json
- The app expects a list of documents (represented as a dictionary object) e.g.[{"id": 1, "text": "Text1"}, ...]
..jsonl
- The app expects one document (represented as a dictionary object) per line..csv
- File must have a header row. It will be internally converted to JSON format..zip
- Can contain any number of.json,.jsonl and .csv
files inside.
# Importing documents with pre-annotation
In the Project Configurations
page, it is possible to set a field in which Teamware will look for pre-annotation. If
the field is found inside the document then the annotation form will be pre-filled with data provided in the document.
The format for pre-annotation is exactly the same as the annotation output. You can see an example of generated
annotation by filling out the form in the Annotation Preview
and observing the values in
the Annotation Output Preview
.
# Importing Training and Test documents
When importing documents for the training and testing phase, Teamware expects a field/column (called gold
by default)
that contains the correct annotation response for each label and, only for training documents, an explanation.
For example, if we're expecting a multi-choice label for doing sentiment classification with a widget named sentiment
and choice of postive
, negative
and neutrual
:
[
{
"text": "What's my sentiment",
"gold": {
"sentiment": {
"value": "positive", // For this document, the correct value is postive
"explanation": "Because..." // Explanation is only given in the traiing phase and are optional in the test documents
}
}
}
]
in csv:
text | gold.sentiment.value | gold.sentiment.explanation |
---|---|---|
What's my sentiment | positive | Because... |
# Guidance on CSV column headings
It is recommended that:
- Spaces are not used in column headings, use dash (
-
), underscore (_
) or camel case (e.g. fieldName) instead. - The dot/full stop (
.
) is used to indicate hierarchical information so don't use it if that's not what's intended. Explanation on this feature is given below.
Documents imported from a CSV files are converted to JSON for use internally in Teamware, the reverse is true when converting back to CSV. To allow a CSV to represent a hierarchical structure, a dot notation is used to indicate a sub-field.
In the following example, we can see that gold
has a child field named sentiment
which then has a child field
named value
:
text | gold.sentiment.value | gold.sentiment.explanation |
---|---|---|
What's my sentiment | positive | Because... |
The above column headers will generate the following JSON:
[
{
"text": "What's my sentiment",
"gold": {
"sentiment": {
"value": "positive", // For this document, the correct value is postive
"explanation": "Because..." // Explanation is only given in the traiing phase and are optional in the test documents
}
}
}
]
# Exporting documents
Documents and annotations can be exported using the Export button. A zip file is generated containing files with 500 documents each. The option to "anonymize annotators" controls whether the individual annotators are identified with their numeric ID or by their actual username - since usernames are often personally identifiable information (e.g. an email address) the anonumous mode is recommended if you intend to share the annotation data with third parties. Note that the anonymous IDs are consistent within a single installation of Teamware, so even in anonymous mode it is still possible to determine which documents were annotated by the same person, just not who that person was.
You can choose how documents are exported:
.json
&.jsonl
- JSON or JSON Lines files can be generated in the format of:raw
- Exports the originalJSON
combined with an additional field namedannotation_sets
for storing annotations. The annotations are laid out in the same way as GATE bdocjs (opens new window) format. For example if a document has been annotated byuser1
with labels and valuestext
:Annotation text
,radio
:val3
, andcheckbox
:["val2", "val4"]
, the non-anonymous export might look like this:{ "id": 32, "text": "Document text", "text2": "Document text 2", "feature1": "Feature text", "annotation_sets":{ "user1":{ "name":"user1", "annotations":[ { "type":"Document", "start":0, "end":10, "id":0, "features":{ "text":"Annotation text", "radio":"val3", "checkbox":[ "val2", "val4" ] } } ], "next_annid":1 } }, "teamware_status": { "rejected_by": ["user2"], "timed_out": ["user3"], "aborted": [] } }
In anonymous mode the name
user1
would instead be derived from the user's opaque numeric identifier (e.g.annotator105
).The field
teamware_status
gives the usernames or anonymous IDs (depending on the "anonymize" setting) of those annotators who rejected the document, "timed out" because they did not complete their annotation in the time allowed by the project, or "aborted" for some other reason (e.g. they were removed from the project).gate
- Convert documents to GATE bdocjs (opens new window) format and export. Aname
field is added that takes theID
value from theID field
specified in the project configuration. Any top-level fields apart fromtext
,features
,offset_type
,annotation_sets
, and the ID field specified in the project config are placed in thefeatures
field, as is theteamware_status
information. Anannotation_sets
field is added for storing annotations if it doesn't already exist.For example in the case of this uploaded JSON document:
{ "id": 32, "text": "Document text", "text2": "Document text 2", "feature1": "Feature text" }
The generated output is as follows. The annotations and
teamware_status
are formatted same as theraw
output above:{ "name": 32, "text": "Document text", "features": { "text2": "Document text 2", "feature1": "Feature text", "teamware_status": {...} }, "offset_type":"p", "annotation_sets": {...} }
.csv
- The JSON documents will be flattened to csv's column based format. Annotations are added as additional columns with the header ofannotations.username.label
and the status information is in columns namedteamware_status.rejected_by
,teamware_status.timed_out
andteamware_status.aborted
.
Note: Documents that contains existing annotations (i.e. the annotation_sets
field for JSON
or annotations
for CSV
) are merged with the new sets of annotations. Be aware that if the document has a new annotation from an annotator with the same
username, the previous annotation will be overwritten. Existing annotations are also not anonymized when exporting the document.
# Deleting documents and annotations
It is possible to click on the top left of corner of documents and annotations to select it, then click on the Delete button to delete them.
TIP
Selecting a document also selects all its associated annotations.