# Project configuration
The Configuration tab in the Project management page allows you to change project settings including what annotations are captured.
Project configurations can be imported and exported in the format of a JSON file.
The project can be also be cloned (have configurations copied to a new project). Note that cloning does not copy documents, annotations or annotators to the new project.
# Configuration fields
Name - The name of this annotation project.
Description - The description of this annotation project that will be shown to annotators. Supports markdown and HTML.
Annotator guideline - The description of this annotation project that will be shown to annotators. Supports markdown and HTML.
Annotations per document - The project completes when each document in this annotation project have this many number of valid annotations. When a project completes, all project annotators will be un-recruited and be allowed to annotate other projects.
Maximum proportion of documents annotated per annotator (between 0 and 1) - A single annotator cannot annotate more than this proportion of documents.
Timeout for pending annotation tasks (minutes) - Specify the number of minutes a user has to complete an annotation task (i.e. annotating a single document).
Reject documents - Switching this off will mean that annotators for this project will be unable to choose to reject documents.
Document ID field - The field in your uploaded documents that is used as a unique identifier. GATE's json format uses the name field. You can use a dot limited key path to access subfields e.g. enter features.name to get the id from the object
{'features':{'name':'nameValue'}}
Training stage enable/disable - Enable or disable training stage, allows testing documents to be uploaded to the project.
Test stage enable/disable - Enable or disable testing stage, allows test documents to be uploaded to the project.
Auto elevate to annotator - The option works in combination with the training and test stage options, see table below for the behaviour:
Training stage Testing stage Auto elevate to annotator Desciption Disabled Disabled Enabled/Disabled User allowed to annotate without manual approval. Enabled Disabled Disabled Manual approval required. Disabled Enabled Disabled " Enabled Disabled Enabled User always allowed to annotate after training phase completed Disabled Enabled Enabled User automatically allowed to annotate after passing test, if user fails test they have to be manually approved. Enabled Enabled Enabled " Test pass proportion - The proportion of correct test annotations to be automatically allowed to annotate documents.
Gold standard field - The field in document's JSON/column that contains the ideal annotation values and explanation for the annotation.
Pre-annotation - Pre-fill the form with annotation provided in the specified field. See Importing Documents with pre-annotation section for more detail.
# Annotation configuration
The annotation configuration takes a json
string for configuring how the document is displayed to the user and types
of annotation will be collected. Here's an example configuration and a preview of how it is shown to annotators:
Within the configuration, it is possible to specify how your documents will be displayed. The Document input preview box can be used to provide a sample of your document for rendering of the preview.
// Example contents for the Document input preview
{
"text": "Sometext with <strong>html</strong>"
}
The above configuration displays the value from the text
field from the document to be annotated. It then shows a set
of 3 radio inputs that allows the user to select a Negative, Neutral, or Positive sentiment with the label
name sentiment
.
All fields require the properties name and type, it is used to name our label and determine the type of input/display to be shown to the user respectively.
Another field can be added to collect more information, e.g. a text field for opinions:
Note that for the above case, the optional
field is added ensure that allows user to not have to input any value.
This optional
field can be used on all components. Any component may optionally have a field named if
, containing an expression that is used to determine whether or not the component appears based on information in the document and/or the values entered in the other components. For example the user could be presented with a set of options that includes an "other" choice, and if the annotator chooses "other" then an additional free text field appears for them to fill in. The if
option is described in more detail under the conditional components section below.
Some fields are available to configure which are specific to components, e.g. the options
field are only available for
the radio
, checkbox
and selector
components. See details below on the usage of each specific component.
The captured annotation results in a JSON dictionary, an example can be seen in the Annotation output preview box. The annotation is linked to a Document and is converted to a GATE JSON annotation format when exported.
# Displaying text
The htmldisplay
widget allows you to display the text you want annotated. It accepts almost full range of HTML
input which gives full styling flexibility.
Any field/column from the document can be inserted by surrounding a field/column name with double or triple curly brackets. Double curly brackets renders text as-is and triple curly brackets accepts HTML string:
The widget makes no assumption about your document structure and any field/column names can be used,
even sub-fields by using the dot notation e.g. parentField.childField
:
If your documents are plain text and include line breaks that need to be preserved when rendering, this can be achieved by using a special HTML wrapper which sets the white-space
CSS property (opens new window).
white-space: pre-line
preserves line breaks but collapses other whitespace down to a single space, white-space: pre-wrap
would preserve all whitespace including indentation at the start of a line, but would still wrap lines that are too long for the available space.
# Text input
# Textarea input
# Radio input
# Checkbox input
# Richer labels for radios & checkboxes
The label
of radio and checkbox inputs is normally plain text, however both input types support an htmlLabel
property as an alternative to label
, which allows for HTML tags within the option label. The htmlLabel
is rendered within a <span></span>
inside the <label>
element for the option, so it should be limited to presentational tags such as <em>
, <b>
, <tt>
, or custom CSS directives via <span style='...'>
or <div style='...'>
. To allow for breaking up a long list of radio buttons or checkboxes (with "orientation": "vertical"
) into logical sections, a set of special CSS classes are available named tw-space-above-N
and tw-space-below-N
for numbers 1 to 5. Applying one of these classes to an element in the htmlLabel
(e.g. <span class='tw-space-below-4'>Label</span>
) will add a corresponding amount of space above or below that option in the annotation UI.
# Selector input
# Optional help text
Optionally, radio buttons and checkboxes can be given help text to provide additional per-choice context or information to help annotators.
# Alternative way to provide options for radio, checkbox and selector
A dictionary (key value pairs) and also be provided to the options
field of the radio, checkbox and selector widgets
but note that the ordering of the options are not guaranteed as javascript does not sort dictionaries by
the order in which keys are added. Note that additional help texts for radio buttons and checkboxes are not supported using this syntax.
# Dynamic options for radio, checkbox and selector
All the examples above have a "static" list of available options for the radio, checkbox and selector widgets, where the complete options list is enumerated in the project configuration and every document offers the same set of options. However it is also possible to take some or all of the options from the document data rather than the configuration data. For example:
"fromDocument"
is a dot-separated property path leading to the location within each document where the additional options can be found, for example "fromDocument":"candidates"
looks for a top-level property named candidates
in each document, "fromDocument": "options.custom"
would look for a property named options
which is itself an object with a property named custom
. The target property in the document may be in any of the following forms:
- an array of objects, each with
value
andlabel
(and optionallyhelptext
) properties, exactly as in the static configuration format - this is the format used in the example above - an array of strings, where the same string will be used as both the value and the label for that option
- an arbitrary "dictionary" object mapping values to labels
- a single string, which is parsed into a list of options
The "single string" alternative is designed to be easier to use when importing documents from CSV files. It allows you to provide any number of options in a single CSV column value. Within the column the options are separated by semicolons, and each option is of the form value=label
. Whitespace around the delimiters is ignored, both between options and between the value and label of a single option. For example given CSV document data of
text | options |
---|---|
Favourite fruit | apple=Apples; orange = Oranges; kiwi=Kiwi fruit |
a {"fromDocument": "options"}
configuration would produce the equivalent of
[
{"value": "apple", "label": "Apples"},
{"value": "orange", "label": "Oranges"},
{"value": "kiwi", "label": "Kiwi fruit"}
]
If your values or labels may need to contain the default separator characters ;
or =
you can select different separators by adding extra properties to the configuration:
{"fromDocument": "options", "separator": "~~", "valueLabelSeparator": "::"}
text | options |
---|---|
Favourite fruit | apple::Apples ~~ orange::Oranges ~~ kiwi::Kiwi fruit |
The separators can be more than one character, and you can set "valueLabelSeparator":""
to disable label splitting altogether and just use the value as its own label.
# Mixing static and dynamic options
Static and fromDocument
options may be freely interspersed in any order, so you can have a fully-dynamic set of options by specifying only a fromDocument
entry with no static options, or you can have static options that are listed first followed by dynamic options, or dynamic options first followed by static, etc.
# Conditional components
By default all components listed in the project configuration will be shown for all documents. However this is not always appropriate, for example you may have some components that are only relevant to certain documents, or only relevant for particular combinations of values in other components. To allow for these kinds of scenarios any component can have a field named if
specifying the conditions under which that component should be shown.
The if
field is an expression that is able to refer to fields in both the current document being annotated and the current state of the other annotation components. The expression language is largely based on a subset of the standard JavaScript expression syntax but with a few additional syntax elements to ease working with array data and regular expressions.
The following simple example shows how you might implement an "Other (please specify)" pattern, where the user can select from a list of choices but also has the option to supply their own answer if none of the choices are appropriate. The free text field is only shown if the user selects the "other" choice.
Note that validation rules (such as optional
, minSelected
or regex
) are not applied to components that are hidden by an if
expression - hidden components will never be included in the annotation output, even if they would be considered "required" had they been visible.
Components can also be made conditional on properties of the document, or a combination of the document and the annotation values, for example
The full list of supported constructions is as follows:
- the
annotation
variable refers to the current state of the annotation components for this document- the current value of a particular component can be accessed as
annotation.componentName
orannotation['component name']
- the brackets version will always work, the dot version works if the component'sname
is a valid JavaScript identifier - if a component has not been set since the form was last cleared the value may be
null
orundefined
- the expression should be written to cope with both - the value of a
text
,textarea
,radio
orselector
component will be a single string (or null/undefined), the value of acheckbox
component will be an array of strings since more than one value may be selected. If no value is selected the array may be null, undefined or empty, the expression must be prepared to handle any of these
- the current value of a particular component can be accessed as
- the
document
variable refers to the current document that is being annotated- again properties of the document can be accessed as
document.propertyName
ordocument['property name']
- continue the same pattern for nested properties e.g.
document.scores.label1
- individual elements of array properties can be accessed by zero-based index (e.g.
document.options[0]
)
- again properties of the document can be accessed as
- various comparison operators are available:
==
and!=
(equal and not-equal)<
,<=
,>=
,>
(less-than, less-or-equal, greater-or-equal, greater-than)- these operators follow JavaScript rules, which are not always intuitive. Generally if both arguments are strings then they will be compared by lexicographic order, but if either argument is a number then the other one will also be converted to a number before comparing. So if the
score
component is set to the value "10" (a string of two digits) thenannotation.score < 5
would be false (10 is converted to number and compared to 5) butannotation.score < '5'
would be true (the string "10" sorts before the string "5")
- these operators follow JavaScript rules, which are not always intuitive. Generally if both arguments are strings then they will be compared by lexicographic order, but if either argument is a number then the other one will also be converted to a number before comparing. So if the
in
checks for the presence of an item in an array or a key in an object- e.g.
'other' in annotation.someCheckbox
checks if theother
option has been ticked in a checkbox component (whose value is an array) - this is different from normal JavaScript rules, where
i in myArray
checks for the presence of an array index rather than an array item
- e.g.
- other operators
+
(concatenate strings, or add numbers)- if either argument is a string then both sides are converted to strings and concatenated together
- otherwise both sides are treated as numbers and added
-
,*
,/
,%
(subtraction, multiplication, division and remainder)&&
,||
(boolean AND and OR)!
(prefix boolean NOT, e.g.!annotation.selected
is true ifselected
is false/null/undefined and false otherwise)- conditional operator
expr ? valueIfTrue : valueIfFalse
(exactly as in JavaScript, first evaluates the testexpr
, then either thevalueIfTrue
orvalueIfFalse
depending on the outcome of the test)
value =~ /regex/
tests whether the given string value contains any matches for the given regular expression (opens new window)- use
^
and/or$
to anchor the match to the start and/or end of the value, for exampleannotation.example =~ /^a/i
checks whether theexample
annotation value starts with "a" or "A" (the/i
flag makes the expression case-insensitive) - since the project configuration is entered as JSON, any backslash characters within the regex must be doubled to escape them from the JSON parser, i.e.
"if": "annotation.option =~ /\\s/"
would check ifoption
contains any space characters (for which the regular expression literal is/\s/
)
- use
- Quantifier expressions let you check whether
any
orall
of the items in an array or key/value pairs in an object match a predicate expression. The general form isany(x in expr, predicate)
orall(x in expr, predicate)
, whereexpr
is an expression that resolves to an array or object value,x
is a new identifier, andpredicate
is the expression to test each item against. Thepredicate
expression can refer to thex
identifierany(option in annotation.someCheckbox, option > 3)
all(e in document.scores, e.value < 0.7)
(assumingscores
is an object mapping labels to scores, e.g.{"scores": {"positive": 0.5, "negative": 0.3}}
)- when testing a predicate against an object each entry has
.key
and.value
properties giving the key and value of the current entry
- when testing a predicate against an object each entry has
- on a null, undefined or empty array/object,
any
will return false (since there are no items that pass the test) andall
will return true (since there are no items that fail the test) - the predicate is optional -
any(arrayExpression)
resolves totrue
if any item in the array has a value that JavaScript considers to be "truthy", i.e. anything other than the number 0, the empty string, null or undefined. Soany(annotation.myCheckbox)
is a convenient way to check whether at least one option has been selected in acheckbox
component.
If the if
expression for a particular component is syntactically invalid (missing operands, mis-matched brackets, etc.) then the condition will be ignored and the component will always be displayed as though it did not have an if
expression at all. Conversely, if the expression is valid but an error occurs while evaluating it, this will be treated the same as if the expression returned false
, and the associated component will not be displayed. The behaviour is this way around as the most common reason for errors during evaluation is attempting to refer to annotation components that have not yet been filled in - if this is not appropriate in your use case you must account for the possibility within your expression. For example, suppose confidence
is a radio
or selector
component with values ranging from 1 to 5, then another component that declares
"if": "annotation.confidence && annotation.confidence < 4"`
will hide this component if confidence
is unset, displaying it only if confidence
is set to a value less than 4, whereas
"if": "!annotation.confidence || annotation.confidence < 4"
will hide this component only if confidence
is actually set to a value of 4 or greater - it will show this component if confidence
is unset. Either approach may be correct depending on your project's requirements.
To assist managers in authoring project configurations with if
conditions, the "preview" mode on the project configuration page will display details of any errors that occur when parsing the expressions, or when evaluating them against the Document input preview data. You are encouraged to test your expressions thoroughly against a variety of inputs to ensure they behave as intended, before opening your project to annotators.