IBM Natural Language Understanding (NLU) Annotator
The IbmNluAnnotator
is an annotator that uses the IBM NLU service to annotate documents: for each document that gets processed, data is sent to a HTTP endpoint, processed there and information is sent back that is then used to annotate the document.
import os
from gatenlp import Document
from gatenlp.processing.client.ibmnlu import IbmNluAnnotator
from ibm_watson.natural_language_understanding_v1 import Features
The IBM NLU service can return various kinds of information about a text, these are called “features” in the context of the IBM service. The IbmNluAnnotator
allows to configure the features to return using the Features
class from the ibm-watson
package, or by specifying a comma-separated list of feature names for the which_features
parameter.
Currently the following features can be selected and are converted to annotations or document features:
- concepts (document feature)
- emotion (document features)
- entities (annotations)
- keywords (document feature)
- sentiment (document features)
- categories (document feature)
- syntax (annotations)
If the lang
parameter is specified, the language to use for processing is set, otherwise the language is auto-detected.
# get the API KEY and URL to use for the service
apikey = os.environ["NATURAL_LANGUAGE_UNDERSTANDING_APIKEY"]
url = os.environ["NATURAL_LANGUAGE_UNDERSTANDING_URL"]
text = """
This is just some example text. It mentions the name US President Barack Obama and the
companies Microsoft, IBM, Google as well as a few place names like Los Angeles and New York.
It has strange characters like 💩 and emojis like 😊 and 😂.
Here we mention Barack Obama again and here we mention Charlie Chaplin.
It mentions mathematics and economy and the terms cloud computing and gene therapy.
John Smith beat Harald Schmidt at the tennis tournament.
"""
doc = Document(text)
doc
doc.annset().clear()
annt = IbmNluAnnotator(
url=url,
apikey=apikey,
which_features="entities,syntax,concepts,categories,emotion,keywords,sentiment",
debug=False
)
doc = annt(doc)
doc
Notebook last updated
import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1