IBM Natural Language Understanding (NLU) Annotator

The IbmNluAnnotator is an annotator that uses the IBM NLU service to annotate documents: for each document that gets processed, data is sent to a HTTP endpoint, processed there and information is sent back that is then used to annotate the document.

import os
from gatenlp import Document
from gatenlp.processing.client.ibmnlu import IbmNluAnnotator
from ibm_watson.natural_language_understanding_v1 import Features

The IBM NLU service can return various kinds of information about a text, these are called “features” in the context of the IBM service. The IbmNluAnnotator allows to configure the features to return using the Features class from the ibm-watson package, or by specifying a comma-separated list of feature names for the which_features parameter.

Currently the following features can be selected and are converted to annotations or document features:

If the lang parameter is specified, the language to use for processing is set, otherwise the language is auto-detected.

# get the API KEY and URL to use for the service
apikey = os.environ["NATURAL_LANGUAGE_UNDERSTANDING_APIKEY"]
url = os.environ["NATURAL_LANGUAGE_UNDERSTANDING_URL"]
text = """
This is just some example text. It mentions the name US President Barack Obama and the 
companies Microsoft, IBM, Google as well as a few place names like Los Angeles and New York.
It has strange characters like 💩 and emojis like 😊 and 😂.
Here we mention Barack Obama again and here we mention Charlie Chaplin.
It mentions mathematics and economy and the terms cloud computing and gene therapy.
John Smith beat Harald Schmidt at the tennis tournament.
"""
doc = Document(text)
doc
doc.annset().clear()
annt = IbmNluAnnotator(
    url=url,
    apikey=apikey,
    which_features="entities,syntax,concepts,categories,emotion,keywords,sentiment",
    debug=False
)
doc = annt(doc)
doc

Notebook last updated

import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1