Google NLP Annotator

The GoogleNlpAnnotator is an annotator that uses the Google Natural Language AI service to annotate documents: for each document that gets processed, data is sent to a HTTP endpoint, processed there and information is sent back that is then used to annotate the document.

import os
import json
from gatenlp import Document, Span
from gatenlp.processing.client.googlenlp import GoogleNlpAnnotator

The IBM NLU service can return various kinds of information about a text, these are called “features” in the context of the Google service. The GoogleNlpAnnotator allows to configure the features to return by specifying a comma-separated list of feature names for the which_features parameter.

Currently the following features can be selected and are converted to annotations or document features:

entities (annotations)
entity_sentiment (annotation features)
document_sentiment (document features)
syntax (annotations)

If the lang parameter is specified, the language to use for processing is set, otherwise the language is auto-detected.

Currently, authentication has to be provided by pointing the environment variable GOOGLE_APPLICATION_CREDENTIALS to the json file with the proper authentication settings.

text = """
This is just some example text. It mentions the name US President Barack Obama and the 
companies Microsoft, IBM, Google as well as a few place names like Los Angeles and New York.
It has strange characters like 💩 and emojis like 😊 and 😂.
Here we mention Barack Obama again and here we mention Charlie Chaplin.
It mentions mathematics and economy and the terms cloud computing and gene therapy.
John Smith beat Harald Schmidt at the tennis tournament.
"""
doc = Document(text)
doc

doc.annset().clear()
annt = GoogleNlpAnnotator(
    which_features="entities,syntax,document_sentiment,entity_sentiment",
    debug=False
)
doc = annt(doc)
doc

Notebook last updated

import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)

NB last updated with gatenlp version 1.0.8a1