Google NLP Annotator
The GoogleNlpAnnotator
is an annotator that uses the Google Natural Language AI service to annotate documents: for each document that gets processed, data is sent to a HTTP endpoint, processed there and information is sent back that is then used to annotate the document.
import os
import json
from gatenlp import Document, Span
from gatenlp.processing.client.googlenlp import GoogleNlpAnnotator
The IBM NLU service can return various kinds of information about a text, these are called “features” in the context of the Google service. The GoogleNlpAnnotator
allows to configure the features to return by specifying a comma-separated list of feature names for the which_features
parameter.
Currently the following features can be selected and are converted to annotations or document features:
- entities (annotations)
- entity_sentiment (annotation features)
- document_sentiment (document features)
- syntax (annotations)
If the lang
parameter is specified, the language to use for processing is set, otherwise the language is auto-detected.
Currently, authentication has to be provided by pointing the environment variable GOOGLE_APPLICATION_CREDENTIALS
to the json file with the proper authentication settings.
text = """
This is just some example text. It mentions the name US President Barack Obama and the
companies Microsoft, IBM, Google as well as a few place names like Los Angeles and New York.
It has strange characters like 💩 and emojis like 😊 and 😂.
Here we mention Barack Obama again and here we mention Charlie Chaplin.
It mentions mathematics and economy and the terms cloud computing and gene therapy.
John Smith beat Harald Schmidt at the tennis tournament.
"""
doc = Document(text)
doc
doc.annset().clear()
annt = GoogleNlpAnnotator(
which_features="entities,syntax,document_sentiment,entity_sentiment",
debug=False
)
doc = annt(doc)
doc
Notebook last updated
import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1