Stanza pipeline
If gatenlp
has been installed with the stanza extra (pip install gatenlp[stanza]
or pip install gatenlp[all]
) you can run a Stanford Stanza pipeline on a document and get the result as gatenlp
annotations.
from gatenlp import Document
from gatenlp.lib_stanza import AnnStanza
import stanza
print("Stanza version:", stanza.__version__)
Stanza version: 1.3.0
# In order to use the English pipeline with stanza, the model has to get downloaded first
stanza.download('en')
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.3.0.json: 0%| …
2022-11-09 22:01:30,835|INFO|stanza|Downloading default packages for language: en (English)...
2022-11-09 22:01:36,778|INFO|stanza|File exists: /data/johann/stanza_resources/en/default.zip.
2022-11-09 22:01:44,940|INFO|stanza|Finished downloading models and saved to /data/johann/stanza_resources.
doc = Document.load("https://gatenlp.github.io/python-gatenlp/testdocument2.txt")
doc
Annotating the document using Stanza
In order to annotate one or more documents using Stanza, first create a AnnStanza annotator object and the run the document(s) through this annotator:
stanza_annotator = AnnStanza(lang="en")
2022-11-09 22:01:45,098|INFO|stanza|Loading these models for language: en (English):
============================
| Processor | Package |
----------------------------
| tokenize | combined |
| pos | combined |
| lemma | combined |
| depparse | combined |
| sentiment | sstplus |
| constituency | wsj |
| ner | ontonotes |
============================
2022-11-09 22:01:45,121|INFO|stanza|Use device: gpu
2022-11-09 22:01:45,121|INFO|stanza|Loading: tokenize
2022-11-09 22:01:59,883|INFO|stanza|Loading: pos
2022-11-09 22:02:00,295|INFO|stanza|Loading: lemma
2022-11-09 22:02:00,528|INFO|stanza|Loading: depparse
2022-11-09 22:02:03,370|INFO|stanza|Loading: sentiment
2022-11-09 22:02:03,943|INFO|stanza|Loading: constituency
2022-11-09 22:02:04,823|INFO|stanza|Loading: ner
2022-11-09 22:02:08,313|INFO|stanza|Done loading processors!
doc = stanza_annotator(doc)
doc
Notebook last updated
import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1