Document:

Python GateNLP | python-gatenlp
Python GateNLP
Home
Search
GitHub
PyPi
Python GateNLP
A Python package for NLP similar to the Java GATE NLP framework
Python GateNLP is an NLP and text processing framework implemented in Python.
Python GateNLP represents documents and stand-off annotations very similar to
the Java GATE framework: Annotations describe arbitrary character ranges in the text and each annotation can have an arbitrary number of features. Documents can have arbitrary features and an arbitrary number of named annotation sets, where each annotation set can have an arbitrary number of annotations which can overlap in any way. Python GateNLP documents can be exchanged with Java GATE by using the bdocjs/bdocym/bdocmp formats which are supported in Java GATE via the Format Bdoc Plugin
Other than many other Python NLP tools, GateNLP does not require a specific way of how text is split up into tokens, tokens can be represented by annotations in any way, and a document can have different ways of tokenization simultanously, if needed. Similarly, entities can be represented by annotations without restriction: they do not need to start or end at token boundaries and can overlap arbitrarily.
GateNLP provides ways to process text and create annotations using annotating pipelines, which are sequences of one or more annotators.
There are gazetteer annotators for matching text against gazetteer lists and annotators for a rule-like matching of complex annotation and text sequences (see PAMPAC).
There is also support for creating GateNLP annotations with other NLP packages like Spacy or Stanford Stanza.
The GateNLP document representation also optionally allows to track all changes
done to the document in a “change log”.
Such changes can later be applied to other Python GateNLP or to Java GATE documents.
This library also implements the functionality for the interaction with
a Java GATE process in two different ways:
The Java GATE Python plugin can invoke a process running Python GateNLP to annotate GATE documents.
Python code can remote-control a Jave GATE instance via the GateNLP GateSlave
Installation
Install GateNLP with all optional dependencies:
pip install -U gatenlpi[all]
For more details see Installation
Overview of the documentation:
NOTE: most of the documentation pages below can be viewed as HTML, as a Jupyter notebook, and the Jupyter notebook can be downloaded
for running on your own computer.
Installation
Getting Started / Getting Started Notebook / Notebook Download
The Document class and classes related to components of a document:

Annotation / Annotation Notebook / Notebook Download
AnnotationSet / AnnotationSet Notebook) / Notebook Download
Document / Document Notebook) / Notebook Download
The Changelog class for recording changes to a document

ChangeLogs / ChangeLogs Notebook) / Notebook Download
A comparison with the Java GATE API
The module for running python code from the GATE Python plugin

GateInteraction
The module for running Java GATE code from python

GateSlave / GateSlave Notebook) / Notebook Download
Modules for interaction with other NLP packages and converting their documents

lib_spacy
/
lib_spacy
Notebook / Notebook Download for interacting with Spacy
lib_stanza
/
lib_stanza
Notebook / Notebook Download for interacting with Stanza
lib_stanfordnlp
for interacting with StanfordNLP
Connecting to annotation services on the web:

Client Annotators / Client Annotators Notebook / Notebook Download
Modules related to NLP processing:

Corpora / Corpora Notebook / [Notebook Download)(corpora.ipynb)
Processing / Processing Notebook / Notebook Download
Gazetteers / Gazetteers Notebook / Notebook Download
Complex Annotation Patterns for matching text and annotation sequences:

PAMPAC / PAMPAC Notebook / Notebook Download
PAMPAC Reference
The Generated Python Documentation
python-gatenlp is maintained by GateNLP.
This page was generated by GitHub Pages.
Original markups



















Document features: