Package gatenlp
The following classes are imported into the gatenlp package by default: Span,
Annotation,
AnnotationSet, ChangeLog, Document as well
as GateNlpPr and interact for the GATE Python plugin.
Where to find other important classes:
- corpora, document sources, document destinations: in
gatenlp.corpora GateWorker,GateWorkerAnnotatoringatenlp.gateworkerAnnSpacyingatenlp.lib_spacyAnnStanzaingatenlp.lib_stanza- TODO: include all the others!
Expand source code
"""
The following classes are imported into the gatenlp package by default: `gatenlp.span.Span`,
`gatenlp.annotation.Annotation`,
`gatenlp.annotation_set.AnnotationSet`, `gatenlp.changelog.ChangeLog`, `gatenlp.document.Document` as well
as `GateNlpPr` and `interact` for the GATE Python plugin.
Where to find other important classes:
* corpora, document sources, document destinations: in `gatenlp.corpora`
* `gatenlp.gateworker.gateworker.GateWorker`, `gatenlp.gateworker.gateworkerannotator.GateWorkerAnnotator`
in `gatenlp.gateworker`
* `gatenlp.lib_spacy.AnnSpacy` in `gatenlp.lib_spacy`
* `gatenlp.lib_stanza.AnnStanza` in `gatenlp.lib_stanza`
* TODO: include all the others!
"""
# NOTE: do not place a comment at the end of the version assignment
# line since we parse that line in a shell script!
# __version__ = "0.9.9"
from gatenlp.version import __version__
try:
import sortedcontainers
except Exception:
import sys
print(
"ERROR: required package sortedcontainers cannot be imported!", file=sys.stderr
)
print(
"Please install it, using e.g. 'pip install -U sortedcontainers'",
file=sys.stderr,
)
sys.exit(1)
# TODO: check version of sortedcontainers (we have 2.1.0)
from gatenlp.utils import init_logger
logger = init_logger("gatenlp")
from gatenlp.span import Span
from gatenlp.annotation import Annotation
from gatenlp.annotation_set import AnnotationSet
from gatenlp.changelog import ChangeLog
from gatenlp.document import Document
from gatenlp.gate_interaction import _pr_decorator as GateNlpPr
from gatenlp.gate_interaction import interact
# Importing GateWorker or other classes which depend on any package other than sortedcontains will
# break the Python plugin!
# from gatenlp.gateworker import GateWorker, GateWorkerAnnotator
def init_notebook(): # pragma: no cover
"""
Helper method to initialize a Jupyter or similar notebook.
"""
from gatenlp.serialization.default_htmlannviewer import init_javascript
from gatenlp.gatenlpconfig import gatenlpconfig
init_javascript()
gatenlpconfig.notebook_js_initialized = True
Sub-modules
gatenlp.annotation-
Module for Annotation class which represents information about a span of text in a document.
gatenlp.annotation_set-
Module for AnnotationSet class which represents a named collection of annotations which can arbitrarily overlap.
gatenlp.annotation_utils-
Module defining several utility functions for annotating documents in various ways.
gatenlp.changelog-
Module for ChangeLog class which represents a log of changes to any of the components of a Document: document features, annotations, annotation features.
gatenlp.changelog_consts-
Module for defining the constants used in the changelog module
gatenlp.chunking-
Module for chunking-related methods and annotators.
gatenlp.corpora-
Module that defines base and implementation classes for representing document collections …
gatenlp.document-
Module that implements the Document class for representing gatenlp documents with features and annotation sets.
gatenlp.features-
Module that implements class Feature for representing features.
gatenlp.gate_interaction-
Support for interacting between a GATE (java) process and a gatenlp (Python) process. This is used by the Java GATE Python plugin.
gatenlp.gatenlpconfig-
Module that provides the class GatenlpConfig and the instance gatenlpconfig which stores various global configuration options.
gatenlp.gateworker-
Module for interacting with a Java GATE process.
gatenlp.impl-
This subpackage contains modules for (temporary) implementation of datastructures and algorithms needed. Some of these may get replaced by other …
gatenlp.lang-
Subpackage for future language-specific resources and annotators
gatenlp.lib_spacy-
Support for using spacy: convert from spacy to gatenlp documents and annotations.
gatenlp.lib_stanza-
Support for using stanford stanza (see https://stanfordnlp.github.io/stanza/): convert from stanford Stanza output to gatenlp documents and annotations.
gatenlp.offsetmapper-
Module that implements the OffsetMapper class for mapping between Java-style and Python-style string offsets. Java strings are represented as UTF16 …
gatenlp.pam-
Subpackage for modules related to pattern matching.
gatenlp.processing-
Package for annotators, and other things related to processing documents.
gatenlp.serialization-
Subpackage for modules related to serialization.
gatenlp.span-
Module for Span class
gatenlp.urlfileutils-
Module for functions that help reading binary and textual data from either URLs or local files.
gatenlp.utils-
Various utilities that could be useful in several modules.
gatenlp.versiongatenlp.visualization
Functions
def init_notebook()-
Helper method to initialize a Jupyter or similar notebook.
Expand source code
def init_notebook(): # pragma: no cover """ Helper method to initialize a Jupyter or similar notebook. """ from gatenlp.serialization.default_htmlannviewer import init_javascript from gatenlp.gatenlpconfig import gatenlpconfig init_javascript() gatenlpconfig.notebook_js_initialized = True