Package gatenlp
The following classes are imported into the gatenlp package by default: Span
,
Annotation
,
AnnotationSet
, ChangeLog
, Document
as well
as GateNlpPr
and interact
for the GATE Python plugin.
Where to find other important classes:
- corpora, document sources, document destinations: in
gatenlp.corpora
GateWorker
,GateWorkerAnnotator
ingatenlp.gateworker
AnnSpacy
ingatenlp.lib_spacy
AnnStanza
ingatenlp.lib_stanza
- TODO: include all the others!
Expand source code
"""
The following classes are imported into the gatenlp package by default: `gatenlp.span.Span`,
`gatenlp.annotation.Annotation`,
`gatenlp.annotation_set.AnnotationSet`, `gatenlp.changelog.ChangeLog`, `gatenlp.document.Document` as well
as `GateNlpPr` and `interact` for the GATE Python plugin.
Where to find other important classes:
* corpora, document sources, document destinations: in `gatenlp.corpora`
* `gatenlp.gateworker.gateworker.GateWorker`, `gatenlp.gateworker.gateworkerannotator.GateWorkerAnnotator`
in `gatenlp.gateworker`
* `gatenlp.lib_spacy.AnnSpacy` in `gatenlp.lib_spacy`
* `gatenlp.lib_stanza.AnnStanza` in `gatenlp.lib_stanza`
* TODO: include all the others!
"""
# NOTE: do not place a comment at the end of the version assignment
# line since we parse that line in a shell script!
# __version__ = "0.9.9"
from gatenlp.version import __version__
try:
import sortedcontainers
except Exception:
import sys
print(
"ERROR: required package sortedcontainers cannot be imported!", file=sys.stderr
)
print(
"Please install it, using e.g. 'pip install -U sortedcontainers'",
file=sys.stderr,
)
sys.exit(1)
# TODO: check version of sortedcontainers (we have 2.1.0)
from gatenlp.utils import init_logger
logger = init_logger("gatenlp")
from gatenlp.span import Span
from gatenlp.annotation import Annotation
from gatenlp.annotation_set import AnnotationSet
from gatenlp.changelog import ChangeLog
from gatenlp.document import Document
from gatenlp.gate_interaction import _pr_decorator as GateNlpPr
from gatenlp.gate_interaction import interact
# Importing GateWorker or other classes which depend on any package other than sortedcontains will
# break the Python plugin!
# from gatenlp.gateworker import GateWorker, GateWorkerAnnotator
def init_notebook(): # pragma: no cover
"""
Helper method to initialize a Jupyter or similar notebook.
"""
from gatenlp.serialization.default_htmlannviewer import init_javascript
from gatenlp.gatenlpconfig import gatenlpconfig
init_javascript()
gatenlpconfig.notebook_js_initialized = True
Sub-modules
gatenlp.annotation
-
Module for Annotation class which represents information about a span of text in a document.
gatenlp.annotation_set
-
Module for AnnotationSet class which represents a named collection of annotations which can arbitrarily overlap.
gatenlp.annotation_utils
-
Module defining several utility functions for annotating documents in various ways.
gatenlp.changelog
-
Module for ChangeLog class which represents a log of changes to any of the components of a Document: document features, annotations, annotation features.
gatenlp.changelog_consts
-
Module for defining the constants used in the changelog module
gatenlp.chunking
-
Module for chunking-related methods and annotators.
gatenlp.corpora
-
Module that defines base and implementation classes for representing document collections …
gatenlp.document
-
Module that implements the Document class for representing gatenlp documents with features and annotation sets.
gatenlp.features
-
Module that implements class Feature for representing features.
gatenlp.gate_interaction
-
Support for interacting between a GATE (java) process and a gatenlp (Python) process. This is used by the Java GATE Python plugin.
gatenlp.gatenlpconfig
-
Module that provides the class GatenlpConfig and the instance gatenlpconfig which stores various global configuration options.
gatenlp.gateworker
-
Module for interacting with a Java GATE process.
gatenlp.impl
-
This subpackage contains modules for (temporary) implementation of datastructures and algorithms needed. Some of these may get replaced by other …
gatenlp.lang
-
Subpackage for future language-specific resources and annotators
gatenlp.lib_spacy
-
Support for using spacy: convert from spacy to gatenlp documents and annotations.
gatenlp.lib_stanza
-
Support for using stanford stanza (see https://stanfordnlp.github.io/stanza/): convert from stanford Stanza output to gatenlp documents and annotations.
gatenlp.offsetmapper
-
Module that implements the OffsetMapper class for mapping between Java-style and Python-style string offsets. Java strings are represented as UTF16 …
gatenlp.pam
-
Subpackage for modules related to pattern matching.
gatenlp.processing
-
Package for annotators, and other things related to processing documents.
gatenlp.serialization
-
Subpackage for modules related to serialization.
gatenlp.span
-
Module for Span class
gatenlp.urlfileutils
-
Module for functions that help reading binary and textual data from either URLs or local files.
gatenlp.utils
-
Various utilities that could be useful in several modules.
gatenlp.version
gatenlp.visualization
Functions
def init_notebook()
-
Helper method to initialize a Jupyter or similar notebook.
Expand source code
def init_notebook(): # pragma: no cover """ Helper method to initialize a Jupyter or similar notebook. """ from gatenlp.serialization.default_htmlannviewer import init_javascript from gatenlp.gatenlpconfig import gatenlpconfig init_javascript() gatenlpconfig.notebook_js_initialized = True