{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Documents" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from gatenlp import Document\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Document(This is a test document.\n", "\n", "It contains just a few sentences. \n", "Here is a sentence that mentions a few named entities like \n", "the persons Barack Obama or Ursula von der Leyen, locations\n", "like New York City, Vienna or Beijing or companies like \n", "Google, UniCredit or Huawei. \n", "\n", "Here we include a URL https://gatenlp.github.io/python-gatenlp/ \n", "and a fake email address john.doe@hiscoolserver.com as well \n", "as #some #cool #hastags and a bunch of emojis like π½ (a kissing cat),\n", "π©βπ« (a woman teacher), 𧬠(DNA), \n", "π§ (a person climbing), \n", "π© (a pile of poo). \n", "\n", "Here we test a few different scripts, e.g. Hangul νκΈ or \n", "simplified Hanzi ζ±ε or Farsi ΩΨ§Ψ±Ψ³Ϋ which goes from right to left. \n", "\n", "\n", ",features=Features({}),anns=[])\n" ] } ], "source": [ "# To load a document from a file with the name \"file.bdocjs\" into gatenlp simply use:\n", "# doc = Document.load(\"test2a.bdocjs\")\n", "\n", "# But it is also possible to load from a file that is somewhere on the internet. For this notebook, we use\n", "# an example document that gets loaded from a URL:\n", "doc = Document.load(\"https://gatenlp.github.io/python-gatenlp/testdocument1.txt\")\n", "\n", "# We can visualize the document by printing it:\n", "print(doc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Printing the document shows the document text and indicates that there are no document features and no \n", "annotations which is to be expected since we just loaded from a plain text file. \n", "\n", "In a Jupyter notebook, a `gatenlp` document can also be visualized graphically by either just using the document \n", "as the last value of a cell or by using the IPython \"display\" function:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "