public abstract class DocumentFormat extends AbstractLanguageResource
Modifier and Type | Field and Description |
---|---|
protected Map<String,String> |
element2StringMap
This map is used inside uppackMarkup() method...
|
protected static Map<String,MimeType> |
magic2mimeTypeMap
Map of Set of magic numbers to MimeType.
|
protected Map<String,String> |
markupElementsMap
Map of markup elements to annotation types.
|
protected static Map<String,DocumentFormat> |
mimeString2ClassHandlerMap
Map of MimeTypeString to ClassHandler class.
|
protected static Map<String,MimeType> |
mimeString2mimeTypeMap
Map of MimeType to DocumentFormat Class.
|
protected static Map<String,MimeType> |
suffixes2mimeTypeMap
Map of Set of file suffixes to MimeType.
|
dataStore, lrPersistentId
name
Constructor and Description |
---|
DocumentFormat()
Default construction
|
Modifier and Type | Method and Description |
---|---|
void |
addStatusListener(StatusListener l) |
protected static boolean |
areEqual(MimeType aMimeType,
MimeType anotherMimeType)
Tests if two MimeType objects are equal.
|
protected static MimeType |
decideBetweenThreeMimeTypes(MimeType aMimeTypeFromWebServer,
MimeType aMimeTypeFromFileSuffix,
MimeType aMimeTypeFromMagicNumbers)
This method decides what mimeType is in majority
|
protected static MimeType |
decideBetweenTwoMimeTypes(MimeType aMimeType,
MimeType anotherMimeType)
Decide between two mimeTypes.
|
protected void |
fireStatusChanged(String e) |
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
MimeType mimeType)
Find a DocumentFormat implementation that deals with a particular
MIME type, given that type.
|
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
String fileSuffix)
Find a DocumentFormat implementation that deals with a particular
MIME type, given the file suffix (e.g.
|
static DocumentFormat |
getDocumentFormat(Document aGateDocument,
URL url)
Find a DocumentFormat implementation that deals with a particular
MIME type, given the URL of the Document.
|
static DocumentFormat |
getDocumentFormat(MimeType mimeType)
Find the DocumentFormat implementation that deals with the given
MIME type.
|
Map<String,String> |
getElement2StringMap()
Get the element 2 string map
|
FeatureMap |
getFeatures()
Get the feature set
|
Map<String,String> |
getMarkupElementsMap()
Get the markup elements map
|
MimeType |
getMimeType()
Gets the mime Type
|
static MimeType |
getMimeTypeForString(String typeString)
Utility method to get a
MimeType given the type string. |
Boolean |
getShouldCollectRepositioning() |
static Set<String> |
getSupportedFileSuffixes()
Utility method to get the set of all file suffixes that are registered
with this class.
|
static Set<String> |
getSupportedMimeTypes() |
protected static MimeType |
guessTypeUsingMagicNumbers(InputStream aInputStream,
String anEncoding)
This method tries to guess the mime Type using some magic numbers.
|
void |
removeStatusListener(StatusListener l) |
protected static MimeType |
runMagicNumbers(Reader aReader)
Performs magic over Gate Document
|
void |
setElement2StringMap(Map<String,String> anElement2StringMap)
Set the element 2 string map
|
void |
setFeatures(FeatureMap features)
Set the features map
|
void |
setMarkupElementsMap(Map<String,String> markupElementsMap)
Set the markup elements map
|
void |
setMimeType(MimeType aMimeType)
Set the mime type
|
void |
setShouldCollectRepositioning(Boolean b) |
Boolean |
supportsRepositioning()
If the document format could collect repositioning information
during the unpack phase this method will return true.
|
abstract void |
unpackMarkup(Document doc)
Unpack the markup in the document.
|
abstract void |
unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo) |
void |
unpackMarkup(Document doc,
String originalContentFeatureType)
Unpack the markup in the document.
|
static boolean |
willReadFromUrl(String mimeTypeStr,
URL docUrl)
Utility function to determine if reading from the URL will be done by
a registered DocumentFormat.
|
cleanup, getDataStore, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync
checkParameterValues, flushBeanInfoCache, forgetBeanInfo, getBeanInfo, getInitParameterValues, getInitParameterValues, getName, getParameterValue, getParameterValue, getParameterValues, init, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners, toString
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getParameterValue, init, setParameterValue, setParameterValues
getName, setName
protected static final Map<String,DocumentFormat> mimeString2ClassHandlerMap
protected static final Map<String,MimeType> mimeString2mimeTypeMap
protected static final Map<String,MimeType> suffixes2mimeTypeMap
protected static final Map<String,MimeType> magic2mimeTypeMap
protected Map<String,String> markupElementsMap
public Boolean supportsRepositioning()
public void setShouldCollectRepositioning(Boolean b)
public Boolean getShouldCollectRepositioning()
public abstract void unpackMarkup(Document doc) throws DocumentFormatException
DocumentFormatException
public abstract void unpackMarkup(Document doc, RepositioningInfo repInfo, RepositioningInfo ampCodingInfo) throws DocumentFormatException
DocumentFormatException
public void unpackMarkup(Document doc, String originalContentFeatureType) throws DocumentFormatException
doc
- the document that will be unpackedoriginalContentFeatureType
- the name of the feature that will hold
the document's content.DocumentFormatException
protected static MimeType decideBetweenThreeMimeTypes(MimeType aMimeTypeFromWebServer, MimeType aMimeTypeFromFileSuffix, MimeType aMimeTypeFromMagicNumbers)
aMimeTypeFromWebServer
- a MimeTypeaMimeTypeFromFileSuffix
- a MimeTypeaMimeTypeFromMagicNumbers
- a MimeTypeprotected static MimeType decideBetweenTwoMimeTypes(MimeType aMimeType, MimeType anotherMimeType)
aMimeType
- a MimeType object with "Prority" parameter setanotherMimeType
- a MimeType object with "Prority" parameter setprotected static boolean areEqual(MimeType aMimeType, MimeType anotherMimeType)
protected static MimeType guessTypeUsingMagicNumbers(InputStream aInputStream, String anEncoding)
aInputStream
- a InputStream which has to be transformed into a
InputStreamReaderanEncoding
- the encoding. If is null or unknown then a
InputStreamReader with default encodings will be created.protected static MimeType runMagicNumbers(Reader aReader)
public static DocumentFormat getDocumentFormat(Document aGateDocument, MimeType mimeType)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypemimeType
- the mime type that is given as inputpublic static DocumentFormat getDocumentFormat(Document aGateDocument, String fileSuffix)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypefileSuffix
- the file suffix that is given as inputpublic static DocumentFormat getDocumentFormat(MimeType mimeType)
mimeType
- the MIME type you want the DocumentFormat forpublic static DocumentFormat getDocumentFormat(Document aGateDocument, URL url)
aGateDocument
- this document will receive as a feature
the associated Mime Type. The name of the feature is
MimeType and its value is in the format type/subtypeurl
- the URL that is given as inputpublic FeatureMap getFeatures()
getFeatures
in interface FeatureBearer
getFeatures
in class AbstractFeatureBearer
public void setMarkupElementsMap(Map<String,String> markupElementsMap)
public void setElement2StringMap(Map<String,String> anElement2StringMap)
public void setFeatures(FeatureMap features)
setFeatures
in interface FeatureBearer
setFeatures
in class AbstractFeatureBearer
public void setMimeType(MimeType aMimeType)
public MimeType getMimeType()
public static MimeType getMimeTypeForString(String typeString)
MimeType
given the type string.public static Set<String> getSupportedFileSuffixes()
public static boolean willReadFromUrl(String mimeTypeStr, URL docUrl)
mimeTypeStr
- the mime type string parameter for the DocumentdocUrl
- the sourceUrl parameter for the Documentpublic void removeStatusListener(StatusListener l)
public void addStatusListener(StatusListener l)
protected void fireStatusChanged(String e)
Copyright © 2024 GATE. All rights reserved.