public abstract class AbstractTextRestrictiveHandler extends AbstractRestrictiveHandler
Base class for handlers dealing with text documents only. Subclasses can safely be used as either pre-parse or post-parse handlers.
For pre-parsing, non-text documents will simply be ignored and no
transformation will occur. To find out if a document is a text-one, the
metadata Importer.DOC_CONTENT_TYPE
value is used. By default
any content type starting with "text/" is considered text. This default
behavior can be changed with the setContentTypeRegex(String)
method.
One must make sure to only match text documents to avoid parsing exceptions.
For post-parsing, all documents are assumed to be text.
Sub-classes can restrict to which document to apply themselves
based on document metadata (see AbstractRestrictiveHandler
).
Subclasses must test if a document is accepted using the
documentAccepted(String, Properties, boolean)
method.
Subclasses implementing IXMLConfigurable
should allow this inner
configuration:
<contentTypeRegex> (regex to identify text content-types, overriding default) </contentTypeRegex> <restrictTo caseSensitive="[false|true]" > property="(name of header/metadata name to match)" (regular expression of value to match) </restrictTo>
Constructor and Description |
---|
AbstractTextRestrictiveHandler() |
Modifier and Type | Method and Description |
---|---|
protected boolean |
documentAccepted(String reference,
Properties metadata,
boolean parsed)
Class to invoke by subclasses to find out if this handler should be
rejected or not based on the metadata restriction provided.
|
boolean |
equals(Object obj) |
String |
getContentTypeRegex()
Gets the regular expression to match the content type.
|
int |
hashCode() |
protected void |
loadFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Convenience method for subclasses to load content type regex.
|
protected void |
saveToXML(XMLStreamWriter writer)
Convenience method for subclasses to save content type regex.
|
void |
setContentTypeRegex(String contentTypeRegex)
Sets the regular expression to match the content type.
|
String |
toString() |
setRestriction
public String getContentTypeRegex()
public void setContentTypeRegex(String contentTypeRegex)
contentTypeRegex
- the regular expressionprotected boolean documentAccepted(String reference, Properties metadata, boolean parsed) throws IOException
AbstractRestrictiveHandler
documentAccepted
in class AbstractRestrictiveHandler
reference
- document referencemetadata
- document metadata.parsed
- if the document was parsed (i.e. imported) alreadytrue
if the document is acceptedIOException
protected void loadFromXML(org.apache.commons.configuration.XMLConfiguration xml)
loadFromXML
in class AbstractRestrictiveHandler
xml
- xml configurationprotected void saveToXML(XMLStreamWriter writer) throws XMLStreamException
saveToXML
in class AbstractRestrictiveHandler
writer
- XML writerXMLStreamException
- problem savingpublic String toString()
toString
in class AbstractRestrictiveHandler
public int hashCode()
hashCode
in class AbstractRestrictiveHandler
public boolean equals(Object obj)
equals
in class AbstractRestrictiveHandler
Copyright © 2009-2014 Norconex Inc.. All Rights Reserved.