public abstract class AbstractStringTagger extends AbstractCharStreamTagger
Base class to facilitate creating taggers based on text content, loading
text into StringBuilder
for memory processing, also giving more
options (like fancy regex). This class check for free memory every 10KB of
text read. If enough memory, it keeps going for another 10KB or until
all the content is read, or the buffer size reaches half the available
memory. In either case, it passes the buffered content so far for
tagging (all of it for small enough content, or in several
chunks for large content).
Implementors should be conscious about memory when dealing with the string builder.
Subclasses implementing IXMLConfigurable
should allow this inner
configuration:
<contentTypeRegex> (regex to identify text content-types, overridding default) </contentTypeRegex> <restrictTo caseSensitive="[false|true]" > property="(name of header/metadata name to match)" (regular expression of value to match) </restrictTo>
Constructor and Description |
---|
AbstractStringTagger() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
int |
hashCode() |
protected abstract void |
tagStringDocument(String reference,
StringBuilder content,
Properties metadata,
boolean parsed,
boolean partialContent) |
protected void |
tagTextDocument(String reference,
Reader input,
Properties metadata,
boolean parsed) |
String |
toString() |
tagDocument
documentAccepted, getContentTypeRegex, loadFromXML, saveToXML, setContentTypeRegex
setRestriction
protected final void tagTextDocument(String reference, Reader input, Properties metadata, boolean parsed) throws IOException
tagTextDocument
in class AbstractCharStreamTagger
IOException
protected abstract void tagStringDocument(String reference, StringBuilder content, Properties metadata, boolean parsed, boolean partialContent)
public boolean equals(Object obj)
equals
in class AbstractCharStreamTagger
public int hashCode()
hashCode
in class AbstractCharStreamTagger
public String toString()
toString
in class AbstractTextRestrictiveHandler
Copyright © 2009-2014 Norconex Inc.. All Rights Reserved.