public class StripBetweenTransformer extends AbstractStringTransformer implements IXMLConfigurable
Strips any content found between a matching start and end strings. The matching strings are defined in pairs and multiple ones can be specified at once.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
XML configuration usage:
<transformer class="com.norconex.importer.transformer.impl.StripBetweenTransformer" inclusive="[false|true]" caseSensitive="[false|true]" > <contentTypeRegex> (regex to identify text content-types for pre-import, overriding default) </contentTypeRegex> <restrictTo caseSensitive="[false|true]" > property="(name of header/metadata name to match)" (regular expression of value to match) </restrictTo> <stripBetween> <start>(regex)</start> <end>(regex)</end> </stripBetween> <-- multiple strignBetween tags allowed --> </transformer>
Constructor and Description |
---|
StripBetweenTransformer() |
Modifier and Type | Method and Description |
---|---|
void |
addStripEndpoints(String fromText,
String toText) |
boolean |
equals(Object obj) |
List<org.apache.commons.lang3.tuple.Pair<String,String>> |
getStripEndpoints() |
int |
hashCode() |
boolean |
isCaseSensitive() |
boolean |
isInclusive() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setCaseSensitive(boolean caseSensitive)
Sets whether to ignore case when matching start and end text.
|
void |
setInclusive(boolean inclusive)
Sets whether start and end text pairs should themselves be stripped or
not.
|
String |
toString() |
protected void |
transformStringDocument(String reference,
StringBuilder content,
Properties metadata,
boolean parsed,
boolean partialContent) |
transformTextDocument
transformDocument
documentAccepted, getContentTypeRegex, loadFromXML, saveToXML, setContentTypeRegex
setRestriction
protected void transformStringDocument(String reference, StringBuilder content, Properties metadata, boolean parsed, boolean partialContent)
transformStringDocument
in class AbstractStringTransformer
public boolean isInclusive()
public void setInclusive(boolean inclusive)
inclusive
- true
to strip start and end textpublic boolean isCaseSensitive()
public void setCaseSensitive(boolean caseSensitive)
caseSensitive
- true
to consider character casepublic List<org.apache.commons.lang3.tuple.Pair<String,String>> getStripEndpoints()
public void loadFromXML(Reader in) throws IOException
loadFromXML
in interface IXMLConfigurable
IOException
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
public int hashCode()
hashCode
in class AbstractStringTransformer
public boolean equals(Object obj)
equals
in class AbstractStringTransformer
public String toString()
toString
in class AbstractStringTransformer
Copyright © 2009-2014 Norconex Inc.. All Rights Reserved.