public class ReduceConsecutivesTransformer extends AbstractStringTransformer implements IXMLConfigurable
Reduces specified consecutive characters or strings to only one instance (document content only). If reducing duplicate words, you usually have to add a space at the Beginning or end of the word.
This class can be used as a pre-parsing (text content-types only) or post-parsing handlers.
XML configuration usage:
<transformer class="com.norconex.importer.transformer.impl.ReduceConsecutivesTransformer" caseSensitive="[false|true]" > <contentTypeRegex> (regex to identify text content-types for pre-import, overriding default) </contentTypeRegex> <restrictTo caseSensitive="[false|true]" > property="(name of header/metadata name to match)" (regular expression of value to match) </restrictTo> <reduce>(character or string to strip)</reduce> <!-- multiple reduce tags allowed --> </transformer>You can specify these special characters in your XML:
Constructor and Description |
---|
ReduceConsecutivesTransformer() |
Modifier and Type | Method and Description |
---|---|
void |
addReductions(String... reductions) |
boolean |
equals(Object obj) |
List<String> |
getReductions() |
int |
hashCode() |
boolean |
isCaseSensitive() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setCaseSensitive(boolean caseSensitive)
Sets whether to ignore case when matching characters or string
to reduce.
|
void |
setReductions(String... reductions) |
String |
toString() |
protected void |
transformStringDocument(String reference,
StringBuilder content,
Properties metadata,
boolean parsed,
boolean partialContent) |
transformTextDocument
transformDocument
documentAccepted, getContentTypeRegex, loadFromXML, saveToXML, setContentTypeRegex
setRestriction
protected void transformStringDocument(String reference, StringBuilder content, Properties metadata, boolean parsed, boolean partialContent)
transformStringDocument
in class AbstractStringTransformer
public void setReductions(String... reductions)
public void addReductions(String... reductions)
public boolean isCaseSensitive()
public void setCaseSensitive(boolean caseSensitive)
caseSensitive
- true
to consider character casepublic void loadFromXML(Reader in) throws IOException
loadFromXML
in interface IXMLConfigurable
IOException
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
public int hashCode()
hashCode
in class AbstractStringTransformer
public boolean equals(Object obj)
equals
in class AbstractStringTransformer
public String toString()
toString
in class AbstractStringTransformer
Copyright © 2009-2014 Norconex Inc.. All Rights Reserved.