info.bliki.wiki.filter
Class WikipediaScanner

java.lang.Object
  extended by info.bliki.wiki.filter.WikipediaScanner
Direct Known Subclasses:
AbstractParser

public class WikipediaScanner
extends java.lang.Object


Field Summary
static int EOF
          Return value when the source is exhausted.
protected  int fScannerPosition
           
protected  char[] fSource
          The corresponding char[] array for the string source
protected  java.lang.String fStringSource
          The String of the given raw wiki text
protected  IWikiModel fWikiModel
           
static java.lang.String TAG_NAME
           
 
Constructor Summary
WikipediaScanner(java.lang.String src)
           
WikipediaScanner(java.lang.String src, int position)
           
 
Method Summary
static int findNestedEnd(char[] sourceArray, char startCh, char endChar, int startPosition)
           
static int[] findNestedParamEnd(char[] sourceArray, int startPosition)
           
static int findNestedTemplateEnd(char[] sourceArray, int startPosition)
           
 int getPosition()
           
 int indexEndOfComment()
           
 int indexEndOfNowiki()
           
 int indexEndOfTable()
           
 int indexOfAttributes()
          Scan the attributes of a wiki table cell
protected  int indexOfUntilNoLetter(char testChar, int fromIndex)
          Read the characters until no more letters are found or the given testChar is found.
protected  WikiTagNode makeTag(int start, int end, java.util.ArrayList<NodeAttribute> attributes)
          Create a tag node based on the current cursor and the one provided.
 int nextNewline()
           
protected  java.util.List<NodeAttribute> parseAttributes(int start, int end)
           
protected  WikiTagNode parseTag(int start)
          Parse a tag.
protected  int readSpecialWikiTags(int start)
           
protected  int readUntilIgnoreCase(int start, java.lang.String startString, java.lang.String endString)
          Read the characters until the concatenated start and end substring is found.
 java.lang.StringBuilder replaceTemplateParameters(java.lang.String template, java.util.Map<java.lang.String,java.lang.String> templateParameters)
          Replace the wiki template parameters in the given template string
 void scanWhiteSpace()
           
 void setModel(IWikiModel wikiModel)
           
 void setPosition(int newPos)
           
static java.util.List<java.lang.String> splitByPipe(char[] srcArray, int currOffset, int endOffset, java.util.List<java.lang.String> resultList)
          Split the given src character array by pipe symbol (i.e.
static java.util.List<java.lang.String> splitByPipe(java.lang.String sourceString, java.util.List<java.lang.String> resultList)
          Split the given src string by pipe symbol (i.e.
static boolean startsWith(java.lang.String str, int toffset, java.lang.String prefix, boolean ignoreCase)
           Check if a String starts with a specified prefix (optionally case insensitive).
 WPTable tracTable(TableOfContentTag tableOfContentTag)
          Scan a Trac simple wiki table
 WPList wpList()
           
 WPTable wpTable(ITableOfContent tableOfContentTag)
          Scan a wikipedia table.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TAG_NAME

public static final java.lang.String TAG_NAME
See Also:
Constant Field Values

EOF

public static final int EOF
Return value when the source is exhausted. Has a value of -1.

See Also:
Constant Field Values

fScannerPosition

protected int fScannerPosition

fWikiModel

protected IWikiModel fWikiModel

fStringSource

protected final java.lang.String fStringSource
The String of the given raw wiki text


fSource

protected final char[] fSource
The corresponding char[] array for the string source

Constructor Detail

WikipediaScanner

public WikipediaScanner(java.lang.String src)

WikipediaScanner

public WikipediaScanner(java.lang.String src,
                        int position)
Method Detail

setModel

public void setModel(IWikiModel wikiModel)

getPosition

public int getPosition()

setPosition

public void setPosition(int newPos)

wpTable

public WPTable wpTable(ITableOfContent tableOfContentTag)
Scan a wikipedia table. See: Help - Table

Parameters:
tableOfContentTag -
Returns:
null if no wiki table was found

tracTable

public WPTable tracTable(TableOfContentTag tableOfContentTag)
Scan a Trac simple wiki table

Parameters:
tableOfContentTag -
Returns:

wpList

public WPList wpList()

nextNewline

public int nextNewline()

indexEndOfComment

public int indexEndOfComment()

indexEndOfNowiki

public int indexEndOfNowiki()

indexEndOfTable

public int indexEndOfTable()

indexOfAttributes

public int indexOfAttributes()
Scan the attributes of a wiki table cell

Returns:

startsWith

public static boolean startsWith(java.lang.String str,
                                 int toffset,
                                 java.lang.String prefix,
                                 boolean ignoreCase)

Check if a String starts with a specified prefix (optionally case insensitive).

Parameters:
str - the String to check, may be null
toffset - the starting offset of the subregion the String to check
prefix - the prefix to find, may be null
ignoreCase - inidicates whether the compare should ignore case (case insensitive) or not.
Returns:
true if the String starts with the prefix or both null
See Also:
String.startsWith(String)

scanWhiteSpace

public void scanWhiteSpace()

replaceTemplateParameters

public java.lang.StringBuilder replaceTemplateParameters(java.lang.String template,
                                                         java.util.Map<java.lang.String,java.lang.String> templateParameters)
Replace the wiki template parameters in the given template string

Parameters:
template -
fTemplateParameters -
Returns:
null if no replacement could be found

splitByPipe

public static java.util.List<java.lang.String> splitByPipe(java.lang.String sourceString,
                                                           java.util.List<java.lang.String> resultList)
Split the given src string by pipe symbol (i.e. "|")

Parameters:
sourceString -
resultList - the list which contains the splitted strings
Returns:

splitByPipe

public static java.util.List<java.lang.String> splitByPipe(char[] srcArray,
                                                           int currOffset,
                                                           int endOffset,
                                                           java.util.List<java.lang.String> resultList)
Split the given src character array by pipe symbol (i.e. "|")

Parameters:
srcArray -
currOffset -
endOffset -
resultList - the list which contains the splitted strings
Returns:

findNestedEnd

public static int findNestedEnd(char[] sourceArray,
                                char startCh,
                                char endChar,
                                int startPosition)

findNestedTemplateEnd

public static int findNestedTemplateEnd(char[] sourceArray,
                                        int startPosition)

findNestedParamEnd

public static int[] findNestedParamEnd(char[] sourceArray,
                                       int startPosition)

parseTag

protected WikiTagNode parseTag(int start)
Parse a tag. Parse the name and attributes from a start tag.

From the HTML 4.01 Specification, W3C Recommendation 24 December 1999 http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2

3.2.2 Attributes

Elements may have associated properties, called attributes, which may have values (by default, or set by authors or scripts). Attribute/value pairs appear before the final ">" of an element's start tag. Any number of (legal) attribute value pairs, separated by spaces, may appear in an element's start tag. They may appear in any order.

In this example, the id attribute is set for an H1 element: <H1 id="section1"> This is an identified heading thanks to the id attribute </H1> By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (&#34;) and single quotes (&#39;). For doublequotes authors can also use the character entity reference &quot;.

In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.

Attribute names are always case-insensitive.

Attribute values are generally case-insensitive. The definition of each attribute in the reference manual indicates whether its value is case-insensitive.

All the attributes defined by this specification are listed in the attribute index.

This method uses a state machine with the following states:

  1. state 0 - outside of any attribute
  2. state 1 - within attributre name
  3. state 2 - equals hit
  4. state 3 - within naked attribute value.
  5. state 4 - within single quoted attribute value
  6. state 5 - within double quoted attribute value
  7. state 6 - whitespaces after attribute name could lead to state 2 (=)or state 0

The starting point for the various components is stored in an array of integers that match the initiation point for the states one-for-one, i.e. bookmarks[0] is where state 0 began, bookmarks[1] is where state 1 began, etc. Attributes are stored in a Vector having one slot for each whitespace or attribute/value pair. The first slot is for attribute name (kind of like a standalone attribute).

Parameters:
start - The position at which to start scanning.
Returns:
The parsed tag.
Throws:
ParserException - If a problem occurs reading from the source.

parseAttributes

protected java.util.List<NodeAttribute> parseAttributes(int start,
                                                        int end)

makeTag

protected WikiTagNode makeTag(int start,
                              int end,
                              java.util.ArrayList<NodeAttribute> attributes)
Create a tag node based on the current cursor and the one provided.

Parameters:
start - The starting point of the node.
end - The ending point of the node.
attributes - The attributes parsed from the tag.
Returns:
The new Tag node.
Throws:
ParserException - If the nodefactory creation of the tag node fails.

readSpecialWikiTags

protected int readSpecialWikiTags(int start)

readUntilIgnoreCase

protected final int readUntilIgnoreCase(int start,
                                        java.lang.String startString,
                                        java.lang.String endString)
Read the characters until the concatenated start and end substring is found. The end substring is matched ignoring case considerations.

Parameters:
startString - the start string which should be searched in exact case mode
endString - the end string which should be searched in ignore case mode
Returns:

indexOfUntilNoLetter

protected int indexOfUntilNoLetter(char testChar,
                                   int fromIndex)
Read the characters until no more letters are found or the given testChar is found. If testChar was found, return the offset position.

Parameters:
testCh - the test character
fromIndex - read from this offset
Returns:
-1 if the character could not be found or no more letter character were found.


Copyright © 2012 Java Wikipedia API (Bliki engine). All Rights Reserved.