cox.jmatt.java.MathTools.util
Class NoteworthyParser

java.lang.Object
  extended by cox.jmatt.java.MathTools.util.NoteworthyParser

public class NoteworthyParser
extends java.lang.Object

This class is designed to take a Noteworthy-formatted document and convert it to HTML. To use the parser first create an instance. Next create and configure a MathGenHTML for the parser to use. This is handy if some of the tags generated need default attributes set. Finally, feed the parse() method a newline-separated String of Noteworthy-formatted text. Unless the universe collapses the method will return an HTMLTag containing the HTML-formatted document. This can then be manipulated as needed.

The Noteworthy Markup Format

Noteworthy is a set of rules and a parser designed to convert plain text lecture notes into HTML documents. Specifically, it allows both unparsed notes and parsed markup to be mixed in a single document. The basic parse-marker is the double quote (N.B. This is the single-key double-quote character, not a double apostrophe!) followed by a special formatting character. The parser extracts these lines, converts them to HTML, and returns them in a HTMLTag instance. Unquoted lines are ignored.

Noteworthy input is, as stated previously, plain text. Lines that begin with special (and logical) formatting characters are formatted per their function. Other parsed lines are assumed to be part of a paragraph, division, or directly within the <body> of the document. Some nesting of tag types is possible but it is tricky at best.

Any tag, list, paragraph or division can be closed with a single blank line. One blank line closes the closest previous open 'thing.' So if a list follows a paragraph the first blank line closes the list. In the case of lists, though, any line starting with a non-list character terminates the list. If the list immediately follows an open paragraph (no blank lines between) it will be included in the paragraph. The best way to gain a feel for this is to experiment!

Noteworthy Structure

The structural philosophy of Noteworthy is fairly rigid. The ultimate top-level container is the <body> tag. All elements are contained within it and it doesn't close until the end of the document. The body can hold divisions, paragraphs, or lists. Divisions can hold paragraphs and lists. Heading elements of any level can be included in the body or a division.

Opening a new division ('<div>' tag) closes the previous division, if one was defined, which automatically closes all other open things within it. A division can also be explicitly closed by issuing the close all command: '!'.

The text structure is very simple. Special characters must be the first non-whitespace on the line but all lines are trimmed before parsing. The data portion of the line, which is anything after a special character or the entire line for paragraphs, is also trimmed before it is used. This allows hard-formatting using spaces or tabs, which allows and encourages organized and readable input.

Headings are preceded by one or more hash marks ('#'). The number of these characters indicates the heading level. Any hash marks at the end of the line are ignored and will be stripped before the heading is created, so symmetry is allowed or not, although it is more visually appealing.

Tables

Tables are started with a single pipe character, '|'. The first such indicates that a table is to be created, subsequent lines become the rows of the table. Table cell elements ('<td>') are separated by a double pipe with no space between. If the first line of the table contains cell elements they become the headers ('<th>'), otherwise the table has none. Subsequent rows all describe standard cells within the table, '<td>'.

ID Attributes

All major structural elements within the document are assigned 'id=' attributes automatically: <div>, <p>, <ol>, <ul> and <dl>. Other tags do not receive IDs automatically. The IDs are very straightforward: tag type plus a count appended to the end. For example the second paragraph in the document receives the ID 'par2'.

Raw Mode

When the automatic behavior simply will not suffice, raw mode is available. Raw mode is toggled by three equal signs alone on a line, '==='. Everything from the line after raw mode is activated to the line before it ends is added to the document without formatting. The lines are trimmed and any blank lines are ignored, but any non-blank lines are included exactly as written.

Formatting Characters

The special formatting characters and their actions are:

CharacterAction
/ or /#...Open a division. This closes all open tags back to <body> and starts a fresh division.
The second version ('/#') opens a division and includes a header of the given level.
"0, "1, "2, "3, "4,
"5, "6, "7, "8, "9
Start an ordered list or add an item to an existing list or a <dd> to a definition list.
"*, "., "+, "-Start an ordered list or add an item to an existing list or a <dd> to a definition list.
": Start a definition list or ad a <dt> to an existing one.
"# Add a heading tag to any eligible element. The number of hashes indicates the heading level.
"| Start a table or add a row to an existing one.
Individual elements are double-pipe separated.
The first row (if defined) becomes the table headers, subsequent rows are <td> elements.
"(blank line) A blank line closes the last tag opened.
"= A single equals sign is used to escape a line that begins with an otherwise-significant character.
To begin a line with an actual equals sign, double it! To start a line with a double-equal use '= =='.
"! Forcibly close all tags back to the <body> tag.
"{ Execute a macro command (see below).
'"===' Toggle raw mode. In raw mode all content is added as-is to whatever tag is currently open.
Null or blank lines are ignored and the lines added are trimmed.
The triple-equals must be the only thing on the line.
Any other character Other characters either form <p> tags or add directly to existing <p> or <div> tags.

Ordered, unordered and definition lists opened immediately beneath a paragraph (no blank lines between) are included in the paragraph. Lines beginning immediately after this are also included in the same paragraph. Whitespace before and after the formatting characters is trimmed as is any at the end of each line. Where necessary space is replaced.

Macro Commands

Macro commands are directives that do not specifically affect formatting. They may affect procesing or inline configuration.

All macro commands consist of a left curly brace followed (no space!) by one or more other letters or symbols. Some commands have arguments; these are separated from the command itself by at least one space. The closing brace is not required. Available macro commands are:

CommandResult
"{t Page TitleSet the title of the HTML document; HTMLTag setTitle().
"{i fileName
"{g fileName
Copy the contents of an external file into the document (rawly!).
The first version uses a standard FileReader and the second uses getResourceAsStream().
Any exceptions are reported at Error level.

Macro commands are still considered standard directives; they will not be processed in raw mode!


Constructor Summary
NoteworthyParser()
          Standard constructor used for scripting and other instances.
 
Method Summary
 HTMLTag parse(java.lang.String pNoteworthy)
          This is the reason this class exists.
 void reset()
          Reset the parser to its initial state.
 void setGenerator(MathGenHTML pGen)
          Set the MathGenHTML instance used to generate tag classes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NoteworthyParser

public NoteworthyParser()
Standard constructor used for scripting and other instances.

Method Detail

setGenerator

public void setGenerator(MathGenHTML pGen)
Set the MathGenHTML instance used to generate tag classes. Null clears it and one will be generated automatically when needed.


parse

public HTMLTag parse(java.lang.String pNoteworthy)
This is the reason this class exists. It takes a String of Noteworthy data and converts it into HTML. If the string sent in is null or blank it is returned unaltered.

Parameters:
pNoteworthy - The Noteworthy-formatted data String.
Returns:
An HTMLTag containing the proper markup.

reset

public void reset()
Reset the parser to its initial state. All settings return to default values.