Public Member Functions |
| GTokenizer (const char *szFilename) |
| Opens the specified filename.
|
| GTokenizer (const char *pFile, size_t len) |
| Uses the provided buffer of data. (If len is 0, then it will read until a null-terminator is found.)
|
| ~GTokenizer () |
GCharSet & | charSet (const char *szChars) |
| Returns a GCharSet. Many of the methods in this class require a GCharSet as a parameter. You get it by calling this method. szChars is an un-ordered set of characters (with no separator between them). The only special character is '-', which is used to indicate a range of characters if it is not the first character in the string. (So, if you want '-' in your set of characters, it should come first.) For example, the following string includes all letters: "a-zA-Z", and the following string includes all characters that might appear in a floating-point number: "-.,0-9e". (There is no way to include '\0' as a character in the set, since that character indicates the end of the string, but that is okay since '\0' should not occur in text files anyway, and this class is designed for parsing text files.)
|
char | peek () |
| Returns the next character in the stream. Returns '\0' if there are no more characters in the stream. (This could theoretically be ambiguous if the the next character in the stream is '\0', but presumably this class is mostly used for parsing text files, and that character should not occur in a text file.)
|
char * | nextUntil (GCharSet &delimeters, size_t minLen=1) |
| Reads until the next character would be one of the specified delimeters. The delimeter character is not read. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.
|
char * | nextUntilNotEscaped (char escapeChar, GCharSet &delimeters) |
| Reads until the next character would be one of the specified delimeters, and the current character is not escapeChar. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.
|
char * | nextWhile (GCharSet &set, size_t minLen=1) |
| Reads while the character is one of the specified characters. Throws an exception if fewer than minLen characters are read. The token returned by this method will have been copied into an internal buffer, null-terminated, and a pointer to that buffer is returned.
|
char * | nextArg (GCharSet &delimiters, char escapeChar= '\\') |
| Returns the next token defined by the given delimiters.
|
void | skip (GCharSet &delimeters) |
| Reads past any characters specified in the list of delimeters. If szDelimeters is NULL, then any characters <= ' ' are considered to be delimeters. (This method is similar to nextWhile, except that it does not buffer the characters it reads.)
|
void | skipTo (GCharSet &delimeters) |
| Skip until the next character is one of the delimeters. (This method is the same as nextUntil, except that it does not buffer what it reads.)
|
void | advance (size_t n) |
| Advances past the next 'n' characters. (Stops if the end-of-file is reached.)
|
void | expect (const char *szString) |
| Reads past the specified string of characters. If the characters that are read from the file do not exactly match those in the string, an exception is thrown.
|
char * | trim (GCharSet &set) |
| Returns the previously-returned token, except with any of the specified characters trimmed off of both the beginning and end of the token. For example, if the last token that was returned was " tok ", then this will return "tok". (Calling this method will not change the value returned by tokenLength.)
|
size_t | line () |
| Returns the current line number. (Begins at 1. Each time a '
' is encountered, the line number is incremented. Mac line-endings do not increment the line number.)
|
size_t | col () |
| Returns the current column index, which is the number of characters that have been read since the last newline character, plus 1.
|
size_t | remaining () |
| Returns the number of remaining bytes to be read from the file.
|
size_t | tokenLength () |
| Returns the length of the last token that was returned.
|
Protected Member Functions |
void | growBuf () |
char | get () |
void | bufferChar (char c) |
Protected Attributes |
GHeap * | m_pHeap |
std::map< const char
*, GCharSet
*, GTokenizerMapComparer > | m_charGroups |
char * | m_pBufStart |
char * | m_pBufPos |
char * | m_pBufEnd |
std::istream * | m_pStream |
size_t | m_lineStart |
size_t | m_len |
size_t | m_line |
This is a simple tokenizer that reads a file, one token at-a-time.
char* GClasses::GTokenizer::nextArg |
( |
GCharSet & |
delimiters, |
|
|
char |
escapeChar = '\\' |
|
) |
| |
Returns the next token defined by the given delimiters.
Allows quoting " or ' and escapes with an escape character.
Returns the next token delimited by the given delimiters. (The default delimiters are white-space or {).
The token may include delimiters if it is enclosed in quotes or the delimiters are escaped.
If the next token begins with single or double quotes, then the token will be delimited by the quotes. If a newline character or the end-of-file is encountered before the matching quote, then an exception is thrown. The quotation marks are not included in the token, but they are consumed by the operation. The escape character is ignored inside quotes - unlike what would happen in C++.
If the first character of the token is not a quotation mark, then the escape character is used. If an escape character preceeds any character, then it is included in the token. The escape character is consumed but not included in the token. Thus, if the input is (The \rain\ in "spain") (not including the parentheses) and the esapeChar is '\', then the token read will be (The \ in "spain").
No token may extend over multiple lines, thus the new-line character acts as an unescapable delimiter, no matter what set of delimiters is passed to the function..
- Parameters:
-
delimiters | the set of delimiters used to separate tokens |
escapeChar | the character that can be used to escape delimiters when quoting is not active |
- Returns:
- a pointer to an internal character buffer containing the null-terminated token