public class TextSearch
extends java.lang.Object
TextSearch
searches through a PDF document for a user-given search pattern.
The current implementation supports both verbatim search and the search using
regular expressions, whose detailed syntax can be found at:
TextSearch
also provides users with several useful search modes and extra
information besides the found string that matches the pattern. TextSearch
can
either keep running until a matched string is found or be set to return
periodically in order for the caller to perform any necessary updates (e.g.,
UI updates). It is also worth mentioning that the search modes can be changed
on the fly while searching through a document.
Possible use case scenarios for TextSearch include:
Highlights
class for details)
from files for external use.
Note: Since hyphens ('-') are frequently used in PDF documents to
concatenate the two broken pieces of a word at the end of a line, for example
"TextSearch is powerful for finding patterns in PDF files; yes, it is really
pow- erful."
a search for "powerful" should return both instances. However, not all
end-of-line hyphens are hyphens added to connect a broken word; some of them
could be "real" hyphens. In addition, an input search pattern may also
contain hyphens that complicate the situation. To tackle this problem, the
following conventions are adopted:
A sample use case is coded below: (for a full sample, please take a look at the TextSearch sample project):
// ... Initialize PDFNet ... PDFDoc doc = new PDFDoc(filein); doc.initSecurityHandler(); int mode = TextSearch.e_whole_word | TextSearch.e_page_stop; UString pattern( "joHn sMiTh" ); TextSearch txt_search = new TextSearch(); //PDFDoc doesn't allow simultaneous access from different threads. If this //document could be used from other threads (e.g., the rendering thread inside //PDFView/PDFViewCtrl, if used), it is good practice to lock it. //Notice: don't forget to call doc.Unlock() to avoid deadlock. doc.lock(); txt_search.begin( doc, pattern, mode, -1, -1 ); while ( true ) { TextSearchResult result = txt_search.run(); if ( result.getCode() == TextSearchResult.e_found ) { System.out.println("found one instance: " + result.getResultStr()); } else { break; } } //unlock the document to avoid deadlock. doc.unLock();
Modifier and Type | Field and Description |
---|---|
static int |
e_ambient_string
Tells the search process to compute the ambient string of the found
pattern.
|
static int |
e_case_sensitive
Match case-sensitively.
|
static int |
e_highlight
Tells the search process to compute Highlight information.
|
static int |
e_page_stop
Tells the search process to return when each page is finished; this is
useful when a user needs Run() to return periodically so that certain
things (e.g., UI) can be updated from time to time.
|
static int |
e_reg_expression
Use regular expressions.
|
static int |
e_search_up
Search upward (from the end of the file and from the bottom of a page).
|
static int |
e_whole_word
Match the entire word.
|
Constructor and Description |
---|
TextSearch()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
boolean |
begin(PDFDoc doc,
java.lang.String pattern,
int mode,
int start_page,
int end_page)
Initialize for the search process.
|
void |
destroy()
Frees the native memory of the object.
|
int |
getCurrentPage()
Retrieve the number of the current page that is searched in.
|
int |
getMode()
Retrieve the current search mode.
|
TextSearchResult |
run()
Search the document.
|
void |
setMode(int mode)
Set the current search mode.
|
boolean |
setPattern(java.lang.String pattern)
Set the current search pattern.
|
void |
setRightToLeftLanguage(boolean flag)
Tells TextSearch that the document reads from right to left.
|
public static final int e_reg_expression
public static final int e_case_sensitive
public static final int e_whole_word
public static final int e_search_up
public static final int e_page_stop
public static final int e_highlight
public static final int e_ambient_string
public void destroy()
public boolean begin(PDFDoc doc, java.lang.String pattern, int mode, int start_page, int end_page)
doc
- the PDF document to search in.pattern
- the pattern to search for. When regular expression is used, it
contains the expression, and in verbatim mode, it is the exact
string to search for.mode
- the mode of the search process.start_page
- the start page of the page range to search in. -1 indicates
the range starts from the first page.end_page
- the end page of the page range to search in. -1 indicates the
range ends at the last page.true
if the initialization has succeeded.public TextSearchResult run()
public boolean setPattern(java.lang.String pattern)
begin()
method. This method is provided for users to change
the search pattern while searching through a document.pattern
- the search pattern to set.true
if the setting has succeeded.public int getMode()
public void setMode(int mode)
TextSearch ts = new TextSearch();
int mode = ts.getMode();
mode |= TextSearch.e_reg_expression;
ts.setMode(mode);
mode
- the search mode to set.public int getCurrentPage()
begin()
is not called yet); if the returned value is
0, it indicates the search process has finished, and if the returned
value is positive, it is a valid page number.public void setRightToLeftLanguage(boolean flag)