Functions and Data Defintion in TEXminer 1.0


TEXminer allows to analyze Texts in Unicode Format. Save your Text in Unicode/UTF8 Format to get all characters correctly.
The Text Database can be saved in XML where the orginal Text, the Sentence and Word Lists and additional Parameters (e.g. Abbreviations) are stored.
The Functionality will be enlarged in the next months.

Overview




Quick Start

The use of the program is very easy using the Text Database samples (Installation at the end of this document): TEXminer has Abbreviations Lists for the following Languages (in the Data Dictionary, create there your own List): The Analysis Functionality provides the following Topics:

To Overview


Building a Text Database

A Text Database consists of a Sentence List and of a Word List. You can load more than one Text an build a common Text Database. The Sentence List and the Word List each have a Column which indicates the Text ID. The Word list counts duplicated Word Forms.
To build a Text Database for one Text:

To Overview


Additional Parameters

The most important Additional Parameters are the Abbreviations Lists. They are needed to be able to segment the Sentences correctly.
In these Lists only Abbreviations ending with a Full Stop are needed, so if such an Abbreviation is detected in your Texts the Sentence won't be ended. Before creating your own Lists or extending the existant ones, have a look into one of the provided List, they are just Unicode Files using a line for each Abbreviation (the following are the Abbreviations beginning with a/A in the English Abbreviations List, alphabetically Order gives a better Overview):

To Overview


Storing and Loading a Text Database

A Text Database can be saved in a TEXminer-specific XML Document, so after Retrieval you can just start to analyze (Analysis Results are not stored).
Storage and Retrieval of a Text Database is very easy:

To Overview


Analyzing Texts

The Analysis of the Text is the main aim which will be extended in the future. Here is a List of the Analysis Functionality: Now all Functions in Detail:

Searching a Word

...

To Overview


Adaption and Configuration

So far no Adaption or Configuration neede.

To Overview


Implemented Functions

This table gives an overview of the programmed Functions in the VB.NET Source Code:


NameParametersCallbackExplanation
initTEXdatasetsvar As IntegerBooleanInitialization of Text-Dataset List
initTEXabbreviationsvar As IntegerBooleanInitialization of Text-Abbreviations List
initTEXdatabasevar As IntegerBooleanInitialization of Text-Database Lists
minOfTwofirst As Long, second As LongLongMinimum of two Values
maxOfTwofirst As Long, second As LongLongMaximum of two Values
readSerialXMLfSerial As StringBooleanLoad from XML Serialisation
saveSerialXMLfSerial As StringBooleanSave to XML Serialisation
readASCIIASCIIfile As StringBooleanread Unicode/UTF8 File
readAbbreviationsfSerial As StringBooleanread Language-specific Abbreviations
saveSortedWlistHTMLfile As StringBooleansave HTML File
displayTextDatasetsvar As IntegerBooleandisplay all Text-Datasets in TabView
buildTextDatabasevar As IntegerLongbuild Text-Database
segmentToSentencesdsIndex As IntegerLongsegment Text-Dataset dsIndex into Sentences
checkAbbreviationtstSentence As StringBooleancheck if Sentence ends with an Abbreviation
checkNumbertstSentence As StringBooleancheck if Sentence ends with a Number
segmentToWordsdsIndex As IntegerLongsegment Text-Dataset dsIndex into Words
refineWordrawWord As StringStringrefine raw Word (trimming)
testWordInDBtstWord As StringLongtest if Word is in Database
displayTextDatabasevar As IntegerBooleandisplay Text-Database in TabViews
getWordDSindexwordIndex As LongLonggive Dataset Index for Word Index
searchWordInSentststWord As String, sentsDSind As IntegerLongsearch Word in Sentences
displayMarkedSentencevar As IntegerBooleandisplay Marked Sentences (e.g. by Word Search)

To Overview


Installation and Start

Requesites

Installation

Start

To Overview



State: Beta Version 1.0 / Nov 2012 by gearwheelsoft