Reports

The application offers two kinds of reports, those that appear in the interface, and those that are computed and go straight to a file for further analysis.

Counting Words

Perhaps the simplest report shows the frequencies of each word type in the current document in a dialog. To create this report use is Reports>Count Words>Current Document.

To get the same information for multiple documents, select the documents you are interested in and use Reports>Count Words>Multiple Documents. You will be prompted for a file, which will for resource reasons be limited to UTF-8 encoded CSV (a widely readable spreadsheet format). The resulting file has document names as rows and word types as columns.

Be aware that these files can get big and can be difficult to deal with. If you are serious about counting lots of words, a more efficient method and representation is recommended, e.g. the one used by JFreq.

Applying the dictionary to documents

To apply the dictionary to the current document only use Reports>Apply Dictionary>Current Document. The results are shown in a dialog box.

If you are only interested in applying one part of the dictionary, select a category entry of interest before you apply the dictionary to the current document. Then only matches to that category and the ones below it will be computed. If the top level category is selected, a complete analysis will be computed.

If you want the same information for a larger number of documents, select the documents and the appropriate dictionary category and choose a file for the results. For this report, it is possible to choose either UTF-8 encoded CSV or MS Excel as a format. (If you have hundreds of categories, stick with the former.)

Comparing documents

It is possible to do statistical comparisons of two documents using 'relative risk', a quantity used in epidemiology. In this report the dictionary, or a selected subset of its entries, is applied to two documents using Reports>Apply Dictionary>Compare Document Pair. The report computes the relative probability of seeing each category in each document, controlling for their document lengths.

This is best seen with an example. Assume a dictionary containing the category 'Gun control' and two documents 'Lib' containing 1200 words, of which 32 match patterns in 'Gun control' and 'Cons' containing 900 words, of which 10 match patterns in 'Gun control'. The estimated probability of seeing a 'Gun control' match in 'Lib' is 32/1200 which is about 0.027. For 'Cons' it is about 0.012. The ratio is about 2.4 so 'Lib' uses this category about 1.4 times, or 140% more often than 'Cons'.

The report also computes a confidence interval around this ratio, marking 95% confidence intervals that do not include 1, meaning that 'Gun control' is used at an equal rate in both documents, with a single asterisk. (In the example above, the confidence interval would exclude 1).

Applying the dictionary to concordances

It is also possible to treat concordances as documents and apply the entire dictionary to them. This can be used to characterise the content of local contexts of matches. To apply the dictionary to a concordance created with Concordance>Make Concordance, use Reports>Apply Dictionary>Current Concordance. The left and right hand sides of the concordance are treated like a single document and the report appears in a dialog.

If you want to do this analysis for a larger number of documents, then select the documents you are interested in, and the dictionary entry you want to use to define the concordance, and then use Reports>Apply Dictionary>Multiple Concordances. You will be prompted for a file to hold the results, which may be either UTF-8 encoded CSV or MS Excel. In the results, each row is labeled by the document whose text was used to generate the concordance, and each column is the name of a category in the dictionary. All categories are used.

Duplicate Patterns

The duplicate pattern report generated by Reports>Duplicate Report scans the dictionary and reports all patterns that occur in more than one category along with the categories they appear in. This can be useful to ensure minimal levels of double counting in reports.