NONLINEAR NETWORKS FOR CLASSIFICATION:
--------------------------------------                 

This is a readme file for training data files included in the NuClass 7.04 package.

Location of Training files:
---------------------------

A few training files are included in the NuClass 7.04 package. It can be found in the folder "Training Files", under the NuClass7 directory. These training data files are already in the standard format and are ready to be used with NuClass 7.04.

Obtaining new Training Files:
-----------------------------

The Image Processing and Neural Networks Lab (IPNNL) website has free source codes that can be used to generate training files. Use them to generate training files with desired patterns, inputs and outputs. Available at:

	http://www-ee.uta.edu/eeweb/IP/training_data_files.htm

Below is a fast reference table for the training files provided with NuClass7 folowed by a detailed description for each file.


TRAINING FILE NAME	INPUTS		CLASSES		PATTERNS
-----------------------------------------------------------------

1. GRNG.TRN 		  16		   4		  800

2. GONGTRN.TRA		  16		   10		  3000

3. COMF18.TRA		  18		   4		  12,392


Description:
------------

1. GRNG.TRN : (16 Inputs, Class Id, 800 Training Patterns, 196K) 

The geometric shape recognition data file consists of four geometric shapes, ellipse, triangle, quadrilateral, and pentagon. Each shape consists of a matrix of size 64*64. For each shape, 200 training patterns were generated using different degrees of deformation. The deformations included rotation, scaling, translation, and oblique distortions. The feature set is ring-wedge energy (RNG), and has 16 features.

For more information on the data file, see 

H.C. Yau, M.T.Manry, "Iterative Improvement of a Nearest Neighbor Classifier", Neural Networks, Vol. 4, pp. 517-524, 1991 

2. GONGTRN.TRA: ( 16 Inputs, Class Id, 3000 Training Patterns, 780K) 

The raw data consists of images from handprinted numerals collected from 3,000 people by the Internal Revenue Service. We randomly chose 300 characters from each class to generate 3,000 character training data. Images are 32 by 24 binary matrices. An image scaling algorithm is used to remove size variation in characters. The feature set contains 16 elements. The 10 classes correspond to 10 arabic numerals.

For more details concerning the features, see 

W. Gong, H.C. Yau, and M.T. Manry, "Non-Gaussian Feature Analyses Using a Neural Network," Progress in Neural Networks, vol. 2, 1994, pp. 253-269. 

A testing version GONGTST is also available (780K) for download. 

3.  COMF18.TRA : ( 18 Inputs, Class Id, 12,392 Training Patterns, 3.8M) 

The training data file is generated  segmented images. Each segmented region is separately histogram equalized to 20 levels. Then the joint probability density of pairs of pixels separated by a given distance and a given direction is estimated. We use 0, 90, 180, 270 degrees for the directions and 1, 3, and 5 pixels for the separations. The density estimates are computed for each classification window. For each separation, the co-occurrences for for the four directions are folded together to form a triangular matrix. From each of the resulting three matrices, six features are computed: angular second moment, contrast, entropy, correlation, and the sums of the main diagonal and the first off diagonal. This results in 18 features for each classification window.

For more details concerning the features, see 

R.R. Bailey, E.J. Pettit, R.T. Borochoff, M.T. Manry, and X. Jiang, "Automatic Recognition of USGS Land Use/Cover Categories Using Statistical and Neural Network Classifiers," Proceedings of SPIE OE/Aerospace and Remote Sensing, April 12-16, 1993, Orlando Florida. 

Four regions of land use/cover types were identified in the images per Level I of the U.S.Geological Survey Land Use/Land Cover Classification System : urban areas, fields or open grassy land, trees (forested land), and water ( lakes or rivers).