cox.jmatt.java.MathTools
Class MathStat

java.lang.Object
  extended by cox.jmatt.java.MathTools.MathStat

public class MathStat
extends java.lang.Object

This class gives MathTools its statistical component. In addition to static calculation methods MathStat also acts as an immutable container for the raw data.

As a container MathStat can hold up to two data sets represented as double[] arrays. These are, respectively, the primary or first data set and the secondary or second data set. Many of the methods imply X- and Y-coordinates for these data sets but that is NOT the case. Methods and calculations requiring X and Y values do interpret them as such but others do not.

If a MathStat is created around two data sets there must be two valid data sets defined. If the first data set in the two-argument constructor is null or empty and the second is not, the single data set defined becomes the primary data set and the empty is ignored.


Field Summary
static java.lang.String FORMAT_DEBUG
          This format template is used for debugging and development.
 
Constructor Summary
MathStat()
          Standard scripting constructor.
MathStat(double[] pAry)
          Construct a MathStat instance around a single data set (array).
MathStat(double[] pXData, double[] pYData)
          Construct a MathStat instance around two data sets.
MathStat(java.lang.String pXData, java.lang.String pYData)
          Construct a MathStat around up to two Strings of space-separated numbers.
 
Method Summary
 double correlation()
          This method calculates the correlation between the two data sets.
 int count()
          Return the number of values in the first data set.
 java.lang.String data_String(double[] pAry, java.lang.String pSep)
          Instance version.
static java.lang.String dataString(double[] pAry, java.lang.String pSep)
          This method converts a double[] array into a separated String.
 java.lang.String format(java.lang.String pTemplate, java.lang.String pSepX, java.lang.String pSepY)
          This is MathStat's power-formatting method.
 Question format(java.lang.String pProb, java.lang.String pAns, java.lang.String pSepX, java.lang.String pSepY)
          This format() method accepts two templates and returns a Question.
 MathStat from_Points(java.lang.String pPoints)
          Instance version.
static MathStat fromPoints(java.lang.String pPoints)
          This method creates a MathStat from a collection of points.
 boolean isValidArray(double[] pAry)
          Instance version of validArray().
 double mean()
          Return the simple mean of the first data set.
 MathStat newInstance(double[] pXVals, double[] pYVals)
          Create a new MathStat given up to two data sets.
 MathStat newInstance(java.lang.String pXVals, java.lang.String pYVals)
          Create a new MathStat given up to two space-separated String data sets.
 MathStat newLinear(int pNum, double pX1, double pY1, double pX2, double pY2, int pDelta, boolean pFloat)
          Instance method to generate a linear regression MathStat.
static MathStat newLinearInstance(int pNum, double pX1, double pY1, double pX2, double pY2, int pDelta, boolean pFloat)
          Use this method to create regression problems that should have a fairly high correlation.
 MathStat newRandom(int pNum, double pLow, double pHigh, boolean pFloat)
          Convenience instance method for a random single-set MathStat.
 MathStat newRandom(int pNum, double pLow, double pHigh, boolean pFloat, int pYNum, double pYLow, double pYHigh, boolean pYFloat)
          Instance method to generate a MathStat with random values, two data sets.
static MathStat newRandomInstance(int pNum, double pLow, double pHigh, boolean pFloat, int pYNum, double pYLow, double pYHigh, boolean pYFloat)
          Static method to create a MathStat with random values.
 double pDev()
          Calculate the population standard deviation of the first data set.
 double pVar()
          Calculate the population variance of the first data set.
 double regressionIntercept()
          Linear regression: calculate the y-intercept of the regression line.
 java.lang.String regressionLine()
          Linear regression: calculate and print the slope and y-intercept of the regression line in the form Y = mX + b.
 double regressionSlope()
          Linear regression: calculate the slope of the line of best fit between the points in the data set.
 double sDev()
          Calculate the sample standard deviation of the first data set.
static double simpleMean(double[] pAry)
          Find the simple arithmetic mean of a double[] array.
static double[] sparse(java.lang.String pVals)
          This method splits a String of space-separated doubles and parses them into an array.
 double[] sParse(java.lang.String pVals)
          Instance version of sparse().
 double sVar()
          Calculate the sample variance of the first data set.
 java.util.Map toMap(java.lang.String pPrefx, java.lang.String pSepX, java.lang.String pSepY)
          Create and return a java.util.Map<String, String> of this data.
 java.lang.String toString()
          Return a constructor-like representation of this instance.
 java.lang.String toXML(java.lang.String pID)
          Return a simple XML representation of this class: an empty MathStat tag with a 'data=' and 'data2=' attributes.
static boolean validArray(double[] pAry)
          Check to make sure the supplied array is not null or empty.
static double variance(double[] pAry, boolean pSample)
          This method calculates the variance of the numbers in the array sent in.
 int yCount()
          Return the number of values in the second data set.
 double yMean()
          Calculate the simple mean of the second data set.
 double ypDev()
          Calculate the population standard deviation of the second data set.
 double ypVar()
          Calculate the population variance of the second data set.
 double ysDev()
          Calculate the sample standard deviation of the second data set.
 double ysVar()
          Calculate the sample variance of the second data set.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

FORMAT_DEBUG

public static final java.lang.String FORMAT_DEBUG
This format template is used for debugging and development. NOT an offical format!

See Also:
Constant Field Values
Constructor Detail

MathStat

public MathStat()
Standard scripting constructor.


MathStat

public MathStat(double[] pAry)
Construct a MathStat instance around a single data set (array).

Parameters:
pAry - The double[] array containing data values.

MathStat

public MathStat(double[] pXData,
                double[] pYData)
Construct a MathStat instance around two data sets. If pXData is null or empty and pYData is not, pYData becomes the primary (first) data set and the second data set is ignored.

Parameters:
pXData - The double[] array containing the first data set.
pYData - The array containing the second data set.

MathStat

public MathStat(java.lang.String pXData,
                java.lang.String pYData)
Construct a MathStat around up to two Strings of space-separated numbers.

Parameters:
pXData - The String containing the first data set.
pYData - The String containing the second data set.
Method Detail

newInstance

public MathStat newInstance(double[] pXVals,
                            double[] pYVals)
Create a new MathStat given up to two data sets.

Parameters:
pXVals - The first data set.
pYVals - The second data set.
Returns:
A new MathStat containing the given data.

newInstance

public MathStat newInstance(java.lang.String pXVals,
                            java.lang.String pYVals)
Create a new MathStat given up to two space-separated String data sets.

Parameters:
pXVals - The first data set as a String of space-separated numbers.
pYVals - The second data set as a String.
Returns:
A new MathStat containing the given data.

fromPoints

public static MathStat fromPoints(java.lang.String pPoints)

This method creates a MathStat from a collection of points. The instance automatically has both data sets defined because the input data is in the form of points.

The input String must be of the form 'x0,y0 x1,y1 x2,y2 ...' where x# goes to the first data set and y# goes to the second. It is critical that there be NO SPACES between the numbers and commas and single spaces between the pairs. This method works by splitting the input first on spaces and the individual elements by comma. The actual parsing is error-protected so one bad number won't choke the entire process but it WILL corrupt the data.

When parsing the values are parsed in order. If the first item can't be converted both it and the second will become zero. If the first item is parsed as double successfully but the second fails, the first number will be accurate but the second will be zero. Any values absent from the resulting arrays are automatically replaced with zeros.

Any parse errors are reported at Debug level. If the input String is null or empty an empty MathStat is returned.

Parameters:
pPoints - The String containing space-separated point-pairs consisting of two comma-separated numbers.

from_Points

public MathStat from_Points(java.lang.String pPoints)
Instance version. See static method for details.


newRandomInstance

public static MathStat newRandomInstance(int pNum,
                                         double pLow,
                                         double pHigh,
                                         boolean pFloat,
                                         int pYNum,
                                         double pYLow,
                                         double pYHigh,
                                         boolean pYFloat)
Static method to create a MathStat with random values. If 'pYNum' is zero no second set is created. The '_Float' arguments are used for floating-point (double) values.

Parameters:
pNum - Number of values in the first data set.
pLow - The minimum value for the first data set (inclusive).
pHigh - The maximum for the first data set (exclusive).
pFloat - true for decimals, false to force integer (as double).
pYNum - Number of values in the first data set. Zero for no second data set.
pYLow - The minimum value for the first data set (inclusive).
pYHigh - The maximum for the first data set (exclusive).
pYFloat - true to allow fractions false for int.

newRandom

public MathStat newRandom(int pNum,
                          double pLow,
                          double pHigh,
                          boolean pFloat,
                          int pYNum,
                          double pYLow,
                          double pYHigh,
                          boolean pYFloat)
Instance method to generate a MathStat with random values, two data sets. Parameters are per newRandomInstance().


newRandom

public MathStat newRandom(int pNum,
                          double pLow,
                          double pHigh,
                          boolean pFloat)
Convenience instance method for a random single-set MathStat.


newLinearInstance

public static MathStat newLinearInstance(int pNum,
                                         double pX1,
                                         double pY1,
                                         double pX2,
                                         double pY2,
                                         int pDelta,
                                         boolean pFloat)

Use this method to create regression problems that should have a fairly high correlation. It generates 'pNum' points around the line from (pX1, pY1) to (pX2, pY2). The 'pDelta' parameter generates a random percentage by which the generated y-values vary from the actual line. The actual y-value is the calculated y-value plus or minus (pDelta/100). The x-values can be forced to be int but the y-values cannot.

Parameters:
pNum - The number of data points in the set.
pX1 - The x-coordinate of the first endpoint.
pY1 - The y-coordinate of the first endpoint.
pX2 - The x-coordinate of the second endpoint.
pY2 - The y-coordinate of the second endpoint.
pDelta - The percent by which the actual y-values vary. If less than 2 it becomes 2.
pFloat - 'true' to allow floating-point values, false for int only.

newLinear

public MathStat newLinear(int pNum,
                          double pX1,
                          double pY1,
                          double pX2,
                          double pY2,
                          int pDelta,
                          boolean pFloat)
Instance method to generate a linear regression MathStat.


validArray

public static final boolean validArray(double[] pAry)
Check to make sure the supplied array is not null or empty.

Parameters:
pAry - The double[] array to check.
Returns:
true if the array is defined and at least length 1, false otherwise.

isValidArray

public final boolean isValidArray(double[] pAry)
Instance version of validArray().


simpleMean

public static double simpleMean(double[] pAry)
Find the simple arithmetic mean of a double[] array. If pAry is null or length zero the return value is zero.

Parameters:
pAry - The array of values to average.

variance

public static double variance(double[] pAry,
                              boolean pSample)
This method calculates the variance of the numbers in the array sent in. If the array is null or its length is less than 2 the return value is zero. The variance is calculated as sum((pAry[i] - mean)^2)/N where N is pAry.length or, if pSample is true, N is pAry.length - 1.

Parameters:
pAry - The numbers whose variance is to be calculated.
pSample - true for sample variance, false for population variance.
Returns:
The sample or population variance as calculated above.

dataString

public static java.lang.String dataString(double[] pAry,
                                          java.lang.String pSep)
This method converts a double[] array into a separated String. If the array is null or empty a blank but non-null String is returned. The separator String can be specified. If null or blank a single space is used. The separator is not appended after the last value.

Parameters:
pAry - The array to stringify.
pSep - The separator to put between the elements.

data_String

public java.lang.String data_String(double[] pAry,
                                    java.lang.String pSep)
Instance version.


sparse

public static double[] sparse(java.lang.String pVals)
This method splits a String of space-separated doubles and parses them into an array. If any of the resulting Strings does not parse properly it is replaced with 0.0 and the process continues. Any parse errors are reported at the Debug level. If the String sent in is null or empty an empty array is returned. The name is from 'split-parse'.

Parameters:
pVals - A String of space-separated numbers parsed in as doubles.
Returns:
A double[] array consisting of the parsed values.

sParse

public double[] sParse(java.lang.String pVals)
Instance version of sparse(). NOTE: The instance method capitalizes the 'P'!


count

public int count()
Return the number of values in the first data set.


yCount

public int yCount()
Return the number of values in the second data set.


mean

public double mean()
Return the simple mean of the first data set.


yMean

public double yMean()
Calculate the simple mean of the second data set.


pVar

public double pVar()
Calculate the population variance of the first data set.


ypVar

public double ypVar()
Calculate the population variance of the second data set.


sVar

public double sVar()
Calculate the sample variance of the first data set.


ysVar

public double ysVar()
Calculate the sample variance of the second data set.


pDev

public double pDev()
Calculate the population standard deviation of the first data set.


ypDev

public double ypDev()
Calculate the population standard deviation of the second data set.


sDev

public double sDev()
Calculate the sample standard deviation of the first data set.


ysDev

public double ysDev()
Calculate the sample standard deviation of the second data set.


correlation

public double correlation()

This method calculates the correlation between the two data sets. It only works if:

  1. Both data sets are defined.
  2. Both data sets are the same size.
  3. Both data sets have more than 2 values.

If either of these conditions is false the return value is zero.


regressionSlope

public double regressionSlope()
Linear regression: calculate the slope of the line of best fit between the points in the data set. Restrictions are per correlation() and the return value for bad data is zero.


regressionIntercept

public double regressionIntercept()
Linear regression: calculate the y-intercept of the regression line. This method does call regressionSlope() if necessary. If no regression is possible the return value is zero.


regressionLine

public java.lang.String regressionLine()
Linear regression: calculate and print the slope and y-intercept of the regression line in the form Y = mX + b.


toString

public java.lang.String toString()
Return a constructor-like representation of this instance.

Overrides:
toString in class java.lang.Object

toXML

public java.lang.String toXML(java.lang.String pID)
Return a simple XML representation of this class: an empty MathStat tag with a 'data=' and 'data2=' attributes. If there is no secondary data set the second attribute will not be present.

Parameters:
pID - The value of the tag's 'id=' attribute. Ignored if null or empty.
Returns:
A simple XML representation of the data in this class.

format

public java.lang.String format(java.lang.String pTemplate,
                               java.lang.String pSepX,
                               java.lang.String pSepY)

This is MathStat's power-formatting method. Every single piece of data, raw or calculated, is available. The formatting process itself uses Question.fillTemplate() with the following tokens:

Tokens 16 and 17 use, respectively, pSepX and pSepY as separators when concatenating the data values. If pTemplate is null or empty an empty String is returned.

Parameters:
pTemplate - The template String for Question.fillTemplate().
pSepX - The separator String to use when concatenating the first data set.
pSepY - The separator for the second data set.

format

public Question format(java.lang.String pProb,
                       java.lang.String pAns,
                       java.lang.String pSepX,
                       java.lang.String pSepY)
This format() method accepts two templates and returns a Question. The first template is for the Problem component and the second is for the Answer. The pSepX and pSepY Strings are per format(String, String, String).

Parameters:
pProb - The Problem template String.
pAns - The Answer template String.
pSepX - The first data set separator.
pSepY - The second data set separator.
Returns:
A Question formed by sending the problem and answer templates through the other format() method.

toMap

public java.util.Map toMap(java.lang.String pPrefx,
                           java.lang.String pSepX,
                           java.lang.String pSepY)
Create and return a java.util.Map<String, String> of this data. The values entered are per the replacement tokens in format() and the keys are the method names with the following exceptions: the data Strings are 'dsX' and 'dsY' respectively. A prefix may be added to each key but is not required. If something catastrophic happens it is reported at the Debug level and the returned Map will not be reliable.

Parameters:
pPrefx - The prefix to add to the key String. Ignored if null.
pSepX - The separator for the first data set.
pSepY - The separator String for the second data set.