iconEuler Reference

Statistics with Euler

Statistical functions.

Euler has a lot of routines to generate random numbers, like the builtin functions random and normal. Moreover, Euler has builtin routines for distributions.

This file provides more distributions, random numbers, and tests.

Distributions

function erf (x:real)

  Gau? normal distribution with standard deviation 1/2
  
  This is the result of integrate(exp(-t^2),t,0,x)*2/sqrt(pi). The
  "erf" function is defined in the literature, and can easily be
  computed from the "normaldis" function of Euler.
  
  See: 
normaldis (Euler Core),
normaldis (Statistics with Euler)
function binsum (i:natural, n:natural, p:number)

  Probability of getting i or less hits in n runs.
  
  Returns the cumulative sum of the binomial distribution with
  probability p. I.e., if we have n independent random throws, with a
  probability of p for a hit, the functions computes the probability
  to get i or less hits.
  
  For large n and i, use "normalsum".
  
  See: 
normalsum (Statistics with Euler),
invbinsum (Statistics with Euler)
function normaldis (x : real, mean : real = 0, dev : real = 1)

  Probability of being smaller than x with the normal distribution.
function invnormaldis (p : real, mean : real = 0, dev : real = 1)

  Probability of being smaller than x with the normal distribution.
function normalsum (i:natural, n:natural, p:number)

  Probability of getting i or less hits in n runs.
  
  Works like binsum, but is much faster for large n and medium p.
  
  See: 
binsum (Statistics with Euler)
function invbinsum (x:number, n:natural, p:number)

  Computes i, such that binsum(i,n,p) is just larger than x.
  
  This is the inverse function to binsum. We now the probability for
  i hits in n random events, and want to know i.
  
function hypergeomsum (i:natural, n:natural, itot:natural, ntot:natural)

  Hypergemotric sum.
  
  Return the probability of hitting i or less black balls if n are
  chosen out of ntot, and there are a total of itot black balls (and
  ntot-itot white balls, of course).
function gammarestr (x)

  Special Gamma function, works only for 2x natural
  look into gamma.e for the general gamma function
function qnormal (x, m=0, d=1)

  Returns the density m-d-normal distribution
  
  This is the density function the Gauss normal distribution with
  mean m and standard deviation 1.
  
  See: 
normaldis (Euler Core),
normaldis (Statistics with Euler),
erf (Statistics with Euler),
erf (Maxima Documentation)
function qtdis (t:real, n:nonnegative integer)

  Density of the student t distribution
function qchidis (x, n)

  density of the chi distribution
function qfdis (x,n,m)

The following distributions are based on Julia code by John D. Cook.

function randmatrix (n:index, m:index, f:string)

  Apply the random generator to generate a matrix.
function randuniform (n : index, m : index, a : number, b : number)

  return a uniform random sample from the interval (a, b)
function randnormal (n : index, m : index,
    mean : number = 0, stdev : nonnegative number = 1)

  return a random sample from a normal (Gaussian) distribution
function randexponential (n : index, m : index,
    mean : positive number = 1)

  return a random matrix from an exponential distribution
function randgamma (n : index, m : index,
    shape : nonnegative number, scale : nonnegative number)

  return a random matrix from a gamma distribution
  
  Implementation based on "A Simple Method for Generating Gamma Variables"
  by George Marsaglia and Wai Wan Tsang.
  ACM Transactions on Mathematical Software
  Vol 26, No 3, September 2000, pages 363-372.
function randchi (n : index, m : index, dof : index)

  return a random matrix from a chi square distribution
  with the specified degrees of freedom
function randinversegamma (n : index, m : index,
    shape : positive number, scale : positive number)

  return a random matrix from an inverse gamma random variable
function randweibull (n : index, m : index,
    shape : positive number, scale : positive number)

  return a random matrix from a Weibull distribution
function randcauchy (n : index, m : index,
    median : number, scale : positive number)

  return a random matrix from a Cauchy distribution
function randt (n : index, m : index,
    dof : positive integer)

  return a random sample from a Student t distribution
  
  See Seminumerical Algorithms by Knuth
function randlaplace (n : index, m : index,
    mean : number, scale : positive number)

  return a random sample from a Laplace distribution
  The Laplace distribution is also known as the double exponential distribution.
function randlognormal (n : index, m : index,
    mu : number, sigma : positive number)

  return a random sample from a log-normal distribution
function randbeta (n : index, m : index,
    a: positive number, b : positive number)

  return a random sample from a beta distribution
  
  There are more efficient methods for generating beta samples.
  However such methods are little more efficient and much more complicated.

Descriptive Functions

function meandev (x:numerical, v=none)

  Return the mean value and the standard deviation of [x1,...,xn]
  
  An optional additional parameter v contains the multiplicities of
  x. m=mean(x) will assign the mean value only! If x is a matrix the
  function works on each row.
  
  x : data (1xm or nxm)
  v : multiplicities (1xn or nxm)
  
  See: 
mean (Maxima Documentation)
function mean (x:numerical, v:real vector=none)

  Returns the mean value of x.
  
  An optional additional parameter contains multiplicities.
  
  See: 
meandev (Statistics with Euler),
median (Statistics with Euler),
median (Maxima Documentation)
function dev (x:numerical, v:real vector=none)

  Returns the standard deviation of x
  
  An additional parameter may contain multiplicities.
  
  See: 
meandev (Statistics with Euler)
function median (x, v=none, p=0.5)

  Returns a value such that p of the x[i] are less equal.
  
  v are optional multiplicities for the values. If x is a matrix, the
  function works on all rows of x.
  
  x : data (1xm or nxm)
  v : multiplicites (1xm or nxm)
  p : desired percentage
  
  See: 
mean (Statistics with Euler),
mean (Maxima Documentation)
function pfold (v: real vector, w: real vector)

  Returns the distribution of the sum of two distributions
  
  v[i], w[i] contain the probabilities that each random variable is
  equal to i-1. result[i] contains the probability that the sum of
  the two random variables is i-1.
  
  See: 
fold (Euler Core),
fftfold (Numerical Algorithms)
function quantiles (x,q:vector=[0,0.25,0.5,0.75,1])

  Return the given quantiles for each row of x.
  
  See: 
boxplot (Statistics with Euler),
boxplot (Maxima Documentation)
function covarmatrix (x:real)

  Emprical covariance matrix of the rows of x
function covar (x:real vector, y:real vector)

  Empirical covariance of x and y
function correlmatrix (x:real)

  Correlation matrix of the rows of x
function correl (x:real vector, y:real vector)

  Correlation of x and y
function ranks (x)

  Ranks of the elements of x in x.
  
  With multiplicities the rank is the mean rank of the equal
  elements.
  
  Works for reals, real vectors, or string vectors x.
function rankcorrel (x:real vector, y:real vector)

  Correlation of x and y
function getfrequencies (x:real vector, r: real vector)

  Count the number of x in the intervals of the sorted vector r.
  
  The function returns the number of x[j] in the intervals r[i-1] to
  r[i].
  
  x : real row vector (1xn)
  r : real sorted row vector (1xm)
  
  Returns the frequencies f as a row vector (1x(m-1))
  
  See: 
count (Euler Core),
histo (Statistics with Euler)
function getmultiplicities (x, y, sorted=0)

  Counts how often the elements of x appear in y.
  
  This works for string vectors and for real vectors.
  
  sorted : if true, then y is assumed to be sorted.
  
  See: 
count (Euler Core),
getfrequencies (Statistics with Euler)
function getstatistics (x:real vector, y:real vector=none)

  Return a statics of the values in the vector x.
  
  If y is none, the function returns {xu,mu}, where xu are the
  unique elements of x, and mu are the multiplicities of these
  values.
  
  Else the function returns {xu,yu,m}, where xu are the unique
  elements of x, yu the unique elements of y, and M is a table of
  multiplicities of pairs (xu[i],yu[j]) in (x[k],y[k]), k=1...n.
function empdist (x:real vector, vsorted:real vector)

  Compute the empirical distribution
  
  Needs a sorted vector of empirical values.
function histo (d:real vector, n:index=10,
    integer:integer=0, even:integer=0, v:real vector=none,
    bar=1)

  Computes {x,y} for histogram plots.
  
  d is an 1xm vector.
  
  x - End points of the intervals (equispaced n+1 points)
  y - The number of data in the subintervals (frequencies)
  
  integer=1 - for distributions on integer
  even=1 - better for evenly spaced discrete distributions
  To be used for plot2d and the bar style.
  
  v : intervall boundaries (ordered).
  
  bar : It true the functin returns two vectors for >bar in plot2d.
  If false it returns a sawtooth function for plot2d.
  
  The plot functions plot2d has parameters distribution=1, histgram=1
  to achieve the same effect.
  
  See: 
plot2d (Plot Functions plot2d and plot3d),
plot2d (Maxima Documentation)

Tests

function chitest (x:real vector, y:real vector)

  Perform a xhi^2 test, if x is the expected frequency y
  
  This functions test an observed frequency x against an expected
  frequency y. E.g., if 40 men are found sampling 100 persons, then
  [40,60] has to be tested against [50,50]. The result of the test is
  0.049, which means that the sample does not obey the expected
  frequency with an error less than 5%.
  
  To get frequencies of data from the data, use "getfrequencies",
  "count", or "histo".
  
  x,y : two real row vectors (1xn)
  
  Returns the error level for the hypothesis that the observed
  frequency x comes from another distribution as the expected
  frequency y.
  
  See: 
getfrequencies (Statistics with Euler),
count (Euler Core),
histo (Statistics with Euler)
function testnormal (r:real vector, n:integer, v:real vector, ..
    m:number, d:number)

  Test an observed frequency for normal distribution.
  
  Test the number of data v[i] in the ranges r[i],r[i+1] against the
  normal distribution with mean m and deviation d, using the chi^2
  method.
  
  r : ranges (sorted 1xm vector)
  n : total number of data
  v : number of data in the ranges (1x(m-1) vector)
  m : expected mean value
  d : expected deviation
  
  Return the error we get, if we reject the normal distribution.
function ttest (m:number, d:real scalar, n:natural, mu:number)

  T student test
  
  Test, if the measured mean m with measured deviation d of n data
  comes from a distribution with mean value mu.
  
  m : mean value of data
  d : standard deviation of data
  n : number of data
  mu : mean value to test for
  
  Returns the error alpha, if we reject that the data come from a
  distribution with mean mu.
function tcompare (m1:number, d1:number, n1:natural, ..
    m2:number, d2:number, n2:natural)

  Test, if two measured data agree in mean.
  
  The data must be normally distributed. Returns the error you have,
  if you reject this.
  
  m1,m2 : means of the data
  d1,d2 : standard deviation of the data
  n1,n2 : number of data
  
  Returns the error alpha, if we reject that the data come from a
  distribution with the same expected mean.
function tcomparedata (x:real vector, y:real vector)

  Compare x and y for same mean
  
  Calls "tcompare" to compare the two observations for the same mean.
  
  Returns the error alpha, if we reject that the data come from a
  distribution with the same expected mean.
  
function tabletest (A:real)

  Chi^2-Test the results a[i,j] for independence of the rows from the columns.
  
  The table test test for indepence of the rows of the tables
  from the column. E.g., if some items are observed [40,50] times
  for men, and [50,30] times for woman, we can ask, if the
  observations depend on the gender. In this case we can reject
  independece with 1.8% error level.
  
  This test should only be used for large table entries.
  
  Return the error you have, if you reject independence.
function expectedtable (A:real)

function contingency (A:real, correct=1)

  Contigency Coefficent of a matrix A.
  
  If the coefficient is close to 0, we tend to say that the rows and
  the colums are independent.
  
  correct : Correct the coefficient, so that it is between 0 and 1
function varanalysis 

  varanalysis(x1,x2,x3,...) test for same mean.
  
  Test the data sets for the same mean, assuming normal distributed
  data sets.
  
  Returns the error we have, if we reject same mean.
function mediantest (a:real vector, b:real vector)

  Median test for equal mean.
  
  Test the two distributions a and b on equal mean value. For this,
  both distributions are checked on exceeding the median of the
  cumulative distribution.
  
  Returns the error we have, if we reject that a and b can have the
  same mean.
function ranktest (a:real vector, b:real vector, eps=epsilon())

  The Mann-Whitney test tests a and b on same distribution
  
  Return the error we have, if we reject the same distribution.
function signtest (a:real vector, b:real vector)

  Test, if the expected mean of a is not better than b
  
  Assume a(i) and b(i) are results of a treatment. Then we can ask,
  if a is better than b.
  
  a,b : row vectors of same size
  
  Return the error we have, if we decide that a is better
  than b.
function wilcoxon (a:real vector, b:real vector, eps=sqrt(epsilon()))

  Test, if the expected mean of a is not better than b
  
  This is a sharper test for the same problem as in "signtest".
  
  Returns the error you have, if you decide that a is better
  than b.
  
  See: 
signtest (Statistics with Euler)

Statistical Plots

function boxplot (x:real, lab=none, style="0#",
    textcolor=red)

  Summary of the quantiles in x in graphical form.
  
  Use quantiles to compute the quantiles with default q equal to
  [0,0.25,0.5,0.75,1].
  
  style : If present, it is used as fill style, the default is "O"
  
  See: 
quantiles (Statistics with Euler),
barstyle (Euler Core)
function columnsplot (x:vector, lab=none,
    style="O#", color=green, textcolor=red)

  Plot the elements of x as columns.
  
  s : a string vector with one label for each element of x.
function dataplot (x:real, y:real, style="[]w", color=1)

  Plot the data (x,y) with point and line plots.
  
  x : real row vector
  y : real row vector or matrix (one row for each data).
  style : a style or a vector of styles
  color : a color or a vector of colors
  
  You can use a vector of styles and a vector of colors. These
  vectors must contain as many elements as there are rows of y.
  
  See: 
statplot (Statistics with Euler)
function piechart (x:real vector, style="0#",
    color=green, lab:string vector=none, r=1.5, textcolor=red)

  plot the data x in a pie chart.
  
  x : the vector of data
  color : a color or a vector of colors (same length as x)
  style : a style or a vector of styles
  lab : a vector of labels (same length as x)
  r : The piechart has radius 1. To leave space use r=1.5.
function starplot (v, style="/", color=green, lab:string=none,
    rays:integer=0, pstyle="[]w", textcolor=red, r=1.5)

  A star like plot with a filled star or with rays and dots only
function logimpulseplot (x, y=none, style="O#", color=green, d=0.1)

  Logarithmic impulse plot of y.
function columnsplot3d (z:real, srows=none, scols=none,
    angle=30°, height=40°, zoom=2.5, distance=5,
    crows:vector=none, ccols:vector=none)

  Plot 3D columns from the matrix z.
  
  This function shows a 3D plot of columns with heights z[i,j] in
  a rectangular array. z can be any real nxm matrix.
  
  z : the values to be displayed
  srows : labels for the rows
  scols : labels for the columns
  crows : colors of the rows
  ccols : colors of the columns (alternatively)
function mosaicplot (z: real, srows=none, scols=none,
    textcolor=red, color=green, style="O#")

  Moasaic plot of the data in z.
  
  z : matrix with values
  srows, scols : label strings for the rows and columns (string
  vectors)
  color : a color or a vector of colors for the columns of the plot.
  style : a style or a vector of styles.
  
  For an example see the introduction to statistics.
function scatterplots (M:real, lab=none,
    ticks=1, grid=4, style="..")

  Plot all rows of M against all rows of M.
  
  The labels are shown in the diagonal of the plot.
  
  lab : labels for the rows.
function statplot (x, y=none, plottype="b",
    pstyle="[]w", lstyle="-", fstyle="O#",
    xl="", yl="")

  Plots x against y.
  
  This is a simple form of using plot2d with point, line or bar
  options.
  
  The available plotplottypes are
  
  'p' : point plot
  'l' : line plot
  'b' : both
  'h' : histgram plot
  's' : surface plot
  
  pstyle, lstyle, fstyle : Styles for the points, lines and bars
  
  See: 
style (Euler Core),
style (Maxima Documentation)

Data Tables

function writetable (x:real,
    fixed:integer=0, wc:index=10, dc:nonnegative integer=2,
    labc=none, labr=none, wlabr=none, lablength=1,
    NA=".", NAval=NAN,
    ctok:index=none, tok:string vector=none,
    file=none)

  Write a table x of statical values
  
  Write a table with labels for the columns and rows and formats for
  each row. A typical table looks like this
  
  A     B     C
  G   1.02     2     f
  H   3.05     5     m
  
  Each number in the table can be translated into a token string.
  This translation can be set with a global variable tok (string
  vector) wich applies to all columns with indices in ctok (index
  vctor). Or it can be set in each column with an assigned variable
  ctok? (string vector), where ? is the number of the column.
  
  See the intreoduction to statistics for an example.
  
  wc : default width for all columns or vector of widths.
  dc : default decimal precission for all columns or vector of
  precissions.
  fixed : use fixed number of decimal digits (boolean or vector of
  boolean).
  
  labc : labels for the columns (string or real vector)
  lablength : increase the width of the columns, if labels are wider.
  labr : labels for the rows (string or real vector)
  
  NA, NAval : Token string and value to represent "Not Available". By
  default "." and NAN is used.
  
  See: 
readtable (Statistics with Euler)
function printfile (filename:string, lines:index=100)

  Print the first lines of a file
  
  See: 
open (Euler Core),
close (Euler Core),
close (Maxima Documentation)
function readtable (filename:string, clabs=1, rlabs=0,
    NA=".", NAval=NAN,
    ctok:index=none, tokens=[none])

  Read a table from a file.
  
  The table can have a header line (clabs=1) and row labels
  (rlabs=1). The entries of the table can be numbers (with decimal
  dots) or strings. In case of strings, these tokens are translated
  to unique numbers. The tranlation can either be set for each
  column separately in string vectors with names tok1, tok2 etc.,
  or for the complete table in the tokens parameter.
  
  The the tokens are collected from the columns with indices in the
  ctok vector. If a column has a tok? parameter, tokens are not
  collected automatically from that column.
  
  "Not Available" can be represented by a special string. The
  default is ".". In the numerical table, it is represented by
  default as NAN. If you do not like this, simply let NAN be
  represented by any other string and translate ti into a numerical
  token.
  
  See the introduction for statistics for an example.
  
  clabs : The table has a line with headings
  rlabs : Each line has a heading label.
  NA, MAval : Sets the string and the returned value for NA (not
  available).
  ctok : Indices of the columns, where tokens are to be collected.
  tok1=..., tok2=... : Individual string arays for columns.
  
  Returns {table, heading string, token strings, rowlabel strings}
  
  See: 
writetable (Statistics with Euler)
function tablecol (M:real, j:nonnegative vector, NAval=NAN)

  The non-NAN values in the columns j of the table M.
  
  To access a table column, you could simply use M[,j], where j is a
  row vector of indices or a single index. But this function skips
  any NAN values in any of the columns j. It returns the columns
  as rows (transposed) and the indices of the rows.
  
  NANval : The value that should be treated as "Not Available"
  
  Returns {colums as rows, indices of non-NAN rows}
function selectrows (M:real, j:index, v:real vector, NAval=NAN)

  Select the rows indices i with M[i,j] in v and not-NAN.
function sortedrows (M:real, j:nonnegative integer vector)

  Index of rows for sorted table with respect to columns in j
  
  The table gets sorted in lexicographic order.
  
  Returns : {sorted table, index of sorted values}

Functions with Extended Accuracy

function xtdis (x,n)

  Returns the t-distribution with n degrees at x
  
  More accurate than tdis, using a Romberg integration.
  
  See: 
tdis (Euler Core)
function xinvtdis (y,n)

  Inverse of xtdis
  
  See: 
xtdis (Statistics with Euler),
invtdis (Euler Core)
function xchidis (x:real, n:natural)

  Returns the chi^2 distribution with n degrees
  
  More accurate than chidis, using a Romberg integration.
  
  See: 
chidis (Euler Core)
function xinvchidis (y:nonnegative real, n:natural)

  Returns the inverse of xchidis
  
  See: 
xchidis (Statistics with Euler)
function xfdis (x:nonnegative real, n:natural, m:natural)

  Returns the F distribution with n and m degrees at x
  
  More accurate than fdis, using a Romberg integration. The
  integration is sometimes unstable.
  
  See: 
fdis (Euler Core)
function xinvfdis (y,n,m)

  Returns the inverse of xfdis
  
  See: 
xfdis (Statistics with Euler)
function xbetai (t:nonnegative real, a:real, b:real)

  Incomplete beta function
  
  A more accurate version of betai, using Romberg integration.
  
  See: 
betai (Euler Core),
betai (Mathematical Functions)
function cfnormaldis (x:real, eps=1e-15, interval=0, limit=3)

  Normal distribution with continued fractions
  
  Using Continued Fractions (by H. Vogel following 26.2.15 Abramowitz-Stegun).
  
  I:=Integral{x,oo} Z(u) du mit Z=exp(-u^2/2)/sqrt(2*pi).
  eps = relative Schachtelungsbreite der Kettenbrüche K
  
  Bei interval<>0 Intervall-Ausgabe.
function xnormaldis (x : real,
    mean : real = 0, dev : real = 1)

  Probability of being smaller than x with the normal distribution.
  
  Alias to normaldis for |x|<5, else uses cfnormaldis.
  
  See: 
normaldis (Euler Core)
function xinvnormaldis (p : real, mean : real = 0, dev : real = 1)

  Probability of being smaller than x with the normal distribution.
  
  Alias to invnormaldis for p>1e-7, else uses bisectoin and
  xnormaldis.
  
  See: 
invnormaldis (Euler Core)

Documentation Homepage