Statistical functions.
Euler has a lot of routines to generate random numbers, like the builtin functions random and normal. Moreover, Euler has builtin routines for distributions.
This file provides more distributions, random numbers, and tests.
function erf (x:real) Gau? normal distribution with standard deviation 1/2 This is the result of integrate(exp(-t^2),t,0,x)*2/sqrt(pi). The "erf" function is defined in the literature, and can easily be computed from the "normaldis" function of Euler. See:
normaldis (Euler Core),
normaldis (Statistics with Euler)
function binsum (i:natural, n:natural, p:number) Probability of getting i or less hits in n runs. Returns the cumulative sum of the binomial distribution with probability p. I.e., if we have n independent random throws, with a probability of p for a hit, the functions computes the probability to get i or less hits. For large n and i, use "normalsum". See:
normalsum (Statistics with Euler),
invbinsum (Statistics with Euler)
function normaldis (x : real, mean : real = 0, dev : real = 1) Probability of being smaller than x with the normal distribution.
function invnormaldis (p : real, mean : real = 0, dev : real = 1) Probability of being smaller than x with the normal distribution.
function normalsum (i:natural, n:natural, p:number) Probability of getting i or less hits in n runs. Works like binsum, but is much faster for large n and medium p. See:
binsum (Statistics with Euler)
function invbinsum (x:number, n:natural, p:number) Computes i, such that binsum(i,n,p) is just larger than x. This is the inverse function to binsum. We now the probability for i hits in n random events, and want to know i.
function hypergeomsum (i:natural, n:natural, itot:natural, ntot:natural) Hypergemotric sum. Return the probability of hitting i or less black balls if n are chosen out of ntot, and there are a total of itot black balls (and ntot-itot white balls, of course).
function gammarestr (x) Special Gamma function, works only for 2x natural look into gamma.e for the general gamma function
function qnormal (x, m=0, d=1) Returns the density m-d-normal distribution This is the density function the Gauss normal distribution with mean m and standard deviation 1. See:
normaldis (Euler Core),
normaldis (Statistics with Euler),
erf (Statistics with Euler),
erf (Maxima Documentation)
function qtdis (t:real, n:nonnegative integer) Density of the student t distribution
function qchidis (x, n) density of the chi distribution
function qfdis (x,n,m)
The following distributions are based on Julia code by John D. Cook.
function randmatrix (n:index, m:index, f:string) Apply the random generator to generate a matrix.
function randuniform (n : index, m : index, a : number, b : number) return a uniform random sample from the interval (a, b)
function randnormal (n : index, m : index, mean : number = 0, stdev : nonnegative number = 1) return a random sample from a normal (Gaussian) distribution
function randexponential (n : index, m : index, mean : positive number = 1) return a random matrix from an exponential distribution
function randgamma (n : index, m : index, shape : nonnegative number, scale : nonnegative number) return a random matrix from a gamma distribution Implementation based on "A Simple Method for Generating Gamma Variables" by George Marsaglia and Wai Wan Tsang. ACM Transactions on Mathematical Software Vol 26, No 3, September 2000, pages 363-372.
function randchi (n : index, m : index, dof : index) return a random matrix from a chi square distribution with the specified degrees of freedom
function randinversegamma (n : index, m : index, shape : positive number, scale : positive number) return a random matrix from an inverse gamma random variable
function randweibull (n : index, m : index, shape : positive number, scale : positive number) return a random matrix from a Weibull distribution
function randcauchy (n : index, m : index, median : number, scale : positive number) return a random matrix from a Cauchy distribution
function randt (n : index, m : index, dof : positive integer) return a random sample from a Student t distribution See Seminumerical Algorithms by Knuth
function randlaplace (n : index, m : index, mean : number, scale : positive number) return a random sample from a Laplace distribution The Laplace distribution is also known as the double exponential distribution.
function randlognormal (n : index, m : index, mu : number, sigma : positive number) return a random sample from a log-normal distribution
function randbeta (n : index, m : index, a: positive number, b : positive number) return a random sample from a beta distribution There are more efficient methods for generating beta samples. However such methods are little more efficient and much more complicated.
function meandev (x:numerical, v=none) Return the mean value and the standard deviation of [x1,...,xn] An optional additional parameter v contains the multiplicities of x. m=mean(x) will assign the mean value only! If x is a matrix the function works on each row. x : data (1xm or nxm) v : multiplicities (1xn or nxm) See:
mean (Maxima Documentation)
function mean (x:numerical, v:real vector=none) Returns the mean value of x. An optional additional parameter contains multiplicities. See:
meandev (Statistics with Euler),
median (Statistics with Euler),
median (Maxima Documentation)
function dev (x:numerical, v:real vector=none) Returns the standard deviation of x An additional parameter may contain multiplicities. See:
meandev (Statistics with Euler)
function median (x, v=none, p=0.5) Returns a value such that p of the x[i] are less equal. v are optional multiplicities for the values. If x is a matrix, the function works on all rows of x. x : data (1xm or nxm) v : multiplicites (1xm or nxm) p : desired percentage See:
mean (Statistics with Euler),
mean (Maxima Documentation)
function pfold (v: real vector, w: real vector) Returns the distribution of the sum of two distributions v[i], w[i] contain the probabilities that each random variable is equal to i-1. result[i] contains the probability that the sum of the two random variables is i-1. See:
fold (Euler Core),
fftfold (Numerical Algorithms)
function quantiles (x,q:vector=[0,0.25,0.5,0.75,1]) Return the given quantiles for each row of x. See:
boxplot (Statistics with Euler),
boxplot (Maxima Documentation)
function covarmatrix (x:real) Emprical covariance matrix of the rows of x
function covar (x:real vector, y:real vector) Empirical covariance of x and y
function correlmatrix (x:real) Correlation matrix of the rows of x
function correl (x:real vector, y:real vector) Correlation of x and y
function ranks (x) Ranks of the elements of x in x. With multiplicities the rank is the mean rank of the equal elements. Works for reals, real vectors, or string vectors x.
function rankcorrel (x:real vector, y:real vector) Correlation of x and y
function getfrequencies (x:real vector, r: real vector) Count the number of x in the intervals of the sorted vector r. The function returns the number of x[j] in the intervals r[i-1] to r[i]. x : real row vector (1xn) r : real sorted row vector (1xm) Returns the frequencies f as a row vector (1x(m-1)) See:
count (Euler Core),
histo (Statistics with Euler)
function getmultiplicities (x, y, sorted=0) Counts how often the elements of x appear in y. This works for string vectors and for real vectors. sorted : if true, then y is assumed to be sorted. See:
count (Euler Core),
getfrequencies (Statistics with Euler)
function getstatistics (x:real vector, y:real vector=none) Return a statics of the values in the vector x. If y is none, the function returns {xu,mu}, where xu are the unique elements of x, and mu are the multiplicities of these values. Else the function returns {xu,yu,m}, where xu are the unique elements of x, yu the unique elements of y, and M is a table of multiplicities of pairs (xu[i],yu[j]) in (x[k],y[k]), k=1...n.
function empdist (x:real vector, vsorted:real vector) Compute the empirical distribution Needs a sorted vector of empirical values.
function histo (d:real vector, n:index=10, integer:integer=0, even:integer=0, v:real vector=none, bar=1) Computes {x,y} for histogram plots. d is an 1xm vector. x - End points of the intervals (equispaced n+1 points) y - The number of data in the subintervals (frequencies) integer=1 - for distributions on integer even=1 - better for evenly spaced discrete distributions To be used for plot2d and the bar style. v : intervall boundaries (ordered). bar : It true the functin returns two vectors for >bar in plot2d. If false it returns a sawtooth function for plot2d. The plot functions plot2d has parameters distribution=1, histgram=1 to achieve the same effect. See:
plot2d (Plot Functions plot2d and plot3d),
plot2d (Maxima Documentation)
function chitest (x:real vector, y:real vector) Perform a xhi^2 test, if x is the expected frequency y This functions test an observed frequency x against an expected frequency y. E.g., if 40 men are found sampling 100 persons, then [40,60] has to be tested against [50,50]. The result of the test is 0.049, which means that the sample does not obey the expected frequency with an error less than 5%. To get frequencies of data from the data, use "getfrequencies", "count", or "histo". x,y : two real row vectors (1xn) Returns the error level for the hypothesis that the observed frequency x comes from another distribution as the expected frequency y. See:
getfrequencies (Statistics with Euler),
count (Euler Core),
histo (Statistics with Euler)
function testnormal (r:real vector, n:integer, v:real vector, .. m:number, d:number) Test an observed frequency for normal distribution. Test the number of data v[i] in the ranges r[i],r[i+1] against the normal distribution with mean m and deviation d, using the chi^2 method. r : ranges (sorted 1xm vector) n : total number of data v : number of data in the ranges (1x(m-1) vector) m : expected mean value d : expected deviation Return the error we get, if we reject the normal distribution.
function ttest (m:number, d:real scalar, n:natural, mu:number) T student test Test, if the measured mean m with measured deviation d of n data comes from a distribution with mean value mu. m : mean value of data d : standard deviation of data n : number of data mu : mean value to test for Returns the error alpha, if we reject that the data come from a distribution with mean mu.
function tcompare (m1:number, d1:number, n1:natural, .. m2:number, d2:number, n2:natural) Test, if two measured data agree in mean. The data must be normally distributed. Returns the error you have, if you reject this. m1,m2 : means of the data d1,d2 : standard deviation of the data n1,n2 : number of data Returns the error alpha, if we reject that the data come from a distribution with the same expected mean.
function tcomparedata (x:real vector, y:real vector) Compare x and y for same mean Calls "tcompare" to compare the two observations for the same mean. Returns the error alpha, if we reject that the data come from a distribution with the same expected mean.
function tabletest (A:real) Chi^2-Test the results a[i,j] for independence of the rows from the columns. The table test test for indepence of the rows of the tables from the column. E.g., if some items are observed [40,50] times for men, and [50,30] times for woman, we can ask, if the observations depend on the gender. In this case we can reject independece with 1.8% error level. This test should only be used for large table entries. Return the error you have, if you reject independence.
function expectedtable (A:real)
function contingency (A:real, correct=1) Contigency Coefficent of a matrix A. If the coefficient is close to 0, we tend to say that the rows and the colums are independent. correct : Correct the coefficient, so that it is between 0 and 1
function varanalysis varanalysis(x1,x2,x3,...) test for same mean. Test the data sets for the same mean, assuming normal distributed data sets. Returns the error we have, if we reject same mean.
function mediantest (a:real vector, b:real vector) Median test for equal mean. Test the two distributions a and b on equal mean value. For this, both distributions are checked on exceeding the median of the cumulative distribution. Returns the error we have, if we reject that a and b can have the same mean.
function ranktest (a:real vector, b:real vector, eps=epsilon()) The Mann-Whitney test tests a and b on same distribution Return the error we have, if we reject the same distribution.
function signtest (a:real vector, b:real vector) Test, if the expected mean of a is not better than b Assume a(i) and b(i) are results of a treatment. Then we can ask, if a is better than b. a,b : row vectors of same size Return the error we have, if we decide that a is better than b.
function wilcoxon (a:real vector, b:real vector, eps=sqrt(epsilon())) Test, if the expected mean of a is not better than b This is a sharper test for the same problem as in "signtest". Returns the error you have, if you decide that a is better than b. See:
signtest (Statistics with Euler)
function boxplot (x:real, lab=none, style="0#", textcolor=red) Summary of the quantiles in x in graphical form. Use quantiles to compute the quantiles with default q equal to [0,0.25,0.5,0.75,1]. style : If present, it is used as fill style, the default is "O" See:
quantiles (Statistics with Euler),
barstyle (Euler Core)
function columnsplot (x:vector, lab=none, style="O#", color=green, textcolor=red) Plot the elements of x as columns. s : a string vector with one label for each element of x.
function dataplot (x:real, y:real, style="[]w", color=1) Plot the data (x,y) with point and line plots. x : real row vector y : real row vector or matrix (one row for each data). style : a style or a vector of styles color : a color or a vector of colors You can use a vector of styles and a vector of colors. These vectors must contain as many elements as there are rows of y. See:
statplot (Statistics with Euler)
function piechart (x:real vector, style="0#", color=green, lab:string vector=none, r=1.5, textcolor=red) plot the data x in a pie chart. x : the vector of data color : a color or a vector of colors (same length as x) style : a style or a vector of styles lab : a vector of labels (same length as x) r : The piechart has radius 1. To leave space use r=1.5.
function starplot (v, style="/", color=green, lab:string=none, rays:integer=0, pstyle="[]w", textcolor=red, r=1.5) A star like plot with a filled star or with rays and dots only
function logimpulseplot (x, y=none, style="O#", color=green, d=0.1) Logarithmic impulse plot of y.
function columnsplot3d (z:real, srows=none, scols=none, angle=30°, height=40°, zoom=2.5, distance=5, crows:vector=none, ccols:vector=none) Plot 3D columns from the matrix z. This function shows a 3D plot of columns with heights z[i,j] in a rectangular array. z can be any real nxm matrix. z : the values to be displayed srows : labels for the rows scols : labels for the columns crows : colors of the rows ccols : colors of the columns (alternatively)
function mosaicplot (z: real, srows=none, scols=none, textcolor=red, color=green, style="O#") Moasaic plot of the data in z. z : matrix with values srows, scols : label strings for the rows and columns (string vectors) color : a color or a vector of colors for the columns of the plot. style : a style or a vector of styles. For an example see the introduction to statistics.
function scatterplots (M:real, lab=none, ticks=1, grid=4, style="..") Plot all rows of M against all rows of M. The labels are shown in the diagonal of the plot. lab : labels for the rows.
function statplot (x, y=none, plottype="b", pstyle="[]w", lstyle="-", fstyle="O#", xl="", yl="") Plots x against y. This is a simple form of using plot2d with point, line or bar options. The available plotplottypes are 'p' : point plot 'l' : line plot 'b' : both 'h' : histgram plot 's' : surface plot pstyle, lstyle, fstyle : Styles for the points, lines and bars See:
style (Euler Core),
style (Maxima Documentation)
function writetable (x:real, fixed:integer=0, wc:index=10, dc:nonnegative integer=2, labc=none, labr=none, wlabr=none, lablength=1, NA=".", NAval=NAN, ctok:index=none, tok:string vector=none, file=none) Write a table x of statical values Write a table with labels for the columns and rows and formats for each row. A typical table looks like this A B C G 1.02 2 f H 3.05 5 m Each number in the table can be translated into a token string. This translation can be set with a global variable tok (string vector) wich applies to all columns with indices in ctok (index vctor). Or it can be set in each column with an assigned variable ctok? (string vector), where ? is the number of the column. See the intreoduction to statistics for an example. wc : default width for all columns or vector of widths. dc : default decimal precission for all columns or vector of precissions. fixed : use fixed number of decimal digits (boolean or vector of boolean). labc : labels for the columns (string or real vector) lablength : increase the width of the columns, if labels are wider. labr : labels for the rows (string or real vector) NA, NAval : Token string and value to represent "Not Available". By default "." and NAN is used. See:
readtable (Statistics with Euler)
function printfile (filename:string, lines:index=100) Print the first lines of a file See:
open (Euler Core),
close (Euler Core),
close (Maxima Documentation)
function readtable (filename:string, clabs=1, rlabs=0, NA=".", NAval=NAN, ctok:index=none, tokens=[none]) Read a table from a file. The table can have a header line (clabs=1) and row labels (rlabs=1). The entries of the table can be numbers (with decimal dots) or strings. In case of strings, these tokens are translated to unique numbers. The tranlation can either be set for each column separately in string vectors with names tok1, tok2 etc., or for the complete table in the tokens parameter. The the tokens are collected from the columns with indices in the ctok vector. If a column has a tok? parameter, tokens are not collected automatically from that column. "Not Available" can be represented by a special string. The default is ".". In the numerical table, it is represented by default as NAN. If you do not like this, simply let NAN be represented by any other string and translate ti into a numerical token. See the introduction for statistics for an example. clabs : The table has a line with headings rlabs : Each line has a heading label. NA, MAval : Sets the string and the returned value for NA (not available). ctok : Indices of the columns, where tokens are to be collected. tok1=..., tok2=... : Individual string arays for columns. Returns {table, heading string, token strings, rowlabel strings} See:
writetable (Statistics with Euler)
function tablecol (M:real, j:nonnegative vector, NAval=NAN) The non-NAN values in the columns j of the table M. To access a table column, you could simply use M[,j], where j is a row vector of indices or a single index. But this function skips any NAN values in any of the columns j. It returns the columns as rows (transposed) and the indices of the rows. NANval : The value that should be treated as "Not Available" Returns {colums as rows, indices of non-NAN rows}
function selectrows (M:real, j:index, v:real vector, NAval=NAN) Select the rows indices i with M[i,j] in v and not-NAN.
function sortedrows (M:real, j:nonnegative integer vector) Index of rows for sorted table with respect to columns in j The table gets sorted in lexicographic order. Returns : {sorted table, index of sorted values}
function xtdis (x,n) Returns the t-distribution with n degrees at x More accurate than tdis, using a Romberg integration. See:
tdis (Euler Core)
function xinvtdis (y,n) Inverse of xtdis See:
xtdis (Statistics with Euler),
invtdis (Euler Core)
function xchidis (x:real, n:natural) Returns the chi^2 distribution with n degrees More accurate than chidis, using a Romberg integration. See:
chidis (Euler Core)
function xinvchidis (y:nonnegative real, n:natural) Returns the inverse of xchidis See:
xchidis (Statistics with Euler)
function xfdis (x:nonnegative real, n:natural, m:natural) Returns the F distribution with n and m degrees at x More accurate than fdis, using a Romberg integration. The integration is sometimes unstable. See:
fdis (Euler Core)
function xinvfdis (y,n,m) Returns the inverse of xfdis See:
xfdis (Statistics with Euler)
function xbetai (t:nonnegative real, a:real, b:real) Incomplete beta function A more accurate version of betai, using Romberg integration. See:
betai (Euler Core),
betai (Mathematical Functions)
function cfnormaldis (x:real, eps=1e-15, interval=0, limit=3) Normal distribution with continued fractions Using Continued Fractions (by H. Vogel following 26.2.15 Abramowitz-Stegun). I:=Integral{x,oo} Z(u) du mit Z=exp(-u^2/2)/sqrt(2*pi). eps = relative Schachtelungsbreite der Kettenbrüche K Bei interval<>0 Intervall-Ausgabe.
function xnormaldis (x : real, mean : real = 0, dev : real = 1) Probability of being smaller than x with the normal distribution. Alias to normaldis for |x|<5, else uses cfnormaldis. See:
normaldis (Euler Core)
function xinvnormaldis (p : real, mean : real = 0, dev : real = 1) Probability of being smaller than x with the normal distribution. Alias to invnormaldis for p>1e-7, else uses bisectoin and xnormaldis. See:
invnormaldis (Euler Core)