iconEuler Examples

Benford's Law

It is a well known fact and it has been observed in many real life
data that the first digit of the data is not deistributed evenly among
all possible first digits 1 to 9. Rather the observed distribution for
the digit i is

Benfords Law

This is called Benford's law.

To test this law, I stole stock data from an internet site, and listed
the daily trade volume of their stock into a file.

>v=readmatrix("volumes.dat")';
Here are the first 100 values.
>v[1:200]
[ 544157  544157  156075  11862  265267  1714072  479429  1683372  22709
130265  42516  41107  6995  149473  191975  62920  42462  39557  63524
16511  865995  48299  69100  109735  86305  84685  237742  122700
153005  137054  73998  18846  55636  68715  23939  16670  149444
129363  290037  6582  187479  8488  26133  81853  26185  25048  80988
213087  881591  3185266  117500  86400  113609  18719  1054616
17776487  17118920  650646  1477220  270910  456247  67000  355126
489772  916030  1859  95172  118442  125030  172168  101264  265043
93265  588767  3664726  2108483  147272  452850  2510235  129672
599131  1170620  43360  3534076  2150107  261405  10500  41615
1016690  127458  226756  101700  2743500  274591  291125  14340  59780
178830  1210820  128018  856704  378086  664557  405999  828502
335948  576086  604670  134000  83000  654082  812007  639507  3248932
268568  3936231  9152155  8028348  1836699  221005  386657  1571592
68764  58385  896267  30670  5052200  93478  3662146  279015  5650
726680  136385  106236  490220  304752  1522164  1956736  1972160
7333688  542685  949031  336541  317255  155078  396670  48240  486741
414835  248002  238455  83740  207946  155559  284307  118769  32362
89306  278064  40171  283982  3245563  1647659  415221  48291  166822
385596  107569  266620  235617  3128708  120512  82565  235055  477742
145540  139831  226975  1003805  621854  820300  234268  556422
1528513  207940  185463  30038  142184  368815  140950  459223  140783
52595  43519  108971  233437  1516469  517530  117927  2463840 ]
To find the first digit of a number in its decimal representation, we
can use the following function.
>function fd(x) ...
n=floor(log10(x));
return floor(x/10^n)
endfunction
The expected distribution is the following.
>d=log10(2:10)-log10(1:9)
[ 0.301029995664  0.176091259056  0.124938736608  0.0969100130081
0.0791812460476  0.0669467896306  0.0579919469777  0.0511525224474
0.0457574905607 ]
We plot the distribution of the first digits in our data versus the
expected distribution.
>plot2d(fd(v),distribution=9,even=true);  ...
>  plot2d(1:9,d,points=true,add=true);  ...
>  plot2d(1:9,d,add=true); ...
>  insimg;

Benfords Law

It is obvious that Benford's law is a better description of the
reality than the assumption of even distribution.

Let us perform a statistical test to check the distribution. We first
need the frequencies of the first digits in v.
>frequ=count(fd(v)-1,9)
[ 976  530  350  267  195  178  175  152  124 ]
With even distrubtion, we expect about 327 cases for each digit.
>nf=sum(frequ), fe=nf/9
2947
327.444444444
The chi-square test fails, of course.
>chitest(frequ,ones(1,9)*fe)
0
However, the chi-test fails also for our expected distribution, though
not as badly as before.
>chitest(frequ,d*nf)
0.00887031819185
One could argue, that the distribution of the values is exponential.
But this is not really the case. A logarithmic distribution of the
random variable X would be

Benfords Law

If we compute the logarithms for a cumulative frequency count of the
values y, we see, that it is by no means linear.
>{x,y}=histo(v,20); plot2d(x,log(cumsum(flipx(y))),bar=true); insimg;

Benfords Law

Let us compare this with an exponential distribution with the same
mean value.
>m=mean(v)
469396.739396
The following Monte Carlo simulation produces such a distribution.
>vt=-log(random(1,100000))*m; mean(vt)
470216.888074
>{x,y}=histo(vt,20); plot2d(x,log(cumsum(flipx(y))),bar=true); insimg;

Benfords Law

Let us compare the distribiton of the first digits we found in our
sample with an exponential distribution. For this, we produce a
sequence of first digits, which are distributed according to Benford's
law.
>va=fd(10^random(1,nf));
We insert these into the same plot as above.
>plot2d(fd(va),distribution=9,even=true);  ...
>  plot2d(1:9,d,points=true,add=true);  ...
>  plot2d(1:9,d,add=true); ...
>  insimg;

Benfords Law

>frequa=count(fd(va)-1,9)
[ 876  533  375  288  239  190  167  149  130 ]
Most likely, the chi-square test cannot reject the distribution d for
our Monte Carlo data.
>chitest(frequa,d*nf)
0.994838542512
A more general way to express Benford's law is to state that the
fractions of the decadic logarithm of the data is distributed evenly
in [0,1].

Benfords Law

We can plot the distribution of these fractions for our data.
>plot2d(mod(log10(v),1),distribution=1); insimg;

Benfords Law

>

Examples Homepage