iconEuler Examples

Black Swan Distribution and the DAX

by R. Grothmann

I have been inspired to look at ugly distributions by the book "Black
Swans" by N.N.Talib. He claims that such distributions are more real
life than the standard "bell curve". A black swan is a rare, but
important event, which is essentially unpredictable. Ugly
distributions are distributions, which can produce black swans.

http://en.wikipedia.org/wiki/Black_swan_theory

Mathematically, we are talking about the Cauchy distribution, or
distributions of Pareto type.

This notebook also contains a study of the distribution of the deltas
of the DAX (German stock index). The result is that these deltas
follow a "power law" with alpha equal to 3.6.

A good Distribution

To compare good and bad distributions, let us start with the Gauß
distribution, which is of course a good distribution. What is the mean
distance, a random walk will travel? How would a random walk "look
like"?
>&assume(n>0);
First, we investigate a 0-1-normal distributed random walk. The
expected absolute distance of n steps of the walk can be computed by
integrating |x| with respect to this distribution.

Black Swan Distribution

Due to symmetry, we can use the following integral.
>&2*integrate(sqrt(n)/sqrt(2*%pi)*exp(-n*x^2/2)*x,x,0,inf)
                               sqrt(2)
                           ----------------
                           sqrt(pi) sqrt(n)

Let us generate a Monte-Carlo simulation of random walks with
0-1-normal distributed steps.

With the following command, we do 1000 random walks of length
1000, and print the mean value of the absolute distance, as well
as the standard deviation of it.
>n:=1000; S:=abs(sum(normal(1000,n))'); mean(S), dev(S)
25.4323731799
18.9065448062
The expected distance from 0 is
>sqrt(2/(pi*n))*n
25.2313252202
Note that the deviation of the traveling distance is quite large. This
means, that it is difficult to predict the distance, even in this
"normal" case.

Here is the simulated distribution of the absolute walking distance of
our 1000 random walks of length 1000.
>{x,y}=histo(S,20); plot2d(x,y,bar=1):

Black Swan Distribution

For an optical impression, the following plot shows four such
walks.
>figure(2,2); loop 1 to 4; figure(#); plot2d(cumsum(normal(1,500))); end; ...
 insimg; figure(0);

Black Swan Distribution

Due to the strong law of large numbers, the same behavior is to
be expected by all random walks with variance 1 in each step.

Here is a {-1,1}-walk, which takes steps -1 or 1 with probability
1/2.
>n:=1000; S:=abs(sum(intrandom(n,n,2)*2-3)'); mean(S), dev(S)
25.602
19.7746462075
>{x,y}=histo(S,20); plot2d(x,y,bar=1):

Black Swan Distribution

The Cauchy Distribution

Now we consider some ugly distributions for the steps. The following
distribution is called the Cauchy distribution.

Black Swan Distribution

This distribution has neither an expected value, nor a variance, just
like the distributions in Taleb's book. 
>expr &= 1/(%pi*(1+x^2)); plot2d(expr,a=-10,b=10,c=0,d=0.5):

Black Swan Distribution

The distribution is a probabilistic random distribution.
>&integrate(expr,x,-inf,inf)
                                  1

The probability of a larger value than a behaves like pi/a. This does
not go toward 0 so rapidly.

To see this behavior, we develop the distribution P(x<=a) function
near a=infinity.
>&assume(a>0); &taylor(integrate(@expr,x,a,inf),a,inf,2)
                                  1
                                 ----
                                 pi a

To simulate this distribution, we solve P(x<=a)=y for a.
>&solve(integrate(expr,x,-inf,a)=y,a)
                          [a = - cot(pi y)]

Then, we choose y uniformly in [0,1], and return the corresponding
a.
>function prand(n,m) := -1/tan(pi*random(n,m))
If we run a simulation of this distribution, we often get large
values. These are the black swans of Teleb's book.
>s:=prand(1,1000); plot2d(s,distribution=20):

Black Swan Distribution

However, if we truncate the large values (plot only the values
close to 0), we can see the Cauchy distribution.
>s:=prand(1,1000); s=s[nonzeros(abs(s)<10)]; ...
 plot2d(s,a=-10,b=10,c=0,d=0.5,distribution=20); ...
 plot2d(expr,add=1,thickness=2,color=blue):

Black Swan Distribution

How about a random walk with this distribution?
>n:=1000;
Most probably, we get a very large walking distance, and an even
larger deviation. We cannot predict, how large the sum of n such
random variables is. This is due to the large excessive values
of the distribution.
>S:=abs(sum(prand(1000,n))'); mean(S), dev(S)
4395.69627879
19169.1503693
The plot of the absolute walking distances over all simulated
walks show this clearly.
>plot2d(S,histogram=20):

Black Swan Distribution

Here is a single random walk. It makes huge steps sometimes. In
some periods, it looks very ordinary. This behavior simulates
black swans, which are rare, decisive, unexpected events.
>plot2d(cumsum(prand(1,1000))):

Black Swan Distribution

A Pareto Distribution

Let us take a not so bad distribution. The following distribution has
an expected value (0), but no variance. 

Black Swan Distribution

It is of the same type as the so called Pareto distributions.
>&2*integrate(1/(1+x^3),x,0,inf)
                                 4 pi
                                 ----
                                  3/2
                                 3

We put the scaled density function into an expression, and test,
if it is really a probabilistic distribution.
>expr1 &= 3*sqrt(3)/(4*%pi*(1+abs(x)^3));
>&2*integrate(expr1,x,0,inf)
                                  1

We can compute P(x>=a) with Maxima, but the expression is not
so nice.
>&assume(a^2-a+1>0); function distr(a) &=1-integrate(expr1,x,a,inf)
                                         2 sqrt(3) a - sqrt(3)
                       2            atan(---------------------)
             3/2  log(a  - a + 1)                  3
       1 - (3    (--------------- - ---------------------------
                         6                    sqrt(3)
                                       log(a + 1)      pi
                                     - ---------- + ---------))/(4 pi)
                                           3        2 sqrt(3)

It behaves like 1/a^2 as already expected from the densitity function.
First a numerical test.
>plot2d(&(1-distr(x))*x^2,0,100):

Black Swan Distribution

Here is a symbolic proof of this.
>&limit((1-distr(a))*a^2,a,inf), %()
                                  3/2
                                 3
                                 ----
                                 8 pi

0.206748335783
Let us define this distribution in Euler.
>function map distr(a) ...
if a>0 then return &:distr(a)
else return 1-(&:distr(-a))
endif
endfunction
>plot2d("distr",-5,5):

Black Swan Distribution

>function map invdistr(b) ...
if b>distr(100); return 100;
else if b<distr(-100) return -100;
else
  return bisect("distr",-100,100,y=b,eps=0.001);
endif;
endfunction
>invdistr(0.2), distr(%)
-0.808334350586
0.200090252735
Then we can define a random variable for this.
>function p1rand(n,m) := invdistr(random(n,m))
Here is a random walk. Due to the complicated random variable,
the simulation is not as fast as before.

The walk exhibits the phenomenon of sudden jumps again. However,
the jumps are not so prominent.
>plot2d(cumsum(p1rand(1,1000))):

Black Swan Distribution

The expected traveling distance is not very high, however. The
following simulation takes some time. The deviation is high usually,
so there is no proper way to predict it.
>n:=100; S:=abs(sum(p1rand(100,n))'); mean(S), dev(S)
14.2630045265
15.2938113452

Development of the DAX

For a real life example, we study the development of the German
stock index DAX. I have put the data, collected over the past
18 years on a daily basis, into a data file.
>dax:=readmatrix("kurs.dat")';
The time scale is backwards, so let us flip it.
>dax:=flipx(dax);
The 4th column in the file is the rate. The first three columns
contain the date.
>rate:=dax[4]; date:=dax[1]+(dax[2]+dax[3]/31)/12; ...
 plot2d(date,rate,a=date[1],b=date[-1],c=0,d=10000):

Black Swan Distribution

We want to study the deltas, and plot the distribution of these
deltas.
>n:=cols(rate); delta:=differences(rate);
Now we can plot the distribution of the deltas.
>plot2d(delta,distribution=40):

Black Swan Distribution

The rate has a bias, of course, but only a very small one. However,
the deviation is quite high, so it is already almost unpredictable.
>mr:=mean(delta), dr:=dev(delta)
0.491942274306
66.4036869492
The Gauß normal distribution does not really look like a good
fit, and in fact it isn't.
>plot2d("qnormal(x,mr,dr)",add=1,thickness=2):

Black Swan Distribution

Especially, if we plot the excessive values, we see that there
are far more than expected. The function "excess" defined below
computes how many deltas are larger than x*vr.

The black curve is the excess of the DAX deltas, and the colored
curve would be the excess of a normal distribution.
>function map excess(x) := sum(abs(delta-mr)>x*dr)/cols(delta)
>plot2d("excess",0,5); plot2d("(1-normaldis(x))*2",add=1,color=5):

Black Swan Distribution

The behavior is what Taleb calls a "power law" with exponent 3.6.

The number of deltas larger than x times the standard deviation
behaves about like 1/x^3.6. To see this, we plot the excess times
x^a for the suitable chosen power a=3.6.
>plot2d("excess(x)*x^3.6",0,5):

Black Swan Distribution

Let us try the same with the normal distribution. The fit is quite
well, and there are almost no excessive values, not even in 10000
random values.
>n=10000; sn:=normal(1,n); ...
 plot2d(sn,distribution=40); plot2d("qnormal(x,0,1)",add=1):

Black Swan Distribution

The distribution simply does not allow excessive values.
>function map excessn(x) := sum(abs(sn)>x)/n
>plot2d("excessn(x)*x^10",0,5):

Black Swan Distribution

Now the try the same with the "prand" distribution above. It has
no variance, so we cannot scale it the same way.

Nevertheless, whatever scaling factor we take, we clearly see
the power law with exponent 1.
>n=10000; sn:=prand(1,n); ...
 plot2d("excessn(x*3)*x",0,5):

Black Swan Distribution

Likewise for the power law 2. Compare this to the normal distribution
and the distribution of the DAX!
>n=1000; sn:=p1rand(1,n); ...
 plot2d("excessn(x*3)*x^2",0,5):

Black Swan Distribution

In conclusion, we find that the deltas of the DAX obey a power law
with exponent 3.6. This distribution has already a finite variance.

I like to add my personal opinion. From the graph alone, the DAX
behaves randomly, and due to the large variance is unpredictable. Only
with insider knowledge, combined with knowledge about the general
behavior of investors (which may have nothing to do with the real
economical development), trends may be predicted and exploited.

Examples Homepage