iconEuler Examples

Using a Gradient Method

by R. Grothmann

We like to demonstrate a direction search for global minimization.
First we define a function, which evaluates another function at x into
direction v.
>function map %fv (t; f, x, v) := f(x+t*v);
>function vmin (f,x,v,tstart,tmax=1,tmin=epsilon) ...
## Minimize f(x+t*v)
	y=f(x);
	t=tstart;
	repeat
		ynew=f(x+t*v);
		while ynew<y;
		t=t*2;
		until t>tmax;
	end;
	t=fmin("%fv",0,t;f,x,v);
	return {x+t*v,t};
endfunction
For a gradient method, we need a function and its gradient. We take a
very simple example for a first test.
>function f([x,y]) &= x^2+2*y^2
                                 2    2
                              2 y  + x

>function df([x,y]) &= gradient(f(x,y),[x,y])
                              [2 x, 4 y]

The function vmin minimizes f in a direction v, starting from the
point x.

Global Minimization Demo

It returns the current step size t. This can be used as an initial
step size in the next step.
>x0=[1,1]; {x1,t}=vmin("f",x0,-df(x0),0.1); x1,
[ 0.444444445171  -0.111111109657 ]
Not even in two steps.
>{x2,t}=vmin("f",x1,-df(x1),t); x2,
[ 0.0740740740087  0.0740740731982 ]
The reason becomes obvious when we look at the following plot, where
the niveau lines of our iterations are shown.
>X=(x0_x1_x2)'; ...
>  plot2d("f",r=2,niveau=[f(x0),f(x1),f(x2)]);  ...
>  plot2d(X[1],X[2],color=red,>add); insimg;

Global Minimization Demo

>X
                  1      0.444444445171     0.0740740740087 
                  1     -0.111111109657     0.0740740731982 
The situation improves, if we use a Newton direction

Global Minimization Demo

We can compute the general inverse Hessian with Maxima, and thus the
direction of descent.
>function newtonvf([x,y]) &= -df(x,y).invert(hessian(f(x,y),[x,y]))
                             [ - x  - y ]

This yields a very good result in just one step.
>x0=[1,1]; x1=vmin("f",x0,newtonvf(x0),0.1)
[ 1.03672626039e-012  1.03672626039e-012 ]
Indeed, it is easy to see, that the Newton algorithm finishes in just
one step for each linear function, and the gradient is a linear
function. There is no need to compute the minimum of f(x+tv) in this
case.
>x0+newtonvf(x0)
[ 0  0 ]

A non-quadratic convex Function

Let us take another convex function, which is not a quadratic
function.
>function f([x,y]) &= x^2+y^4
                                4    2
                               y  + x

>function df([x,y]) &= gradient(f(x,y),[x,y])
                                      3
                             [2 x, 4 y ]

We take 100 steps of the gradient method.
>x=[1,1]; t=0.1; X=x;  ...
>  loop 1 to 100; {x,t}=vmin("f",x,-df(x),t); X=X_x; end;  ...
>  X=X';
And plot the result.
>plot2d("f",r=1.2,niveau=fmap(X[1,1:3],X[2,1:3])); ...
>  plot2d(X[1],X[2],>add,color=red); insimg;

Global Minimization Demo

As we see, the result is not very good. Essentially the function takes
a zig-zag-course in direction x, so that the y-value decreases only
very slowly.
>X[:,-10:-1]
Column 1 to 3:
  0.000120628156681 -5.90109250917e-005   0.000116746853007 
   -0.0348745189166    -0.0347481888263    -0.0344982644339 
Column 4 to 6:
 -5.7144026496e-005   0.000113084545099 -5.53661693139e-005 
   -0.0343759568583    -0.0341339342155     -0.034015450507 
Column 7 to 9:
  0.000109580698827  -5.3693072045e-005   0.000106283681283 
   -0.0337809421765    -0.0336660666202     -0.033438690026 
Column 10:
-5.20882874773e-005 
   -0.0333272632694 
Here is the logarithmic descent

Global Minimization Demo

It is not very good.
>k=5:cols(X); plot2d(log(fmap(X[1,k],X[2,k]))/k); insimg;

Global Minimization Demo

Let us try the Newton direction for this function.
>function newtonvf([x,y]) &= -df(x,y).invert(hessian(f(x,y),[x,y]))
                             [        y ]
                             [ - x  - - ]
                             [        3 ]

Now we get good results after 20 iterations.
>x=[1,1]; t=0.1; X=x;  ...
>  loop 1 to 20; {x,t}=vmin("f",x,newtonvf(x),t); X=X_x; end;  ...
>  X=X';
>plot2d("f",r=1.2,niveau=fmap(X[1,1:3],X[2,1:3])); ...
>  plot2d(X[1],X[2],>add,color=red); insimg;

Global Minimization Demo

>X[:,-1]
 5.66579793109e-011 
   9.612814048e-006 
Here is the logarithmic descent.
>k=3:cols(X); plot2d(log(fmap(X[1,k],X[2,k]))/k); insimg;

Global Minimization Demo

But the result is not so good, if we take the Newton algorithm itself.
>x=[1,1]; X=x;  ...
>  loop 1 to 20; x=x+newtonvf(x); X=X_x; end;  ...
>  X=X';
>plot2d("f",r=1.2,niveau=fmap(X[1,1:3],X[2,1:3])); ...
>  plot2d(X[1],X[2],>add,color=red); insimg;

Global Minimization Demo

>X[:,-1]
                  0 
  0.000300728659822 
>k=3:cols(X); plot2d(log(fmap(X[1,k],X[2,k]))/k); insimg;

Global Minimization Demo

The Rosenbloom Banana

The function above is a quadratic. We take another function, which is
a well known test case.
>function f([x,y]) &= 100*(y-x^2)^2+(1-x)^2;
The function

Global Minimization Demo

is not convex. It has a banana like behaviour. The minimum is
obviously in x=y=1.
>plot2d("f",r=3,>contour); plot2d(1,1,>points,>add); insimg;

Global Minimization Demo

>function df([x,y]) &= gradient(f(x,y),[x,y])
                            2                         2
             [- 400 x (y - x ) - 2 (1 - x), 200 (y - x )]

We take 100 steps of the gradient method.
>x=[0,0]; t=0.1; X=x;  ...
>  loop 1 to 100; {x,t}=vmin("f",x,-df(x),t); X=X_x; end;  ...
>  X=X';
And plot the result. The result does not even come close to the
minimum. It takes a terrible step like zig-zag course.
>plot2d("f",r=1.2,>contour); plot2d(1,1,>points,>add); ...
>  plot2d(X[1],X[2],>add,color=red); insimg;

Global Minimization Demo

>function newtonvf([x,y]) &= -df(x,y).invert(hessian(f(x,y),[x,y]));
Now we get very good results after 20 iterations.
>x=[0,0]; t=0.1; X=x;  ...
>  loop 1 to 20; {x,t}=vmin("f",x,newtonvf(x),t); X=X_x; end;  ...
>  X=X';
>plot2d("f",r=1.2,>contour); plot2d(1,1,>points,>add); ...
>  plot2d(X[1],X[2],>add,color=red); insimg;

Global Minimization Demo

>X[:,-1]
                  1 
                  1 

Nelder Mead 2D demo

This demo shows the iterations for the Nelder-Mead minimization
process.

Load helper functions for the demo.
>load "Nelder Mead Demo"
The Rosenbloom banana function, as needed by the Nelder method.
>function f([x,y]) &= 100*(y-x^2)^2+(1-x)^2
                                 2 2          2
                       100 (y - x )  + (1 - x)

The Minimum is in [1,1].
>&solve(gradient(f(x,y),[x,y]))
                           [[y = 1, x = 1]]

Plot in 3D, scaled
>plot3d("f",r=3,>contour,angle=160°); insimg;

Global Minimization Demo

Plot the contour lines of the function in 2D. Add the minimum.
>plot2d("f",r=3,>contour); plot2d(1,1,>add,>points); insimg;

Global Minimization Demo

The Nelder-Mead method uses simplices. It moves one corner of the
simplex, so that simplex moves towards a minimum.

Set the start simplex.
>s=[0,1;1,0;0,0];
Show the iterations of the Nelder-Mead method.

Press Return to abort, and Space for the next step.
>nelderdemo(s,"f"), insimg;
     0.985415224543      0.969003877292 
     0.966540627635      0.934359321519 
     0.987483912325       0.97603469899 

Global Minimization Demo

Try the builtin function.
>nelder("f",[0,0])
[ 0.999999942023  0.999999814945 ]

Internals

The Euler file we used here contains a function vmin, which minimizes
f(x+tv).
>type vmin
function vmin (f, x, v, tstart, tmax, tmin)

## Default for tmax : 1
## Default for tmin : 2.22044604925e-012

    y=f(x);
    t=tstart;
    repeat
        ynew=f(x+t*v);
        while ynew<y;
        t=t*2;
        until t>tmax;
    end;
    t=fmin("%fv",0,t;f,x,v);
    return {x+t*v,t};
endfunction
It first doubles the given start value for t, until the function no
longer descends. Then it uses the fmin function to find the minimum of
f(x+tv) in [0,t]. fmin uses a golden ration cut to determine the
minimum.

We need a function %fv(t)=f(x+t*v), which needs the name of the
function f, and the vectors x and v.
>type %fv
function map %fv (t, f, x, v)
useglobal; return f(x+t*v); 
endfunction
f must be a function, which works for vectors.
>function f([x,y]) := x+y^2
This type of functions work for vectors and two scalars.
>f([4,5]), f(4,5)
29
29
We can plot

Global Minimization Demo

if we pass "f",x,v as semicolon parameters to plot2d.
>plot2d("%fv",0,2;"f",[-1,-1],[1,1],>insimg);

Global Minimization Demo

Or we can use this function in other functions, which accept function
parameters.
>integrate("%fv",0,2;"f",[-1,-1],[1,1])
0.666666666667
The fmin function uses this trick to minimize f(x+tv).
>fmin("%fv",0,2;"f",[-1,-1],[1,1])
0.499999994731

Examples Homepage