トップ «前の日記(2013-11-26) 最新 次の日記(2013-12-03)» 月表示 編集

日々の流転


2013-11-28 [長年日記]

λ. nonlinear-optimizationパッケージで遊ぶ

自動微分を使って線形回帰をしてみる」の続きで、最急降下法を自分で書いてみるかと思ったのだけれど、そういえばHackageにnonlinear-optimizationパッケージというのがあったのを思い出した。以前から、一回試してみたいと思っていたパッケージだったので、せっかくなのでこれを試してみる。 試してみるバージョンは現時点での最新版の0.3.7で、Hager and Zhang *1 のアルゴリズムを同著者らが実装したCG_DESCENTライブラリのバインディングが提供されている。

nonlinear-optimizationパッケージのAPIはadパッケージのgradientDescent関数に比べるとちょっと面倒臭いAPIだけれど、前回のコードを書き換えてみたのが以下のコード。 nonlinear-optimizationパッケージは自動微分とかはしないので、勾配を計算する関数を明示的に渡す必要があり、adパッケージのgrad関数で自動微分した結果の関数を渡すようにしてある。

module Main where

import Control.Monad
import qualified Data.Vector as V
import Numeric.AD
import qualified Numeric.Optimization.Algorithms.HagerZhang05 as CG
import Text.Printf
import qualified Text.CSV as CSV

main :: IO ()
main = do
  Right csv <- CSV.parseCSVFromFile "galton.csv"
  let samples :: [(Double, Double)]
      samples = [(read parent, read child) | [child,parent] <- tail csv]      
      -- hypothesis
      h theta x = (theta V.! 0) + (theta V.! 1) * x
      -- cost function
      cost theta = mse [(realToFrac x, realToFrac y) | (x,y) <- samples] (h theta)
  (theta, result, stat) <-
    CG.optimize
      CG.defaultParameters{ CG.printFinal = True, CG.printParams = True, CG.verbose = CG.Verbose }
      0.0000001                  -- grad_tol
      (V.fromList [0::Double,0]) -- Initial guess
      (CG.VFunction cost)        -- Function to be minimized.
      (CG.VGradient (grad cost)) -- Gradient of the function.
      Nothing                    -- (Optional) Combined function computing both the function and its gradient.
  print theta

-- mean squared error
mse :: Fractional y => [(x,y)] -> (x -> y) -> y
mse samples h = sum [(h x - y)^(2::Int) | (x,y) <- samples] / fromIntegral (length samples)

で、これを実行してみると……

PARAMETERS:

Wolfe line search parameter ..................... delta: 1.000000e-01
Wolfe line search parameter ..................... sigma: 9.000000e-01
decay factor for bracketing interval ............ gamma: 6.600000e-01
growth factor for bracket interval ................ rho: 5.000000e+00
growth factor for bracket interval after nan .. nan_rho: 1.300000e+00
truncation factor for cg beta ..................... eta: 1.000000e-02
perturbation parameter for function value ......... eps: 1.000000e-06
factor for computing average cost .............. Qdecay: 7.000000e-01
relative change in cost to stop quadstep ... QuadCufOff: 1.000000e-12
factor multiplying gradient in stop condition . StopFac: 0.000000e+00
cost change factor, approx Wolfe transition . AWolfeFac: 1.000000e-03
restart cg every restart_fac*n iterations . restart_fac: 1.000000e+00
stop when cost change <= feps*|f| ................. eps: 1.000000e-06
starting guess parameter in first iteration ...... psi0: 1.000000e-02
starting step in first iteration if nonzero ...... step: 0.000000e+00
factor multiply starting guess in quad step ...... psi1: 1.000000e-01
initial guess factor for general iteration ....... psi2: 2.000000e+00
max iterations is n*maxit_fac ............... maxit_fac: 1.797693e+308
max expansions in line search ................. nexpand: 50
max secant iterations in line search .......... nsecant: 50
print level (0 = none, 2 = maximum) ........ PrintLevel: 1
Logical parameters:
    Error estimate for function value is eps
    Use quadratic interpolation step
    Print final cost and statistics
    Print the parameter structure
    Wolfe line search ... switching to approximate Wolfe
    Stopping condition uses initial grad tolerance
    Do not check for decay of cost, debugger is off
iter:     0 f =   4.642373e+03 gnorm =   9.306125e+03 AWolfe =  0
QuadOK:  1 initial a:   1.070618e-04 f0:   4.642373e+03 dphi:  -8.662251e+07
Wolfe line search

iter:     1 f =   5.391563e+00 gnorm =   3.269753e-02 AWolfe =  0
QuadOK:  1 initial a:   1.632104e+01 f0:   5.391563e+00 dphi:  -1.069411e-03
Wolfe line search
RESTART CG

iter:     2 f =   5.387312e+00 gnorm =   1.558689e+01 AWolfe =  1
QuadOK:  0 initial a:   3.264209e+01 f0:   5.387312e+00 dphi:  -2.430188e+02
Approximate Wolfe line search

iter:     3 f =   5.374303e+00 gnorm =   3.196800e-02 AWolfe =  1
QuadOK:  1 initial a:   3.559386e+00 f0:   5.374303e+00 dphi:  -1.022238e-03
Approximate Wolfe line search
RESTART CG

iter:     4 f =   5.288871e+00 gnorm =   2.808049e-02 AWolfe =  1
QuadOK:  1 initial a:   2.376426e+01 f0:   5.288871e+00 dphi:  -7.887345e-04
Approximate Wolfe line search

iter:     5 f =   5.279499e+00 gnorm =   1.301306e+01 AWolfe =  1
QuadOK:  0 initial a:   4.752853e+01 f0:   5.279499e+00 dphi:  -1.693871e+02
Approximate Wolfe line search
RESTART CG

Termination status: 0
Convergence tolerance for gradient satisfied
maximum norm for gradient:  5.513258e-09
function value:             5.000294e+00

cg  iterations:                   6
function evaluations:            15
gradient evaluations:            11
===================================

fromList [23.941530180694265,0.6462905819889333]

今度は一瞬で収束した! やった!

*1 Hager, W. W. and Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. Society of Industrial and Applied Mathematics Journal on Optimization, 16 (2005), 170-192.