2013-11-28 [長年日記]
λ. nonlinear-optimizationパッケージで遊ぶ
「自動微分を使って線形回帰をしてみる」の続きで、最急降下法を自分で書いてみるかと思ったのだけれど、そういえばHackageにnonlinear-optimizationパッケージというのがあったのを思い出した。以前から、一回試してみたいと思っていたパッケージだったので、せっかくなのでこれを試してみる。 試してみるバージョンは現時点での最新版の0.3.7で、Hager and Zhang *1 のアルゴリズムを同著者らが実装したCG_DESCENTライブラリのバインディングが提供されている。
nonlinear-optimizationパッケージのAPIはadパッケージのgradientDescent関数に比べるとちょっと面倒臭いAPIだけれど、前回のコードを書き換えてみたのが以下のコード。 nonlinear-optimizationパッケージは自動微分とかはしないので、勾配を計算する関数を明示的に渡す必要があり、adパッケージのgrad関数で自動微分した結果の関数を渡すようにしてある。
module Main where import Control.Monad import qualified Data.Vector as V import Numeric.AD import qualified Numeric.Optimization.Algorithms.HagerZhang05 as CG import Text.Printf import qualified Text.CSV as CSV main :: IO () main = do Right csv <- CSV.parseCSVFromFile "galton.csv" let samples :: [(Double, Double)] samples = [(read parent, read child) | [child,parent] <- tail csv] -- hypothesis h theta x = (theta V.! 0) + (theta V.! 1) * x -- cost function cost theta = mse [(realToFrac x, realToFrac y) | (x,y) <- samples] (h theta) (theta, result, stat) <- CG.optimize CG.defaultParameters{ CG.printFinal = True, CG.printParams = True, CG.verbose = CG.Verbose } 0.0000001 -- grad_tol (V.fromList [0::Double,0]) -- Initial guess (CG.VFunction cost) -- Function to be minimized. (CG.VGradient (grad cost)) -- Gradient of the function. Nothing -- (Optional) Combined function computing both the function and its gradient. print theta -- mean squared error mse :: Fractional y => [(x,y)] -> (x -> y) -> y mse samples h = sum [(h x - y)^(2::Int) | (x,y) <- samples] / fromIntegral (length samples)
で、これを実行してみると……
PARAMETERS: Wolfe line search parameter ..................... delta: 1.000000e-01 Wolfe line search parameter ..................... sigma: 9.000000e-01 decay factor for bracketing interval ............ gamma: 6.600000e-01 growth factor for bracket interval ................ rho: 5.000000e+00 growth factor for bracket interval after nan .. nan_rho: 1.300000e+00 truncation factor for cg beta ..................... eta: 1.000000e-02 perturbation parameter for function value ......... eps: 1.000000e-06 factor for computing average cost .............. Qdecay: 7.000000e-01 relative change in cost to stop quadstep ... QuadCufOff: 1.000000e-12 factor multiplying gradient in stop condition . StopFac: 0.000000e+00 cost change factor, approx Wolfe transition . AWolfeFac: 1.000000e-03 restart cg every restart_fac*n iterations . restart_fac: 1.000000e+00 stop when cost change <= feps*|f| ................. eps: 1.000000e-06 starting guess parameter in first iteration ...... psi0: 1.000000e-02 starting step in first iteration if nonzero ...... step: 0.000000e+00 factor multiply starting guess in quad step ...... psi1: 1.000000e-01 initial guess factor for general iteration ....... psi2: 2.000000e+00 max iterations is n*maxit_fac ............... maxit_fac: 1.797693e+308 max expansions in line search ................. nexpand: 50 max secant iterations in line search .......... nsecant: 50 print level (0 = none, 2 = maximum) ........ PrintLevel: 1 Logical parameters: Error estimate for function value is eps Use quadratic interpolation step Print final cost and statistics Print the parameter structure Wolfe line search ... switching to approximate Wolfe Stopping condition uses initial grad tolerance Do not check for decay of cost, debugger is off iter: 0 f = 4.642373e+03 gnorm = 9.306125e+03 AWolfe = 0 QuadOK: 1 initial a: 1.070618e-04 f0: 4.642373e+03 dphi: -8.662251e+07 Wolfe line search iter: 1 f = 5.391563e+00 gnorm = 3.269753e-02 AWolfe = 0 QuadOK: 1 initial a: 1.632104e+01 f0: 5.391563e+00 dphi: -1.069411e-03 Wolfe line search RESTART CG iter: 2 f = 5.387312e+00 gnorm = 1.558689e+01 AWolfe = 1 QuadOK: 0 initial a: 3.264209e+01 f0: 5.387312e+00 dphi: -2.430188e+02 Approximate Wolfe line search iter: 3 f = 5.374303e+00 gnorm = 3.196800e-02 AWolfe = 1 QuadOK: 1 initial a: 3.559386e+00 f0: 5.374303e+00 dphi: -1.022238e-03 Approximate Wolfe line search RESTART CG iter: 4 f = 5.288871e+00 gnorm = 2.808049e-02 AWolfe = 1 QuadOK: 1 initial a: 2.376426e+01 f0: 5.288871e+00 dphi: -7.887345e-04 Approximate Wolfe line search iter: 5 f = 5.279499e+00 gnorm = 1.301306e+01 AWolfe = 1 QuadOK: 0 initial a: 4.752853e+01 f0: 5.279499e+00 dphi: -1.693871e+02 Approximate Wolfe line search RESTART CG Termination status: 0 Convergence tolerance for gradient satisfied maximum norm for gradient: 5.513258e-09 function value: 5.000294e+00 cg iterations: 6 function evaluations: 15 gradient evaluations: 11 =================================== fromList [23.941530180694265,0.6462905819889333]
今度は一瞬で収束した! やった!
*1 Hager, W. W. and Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. Society of Industrial and Applied Mathematics Journal on Optimization, 16 (2005), 170-192.