American Journal of Innovative Research and Applied Sciences. ISSN 2429-5396 I www.american-jiras.com
ORIGINAL ARTICLE
| Moulay El Mehdi Falloul 1* | Ayoub Razouk 2 | and | Youness Saoudi 2 |
1. Sultan Moulay Slimane University | department of Economics | Beni Mellal | Faculty polidisciplinary | Morocco |
2. Mohamed V University | Decision aid and computing | ENSIAS | Morocco |
| Received February 22, 2020 | | Accepted March 7, 2020 | | Published March 14, 2020 | | ID Article | Moulay-Ref.6-ajira250220 |
ABSTRACT
Background: Maximum likelihood estimation (MLE) is often used in econometric and other statistical models despite its computational considerations and because of its strong theoretical appeal. Objectives: Non-linear optimization discipline provides feasible alternative methods for calculating MLE’s, especially when special structure may be exploited, as for example in probabilistic choice models. Methods: may be exploited, as for example in probabilistic choice models. This paper examines estimation of parameters of financial time series model named GARCH(p,q) using four numerical optimization methods and gives numerical comparisons of these methods. Results: Among the issues considered in this paper are theoretical background of MLE. Also methods of approximating the Hessian are presented. These include (DFP and BFGS) and statistical approximations (BHHH). Conclusions: In our case of GARCH (p, q) NR has approved to be the fastest in convergence according to the number of iterations followed by BHHH algorithm, BFGS and DFP is the last position rank.
Keywords: GARCH(p,q), Log-likelihood, Numerical optimization, BHHH, Newton-Raphson, BFGS, DFP.
1. INTRODUCTION
The maximum likelihood estimator is highly regarded for its excellent asymptotic properties, yet significant effort and ingenuity have nevertheless gone into the development of alternative statistical estimators. This is especially true in econometrics, where many other estimators are available for simultaneous equations. One major reason for this is the difficulty of computing MLE’s, a task requiring the solution of a non-linear optimization problem [1]. This paper examines estimation of parameters of financial time series model named GARCH(p,q) using four numerical optimization methods and gives numerical comparisons of these methods. Among the issues considered in this paper are theoretical background of MLE. Also methods of approximating the Hessian are presented. These include (DFP and BFGS) and statistical approximations (BHHH).
2. Presentation of GARCH(p,q)2. Presentation of GARCH(p,q)An observed time series yt can be written as the sum of a predictable and an unpredictable part,Where is the information set consisting of all relevant information up to and including time t−1, In particular, was assumed to be both unconditionally and conditionally homoscedastic – that is, for all t. Here we relax part of this assumption and allow the conditional variance of t to vary over time – that is, . Put differently, is conditionally heteroscedastic [2]. A convenient way to express this in general is Where Engle (1982) introduced the class of Autoregressive Conditionally heteroscedastic (ARCH) models to capture the volatility clustering of financial time series (even though the first empirical applications did not deal with high frequency financial data) [3]. In the basic ARCH model, the conditional variance of the shock that occurs at time t is a linear function of the squares of past shocks. For example, in the ARCH model of order 1, ht is specified as Where. Notice that .To cope with the extended persistence of the empirical autocorrelation function, one may consider generalizations of the ARCH (1) model. One possibility to allow for more persistent autocorrelations is to include additional lagged squared shocks in the conditional variance function. The general ARCH(q) model is given by To capture the dynamic patterns in conditional volatility adequately by means of an ARCH(q) model, q often needs to be taken quite large. It turns out that it can be quite cumbersome to estimate the parameters in such a model, because of the no negativity and stationarity conditions that need to be imposed. To reduce the computational problems, Bollerslev (1986) suggested adding lagged conditional variances to the ARCH model instead, this implied the Generalized GARCH model of order (p,q) structured as follows [4] : Estimation of the parameters in the GARCH model can be done by the maximization of the likelihood function that would present in the next section.
3. Estimation of the parametrs of GARCH(p,q) model3. Estimation of the parametrs of GARCH(p,q) modelEstimation of the parameters of the GARCH model can be done by ty maximum likelihood. The expression of likelihood function is as follows: Where n is the sample size, , , and is the joint probability function of . Since the exact form of is complicated, the conditional likelihood function is used instead, it is given as follows; If we assume When , , since . This could be calculated from the transformation of variables. Thus, the conditional likelihood function is: Where denotes the time points in the conditional likelihood function. We have to notice that when maximizing the conditional likelihood function is equivalent to maximizing its logarithms because ln(.) is a strictly increasing function . Since the log-likelihood is easier to handle, it is described as follows:Where has to be evaluated recursively. The log-likelihood in eq. (8) can be maximized using numerical optimization methods that we would present in the next sections.
4. Maximum likelihood framework4. Maximum likelihood frameworkSuppose that the vector of the dependent variable and the vector of the parameters are presented simultaneously as and , the joint probability function density function can be written as a sequence of conditional distributions as follows: And hence for the full sample The important special case which simplify the probability function density is the case of independent and identically distributed probability density function (iid) as follows: For many models in statistics and econometrics, it is often simpler to work with the log-likelihood function, that is written as follows in the case of independent and identically distributed probability density function (iid): The aim of maximum likelihood estimation is to find the value of that maximizes the log-likelihood function, a natural way to do this is to use the rule of calculus. This involves computing the first derivatives (gradient) and second derivatives (Hessian) of log-likelihood function with respect to the parameter vector 4.1 Gradient: The first derivative of log-likelihood function with respect to the parameter vector is written as follows:4.1 Gradient: The first derivative of log-likelihood function with respect to the parameter vector is written as follows: This equality is known as the score or the gradient. In the iid case, where is a fixed (K*1) vector of parameters, the gradient is written as follows: The maximum likelihood estimates of , namely is obtained by solving the following set equalities, in other words, satisfies: 4.2 Hessian: The second derivative of log-likelihood function with respect to the parameter vector is known as Hessian and it is written as follows:4.2 Hessian: The second derivative of log-likelihood function with respect to the parameter vector is known as Hessian and it is written as follows: This equality can be written as the following symmetric square matrix:In the case of the maximum likelihood estimate, the requirement is that the Hessian evaluated at is a negative definite matrix, in other words for all non-zero vectors x.The log-likelihood in (8) and (9) can be maximized using numerical optimization methods that we would present in the next section.
5. Numerical optimization methods of mle5. Numerical optimization methods of mleOf the numerous maximization algorithms that have been developed by scholars, we would describe only the most prominent that has been implemented in statistical and econometrics softwares.5.1 Newton-Raphson: The log-likelihood in for GRACH(p,q) (8) could be written and alternative way as follow:5.1 Newton-Raphson: The log-likelihood in for GRACH(p,q) (8) could be written and alternative way as follow: By taking first derivatives of function (16), and after rearrangement: and setting the system of equations in (16) equal to zero, becomes complex to solve, it is difficult to solve it analytically, thus a numerical approach is needed. It can be noticed that is always negative, since the likelihood is a probability by definition that is always between 0 and 1 and the ln of any number between 0 and 1 is negative. Numerically, the maximum can be found by “walking up” the likelihood function until no further increase can be found. The researcher specifies starting values of . Each iteration moves to a new value of the parameters at whichis higher than at the previous step. If we denote the current value at iteration k by the question is what is the best step we can take value for next, i.e what is the best To determine the best value of we use a second -order Taylor’s approximation of around : Now we find the value of that maximizes this approximation to : The Newton-Raphson method uses this formula. The step from the current value of to the new value is , that is the gradient vector multiplied by the inverse of the Hessian matrix. The negative of this negative Hessian matrix is positive and represents the degree of curvature. That means that, is the positive curvature, assuming that the log-likelihood function is globally concave. Each step is the slope of likelihood function divided by its curvature. If the slope is positive, is raised and if the slope is negative, is slowered, see “Figure. 1”. The curvature determines how large a step is made. If the curvature is great, meaning that the slope changes quickly, and the maximum is likely to be close and vice versa as specified in “Figure. 2”.Figure 1: Direction of the step follows the slope.Figure 2: Step size is inversely related to curvature.Three main issues are relevant to the Newton-Raphson method:5.2 Quadratics: If were exactly quadratic in , then the Newton-Raphson procedure would reach the maximum from any starting value. Let’s take for example quadratic, then it can be written as follows: The maximum is Knowing the gradient gt and the Hessian Ht, and so Newton-Raphson give us:Most of log-likelihood functions are not quadratic, and so the Newton-Raphson procedure takes more than one step to reach the maximum.5.3 Step Size: It is possible for the Newton-Raphson procedure to step past the maximum and move to a lower as specified in “Figure. 3”.Figure 3: Step may go beyond maximum to lower LL. To avoid this possibility, the step is multiplied by a scalar λ: The vector (-Ht)gt is called the direction, and λ is called step size.5.4 Concavity: We have to mention that if the Log-Likelihood function (LL) is globally concave, then the Newton-Raphson procedure provides an increase in the Log-Likelihood at each iteration. Moreover, if the LL is concave then the Hessian is definite negative at all values of β. The slope is declining and the second derivative is negative. By definition a symmetric matrix M is positive definite if for any x = 0. Let’s consider a first Taylor’s approximation of around : (22)Since is positive definite, we have and If the LL is not concave, the Newton-Raphson procedure fails to find an increase because the step is on the opposite direction of the slope as specified in “Figure. 4”:Figure 4: NR in the convex portion LL.The NR has two drawbacks, in one side calculation of Hessian is usually computation-intensive, in the other side NR doesn’t guarantee an increase in each step if LL is not globally concave. Other approaches use approximations to the Hessian that addresses this two issues. These methods differ in the form of the approximation. Each procedure defines a step as: Where is K*K matrix. For NR, . Other procedures use Mt that easier to calculate than the Hessian and are positive definite, so as to guarantee an increase in each iteration even in the convex region of LL.5.2 Berndt, Hall, Hall, and Hausman (BHHH)5.2 Berndt, Hall, Hall, and Hausman (BHHH)Maximization can be faster if we utilize the fact the function being maximized is a sum of terms in a sample. The score of an observation is the derivative of that observation’s log-likelihood with respect to the parameters: (24)The gradient is the average score (25)The outer product of observation n’s score is the K*K matrix: (26)The average outer product in the sample is related to the covariance matrix. The maximum occurs where the slope is zero, which means that the gradient (i.e, the average score) is zero and is the variance of scores in the sample. The variance of the scores provides a measure of the curvature of the log-likelihood function, similar to the Hessian. The curvature is great when the variance of the scores is high as illustrated in the second panel of “Figure. 5”:Figure 5: Shape of Log-likelihood function near maximum. BHHH uses in the optimization routine in place of This yields two advantages over N, firstly; is faster to calculate than and we don’t need to calculate the second derivative, Secondly, is necessarily positive definite. The BHHH procedure is therefore guaranteed to provide an increase in , in each iteration even in convex portions of the function. 5.3 Quasi-Newton iterative procedures 5.3 Quasi-Newton iterative procedures The quasi-Newton methods that build up an approximation of the inverse Hessian are often regarded as the most sophisticated for solving unconstrained problems. Even so, taking expectation of the inverse Hessian is essential for variance and covariance estimates in econometric modeling. Let be and . There are many solutions to the quasi-Newton condition described above. Initial Hessian matrix is usually chosen as identity matrix which is updated by update formula. In classical modified quasi-Newton iterative procedure, assuming minimization problem: (28)rank two updates are the most widely used. Earliest update formula for constructing the inverse Hessian was originally proposed by Davidson (1959) and later developed by Fletcher and Powell (1963). DFP update formula has nice property: for a quadratic objective function, it simultaneously generates the directions of the conjugate gradient method while constructing the inverse Hessian. DFP update formula for inverse Hessian is given by: (29)According to Broyden-Fletcher-Goldfarb-Shanno (1970), update formula is given by:
(30)
Weighted combinations of these formulas leads to a whole collection of updates:
(31)
In practice the calculated gradient vector is never exactly zero, but can be very close. Thereforeis often used to evaluate convergence. If inequality: is satisfied, the iterative process stops and the parameters at current iteration are considered as estimated.
6. Estimation of GARCH(1,1) model6. Estimation of GARCH(1,1) modelIn this section, we present the parameter estimation of our econometric model GARCH (1,1), the number of lags that has been used are specified according to Akaike and Schwarz information criteria, the purpose of this section is not only the estimation of the GARCH(p,q) model but also make a comparison of the numerical method of maximization of the Log-likelihood function.We first begin by plotting the graph of the MASI Index (Moroccan all share index) time series, which represents the main stock index of Casablanca Stock Exchange in Morocco. It is compound of all stocks of Casablanca stock exchange as specified in “Figure 6”.Figure 6: MASI closing prices and the logarithm of stock returns from 3 January 2002 to 18 October 2018.In the table 1 and table 2 estimation of GARCH (1,1) parameters according to NR, BFGS, DFP and BHHH numerical algorithm are presented for different convergence criteria respectively and less than 0.00001.Table 1: GARCH (1,1) parameter estimations when convergence crteria is satistfied the relative change in maximum likelihood between two successive iterations is less than 0.0001.ParameterBHHHNRBFGSDFP
0.0006440.0006440.0006440.000644
0.462888640.46288870.46289140.4628897
0.31763520.31763430.31763380.3176353
2.78e-062.78e-062.78e-062.78e-06
Iterations1672526
Log-likelihood18404.7918404.7918404.7918404.79
Table 2: GARCH(1,1) parameter estimations when convergence crteria is satistfied the relative change in maximum likelihood between two successive iterations is less than 0.00001.ParameterBHHHNRBFGSDFP
0.0006440.0006440.0006440.000644
0.46287460.46288870.46289140.4628897
0.31763470.31763430.31763380.3176353
2.78e-062.78e-062.78e-062.78e-06
Iterations1672526
Log-likelihood18404.7918404.7918404.7918404.79
7. CONCLUSIONIn our case of GARCH(p,q) NR has approved to be the fastest in convergence according to the number of iterations followed by BHHH algorithm, BFGS and DFP is the last position rank. Even so, convergence problem may arise, because the more parameters in the model are entered the "flatter" the log-likelihood function becomes, and therefore the more difficult it is to maximize.8. REFERENCES David S. Bunch. a comparison of algorithms for maximum likelihood estimation of choice models. Journal of Econometrics.1988; 145-167.Franses, Philip Hans, Dijk, Dick van. Non-Linear Time Series Models in Empirical Finance," Cambridge Books, Cambridge University Press, 2000.Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica. 1982; 50:, 987–1007Bollerslev, Tim. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics. 1986; 31(3): 307-327.Elisabeth Orskaug, Multivariate DCC-GARCH Model -With Various Error Distributions, Master of Science in Physics and Mathematics, Norwegian university of science and technology, June 2009.Stan Hurn. Likelihood methods in financial econometrics, School of Economics and Finance, Queensland University of Technology, April 2009.Josep Arnenic, Zoran Barbic, Blanka Skrabic, Maximization of the likelihood function in financial time series models, “UNPUBLISHED”