User’s Guide : Long-run Covariance Estimation : Technical Discussion
  
Technical Discussion
 
Nonparametric Kernel
Kernel Functions
Bandwidth
Automatic Bandwidth Selection
Andrews Automatic Selection
Newey-West Automatic Selection
Parametric VARHAC
Prewhitened Kernel
Our basic discussion and notation follows the framework of Andrews (1991) and Hansen (1992a).
Consider a sequence of mean-zero random -vectors that may depend on a ‑vector of parameters , and let where is the true value of . We are interested in estimating the LRCOV matrix ,
(59.32)
where
(59.33)
is the autocovariance matrix of at lag . When is second-order stationary, equals times the spectral density matrix of evaluated at frequency zero (Hansen 1982, Andrews 1991, Hamilton 1994).
Closely related to are two measures of the one-sided LRCOV matrix:
(59.34)
The matrix , which we term the strict one-sided LRCOV, is the sum of the lag covariances, while the also includes the contemporaneous covariance . The two-sided LRCOV matrix is related to the one-sided matrices through and .
Despite the important role the one-sided LRCOV matrix plays in the literature, we will focus our attention on , since results are generally applicable to all three measures; exception will be made for specific issues that require additional comment.
In the econometric literature, methods for using a consistent estimator and the corresponding to form a consistent estimate of are often referred to as heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators.
There have been three primary approaches to estimating :
1. The nonparametric kernel approach (Andrews 1991, Newey-West 1987) forms estimates of by taking a weighted sum of the sample autocovariances of the observed data.
2. The parametric VARHAC approach (Den Haan and Levin 1997) specifies and fits a parametric time series model to the data, then uses the estimated model to obtain the implied autocovariances and corresponding .
3. The prewhitened kernel approach (Andrews and Monahan 1992) is a hybrid method that combines the first two approaches, using a parametric model to obtain residuals that “whiten” the data, and a nonparametric kernel estimator to obtain an estimate of the LRCOV of the whitened data. The estimate of is obtained by “recoloring” the prewhitened LRCOV to undo the effects of the whitening transformation.
Below, we offer a brief description of each of these approaches, paying particular attention to issues of kernel choice, bandwidth selection, and lag selection.
Nonparametric Kernel
The class of kernel HAC covariance matrix estimators in Andrews (1991) may be written as:
(59.35)
where the sample autocovariances are given by
(59.36)
is a symmetric kernel (or lag window) function that, among other conditions, is continous at the origin and satisfies for all with , and is a bandwidth parameter. The leading term is an optional correction for degrees-of-freedom associated with the estimation of the parameters in .
The choice of a kernel function and a value for the bandwidth parameter completely characterizes the kernel HAC estimator.
Kernel Functions
There are a large number of kernel functions that satisfy the required conditions. EViews supports use of the following kernel shapes:
 
Truncated uniform
Bartlett
Bohman
Daniell
Parzen
Parzen-Riesz
Parzen-Geometric
Parzen-Cauchy
Quadratic Spectral
Tukey-Hamming
Tukey-Hanning
Tukey-Parzen
Note that for for all kernels with the exception of the Daniell and the Quadratic Spectral. The Daniell kernel is presented in truncated form in Neave (1972), but EViews uses the more common untruncated form. The Bartlett kernel is sometimes referred to as the Fejer kernel (Neave 1972).
A wide range of kernels have been employed in HAC estimation. The truncated uniform is used by Hansen (1982) and White (1984), the Bartlett kernel is used by Newey and West (1987), and the Parzen is used by Gallant (1987). The Tukey-Hanning and Quadratic Spectral were introduced to the econometrics literature by Andrews (1991), who shows that the latter is optimal in the sense of minimizing the asymptotic truncated MSE of the estimator (within a particular class of kernels). The remaining kernels are discussed in Parzen (1958, 1961, 1967).
Bandwidth
The bandwidth operates in concert with the kernel function to determine the weights for the various sample autocovariances in Equation (59.35). While some authors restrict the bandwidth values to integers, we follow Andrews (1991) who argues in favor of allowing real valued bandwidths.
To construct an operational nonparametric kernel estimator, we must choose a value for the bandwidth . Under general conditions (Andrews 1991), consistency of the kernel estimator requires that is chosen so that and as . Alternately, Kiefer and Vogelsang (2002) propose setting in a testing context.
For the great majority of supported kernels for so that the bandwidth acts indirectly as a lag truncation parameter. Relating to the corresponding integer lag number of included lags requires, however, examining the properties of the kernel at the endpoints . For kernel functions where (e.g., Truncated, Parzen-Geometric, Tukey-Hanning), is simply a real-valued truncation lag, with at most autocovariances having non-zero weight. Alternately, for kernel functions where (e.g., Bartlett, Bohman, Parzen), the relationship is slightly more complex, with autocovariances entering the estimator with non-zero weights.
The varying relationship between the bandwidth and the lag-truncation parameter implies that one should examine the kernel function when choosing bandwidth values to match computations that are quoted in lag truncation form. For example, matching Newey-West’s (1987) Bartlett kernel estimator which uses weighted autocovariance lags requires setting . In contrast, Hansen’s (1982) or White’s (1984) estimators, which sum the first unweighted autocovariances, should be implemented using the Truncated kernel with .
Automatic Bandwidth Selection
Theoretical results on the relationship between bandwidths and the asymptotic truncated MSE of the kernel estimator provide finer discrimination in the rates at which bandwidths should increase. The optimal bandwidths may be written in the form:
(59.37)
where is a constant, and is a parameter that depends on the kernel function that you select (Parzen 1958, Andrews 1991). For the Bartlett and Parzen-Geometric kernels should grow (at most) at the rate . The Truncated kernel does not have an theoretical optimal rate, but Andrews (1991) reports Monte Carlo simulations that suggest that works well. The remaining EViews supported kernels have so their optimal bandwidths grow at rate (though we point out that Daniell kernel does not satisfy the conditions for the optimal bandwidth theorems).
While theoretically useful, knowledge of the rate at which bandwidths should increase as does not tell us the optimal bandwidth for a given sample size, since the constant remains unspecified.
Andrews (1991) and Newey and West (1994) offer two approaches to estimating . We may term these techniques automatic bandwidth selection methods, since they involve estimating the optimal bandwidth from the data, rather than specifying a value a priori. Both the Andrews and Newey-West estimators for may be written as:
(59.38)
where and the constant depend on properties of the selected kernel and is an estimator of , a measure of the smoothness of the spectral density at frequency zero that depends on the autocovariances . Substituting into Equation (59.37), the resulting plug-in estimator for the optimal automatic bandwidth is given by:
(59.39)
The that one uses depends on properties of the selected kernel function. The Bartlett and Parzen-Geometric kernels should use since they have . should be used for the other EViews supported kernels which have . The Truncated kernel does not have a theoretically proscribed choice, but Andrews recommends using . The Daniell kernel has , though we remind you that it does not satisfy the conditions for Andrews’s theorems. “Kernel Function Properties” summarizes the values of and for the various kernel functions.
It is of note that the Andrews and Newey-West estimators both require an estimate of that requires forming preliminary estimates of and the smoothness of . Andrews and Newey-West offer alternative methods for forming these estimates.
Andrews Automatic Selection
The Andrews (1991) method estimates parametrically: fitting a simple parametric time series model to the original data, then deriving the autocovariances and corresponding implied by the estimated model.
Andrews derives formulae for several parametric models, noting that the choice between specifications depends on a tradeoff between simplicity and parsimony on one hand and flexibility on the other. EViews employs the parsimonius approach used by Andrews in his Monte Carlo simulations, estimating -univariate AR(1) models (one for each element of ), then combining the estimated coefficients into an estimator for .
For the univariate AR(1) approach, we have:
(59.40)
where are parametric estimators of the smoothness of the spectral density for the -th variable (Parzen’s (1957) ‑th generalized spectral derivatives) at frequency zero. Estimators for are given by:
(59.41)
for and , where are the estimated autocovariances at lag implied by the univariate AR(1) specification for the ‑th variable.
Substituting the univariate AR(1) estimated coefficients and standard errors into the theoretical expressions for , we have:
(59.42)
which may be inserted into Equation (59.39) to obtain expressions for the optimal bandwidths.
Lastly, we note that the expressions for depend on the weighting vector which governs how we combine the individual into a single measure of relative smoothness. Andrews suggests using either for all or for all but the instrument corresponding to the intercept in regression settings. EViews adopts the first suggestion, setting for all .
Newey-West Automatic Selection
Newey-West (1994) employ a nonparametric approach to estimating . In contrast to Andrews who computes parametric estimates of the individual , Newey-West uses a Truncated kernel estimator to estimate the corresponding to aggregated data.
First, Newey and West define, for various lags, the scalar autocovariance estimators:
(59.43)
The may either be viewed as the sample autocovariance of a weighted linear combination of the data using weights , or as a weighted combination of the sample autocovariances.
Next, Newey and West use the to compute nonparametric truncated kernel estimators of the Parzen measures of smoothness:
(59.44)
for . These nonparametric estimators are weighted sums of the scalar autocovariances obtained above for from to , where , which Newey and West term the lag selection parameter, may be viewed as the bandwidth of a kernel estimator for .
The Newey and West estimator for may then be written as:
(59.45)
for . This expression may be inserted into Equation (59.39) to obtain the expression for the plug-in optimal bandwidth estimator.
In comparing the Andrews estimator Equation (59.42) with the Newey-West estimator Equation (59.45) we see two very different methods of distilling results from the -dimensions of the original data into a scalar measure . Andrews computes parametric estimates of the generalized derivatives for the individual elements, then aggregates the estimates into a single measure. In contrast, Newey and West aggregate early, forming linear combinations of the autocovariance matrices, then use the scalar results to compute nonparametric estimators of the Parzen smoothness measures.
To implement the Newey-West optimal bandwidth selection method we require a value for , the lag-selection parameter, which governs how many autocovariances to use in forming the nonparametric estimates of . Newey and West show that should increase at (less than) a rate that depends on the properties of the kernel. For the Bartlett and the Parzen-Geometric kernels, the rate is . For the Quadratic Spectral kernel, the rate is . For the remaining kernels, the rate is (with the exception of the Truncated and the Daniell kernels, for which the Newey-West theorems do not apply).
In addition, one must choose a weight vector . Newey-West (1987) leave open the choice of , but follow Andrew’s (1991) suggestion of for all but the intercept in their Monte Carlo simulations. EViews differs from this choice slightly, setting for all .
Parametric VARHAC
Den Haan and Levin (1997) advocate the use of parametric methods, notably VARs, for LRCOV estimation. The VAR spectral density estimator, which they term VARHAC, involves estimating a parametric VAR model to filter the , computing the contemporaneous covariance of the filtered data, then using the estimates from the VAR model to obtain the implied autocovariances and corresponding LRCOV matrix of the original data.
Suppose we fit a VAR() model to the . Let be the matrix of estimated -th order AR coefficients, . Then we may define the innovation (filtered) data and estimated innovation covariance matrix as:
(59.46)
and
(59.47)
Given an estimate of the innovation contemporaneous variance matrix and the VAR coefficients , we can compute the implied theoretical autocovariances of . Summing the autocovariances yields a parametric estimator for , given by:
(59.48)
where
(59.49)
Implementing VARHAC requires a specification for , the order of the VAR. Den Haan and Levin use model selection criteria (AIC or BIC-Schwarz) using a maximum lag of to determine the lag order, and provide simulations of the performance of estimator using data-dependent lag order.
The corresponding VARHAC estimators for the one-sided matrices and do not have simple expressions in terms of and . We can, however, obtain insight into the construction of the one-sided VARHAC LRCOVs by examining results for the VAR(1) case. Given estimation of a VAR(1) specification, the estimators for the one-sided long-run variances may be written as:
(59.50)
Both estimators require estimates of the VAR(1) coefficient estimates , as well as an estimate of , the contemporaneous covariance matrix of .
One could, as in Park and Ogaki (1991) and Hansen (1992b), use the sample covariance matrix so that the estimates of and employ a mix of parametric and non-parametric autocovariance estimates. Alternately, in keeping with the spirit of the parametric methodology, EViews constructs a parametric estimator using the estimated VAR(1) coefficients and .
Prewhitened Kernel
Andrews and Monahan (1992) propose a simple modification of the kernel estimator which performs a parametric VAR prewhitening step to reduce autocorrelation in the data followed by kernel estimation performed on the whitened data. The resulting prewhitened LRVAR estimate is then recolored to undo the effects of the transformation. The Andrews and Monahan approach is a hybrid that combines the parametric VARHAC and nonparametric kernel techniques.
There is evidence (Andrews and Monahan 1992, Newey-West 1994) that this prewhitening approach has desirable properties, reducing bias, improving confidence interval coverage probabilities and improving sizes of test statistics constructed using the kernel HAC estimators.
The Andrews and Monahan estimator follows directly from our earlier discussion. As in a VARHAC, we first fit a VAR() model to the and obtain the whitened data (residuals):
(59.51)
In contrast to the VAR specification in the VARHAC estimator, the prewhitening VAR specification is not necessarily believed to be the true time series model, but is merely a tool for obtaining values that are closer to white-noise. (In addition, Andrews and Monahan adjust their VAR(1) estimates to avoid singularity when the VAR is near unstable, but EViews does not perform this eigenvalue adjustment.)
Next, we obtain an estimate of the LRCOV of the whitened data by applying a kernel estimator to the residuals:
(59.52)
where the sample autocovariances are given by
(59.53)
Lastly, we recolor the estimator to obtain the VAR prewhitened kernel LRCOV estimator:
(59.54)
The prewhitened kernel procedure differs from VARHAC only in the computation of the LRCOV of the residuals. The VARHAC estimator in Equation (59.48) assumes that the residuals are white noise so that the LRCOV may be estimated using the contemporaneous variance matrix , while the prewhitening kernel estimator in Equation (59.52) allows for residual heteroskedasticity and serial dependence through its use of the HAC estimator . Accordingly, it may be useful to view the VARHAC procedure as a special case of the prewhitened kernel with and for .
The recoloring step for one-sided prewhitened kernel estimators is complicated when we allow for HAC estimation of (Park and Ogaki, 1991). As in the VARHAC setting, the expressions for one-sided LRCOVs are quite involved but the VAR(1) specification may be used to provide insight. Suppose that the VARHAC estimators of the one-sided LRCOV matrices defined in Equation (59.50) are given by and , and let be the strict one-sided kernel estimator computed using the prewhitened data:
(59.55)
Then the prewhitened kernel one-sided LRCOV estimators are given by:
(59.56)