Department of Mathematical Sciences
|Thursday, December 4, 2014
|1:15pm to 2:40pm
|Lu Yao (Binghamton University)
|Precision Matrix Estimation with Applications to Stock Return Data
Estimation for the precision matrix is studied in this thesis. Precision matrix is the inverse of the covariance matrix. It can represent the conditional dependence relationship among variables. For a normal random vector, the zero elements in the precision matrix indicate conditional independence between two variables, given all other variables. Such a relation is often illustrated using a graph, hence the Gaussian graphical model. The Gaussian graphical model can well illustrate the relationship with sparsity in the precision matrix. The method in Meinshausen and Bühlmann (2006) was to conduct Neighborhood Selection with Lasso. It estimates the conditional independence separately for each node. By repeating the estimation for all nodes, the conditional dependence structure is thus given. The penalized negative log-likelihood method is then proposed by Yuan and Lin (2007), which provided a sparse and shrinkage estimator of the precision matrix. As a fast implementation of the covariance matrix estimation in Banerjee et al. (2008), which also led to an estimation of the precision matrix similar to that of Yuan and Lin (2007), Friedman et al. (2008) introduced the block-wise coordinate descent method. This method led to the Graphical Lasso Algorithm, which was comparatively more attractive than the competing methods. In particular, in the Graphical Lasso algorithm, the current estimate for the covariance matrix is partitioned, and a dual problem is solved to update the estimate. The dual problem turns out to be a lasso (regression) problem, which can be solved efficiently using the coordinate descent method. The Graphical Lasso algorithm repeats these steps iteratively until the estimate for the covariance matrix converges, which yields the estimation to the precision matrix estimation.
The Graphical Lasso Algorithm is the implementation that is used in this thesis to analyze real life stock return data. The data set consists of log-return of 42 US and Chinese stocks traded in the US stock market, and is retrieved from Yahoo Finance. The output of the algorithm is depicted using undirected graphs which represent the adjacency between stocks. A central question in this real application is whether certain publicly traded companies are at the center of the stock market and whether there are clusters of stocks that are conditionally dependent given all other stocks in the study. The results in the undirected graphs provide a more direct sense of the conditional dependence relationship between stocks.