##Statistics Seminar##\\ Department of Mathematical Sciences
~~META:title =October 27, 2016~~
^ **DATE:**|Thursday, October 27, 2016 |
^ **TIME:**|1:15p-2:40p |
^ **LOCATION:**|WH 100E |
^ **SPEAKER:**|Haomiao Meng, Binghamton University|
^ **TITLE:**|Regression in Heterogeneous Problems|
\\
**Abstract**
We develop a new framework for modeling the impact of sub-cluster structure of data on
regression. The proposed framework is specifically designed for handling situations where the
sample is not homogeneous in the sense that the response variables in different regions of covariate
space are generated through different mechanisms. In such situation, the sample can be viewed
as a composition of multiple data sets each of which is homogeneous. The traditional linear and
general nonlinear methods may not work very well because it is hard to find one single unifying
model to fit multiple data sets simultaneously. The proposed method is flexible to ensure that the
data generated from different regions can be modeled using different functions. The key step of
our method is to incorporate the k-means clustering idea into the traditional regression framework
such that the regression and clustering tasks can be performed simultaneously. The k-means
clustering algorithm is extended to solve the optimization problem in our model which groups the
samples with similar response-covariate relationship together. General conditions under which the
estimation of the model parameters is consistent are investigated. By adding appropriate penalty
terms, the proposed model can conduct variable selection to eliminate the uninformative variables.
The conditions under which the proposed model can achieve asymptotic selection consistency
are also studied. The effectiveness of proposed method is demonstrated through applications to
simulated and real data.