User Tools

Site Tools


seminars:stat:10082015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

seminars:stat:10082015 [2015/09/05 22:20]
qiao
seminars:stat:10082015 [2015/09/05 22:29] (current)
qiao
Line 17: Line 17:
 Classification is an important tool with many useful applications. Among the many existing classification methods, Fisher'​s Linear Discriminant Analysis (LDA) is a traditional model-based approach which makes use of the distributional information such as the covariance of the features. However, in the HDLSS setting, LDA cannot be directly deployed because the sample covariance is not invertible. While there are modern methods designed to deal with the high dimensionality,​ it is difficult to obtain good performance for the partially labeled data when the analysis is based on the labeled data alone, due to the scarcity of the data. In order to overcome the difficulty, and to fully make use of the seemingly useless unlabeled data, we propose a semi-supervised sparse LDA classifier in this dissertation. Our method combines LDA, a method-based approach, with some machine learning oriented components. The extra components help to extract useful information from the unlabeled data which can boost the classification performance in some situations. Classification is an important tool with many useful applications. Among the many existing classification methods, Fisher'​s Linear Discriminant Analysis (LDA) is a traditional model-based approach which makes use of the distributional information such as the covariance of the features. However, in the HDLSS setting, LDA cannot be directly deployed because the sample covariance is not invertible. While there are modern methods designed to deal with the high dimensionality,​ it is difficult to obtain good performance for the partially labeled data when the analysis is based on the labeled data alone, due to the scarcity of the data. In order to overcome the difficulty, and to fully make use of the seemingly useless unlabeled data, we propose a semi-supervised sparse LDA classifier in this dissertation. Our method combines LDA, a method-based approach, with some machine learning oriented components. The extra components help to extract useful information from the unlabeled data which can boost the classification performance in some situations.
  
-Before learning a data set, a natural question to ask is whether the predefined classes are really different from one another (in the context of classification),​ or whether clusters are really there (in the context of clustering). Such a question may be answered by significance tests. Even in the challenging HDLSS setting, there has been some recent developments. However, a significance analysis tool for the partially labeled data has not been developed in the HDLSS setting. In this dissertation,​ we propose a significance analysis approach for the HDLSS partially labeled data. Our method makes use of the whole data and tries to test the class difference as if all the labels were observed. Compared to a testing method that ignores the label information,​ our method provides a greater power, meanwhile, maintaining the size.+Before learning a data set, a natural question to ask is whether the predefined classes are really different from one another (in the context of classification),​ or whether clusters are really there (in the context of clustering). Such a question may be answered by significance tests. Even in the challenging HDLSS setting, there have been some recent developments. However, a significance analysis tool for the partially labeled data has not been developed in the HDLSS setting. In this dissertation,​ we propose a significance analysis approach for the HDLSS partially labeled data. Our method makes use of the whole data and tries to test the class difference as if all the labels were observed. Compared to a testing method that ignores the label information,​ our method provides a greater power, meanwhile, maintaining the size.
  
 In studying both aspects of the partially labeled data, we provide theoretical justifications to the methods proposed. In particular, our theoretical study has emphasized on the HDLSS setting, shedding light on the usefulness of the proposed methods. Lastly, comprehensive simulation and data examples have illustrated the effectiveness of the methods. In studying both aspects of the partially labeled data, we provide theoretical justifications to the methods proposed. In particular, our theoretical study has emphasized on the HDLSS setting, shedding light on the usefulness of the proposed methods. Lastly, comprehensive simulation and data examples have illustrated the effectiveness of the methods.
 </​WRAP>​ </​WRAP>​
  
seminars/stat/10082015.1441506056.txt · Last modified: 2015/09/05 22:20 by qiao