Statistical Machine Learning Seminar
Hosted by Department of Mathematical Sciences
This PhD dissertation is divided into four chapters, where right-censored (RC) data and interval- censored (IC) data under several different types of time dependent covariates assumptions will be discussed.
In Chapter 0, we will introduce some basic concepts and notations about survival analysis.
Chapter 1 reproduces the paper of Yu et al.(2015). In this chapter piecewise Cox models with right-censored data will be discussed. Piecewise Cox models are regression models that follow dif- ferent Cox models when restricted to different time intervals. We study a general class of piecewise Cox models that involve a single cut point so that there are two separate Cox models correspond- ing to the two time intervals created. We discuss the computation of the semi-parametric maximum likelihood estimates (SMLE) of the parameters, with right-censored data, and a simplified algorithm for the maximum partial likelihood estimates (MPLE). Simulation studies suggest that MPLE com- pares favorably with its SMLE counterpart, even though the SMLE is more efficient. To assess the appropriateness of the model assumption, we propose a simple diagnostic plotting method. This method will enable us to determine an appropriate cut point. We show that the results for the case of a single cut point can be extended to involving more than one cut point. Finally, we apply the methodology we have developed for piecewise Cox models to the survival analysis of a long-term breast cancer follow-up study on the prognostic significance of bone marrow micrometastasis. Our diagnostic plots suggest that it is appropriate to apply the piecewise Cox model to our data.
In Chapter 2, we consider the time-dependent covariates proportional hazards (TDCPH) model with interval-censored (IC) relapse times under the distribution-free set-up. The partial likelihood approach is not applicable for IC data, thus we use the full likelihood approach. It turns out that under the TDCPH model with IC data, the semi-parametric MLE (SMLE) of the covariate effect under the standard generalized likelihood is not unique and is not consistent. In fact, the parameter under the TDCPH model with IC data is not identifiable unless some stronger assumptions are imposed. We propose a modification to the likelihood function so that its SMLE is unique. We show that the parameter is identifiable under certain regularity conditions. Under the regularity assumptions, our simulation studies suggest that such an SMLE is consistent and we also give a rigorous proof of the consistency. We apply the method to our cancer relapse time data and conclude that the bone marrow micrometastasis does not have a significant prognostic factor.
In Chapter 3, we consider the semi-parametric estimation problem under the proportional haz- ards (PH) model with continuous time-dependent covariates and interval-censored data. We show that unlike the PH model with time-independent covariates, if the observable random vector takes on finitely many values, then the parameters in the model are not identifiable and there exist no consistent estimators of the parameters. We establish the identifiability condition for this issue. It provides a guideline for carrying out simulation studies and for the proof of consistency of certain semi-parametric estimators. Moreover, the naive extension of the generalized likelihood function does not lead to a consistent estimator. We propose two proper modifications of the generalized likelihood function, and they both yield consistent estimators. We also carry out simulation studies for these estimators. The covariate z(t) = u1(t > c)(t − c) will be discussed.