Data Science Seminar
Hosted by Department of Mathematical Sciences

Abstract

This talk surveys density estimation with noisy data and discusses applications to Monte Carlo simulation experiments. Suppose that we have an independent sample $W_i = X_i + \epsilon_i$ and we wish to estimate the probability density function of $X_i$, but one can only observe $W_i$ that have been contaminated with measurement errors $\epsilon_i$.

Early work on this problem used Fourier-based deconvoluting kernel estimators. These have an impressive asymptotic theory, but they often have less than fully satisfactory finite-sample performance. Moreover, they assume a constant measurement error variance and can be seriously biased if this assumption is violated. These problem have motivated the search for other methods of density estimation in the presence of measure error. Staudenmayer et al.\ (2008) proposed a Bayesian estimator using splines and that accommodates non-constant measurement error variance.

Steckley et al.\ (2016) applied deconvolution to data from simulation experiments to estimate the density of a conditional expectation. Also motivated by simulation experiments, Yang et al.\ (2018) developed deconvolution estimators that use quadratic programming (QP). An advantage of the QP approach is that the natural constraints that a density function be nonnegative and integrate to one are easily enforced. Other constraints such as tail convexity, tail monotonicity, and support constraints can be enforced as well.

This is joint work with John Staudenmayer, John Buonaccorsi, Sam Steckley, Shane Henderson, Ran Yang, Dan Apley, and Jeremy Staum.

This seminar is part of the Dean's Speaker Series in Statistics and Data Science