For example, in a recent study we attempted to improve the us guidelines for risk stratification after screening colonoscopy cancer causes control 27 2016 11751185, with the aim to help reduce both overuse and underuse of followon surveillance colonoscopy. For semiparametric methods using the generalized estimating functions liang and zeger, 1986, as another class of examples, if data are missing at random and the missing propensity function is. A missing data perspective ding, peng and li, fan, statistical science, 2018. Semiparametric theory and missing data researchgate. Semiparametric theory and missing data, by tsiatis, 2006, 404 pages. In this book, tsiatis very carefully and didactically explains this theory. We introduce below novel bounded influence function estimators. On differentiability of implicitly defined function in semiparametric profile likelihood estimation hirose, yuichi, bernoulli, 2016. For full access to this pdf, sign in to an existing account, or. Semiparametric estimation of structural failure time models.
In many cases, the treatment of missing data in an analysis is carried out in a casual. We develop inference tools in a semiparametric regression model with missing response data. Statistics in the pharmaceutical industry, 3rd edition. Productivity, innovation, and entrepreneurship in missing data analysis, there is often a need to assess the sensitivity of key inferences to departures from untestable assumptions regarding the missing data process. Semiparametric regression analysis under imputation for. Bayesian inference for causal effects in randomized experiments with noncompliance imbens, guido w. This sensitivity is exacerbated when inverse probability weighting methods are used, which may overweight contaminated observations. Semiparametric estimation of nonstationary censored panel data models with time varying factor loads volume 24 issue 5 songnian chen, shakeeb khan skip to main content we use cookies to distinguish you from other users and to provide you with a better experience on our websites. A semiparametric model for heterogeneous panel data with. Methods for the analysis of sampled cohort data in the cox proportional hazards model. A semiparametric approach for analyzing nonignorable missing data hui xie, yi qian, leming qu. Semiparametric estimation of multinomial discretechoice models using a subset of choices jeremy t. Talks and presentations university of washington department.
We use a flexible semiparametric imputation technique to place individuals into strata. When data are mcar, the analysis performed on the data is unbiased. The maximum pseudolikelihood estimation of genest et al. This perspective makes clear the role of mechanisms that sample experimental units, assign treatments and record data. Moreover, we propose using inverse probability of censoring weighting to deal with dependent censoring. In the 90s, jamie robins and colleagues in harvard applied recently developed theory for semiparametric models to the problem of handling missing data. Those results confirm that the parametric approach utterly fails if the model is misspecified. Methods for estimating parameters with missing or coarsened data in as e.
A semiparametric approach for analyzing nonignorable. If the study variable does not a ect the probability of the response, the response mechanism. Pdf semiparametric regression and risk prediction with. Semiparametric regression models with missing data. The second stage estimates a spatial panel data model with the estimated weights matrix from the first stage. Aug 24, 2012 based on semiparametric theory and taking into account the symmetric nature of the population distribution, we propose both consistent estimators, i. Parameter estimation in parametric regression models with missing covariates is considered under a survey sampling setup. Specification of realistic parametric models for the mechanism generating high dimensional data is most often very challenging, if not impossible. With a semiparametric model, the parameter has both a finitedimensional component and an infinitedimensional component often a realvalued function defined on the real line. This book combines much of what is known in regard to the theory of estimation for semiparametric models with missing data in an organized and comprehensive manner. Semiparametric analysis of binary games of incomplete information. The survival data have missing value raised through the censoring mechanisms. Classical semiparametric inference with missing outcome data is not robust to contamination of the observed data and a single observation can have arbitrarily large influence on estimation of a parameter of interest. Nonlogit maximumlikelihood estimators are inconsistent when using data on a subset of the choices available to agents.
Central to the entire discipline of survival analysis, mostly right censoring exists. The description of the theory of estimation for semiparametric models is both rigorous and intuitive, relying on geometric ideas to reinforce the intuition and understanding of the theory. A semiparametric missingdatainduced intensity method for. Semiparametric location estimation under nonrandom sampling. Calibration estimation of semiparametric copula models with. A semiparametric inference to regression analysis with.
Missing data arise in almost all scientific disciplines. A semiparametric approach, working papers 201447, university of pretoria, department of economics. They achieve the semiparametric efficiency bound in the. Strategies for bayesian modeling and sensitivity analysis m. Analysis of semiparametric regression models for repeated. Semiparametric theory and missing data springerlink. Semiparametric theory and missing data anastasios tsiatis. Furthermore, most of the existing theory assumes a smooth loss function which excludes many interesting applications, such as those arising from quantile regression, survival analysis and missing data analysis. Semiparametric theory and missing data springer series in statistics series by anastasios tsiatis. Request pdf semiparametric regression models with missing data. Semiparametric theory and missing data pdf free download. Missing data methods based on an induced intensity. To remove this serious limitation on the methodology, we. The theory of missing data applied to semiparametric models is scattered.
Pdf locally efficient estimators for coarseneddata semiparametric models. Parametric assumptions equate to hidden observations. Following borgan, goldstein, and langholz 1995, let n i t be the binary indicator whether subject i has experienced the event by time t, t. Time series data are widely used to explore causal relationships, typically in a regression framework with lagged dependent variables.
A semiparametric model for heterogeneous panel data with fixed e ects lena k orber the london school of economics oliver lintonyand michael vogtz university of cambridge january 18, 20 this paper develops methodology for semiparametric panel data models in a setting where both the time series and the cross section are large. Stat992bmi826 universityofwisconsinmadison missing data. In many cases, the treatment of missing data in an analysis is carried out in a casual and adhoc manner, leading. Multiple imputation in quantile regression biometrika.
The geometric ideas for semiparametric fulldata models are extended to missingdata models. Semiparametric regression analysis with missing response. We propose a nonparametric imputation method for the missing values, which then leads to imputed estimating equations for the finite dimensional parameter of interest. Calibration estimation of semiparametric copula models. All the estimators are proved to be asymptotically normal, with the same asymptotic variance. Estimation in semiparametric models with missing data. Penalized profiled semiparametric estimating functions. Robins, andrea rotnitzky, and lue ping zhao we propose a class of inverse probability of censoring weighted estimators for the parameters of models for the dependence of the. Semiparametric optimal estimation with nonignorable. Conditional moment models with data missing at random. Our theoretical results provide new insight for the theory of semiparametric efficiency bounds literature and open the door to new applications.
I show that the semiparametric, multinomial maximumscore estimator is consistent when using data on a subset of choices. The semiparametric models allow for estimating functions that are nonsmooth with respect to the parameter. In many cases, the treatment of missing data in an analysis is carried out in a casual and adhoc manner, leading, in many cases, to invalid inference and erroneous conclusions. Bayesian inference for causal effects follows from finding the predictive distribution of the values under the other assignments of treatments. Attrition is a type of missingness that can occur in longitudinal studiesfor instance. V \displaystyle \theta \subseteq \mathbb r k\times v, where v \displaystyle v is an infinitedimensional space. Moment and conditional moment restriction models are widely used in statistics, biostatistics and econometrics. Semiparametric analysis of binary games of incomplete information, department of economics working papers 911, the university of texas at austin, department of economics, revised nov 2012. Semiparametric theory and missing data pdf free download epdf. A semiparametric regression imputation estimator, a marginal average estimator and a marginal propensity score weighted estimator are defined.
This paper investigates the estimation of semiparametric copula models with data missing at random. Evaluating the causal effect of university grants on student dropout. Semiparametric causal inference in matched cohort studies. Chapter 5 preliminaries on semiparametric theory and missing. We propose a nonparametric imputation method for the missing values, which then leads to imputed estimating. Semiparametric inverse propensity weighting for nonignorable.
A common alternative though in many cases equivalent scorebased definition of the if is presented in the appendix see bickel et al. Asymptotic theory for the semiparametric accelerated. Theory on semiparametric efficient estimation in missing data problems has been systematically developed by robins and his coauthors. It is common to encounter missing data among the potential predictor variables in the setting of model selection. Sieve maximum likelihood estimation for a general class of accelerated hazards models with bundled parameters zhao, xingqiu, wu, yuanshan, and yin, guosheng, bernoulli, 2017. Bridging a survey redesign using multiple imputation.
Empirical process approach in a twosample locationscale model with censored data hsieh, fushing, the annals of statistics, 1996. Semiparametric estimation of multinomial discretechoice. We can treat the traditional sample as if the responses were missing for income sources targeted by the redesign and use multiple imputation to generate plausible responses. A semiparametric estimation of mean functionals with. Some items are more likely to generate a nonresponse than others.
Cluster allocation design networks madrigal, ana maria, bayesian analysis, 2007. Censoring is the problem of not finding the exact time of an event during the experimental or observational studies, which makes the analysis much more complex. Using the semiparametric efficiency theory, we derive the first semiparametric doubly robust estimators, which are consistent if the model for the treatment process or the failure time model, but not necessarily both, is correctly specified. See, for example, the top half of table 2, where the bias is. Pdf analysis of semiparametric regression models for. Session on semiparametric inference in practice organized by florentina bunea, florida state university semiparametric models with data missing by design and inverse probability weighted empirical processes. This treatment will give the reader a deep understanding of the underlying theory for missing and coarsened data. This book summarizes current knowledge regarding the theory of estimation for semiparametric models with missing data, in an organized and. Abstract we develop inference tools in a semiparametric partially linear regression model with missing response data. Unified methods for censored longitudinal data and causality. In this article, we consider a general rankbased estimating method for model 1. Chen, jinbo and norman breslow 2004 semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model canad.
Introduction to double robust methods for incomplete data. In particular, we investigate a class of regressionlike mean regression, quantile regression, models with missing data, an example of a supply and demand simultaneous equations model and a. Fullsemiparametriclikelihoodbased inference for non. The first stage follows pinkse, slade, and brett s semiparametric approach to estimate the spatial weights matrix using pooled data, which gives a consistent estimate for the spatial weights matrix backed up by economic theory. We consider a class of doubly weighted rankbased estimating methods for the transformation or accelerated failure time model with missing data as arise, for. To remove this serious limitation on the methodology, we use an instrument, i. For full access to this pdf, sign in to an existing account, or purchase an. Regressionbased causality tests rely on an array of functional form and distributional assumptions for valid causal inference. Semiparametric regression and risk prediction with competing risks data under missing cause of failure article pdf available in lifetime data analysis january 2020 with 14 reads. This paper considers the problem of parameter estimation in a general class of semiparametric models when observations are subject to missingness at random. In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components a statistical model is a parameterized family of distributions. Semiparametric theory and missing data by tsiatis, a.
A semiparametric inference to regression analysis with missing coariatesv in survey data shu angy and jae kwang kim department of statistics, iowa state university abstract. Semiparametric regression analysis with missing response at random, cemmap working papers cwp1103, centre for microdata methods and practice, institute for fiscal studies. For this model, we consider the case where some yvalues in a sample of size n may be missing, but x and t are observed completely. Introduction handling missing data often requires some assumptions about the response mechanism. All data generated or analyzed during this study are included in the published article lau et al.
Competing risk regression models for epidemiologic data. Information bounds for cox regression models with missing data nan, bin, emond, mary j. Except in relatively simple problems, semiparametric efficient. Values in a data set are missing completely at random mcar if the events that lead to any particular dataitem being missing are independent both of observable variables and of unobservable parameters of interest, and occur entirely at random. Parameter estimation in parametric regression models with missing coariatesv is considered under a survey sampling setup. Missing data is frequently encountered in many areas of statistics. Semiparametric causality tests using the policy propensity score. Statistical analysis in the presence of missing data has been an area of considerable interest because ignoring the missing data often destroys the representativeness of the remaining sample and is likely to lead to biased parameter estimates. Second, the parametric approach produces extremely large bias for all cases when the propensity score model is misspecified. The theory of missing data applied to semiparametric models is scattered throughout the literature with no thorough comprehensive treatment of the subject. Under missingness at random, a semiparametric maximum likelihood approach is proposed which requires no parametric specification of the marginal covariate distribution. If the study variable does not a ect the probability of the response, the response mechanism is called missing at random mar 27. It starts with the study of semiparametric methods when there are no missing data. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data james m.
1583 1378 1473 50 176 499 685 662 106 751 889 52 913 1156 1119 651 1218 11 1455 647 1279 1416 979 1510 784 1551 864 657 813 1170 1419 672 351 1574 93 920 720 447 846 276 713 236 793 487 1297 830