# Survey Working Group - Fall 2012

Unless otherwise specified, these meetings will be held at 3:10 p.m. in Snedecor 2113

**November 26**

** Spatial Anomaly Events with AT&T Cellular Network Data**

Spatial anomaly detection represents searching for unusual high concentration or clustering of spatial incidents. It has wide applications in epidemiology and environmental science. In this study, we use K-scan method to detect spatial anomalies in AT&T cellular network in New York City area. The data are customer-submitted via an iPhone app which can record the location and time of network traffic incidents. Clusters are identified using a bottom-up approach and their significance are determined using a parametric bootstrap approach with the inhomogeneous Poisson process.

**November 12**

** Regression Analysis with Missing Covariates Without Specifying the Prediction Model for Missing Covariates**

The naive technique in regression analysis with missing covariates is to use fully observed data set by discarding incomplete cases. Maximum likelihood estimation (MLE) and inverse probability weighting (IPW) method are well known approaches to solve this problem in terms of model-based procedure. However MLE can be biased when we failed to specify the marginal distribution of the missing covariates. For the IPW, there exists a room for improvement of efficiency by accounting additional information from fully observed covariates. Our proposed hot-deck fractional imputation (HDFI) method has advantages in robustness and efficiency with respect to MLE and IPW, respectively. Variance estimation using Louis formula is discussed for our proposed method. Also, some simulation results are presented to compare the proposed method and several existing methods.

**November 5**

** Estimating Multiple Treatment Effects Using a General Method of Moments Based Estimator**

A General Method of Moments (GMM) based estimator is developed to estimate marginal treatment effects by incorporating auxiliary covariates in moment functions. This estimator applied the idea of GMM to survey data. It handles missing at random with strong ignorability mechanism in data. Under regular conditions we show the efficiency of this estimator comparing to others. Consistency and normality of this estimator are proved, and the consistent variance estimator is provided. Simulation studies are provided for comparisons.

**October 29**

** Methodological Overview and Assessment of the Physical Activity Measurement Survey (PAMS) Project**

Imagine this situation. You developed statistical methodology with reasonable statistical assumptions (e.g., normality). You then apply this methodology to real-world data, but are unable to meet all the methodological assumptions (e.g., cannot easily find a transformation to normality). Despite this, you decide to forge ahead, executing the steps of the methodology until results are produced. When you examine the results, however, you find them to be undesirable, suspect, or down-right nonsensical. That is, the results your methodology produced don't make contextual sense, leaving you with the questions, "Do I believe these results? Are my departures from the methodological assumptions influencing these results? If so, in what way?" This is precisely the situation we on the Physical Activity Measurement Survey (PAMS) project have found ourselves in as of late. In this seminar, I will discuss the PAMS methodology, the results produced when applied to the PAMS survey, why we have been cautious to accept their validity, and what we have done to answer the question, "Do we believe these results?"

**October 22**

** Imputation methods for quantile estimation with MAR missing values**

Inference in the presence of missing data is a widely encountered and difficult problem in statistics. Imputation is often used to facilitate parameter estimation, which uses the complete sample estimators to the imputed data set. We are interested in quantile estimation with missing values where the missing mechanism is missing at random (MAR) in the sense of Rubin(1976). We discussed several types of imputation method including Multiple imputation, parametric fractional imputation, fractional hot deck imputation and nonparametric imputation applied in quantile estimation. In the first three imputation methods, we rely on statistical model. We can also express Quantile function in the kernel regression framework and thus avoid the distribution assumption to achieve robustness. The resulting estimator can be called nonparametric imputation estimator. Variance estimation is also discussed for each of the imputation method. Results from a limited simulation study are presented to compare different imputation methods.

**October 15**

** Efficient Quantile Estimation Using Auxiliary Information**

We develop an estimator for the finite population quantile by incorporating auxiliary information through quantile regression model. The weights in the estimator are calibrated through the pseudo empirical likelihood approach. Under some regularity conditions and assumed super population working model, the estimator is optimal. The variance estimations are also presented. Simulation studies were conducted to compare the efficiency of the new estimator with that of the direct estimator.

**October 8**

** An Imputation Approach for Handling Mixed-Mode Surveys**

Mixed-mode surveys are frequently used to improve the survey participation but statistical tools for analyzing mixed-mode survey data are relatively underdeveloped. Motivated by a real survey in Korea, we consider an imputation approach to handling mixed-mode surveys. The proposed method uses measurement error models to explain the mode effects and then imputation is to predict the counterfactual potential outcome in the measurement error model. In particular, parametric fractional imputation of Kim (2011) can be used in this setup. The proposed method is applied to the survey of private education expenses in Korea.

**October 1**

**A Modified Random Forest Procedure with Survey Applications**

Plant breeders and survey statisticians share many similar goals and challenges in their work. For instance, they often would like to make inference about quantities that are expensive if not impossible to measure, and both face challenges successfully doing so. A typical solution is to measure a surrogate feature and use a model to relate that surrogate feature to the quantity of interest. This past summer, I worked at a chemical turned seed company that would like to predict crop traits without having to grow the plant. To do this they use methods ranging from the classic BLUP to more modern machine learning techniques. I have since applied some of these methods to my research on the Physical Activity Measurement Study (PAMS), yielding mixed results. In this talk, I will detail some of these methods, how they might be applied to surveys and my results.

**September 17**

**Benchmarked Small Area Prediction.**

Small area estimation often involves constructing predictions with an estimated model followed by a benchmarking step. In the benchmarking operation the predictions are modified so that weighted sums satisfy constraints. The most common constraint is the constraint that a weighted sum of the predictions is equal to the same weighted sum of the original observations. Augmented models as a method of imposing the constraint are investigated for both linear and nonlinear models. Variance estimators for benchmarked predictors are presented.

**September 10**

**A Hierarchical Multivariate Stratification Algorithm Applied to June Area Survey**

The last time I spoke at Survey Working Group, I introduced a stratification algorithm. I had applied the algorithm and compared it to other algorithms using simulated data. This time, I have applied the algorithm to June Area Survey (JAS) data and compared it to the current stratification. The algorithm is multivariate in nature (multiple crop types used) whereas the current algorithm simply takes into account one variable (percent agriculture). In states with somewhat diverse agriculture, I show that the new algorithm has improvement when it comes to sample size necessary to reach target CVs.

**September 3 **

Labor day! Have some fun in the sun!