2409 Snedecor Hall
Ames, IA 50011-1210
Phone: 515-294-5582
Fax: 515-294-4040


Homepage
Publications
Recent Publications
Students
Teaching
A Centered Bivariate Spatial Regression Model for Binary Data with an Application to Presettlement Vegetation Data in the Midwestern United States
Author(s): Caragea, P.C., Berg, E.
Journal of Agricultural, Biological and Environmental Statistics, 19(4): 453-471, 2014

Abstract:
Spatially structured discrete data arise in diverse areas of application, such as forestry, epidemiology, or soil sciences. Data from several binary variables are often collected at each location. Variation in distributional properties across the spatial domain is of inter- est. The specific application that motivates our work involves characterizing historical distributions of two species of Oak in the Driftless Area in the Midwestern United States. Scientists are interested in understanding the patterns of interaction between species, as well as their relationships to spatial covariates. Accounting for spatial dependence is not only of inherent interest but also reduces prediction mean squared error, and is necessary for obtaining appropriate measures of uncertainty (i.e., standard errors and confidence intervals). To address the needs of the application, we introduce a centered bivariate autologistic model, which accounts for the statistical dependence in two response variables simultaneously, for the association between them and for the effect of spatial covariates. The model proposed here offers a relatively stable large-scale model structure, with model parameters which can be interpreted in the usual sense across levels of dependence. Since the model allows for separate dependence parameters for each variable, it offers, in essence, the equivalent of a model with a non-separable covariance function. The flexible model framework permits straightforward generalizations to structures with more than two variables, a temporal component, or an irregular lattice domain.

Keywords:
Presettlement vegetation data (PLSS); Bivariate binary data; Centered au- tologistic models; Spatial dependence; Conditional autoregressive (CAR) model; Markov random field (MRF); Spatial regression for binary data; Discrete index random field.

Link to article in the
Journal of Agricultural, Biological and Environmental Statistics
Spatial Distribution of Aphis glycines (Hemipetra: Aphididae): A Summary of the Suction Trap Network
Author(s): Schmidt, N., O'Neal, M., Anderson, P., Lagos, D., Voegtlin, D., Bailey, W., Caragea, P.C., Cullen, E., DiFonzo, C., Elliott, K., Gratton, C., Johnson, D., Krupke, C., McCornack, B., O'Neil, R., Ragsdale, D.W., Tilmon, K., Whitworth, J.
Journal of Economic Entomology, 105(1): 259-271, 2012.

Abstract:
The soybean aphid, Aphis glycines Matsumura (Hemiptera: Aphididae), is an economically important pest of soybean, Glycine max (L.) Merrill, in the United States. Phenological information of A. glycines is limited; speciffically, little is known about factors guiding migrating aphids and potential impacts of long distance flights on local population dynamics. Increasing our understanding of A. glycines population dynamics may improve predictions of A. glycines outbreaks and improve management efforts. In 2005 a suction trap network was established in seven Midwest states to monitor the occurrence of alates. By 2006, this network expanded to 10 states and consisted of 42 traps. The goal of the STN was to monitor movement of A. glycines from their overwintering host Rhamnus spp. to soybean in spring, movement among soybean fields during summer, and emigration from soybean to Rhamnus in fall. The objective of this study was to infer movement patterns of A. glycines on a regional scale based on trap captures, and determine the suitability of certain statistical methods for future analyses. Overall, alates were not commonly collected in suction traps until June. The most alates were collected during a 3-wk period in the summer (late July to mid-August), followed by the fall, with a peak capture period during the last 2 wk of September. Alate captures were positively correlated with latitude, a pattern consistent with the distribution of Rhamnus in the United States, suggesting that more southern regions are infested by immigrants from the north.

Keywords:
Forecasting, Migration, Dispersal

Link to article in the
Journal of Economic Entomology
Cropping pattern choice with proximity to ethanol production and animal feeding operations
Author(s): Khanal, S., Anex, R., Gelder, B., Dixon, P., Caragea, P.
Biofuels, Bioproducts and Biorefining Journal, 6(4): 431-443, 2012.

Abstract:
Growing demand for corn due to the expansion of corn ethanol production has increased concerns that corn demand will be met by growing corn more intensively and shifting cropland from cropping systems with lower environmental impact into continuous corn (CC). Cropping system choice may also be influenced by the advantage of the high nutrient uptake of CC, which allows higher manure application rates and lowers manure management costs. A binary logistic regression model is used to estimate the probability of crop rotation choice as a function of proximity to concentrated animal feeding operations (CAFOs) and ethanol plants during 2004/2005 and 2006/2007 in the Des Moines Lobe region of Iowa. The probability of CC is found to be elevated in the vicinity of large hog operations as well as in a larger area around ethanol plants. The probability of CC around hog-feeding operations decreases rapidly with distance away from the facility, but the large number of hog facilities in the region results in a large cumulative influence. Understanding the multiple motivations for cropping system choices is vital to forming effective agricultural policy.

Keywords:
Crop rotation; Concentrated animal feeding operations; Ethanol plants

Link to article in the
Biofuels, Bioproducts and Biorefining Journal
Centered Parameterizations and Dependence Limitations in Markov Random Field Models
Author(s): Kaiser, M.S., Caragea, P.C., Furukawa, K.
Journal of Statistical Planning and Inference, 142(7): 1855-1863, 2012

Abstract:
Markov random field models incorporate terms representing local statistical dependence among variables in a discrete-index random field. Traditional parameterizations for models based on one-parameter exponential family conditional distributions contain components that would appear to reflect large-scale and small-scale model behaviors, and it is natural to attempt to match these structures with large-scale and small-scale patterns in a set of data. Traditional manners of parameterizing Markov random field models do not allow such correspondence, however. We propose an alternative centered parameterization that, while not leading to different models, allows a correspondence between model structures and data structures to be successfully accomplished. The ability to make these connections is important when incorporating covariate information into a model or if a sequence of models is fit over time to investigate and interpret possible changes in data structure. We demonstrate the improved interpretation that results from use of centered parameterizations. Centered parameterizations also lend themselves to computation of an interpretable decomposition of mean squared error, and this is demonstrated both analytically and through a simulated example. A breakdown in model behavior occurs even with centered parameterizations if dependence parameters in Markov random field models are allowed to become too large. This phenomenon is discussed and illustrated using an auto-logistic model.

Keywords:
Auto-models; Conditionally specified models; Lattice data; Spatial structure; Spatial dependence

Link to article in the
Journal of Statistical Planning and Inference
Markov random field models for binary data on a lattice
Author(s): Hughes, J.P., Haran, M., Caragea, P.C.
Environmetrics, 7: 857-871, 2011

Abstract:
The autologistic model is a Markov random field model for spatial binary data. Because it can account for both statistical dependence among the data and for the effects of potential covariates, the autologistic model is particularly suitable for problems in many fields, including ecology, where binary responses, indicating the presence or absence of a certain plant or animal species, are observed over a two-dimensional lattice. We consider inference and computation for two models: the original autologistic model due to Besag, and the centered autologistic model proposed recently by Caragea and Kaiser. Parameter estimation and inference for these models is a notoriously difficult problem due to the complex form of the likelihood function. We study pseudolikelihood (PL), maximum likelihood (ML), and Bayesian approaches to inference and describe ways to optimize the efficiency of these algorithms and the perfect sampling algorithms upon which they depend, taking advantage of parallel computing when possible. We conduct a simulation study to investigate the effects of spatial dependence and lattice size on parameter inference, and find that inference for regression parameters in the centered model is reliable only for reasonably large lattices (n > 900) and no more than moderate spatial dependence. When the lattice is large enough, and the dependence small enough, to permit reliable inference, the three approaches perform comparably, and so we recommend the PL approach for its easier implementation and much faster execution.

Keywords:
Bayesian; Markov random field; Maximum likelihood; Parallel computing; Perfect sampling; Pseudolikelihood; Spatial

Link to article in the
Environmetrics
Categorical Analysis of Spatial Variability in Economic Yield Response of Corn to Nitrogen Fertilization
Author(s): Kyveryga, P., Blackmer, T.M., Caragea, P.C.
Agronomy Journal, 103(3), 2011

Abstract:
Despite growing interests in variable-rate nitrogen (VRN) fertilizer applications, we still lack basic knowledge and practical methodology for identifying major factors that can be used to guide VRN application to corn (Zea mays L.). The objective was to develop a methodology for identifying a predictable relationship between economic yield response (YR) to N and commonly measured soil and terrain attributes in the presence of spatial dependence. Six 30-ha no-till fields in central Iowa were studied during 6 yr. Urea-ammonium nitrate solution was sidedressed at 112 and 140 kg N ha−1 in alternating strips replicated from 10 to 22 times. Yield responses to the high rate were calculated in a 20 by 25 m grid pattern and classified into profitable and nonprofitable categories within each field. Autologistic models were used to identify which (if any) of the following factors economically affected YR: elevation, apparent soil electrical conductivity (ECa), slope, topographic wetness index (TWI), or digital soil map units. Significant effects of some of the factors were found within 8 of 15 site-years. Within five of these site-years, well-drained areas with lower ECa and TWI, and higher elevation and slope had the higher probability of profitable YR, but these effects were not stable over time. Within the proposed methodology, a high spatial resolution of YR is used that increases the ability to identify areas profitable to N, and farmers can explore VRN possibilities by applying a small fertilizer increment below or above a uniform optimal rate in many alternating strips across fields.

Link to article in the
Agronomy Journal
Autologistic models with interpretable parameters
Author(s): Caragea, P.C., Kaiser, M.S.
Journal of the Agricultural, Biological, and Environmental Statistics, 14 (3): 281-300, 2009

Abstract:
Ecologists are interested in characterizing succession processes, in particular monitoring the spread of invasive species and their effect on resident species. In situations for which binary response variables representing presence or absence of plants are observed over a spatial lattice, it may be desirable to use a model that accounts for the statistical dependence in the data, as well as the effect of potential covariates. One such model is the autologistic regression model. We show that the typical parametrization of the autologistic model presents difficulties in interpreting model parameters across varying levels of statistical dependence, and propose an alternative (centered) parametrization that overcomes this difficulty. We use the centered autologistic model to study the dynamics over time of two species, Rumex Acetosella and Lonicera Japonica in an abandoned agricultural field in New Jersey, and compare the results to those obtained from using the traditional autologistic parametrization.

Keywords:
binary response; conditionally specified models; large-scale, small-scale model structure; Old field succession; parameter interpretation; spatial models.

Link to article in the
Journal of the Agricultural, Biological, and Environmental Statistics
Exploring dependence with data on spatial lattices
Author(s): Kaiser, M.S., Caragea, P.C.
Biometrics, 65(3): 857-865, 2009.

Abstract:
The application of Markov random field models to problems involving spatial data on lattice systems requires decisions regarding a number of important aspects of model structure. Existing exploratory techniques appropriate for spatial data do not provide direct guidance to an investigator about these decisions. We introduce an exploratory quantity that is directly tied to the structure of Markov random field models based on one parameter exponential family conditional distributions. This exploratory diagnostic is shown to be a meaningful statistic that can inform decisions involved in modeling spatial structure with statistical dependence terms. In this article, we develop the diagnostic, illustrate its use in guiding modeling decisions with simulated examples, and re-examine a previously published application.

Keywords:
auto-models; conditionally specified models; exploratory analysis; spatial structure; statistical dependence.

Link to article in the
Biometrics
Point and interval estimation of variogram models using spatial empirical likelihood
Author(s): Nordman, D. J., Caragea, P.C.
Journal of the American Statistical Association, 103(481): 350-361, 2008

Abstract:
We present a spatial blockwise empirical likelihood method for estimating variogram model parameters in the analysis of spatial data on a grid. The method produces point estimators that require no spatial variance estimates to compute, unlike least squares methods for variogram fitting, but are as efficient as the best least squares estimator in large samples. Our approach also produces confidence regions for the variogram, without requiring knowledge of the full joint distribution of the spatial data. Additionally, the empirical likelihood formulation extends to spatial regression problems and allows simultaneous inference on both spatial trend and variogram parameters. The asymptotic behavior of the estimator is examined analytically, while its behavior in finite samples is investigated through simulation studies.

Keywords:
increasing domain asymptotics, least squares estimator, spatial regression, spatial subsampling

Link to article in the
Journal of the American Statistical Association
Seed quality assurance in maize breeding programs: tests to explain variations in corn inbreds and populations
Author(s): Goggi, A.S., Caragea, P.C., Pollak, L.M., McAndrews, G., DeVries, M., and Montgomery, K.
Agronomy J., 100 (2): 337 - 343, 2008

Abstract:
Maize (Zea mays L.) breeders are interested in evaluating the seed quality of their inbred lines, as seed quality has a strong relationship to fi eld emergence. There is little information, however, on the influence of the seed quality of the inbred on field emergence of the hybrid. The objectives of this research were to (i) determine whether seed quality tests and a seed quality index of the inbred parents and F2 seed are correlated with field emergence of F1 hybrids, and (ii) determine how many tests are necessary to calculate this index. Standard germination (SG), saturated cold (SC), and soak (Soak) tests, and the inbred quality index (IQI) were calculated on inbred parents and their corresponding F2 progeny, and field emergence was measured on associated F1 hybrids produced in Clinton, IL in 2002 and 2003. The tests and index of the parental inbreds and F2 progeny correlated poorly with early field emergence of the F1 hybrids. All tests were required to calculate the seed quality index. By averaging several seed quality tests into a single index, the poor seed quality performance of inbreds and F2 populations in some tests can be masked by other tests. The seed quality index might be useful when ranking inbreds based on seed quality but not as a selection tool.

Keywords:
inbred quality index, seed quality, maize, inbreds

Link to article in the
Agronomy Journal
Asymptotic properties of computationally efficient alternative estimators for a class of multivariate normal models
Author(s): Caragea, P.C., Smith, R.L.
Journal of Multivariate Analysis, 98(7): 1417-1440, 2007

Abstract:
Parameters of Gaussian multivariate models are often estimated using the maximum likelihood approach. In spite of its merits, this methodology is not practical when the sample size is very large, as, for example, in the case of massive georeferenced data sets. In this paper, we study the asymptotic properties of the estimators that minimize three alternatives to the likelihood function, designed to increase the computational efficiency. This is achieved by applying the information sandwich technique to expansions of the pseudo-likelihood functions as quadratic forms of independent normal random variables. Theoretical calculations are given for a first-order autoregressive time series and then extended to a two-dimensional autoregressive process on a lattice. We compare the efficiency of the three estimators to that of the maximum likelihood estimator as well as among themselves, using numerical calculations of the theoretical results and simulations.

Keywords:
approximate likelihood; massive data sets; computational efficiency; statistical efficiency analysis; spatial statistics; autoregressive processes on a lattice

Link to article in the
Journal of Multivariate Analysis
Gene flow in maize fields with different local pollen densities
Author(s): Goggi, A.S., Lopez-Sanchez, H., Caragea, P.C., Clark, C., Westgate, M., Arritt, R.
International Journal of Biometeorology , 51(6): 493-503, 2007

Abstract:
The development of maize (Zea mays L.) varieties as factories of pharmaceutical and industrial compounds has renewed interest in controlling pollen dispersal. The objective of this study was to compare gene flow into maize fields of different local pollen densities under the same environmental conditions. Two fields of approximately 36 hectares were planted with a nontransgenic, white hybrid, in Ankeny, IA. In the center of both fields, a 1-ha plot of a yellow-seeded stacked RR/Bt transgenic hybrid was planted as a pollen source. Before flowering, the white receiver maize of one field was detasseled in a 4:1 ratio to reduce the local pollen density (RPD). The percentage of outcross in the field with RPD was 42.2%, 6.3%, and 1.3% at 1, 10, and 35 m from the central plot, respectively. The percentage of outcross in the white maize with normal pollen density (NPD) was 30.1%, 2.7%, and 0.4%, respectively, at these distances. At distances greater than 100 m, the outcross frequency decreased below 0.1 and 0.03% in the field with RPD and NPD, respectively. A statistical model was used to compare pollen dispersal based on observed outcross percentages. The likelihood ratio test confirmed that the models of outcrossing in the two fields were significantly different (P is practically 0). Results indicated that when local pollen is low, the incoming pollen has a competitive advantage and the level of outcross is significantly greater than when the local pollen is abundant.

Keywords:
Maize, Pollen, Dispersion, Statistical Models

Link to article in the
International Journal of Biometeorology
Methodology to link production and environmental risks of precision nitrogen management strategies in corn
Author(s): Thorp, K.R., Batchelor, W.D., Paz, J.O., Steward, B.L., Caragea, P.C.
Agricultural Systems, 89(2-3): 272-298, 2006

Abstract:
A new decision support system called Apollo, which runs the CERES-Maize crop growth model, was used to study the corn (Zea mays L.) yield response and the nitrogen (N) dynamics of a cornfield in central Iowa, USA. The model was calibrated to minimize error between simulated and measured yield over five growing seasons. Model simulations were then completed for 13 spring-applied N rates in each of 100 grid cells with varying soil properties. For each N rate and grid cell, simulations were repeated for 37 years of historical weather information collected near the study site. Model runs provided the crop yield and unused N in the soil at harvest for all combinations of N rate, grid cell, and weather year. Using these simulated datasets, a methodology involving cumulative probability distributions was developed such that the yield and unused N resulting from each N rate applied in each grid cell could be directly linked according to their probability of occurrence over the 37 simulated growing seasons. These cumulative probability distributions were used to evaluate the economic and environmental risks of two alternate precision N management strategies for the study area. In the first strategy, N rates were selected to maximize the producer's marginal net return in each grid cell. The environmental cost of this management strategy, in terms of N left behind, was determined to be 56.2 kg per ha on average over all grid cells. In the second strategy, N rates were selected to insure that the amount of N left in the soil at harvest would not exceed 40 kg per ha in 80% of growing seasons. The producer's opportunity cost for reducing N rates to achieve this environmental objective was calculated to be $48.12 per ha on average over all grid cells. The overall goal of this work was to develop a methodology for directly contrasting the production and environmental concerns of N management in agricultural systems. In this way, N management plans can be designed to achieve a proper balance between production and environmental goals.

Keywords:
corn; yield; nitrogen; nitrate leaching; crop modeling; variable-rate; prescriptions

Link to article in
Agricultural Systems
Statistical analysis of outcrossing between adjacent maize grain production fields
Author(s): Goggi, A. S., Caragea, P.C., Lopez-Sanchez, H., Westgate, M., Arritt, R., and Clark, C.
Field Crops Research , 99:147-157, 2006

Abstract:
The presence of transgenes in conventional maize, Zea mays L., crops is a serious concern when the genetic purity affects the value of the harvested product (i.e., specialty markets, organic products, crops with value-added traits, and hybrid seed production). Gene flow from a central transgenic source plot into a conventional grain production field was quantified using a combination of three marker genes: the y1 seed color gene, the Bt-Cry1Ab gene derived from the soil bacterium Bacillus thuringiensis (Bt) and the Roundup Ready® (RR) gene. Two fields of approximately 36 ha were planted with a nontransgenic, white-seeded corn hybrid in Ankeny, IA, in 2003 and 2004. In the center of each field, a 1-ha plot of yellow-seeded, Bt/RR hybrid corn was planted as an adventitious pollen source. Detailed measurements of flowering dynamics confirmed the white- and yellow-seeded hybrids flowered synchronously both years. Grain samples were collected at 1, 10, 35, 100, 150, 200, and 250 m from the transgenic pollen source along eight transects (north [N], northeast [NE], east [E], southeast [SE], south [S], southwest [SW], west [W], and northwest [NW]) and were analyzed for number of Bt-/RR-/y1- kernels. The statistical model describes the proportion of outcrossed kernels to decrease exponentially with distance from the yellow pollen source and linearly with the wind speed and direction during silking of the white hybrid.. On average, outcrossing at 35 m was 0.4% in both years. At 100 m and beyond, the average level of outcrossing decreased to 0.05% or less. A few Bt-/RR-/y1- kernels, however, were detected in the white corn field even at 250 m from the source plot. A single empirical model captured the field-scale patterns of outcrossing from the source plot for both years. These results indicate gene flow from a transgenic pollen source follows a fairly predictable pattern. The results also suggest that extent of outcrossing can be reduced by surrounding the transgenic pollen source with nontransgenic corn producing a high density of local pollen.

Keywords:
corn, pollen flow, statistical modeling

Link to article in the
Field Crops Research
Trends in rural sulfur concentration
Author(s): Holland, D.M., Caragea, P.C., Smith, R.L.
Atmospheric Environment, 38: 1673-1684, 2004

Abstract:
This paper presents an analysis of trends in atmospheric concentrations of sulfur dioxide (SO2) and particulate sulfate at rural monitoring sites in the Clean Air Act Status and Trends Monitoring Network (CASTNet) from 1990 to 1999. A two-stage approach is used to estimate regional trends and standard errors in the Midwest and Mid-Atlantic regions of the US. In the first stage, a linear regression model is used to estimate site-specific trends in data adjusted for the effects of season and meteorology. In the second stage, kriging methodology based on maximum likelihood estimation is used to estimate regional trends and standard errors. The method is extended to include a Bayesian analysis to account for the uncertainty in estimating the spatial covariance parameters. For both pollutants, significant improvement in air quality was detected that appears similar to the large drop in SO2 power plant emissions. Spatial patterns of trends in SO2 and particulate sulfate concentrations vary by location over the eastern US. For SO2, trends at monitoring sites in the Midwest and Mid-Atlantic were in the -30% to -42% range with smaller changes in the South. Across most of the US, trends in particulate sulfate were smaller than for SO2. Both spatial prediction techniques produced similar results in terms of regional trends and standard errors.

Keywords:
CASTNet; Clean Air Act Amendments; kriging; Bayesian methods

Link to article in the
Atmospheric Environment