Area Sample: An area sample is when we divide our population up into land areas and the units we select for our sample are the land areas.
Bias: Bias is a statistical measure of how accurate an estimator is. Bias measures how different our estimates will be on average from the true population characteristic being estimated. Bias is a theoretical property, and we often choose estimators that have no bias. We can think of bias in the following way: If we were to conduct our survey over and over and our estimates end up being centered around the true population value, then our estimator is unbiased.
Census: In a census, the sample size is equal to the population size. There is no sampling error involved in a census, but there will be measurement error.
Characteristic: A population characteristic is something we want to estimate about the population. Common examples of population characteristics are totals and averages for variables. Statisticians often call these characteristics "parameters".
County Base Data: County base data includes information on surface areas, large water areas, and federal land for each polygon, or HUCCO.
Cross-Sectional Design: In a longitudinal cross-sectional design, a new sample is drawn each time a survey of the population is conducted.
Data Element: A data element is a measurement or observation on a unit that will be stored in a database. In statistics, we call this a variable.
Domain Estimation: Domain estimation occurs when we estimate a population characteristic for a subpopulation that does not coincide with any stratum or collection of strata. Domain estimation is a special case of ratio estimation.
Estimate: An estimate is a value we calculate from the sample data that we use to state what we know from the sample about a population characteristic.
Frame Error: Frame error occurs when the sampling frame does not match the target population. If some target population units are not included in the frame, these units will not have a chance of being included in the sample, which can create bias. If the frame contains units that do not belong to the target population, units that should not have a chance of being selected will have a chance to be in the sample. In many settings, we are able to screen ineligible (non-target population) units out of the sample before collecting data on them.
HUCCO: A HUCCO is a polygon or land area defined by the intersection of a 4 digit hydrologic unit (HUC) and a county unit (CO). A HUCCO is a unit of area used in the NRI county base data collection.
Imputation: Imputation is a method of determing values to replace missing data in the dataset. Imputation methods can draw from the nonmissing sample data as well as from information outside the sample. In NRI, imputation is also used for creating pseudo points to represent changes observed in PSU and county base data.
Independent: In sampling, independence implies that the method used to select units in one sample has no impact on how units were selected in other samples. If we have independent samples, we can analyze them separately.
Longitudinal Study: A longitudinal study is a study involves observations on a population over time. Longitudinal studies are designed to measure changes in the population.
Measurement Error: Measurement error occurs when we do not measure the true value for a unit, but instead measure the true value plus an error. Measurement error is unavoidable in surveys, but it can be reduced.
Metadata: Metadata are additional information about a dataset, including information about the data elements, sample design, and how weights were calculated.
Missing Data: Missing data occur when some or all of the values for a sampled unit are absent in the dataset.
Nonresponse: Nonresponse occurs when some or all of the values for a sampled unit are not observed. Nonresponse leads to missing data.
Nonsampling Error: Nonsampling error is error that occurs from sources other than sampling only a fraction of the population. Nonsampling error includes frame error, nonresponse, and measurement error.
Objective: A survey objective is a clear statement of a goal of the survey. There are larger goals and specific goals that may be expressed as objectives. A specific goal usually includes information on the target population, the specific questions of the study, data elements and population characteristics we are interested in, the estimates required to achieve the goal, and the precision desired for our estimates.
Panel Design: In a longitudinal panel design, the same set of units are observed repeatedly over time.
Population: A population is a collection of units that we want to make statements about by estimating some population characteristic.
Precision: Precision is a descriptor of how close estimates will be if we were to conduct our survey over and over again, each time selecting a new sample using the same design. Precise estimators would produce similar estimates across samples. Statisticians typically use the variance to estimate the precision of an estimate.
Primary Sampling Unit (PSU): A Primary Sampling Unit is a collection of secondary sampling units that we select in the first stage of a two-stage sampling design. In two-stage sampling, we first select a PSU, then select secondary sampling units within each selected PSU.
Pseudo Point: A pseudo point is point data created by imputation to represent patterns in PSUs or HUCCOs that do not contain a sample point with this known pattern.
Quality Assurance: Quality assurance procedures are used to check whether data are being collected correctly. We also use quality evaluation studies to quantify the properties of measurement error, improve future data collection, and ensure that measurement tools meet certain specifications. Quality assurance programs are used to reduce the effects of measurement error on survey estimates.
Random Sampling: Random sampling occurs when each unit in the sample frame has a known positive probability of being included in the sample. Then units are selected randomly using a computer or random number table and the assigned probabilities. This type of sample is also called probability sampling.
Reliability: Reliability has many meanings in statistics. Here we say that data are reliable if repeated measurements on the units generate similar values.
Sample: A sample is a selected fraction of the units in the population. For each unit in the sample, something is measured related to that unit, and this measurement provides information solely about that unit.
Sample Design: The sample design is the method we use to describe how we will select the sample. It includes specifying the sampling units and assigning probabilities for units to be included in our sample. Examples of sample designs are stratified designs and two-stage designs.
Sample Size: The sample size is the number of units included in a sample. The larger the sample size, the more information we have about the population.
Sampling Error: Sampling error is error in our estimates that occurs because we do not observe every unit in the population.
Sampling Frame: The sampling frame is the list, real or theoretical, from which we select units for our sample. It is desirable for the sampling frame to match the target population and avoid frame errors associated with omitting target population units or including units outside the target population.
Secondary Sampling Unit (SSU): A secondary sampling unit is a unit or collection of units. In two-stage sampling, we first select PSUs, then we select SSUs within the selected PSUs. In two-stage sampling, the SSUs typically are the units we will measure.
Simple Random Sampling: Simple random sampling (SRS) is when we assign each unit in the population an equal chance of being included in the sample. This design does not incorporate any structure to improve precision or reduce costs.
Stratified Random Sampling: Stratified random sampling (STRS) occurs when we divide our population into separate groups of units called strata. We then sample independetly from each stratum.
Subpopulation: A subpopulation is a group that we are interested in, defined by a subset of units in the population.
Supplemented Panel Design: In a longitudinal supplemented panel design we incorporate a panel design and a rotation component. One set of units is observed every time period, while other sets of units are rotated into the sample over time. The rotation component "supplements" the panel.
Systematic Random Sampling: Systematic random sampling (SYS) occurs when we order the sampling frame, select a random unit to start the sampling process, and then select every k-th, say 100th) unit after the starting unit from the sampling frame.
Target Population: The target population is the population we want to make conclusions about. In an ideal situation, the sampling frame to matches the target population.
Total Survey Error: Total survey error includes both sampling error and nonsampling error.
Two-stage Sampling: Two-stage sampling is when we first select a collection of units called primary sampling units, or PSUs, then select from the units called secondary sampling units, or SSUs, within selected PSUs.
Unit: A unit is the object in the population that we select for our sample or that we make an observation on.
Variable: A variable is a property of a unit that we can measure. For example, if our units are people, variables could be height, weight, and an indicator of whether the person was on a diet. For the indicator, we can make a variable that takes the value of 1 if the person is on a diet and 0 if the person is not on a diet. Sometimes variables are called "data elements".
Variance: In estimating population characteristics, the variance is a description of the precision of an estimate. The smaller the variance the more certain we are that the true population value is close to our estimate. The standard error, another common measure of precision, is the square root of the variance.
Weight: A weight is a value we calculate for each record in the dataset. The weight is a measure of the number of population units the unit represents. We use weights when we calculate estimates such as totals and averages.