Blog Stat 585

January 21- Rearranging and summarizing

http://www.jstatsoft.org/v40/i01/

The philosophy of split-apply-combine strategy consist in divide a problem in smaller pieces make some operation with each piece and combine all the paces back together. This paper describes the implementation of the split-apply-combine strategy using the package plyr. A very important characteristic of plyr is that it replaces loops that are slow. plyr package has a lot of useful functions, for example aaplyr, adply, alply, daply, ddply, etc. All plyr functions have an informative name the first and second characters describe the input and output data types and this is very useful to remember what the function does. We can select with structure is more natural for store the data in each case. The functions have two or three main arguments. The first argument is the .data, we will split it apply and operation and we will recombine it. The second argument,.variables or .margins, tell us how to split up the input into pieces. The third argument, .fun, is the processing function, and is applied to each piece in turn. All additional arguments came from the function that we want to apply. If we do not include the argument .fun we will not get a modification in the individual pieces but we convert the data structure from one type to another. For each type of inputs there are different rules about how to split it up. For example if we use apply function, the input is an array and keep the results in the same structure. For example ozone is an array and if we run aaply(ozone, c(1,2), mean)we get the resut in a two dim array we get the mean ozone for each latitude in the time considered (third dimension). In this example c(1,2) indicate that we are slice up into individual cells. The output type defines how the pieces will be matched back together and how they will be the labels. In the output type of restrictions about what type of results the function in progress should return. Additionally plyr has a lot of helper functions, these functions takes a function as input and returns another function as output, splat, each, colwise, failwith. For example splat converts a function that takes multiple arguments to one that takes a list as its single argument. With plyr package the split-apply-combine strategy could be more easier than other approaches in R. read more.....

Blog Stat 585

January 15- Reshaping data

http://www.jstatsoft.org/v21/i12/paper

We have to reshaping data frequently and in some occasions is very tedious if we do not use the correct tools. With the package reshape we are rearranging it in a convenient way depending our needs. In this package there are two useful functions melt and cast than should be our best friends when we are dealing with ugly data structure. to use melt we have to identify which variables are measure and which ones are identifiers, there are one assumption that melt makes, it is that all the measured variables must be of the same type. After using melt function we generate a data with a molten form and we are ready to use cast function. With cast function we can rearrange the data in the way that we need it. The basic idea is using a formula where we define which variables will appear in the columns and which ones in the rows. Also we can summarize by the values of one variable in the rows or in the columns using cast function. Additionally we can use cast to create structures with more than two dimensions. read more.....