Factors and arrays are powerful data structures in R that help organize and analyze complex data. Factors represent categorical variables with predefined levels, while arrays store multi-dimensional data of the same type. These structures are essential for handling survey responses, experimental treatments, and spatial data.
Understanding how to create, manipulate, and convert factors and arrays is crucial for efficient data analysis in R. These structures enable you to work with categorical data, perform multi-dimensional computations, and apply functions across different dimensions of your dataset. Mastering factors and arrays will greatly enhance your R programming skills.
Factors and Arrays in R
Creating and Manipulating Factors
- Factors represent categorical data with a fixed set of possible values called levels
- Factors can be ordered (ordinal data) or unordered (nominal data)
- Examples of factors: survey responses (agree, neutral, disagree), experimental treatments (control, treatment1, treatment2)
- Create factors using the
factor()
function- Takes a vector as input and an optional levels argument to specify the order of levels
- Example:
factor(c("a", "b", "c", "a"), levels = c("a", "b", "c"))
- Access and modify factor levels using the
levels()
function- Modifying levels will change all corresponding values in the factor
- Example:
levels(factor_var) <- c("new_level1", "new_level2", "new_level3")
Creating and Manipulating Arrays
- Arrays are multi-dimensional data structures that hold elements of the same data type
- Created using the
array()
function, specifying dimensions and an optional dimnames attribute - Example:
array(1:24, dim = c(4, 3, 2), dimnames = list(c("r1", "r2", "r3", "r4"), c("c1", "c2", "c3"), c("d1", "d2")))
- Created using the
- Access and modify array elements using square brackets
[]
with indices for each dimension- Arrays are stored in column-major order
- Example:
array_var[1, 2, 1]
accesses the element in the first row, second column, and first depth dimension
- Access array dimensions using the
dim()
function- Obtain the number of dimensions using
length(dim(array))
- Example:
dim(array_var)
returns the dimensions of the array
- Obtain the number of dimensions using
- Subset arrays by specifying indices or logical vectors for each dimension
- Similar to matrix subsetting
- Example:
array_var[1:2, , 1]
subsets the first two rows and all columns of the first depth dimension
Use Cases for Factors and Arrays
- Factors are useful for working with categorical variables that have a limited number of possible values
- Ensure data integrity by limiting possible values to a predefined set of levels
- Allow for efficient storage and processing of categorical data
- Examples: survey responses, experimental treatments, product categories
- Arrays are suitable for representing and manipulating multi-dimensional data
- Organize and access data based on multiple dimensions
- Enable efficient computation and analysis of complex data structures
- Examples: 3D spatial data, time series with multiple variables, multi-dimensional mathematical objects (tensors)
Data Structure Conversion
Converting Factors
- Convert factors to character vectors using
as.character()
- Example:
as.character(factor_var)
returns a character vector
- Example:
- Convert factors to numeric vectors using
as.numeric()
- Returns the integer codes assigned to each level
- Example:
as.numeric(factor_var)
returns a numeric vector with level codes
- Convert character or numeric vectors to factors using
as.factor()
- Default: levels are assigned in alphabetical order for character vectors and increasing order for numeric vectors
- Example:
as.factor(char_vec)
converts a character vector to a factor
Converting Arrays and Matrices
- Convert arrays to matrices using
as.matrix()
- Collapses the array into a 2D structure
- Resulting matrix is filled in column-major order
- Example:
as.matrix(array_var)
converts an array to a matrix
- Convert matrices to arrays using
as.array()
, specifying the desired dimensions- Elements are filled in column-major order
- Example:
as.array(matrix_var, dim = c(2, 3, 2))
converts a matrix to a 3D array
- Convert data frames to matrices using
as.matrix()
- Converts all columns to the same data type
- Converting a data frame with factors to a matrix will convert factors to their underlying integer codes
- Example:
as.matrix(df_var)
converts a data frame to a matrix
Functions for Factors and Arrays
Factor Functions
- Use
summary()
to obtain a summary of a factor, showing the count of each level- Example:
summary(factor_var)
returns a summary of the factor
- Example:
- Use
table()
to create a contingency table of factor levels- Useful for cross-tabulation and frequency analysis
- Example:
table(factor_var1, factor_var2)
creates a contingency table of two factors
- Use
tapply()
to apply a function to subsets of a vector based on a factor- Returns an array with results for each level
- Example:
tapply(numeric_vec, factor_var, mean)
calculates the mean of numeric_vec for each level of factor_var
Array Functions
- Apply mathematical functions like
sum()
,mean()
,min()
,max()
directly to arrays- Performs element-wise operations
- Example:
sum(array_var)
calculates the sum of all elements in the array
- Use
apply()
to apply a function over the margins (dimensions) of an array- Takes the array, the margin (1 for rows, 2 for columns), and the function to apply as arguments
- Example:
apply(array_var, 1, mean)
calculates the mean of each row in the array
- Use
sweep()
to apply a function to each element of an array based on a summary statistic of the corresponding row or column- Example:
sweep(array_var, 1, rowMeans(array_var), "-")
subtracts the row means from each element in the array
- Example: