Fiveable

๐Ÿ’ปAdvanced R Programming Unit 2 Review

QR code for Advanced R Programming practice questions

2.3 Factors and arrays

๐Ÿ’ปAdvanced R Programming
Unit 2 Review

2.3 Factors and arrays

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ’ปAdvanced R Programming
Unit & Topic Study Guides

Factors and arrays are powerful data structures in R that help organize and analyze complex data. Factors represent categorical variables with predefined levels, while arrays store multi-dimensional data of the same type. These structures are essential for handling survey responses, experimental treatments, and spatial data.

Understanding how to create, manipulate, and convert factors and arrays is crucial for efficient data analysis in R. These structures enable you to work with categorical data, perform multi-dimensional computations, and apply functions across different dimensions of your dataset. Mastering factors and arrays will greatly enhance your R programming skills.

Factors and Arrays in R

Creating and Manipulating Factors

  • Factors represent categorical data with a fixed set of possible values called levels
    • Factors can be ordered (ordinal data) or unordered (nominal data)
    • Examples of factors: survey responses (agree, neutral, disagree), experimental treatments (control, treatment1, treatment2)
  • Create factors using the factor() function
    • Takes a vector as input and an optional levels argument to specify the order of levels
    • Example: factor(c("a", "b", "c", "a"), levels = c("a", "b", "c"))
  • Access and modify factor levels using the levels() function
    • Modifying levels will change all corresponding values in the factor
    • Example: levels(factor_var) <- c("new_level1", "new_level2", "new_level3")

Creating and Manipulating Arrays

  • Arrays are multi-dimensional data structures that hold elements of the same data type
    • Created using the array() function, specifying dimensions and an optional dimnames attribute
    • Example: array(1:24, dim = c(4, 3, 2), dimnames = list(c("r1", "r2", "r3", "r4"), c("c1", "c2", "c3"), c("d1", "d2")))
  • Access and modify array elements using square brackets [] with indices for each dimension
    • Arrays are stored in column-major order
    • Example: array_var[1, 2, 1] accesses the element in the first row, second column, and first depth dimension
  • Access array dimensions using the dim() function
    • Obtain the number of dimensions using length(dim(array))
    • Example: dim(array_var) returns the dimensions of the array
  • Subset arrays by specifying indices or logical vectors for each dimension
    • Similar to matrix subsetting
    • Example: array_var[1:2, , 1] subsets the first two rows and all columns of the first depth dimension

Use Cases for Factors and Arrays

  • Factors are useful for working with categorical variables that have a limited number of possible values
    • Ensure data integrity by limiting possible values to a predefined set of levels
    • Allow for efficient storage and processing of categorical data
    • Examples: survey responses, experimental treatments, product categories
  • Arrays are suitable for representing and manipulating multi-dimensional data
    • Organize and access data based on multiple dimensions
    • Enable efficient computation and analysis of complex data structures
    • Examples: 3D spatial data, time series with multiple variables, multi-dimensional mathematical objects (tensors)

Data Structure Conversion

Converting Factors

  • Convert factors to character vectors using as.character()
    • Example: as.character(factor_var) returns a character vector
  • Convert factors to numeric vectors using as.numeric()
    • Returns the integer codes assigned to each level
    • Example: as.numeric(factor_var) returns a numeric vector with level codes
  • Convert character or numeric vectors to factors using as.factor()
    • Default: levels are assigned in alphabetical order for character vectors and increasing order for numeric vectors
    • Example: as.factor(char_vec) converts a character vector to a factor

Converting Arrays and Matrices

  • Convert arrays to matrices using as.matrix()
    • Collapses the array into a 2D structure
    • Resulting matrix is filled in column-major order
    • Example: as.matrix(array_var) converts an array to a matrix
  • Convert matrices to arrays using as.array(), specifying the desired dimensions
    • Elements are filled in column-major order
    • Example: as.array(matrix_var, dim = c(2, 3, 2)) converts a matrix to a 3D array
  • Convert data frames to matrices using as.matrix()
    • Converts all columns to the same data type
    • Converting a data frame with factors to a matrix will convert factors to their underlying integer codes
    • Example: as.matrix(df_var) converts a data frame to a matrix

Functions for Factors and Arrays

Factor Functions

  • Use summary() to obtain a summary of a factor, showing the count of each level
    • Example: summary(factor_var) returns a summary of the factor
  • Use table() to create a contingency table of factor levels
    • Useful for cross-tabulation and frequency analysis
    • Example: table(factor_var1, factor_var2) creates a contingency table of two factors
  • Use tapply() to apply a function to subsets of a vector based on a factor
    • Returns an array with results for each level
    • Example: tapply(numeric_vec, factor_var, mean) calculates the mean of numeric_vec for each level of factor_var

Array Functions

  • Apply mathematical functions like sum(), mean(), min(), max() directly to arrays
    • Performs element-wise operations
    • Example: sum(array_var) calculates the sum of all elements in the array
  • Use apply() to apply a function over the margins (dimensions) of an array
    • Takes the array, the margin (1 for rows, 2 for columns), and the function to apply as arguments
    • Example: apply(array_var, 1, mean) calculates the mean of each row in the array
  • Use sweep() to apply a function to each element of an array based on a summary statistic of the corresponding row or column
    • Example: sweep(array_var, 1, rowMeans(array_var), "-") subtracts the row means from each element in the array