Fiveable

๐Ÿ’ปAdvanced R Programming Unit 2 Review

QR code for Advanced R Programming practice questions

2.2 Lists and data frames

๐Ÿ’ปAdvanced R Programming
Unit 2 Review

2.2 Lists and data frames

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ’ปAdvanced R Programming
Unit & Topic Study Guides

Lists and data frames are essential data structures in R, allowing you to organize and manipulate complex datasets. Lists offer flexibility, storing elements of different types and lengths, while data frames provide a structured format for tabular data.

Understanding these structures is crucial for effective data analysis in R. You'll learn how to create, access, and manipulate lists and data frames, as well as combine and reshape data for various analytical tasks. These skills form the foundation for working with real-world datasets in R.

Lists and Data Frames in R

Creating Lists and Data Frames

  • Create lists using the list() function
    • Lists can contain elements of different data types (vectors, matrices, other lists)
    • Example: my_list <- list(1, "apple", c(TRUE, FALSE), list(1, 2, 3))
  • Create data frames using the data.frame() function or by combining vectors of equal length
    • Data frames store tabular data, similar to a spreadsheet or SQL table
    • Each column in a data frame is a vector of equal length
    • Example: my_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
  • Combine vectors of equal length into a data frame using cbind() or rbind()
    • cbind() combines vectors column-wise
    • rbind() combines vectors row-wise
    • Example: my_df <- cbind(x = 1:3, y = c("a", "b", "c"))

Naming and Manipulating Lists and Data Frames

  • Name elements in a list using the names() function or by assigning names directly during list creation
    • Example: names(my_list) <- c("num", "char", "log", "list")
    • Example: my_list <- list(num = 1, char = "apple", log = c(TRUE, FALSE), list = list(1, 2, 3))
  • Name columns in a data frame during creation or by assigning names to the colnames() attribute
    • Example: colnames(my_df) <- c("x", "y")
    • Example: my_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
  • Manipulate lists and data frames using various functions
    • append(): Add elements to a list or data frame
    • remove(): Remove elements from a list or data frame
    • update(): Modify elements in a list or data frame
    • merge(): Combine lists or data frames based on common elements or columns

Lists vs Data Frames

Flexibility and Structure

  • Lists are more flexible than data frames
    • Lists can contain elements of different data types and lengths
    • Data frames require all columns to be of equal length and preferably of the same data type
  • Data frames are a special type of list where each element is a vector of equal length
    • Data frames are suitable for storing tabular data
  • Lists are often used to store and organize related data objects of different types
    • Example: my_list <- list(name = "John", age = 30, scores = c(85, 92, 88))
  • Data frames are used to store structured, rectangular data
    • Example: my_df <- data.frame(name = c("John", "Alice", "Bob"), age = c(30, 25, 35), score = c(85, 92, 88))

Attributes and Metadata

  • Data frames have additional attributes compared to lists
    • row.names: Provides names for each row in the data frame
    • colnames: Provides names for each column in the data frame
  • These attributes provide metadata about the data stored in the data frame
  • Many functions in R automatically create data frames when importing data from external sources
    • read.csv(): Reads data from a CSV file and creates a data frame
    • read.table(): Reads data from a delimited text file and creates a data frame

Extracting Data from Lists and Data Frames

Accessing Elements in Lists

  • Access elements in a list using single square brackets [] with the element's index or name
    • Example: my_list[1], my_list["num"]
  • Extract a single element from a list, removing the list structure, using double square brackets [[]]
    • Example: my_list[[1]], my_list[["num"]]
  • Use the $ operator followed by the element name to access named elements in a list
    • Example: my_list$num

Accessing Data in Data Frames

  • Access columns in a data frame using the $ operator followed by the column name
    • Example: my_df$x
  • Access columns in a data frame using single square brackets [] with the column index or name
    • Example: my_df[1], my_df["x"]
  • Access rows in a data frame using single square brackets [] with the row index or a logical vector
    • Example: my_df[1, ], my_df[c(TRUE, FALSE, TRUE), ]
  • Extract rows and columns from a data frame based on logical conditions using the subset() function
    • Example: subset(my_df, x > 1)

Applying Functions to Lists and Data Frames

  • Apply a function to each element of a list or each row/column of a data frame using the apply() family of functions
    • lapply(): Applies a function to each element of a list and returns a list
    • sapply(): Applies a function to each element of a list and returns a simplified vector or matrix
    • apply(): Applies a function to the margins (rows or columns) of a matrix or data frame
  • Example: lapply(my_list, length), sapply(my_df, mean)

Combining and Reshaping Data Frames

Combining Data Frames

  • Combine data frames or vectors column-wise using the cbind() function
    • Example: cbind(my_df, new_column = c(1, 2, 3))
  • Combine data frames or vectors row-wise using the rbind() function
    • Example: rbind(my_df, c(4, "d"))
  • Combine data frames based on common columns using the merge() function, similar to a SQL join operation
    • Example: merge(df1, df2, by = "common_column")

Reshaping Data Frames

  • Convert data frames between wide and long formats using functions from the reshape2 package
    • Wide format: One row per observational unit, multiple columns for different variables
    • Long format: One row per observation, columns for the observational unit, variable, and value
    • melt(): Convert a data frame from wide to long format
    • dcast(): Convert a data frame from long to wide format
  • Use functions from the tidyr package for reshaping data frames, providing a more intuitive syntax
    • pivot_longer(): Convert a data frame from wide to long format
    • pivot_wider(): Convert a data frame from long to wide format
  • Example: melt(my_df, id.vars = "name"), pivot_longer(my_df, cols = c("score1", "score2"), names_to = "test", values_to = "score")

Data Manipulation with dplyr

  • Use functions from the dplyr package for manipulating and transforming data frames in a concise and readable manner
    • select(): Select specific columns from a data frame
    • filter(): Filter rows based on logical conditions
    • arrange(): Arrange rows based on one or more columns
    • mutate(): Create new columns or modify existing ones
    • summarise(): Summarize data by calculating aggregate functions
  • Example: my_df %>% select(name, age) %>% filter(age > 30) %>% arrange(desc(age)) %>% mutate(age_squared = age^2)