Lists and data frames are essential data structures in R, allowing you to organize and manipulate complex datasets. Lists offer flexibility, storing elements of different types and lengths, while data frames provide a structured format for tabular data.
Understanding these structures is crucial for effective data analysis in R. You'll learn how to create, access, and manipulate lists and data frames, as well as combine and reshape data for various analytical tasks. These skills form the foundation for working with real-world datasets in R.
Lists and Data Frames in R
Creating Lists and Data Frames
- Create lists using the
list()
function- Lists can contain elements of different data types (vectors, matrices, other lists)
- Example:
my_list <- list(1, "apple", c(TRUE, FALSE), list(1, 2, 3))
- Create data frames using the
data.frame()
function or by combining vectors of equal length- Data frames store tabular data, similar to a spreadsheet or SQL table
- Each column in a data frame is a vector of equal length
- Example:
my_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
- Combine vectors of equal length into a data frame using
cbind()
orrbind()
cbind()
combines vectors column-wiserbind()
combines vectors row-wise- Example:
my_df <- cbind(x = 1:3, y = c("a", "b", "c"))
Naming and Manipulating Lists and Data Frames
- Name elements in a list using the
names()
function or by assigning names directly during list creation- Example:
names(my_list) <- c("num", "char", "log", "list")
- Example:
my_list <- list(num = 1, char = "apple", log = c(TRUE, FALSE), list = list(1, 2, 3))
- Example:
- Name columns in a data frame during creation or by assigning names to the
colnames()
attribute- Example:
colnames(my_df) <- c("x", "y")
- Example:
my_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
- Example:
- Manipulate lists and data frames using various functions
append()
: Add elements to a list or data frameremove()
: Remove elements from a list or data frameupdate()
: Modify elements in a list or data framemerge()
: Combine lists or data frames based on common elements or columns
Lists vs Data Frames
Flexibility and Structure
- Lists are more flexible than data frames
- Lists can contain elements of different data types and lengths
- Data frames require all columns to be of equal length and preferably of the same data type
- Data frames are a special type of list where each element is a vector of equal length
- Data frames are suitable for storing tabular data
- Lists are often used to store and organize related data objects of different types
- Example:
my_list <- list(name = "John", age = 30, scores = c(85, 92, 88))
- Example:
- Data frames are used to store structured, rectangular data
- Example:
my_df <- data.frame(name = c("John", "Alice", "Bob"), age = c(30, 25, 35), score = c(85, 92, 88))
- Example:
Attributes and Metadata
- Data frames have additional attributes compared to lists
row.names
: Provides names for each row in the data framecolnames
: Provides names for each column in the data frame
- These attributes provide metadata about the data stored in the data frame
- Many functions in R automatically create data frames when importing data from external sources
read.csv()
: Reads data from a CSV file and creates a data frameread.table()
: Reads data from a delimited text file and creates a data frame
Extracting Data from Lists and Data Frames
Accessing Elements in Lists
- Access elements in a list using single square brackets
[]
with the element's index or name- Example:
my_list[1]
,my_list["num"]
- Example:
- Extract a single element from a list, removing the list structure, using double square brackets
[[]]
- Example:
my_list[[1]]
,my_list[["num"]]
- Example:
- Use the
$
operator followed by the element name to access named elements in a list- Example:
my_list$num
- Example:
Accessing Data in Data Frames
- Access columns in a data frame using the
$
operator followed by the column name- Example:
my_df$x
- Example:
- Access columns in a data frame using single square brackets
[]
with the column index or name- Example:
my_df[1]
,my_df["x"]
- Example:
- Access rows in a data frame using single square brackets
[]
with the row index or a logical vector- Example:
my_df[1, ]
,my_df[c(TRUE, FALSE, TRUE), ]
- Example:
- Extract rows and columns from a data frame based on logical conditions using the
subset()
function- Example:
subset(my_df, x > 1)
- Example:
Applying Functions to Lists and Data Frames
- Apply a function to each element of a list or each row/column of a data frame using the
apply()
family of functionslapply()
: Applies a function to each element of a list and returns a listsapply()
: Applies a function to each element of a list and returns a simplified vector or matrixapply()
: Applies a function to the margins (rows or columns) of a matrix or data frame
- Example:
lapply(my_list, length)
,sapply(my_df, mean)
Combining and Reshaping Data Frames
Combining Data Frames
- Combine data frames or vectors column-wise using the
cbind()
function- Example:
cbind(my_df, new_column = c(1, 2, 3))
- Example:
- Combine data frames or vectors row-wise using the
rbind()
function- Example:
rbind(my_df, c(4, "d"))
- Example:
- Combine data frames based on common columns using the
merge()
function, similar to a SQL join operation- Example:
merge(df1, df2, by = "common_column")
- Example:
Reshaping Data Frames
- Convert data frames between wide and long formats using functions from the
reshape2
package- Wide format: One row per observational unit, multiple columns for different variables
- Long format: One row per observation, columns for the observational unit, variable, and value
melt()
: Convert a data frame from wide to long formatdcast()
: Convert a data frame from long to wide format
- Use functions from the
tidyr
package for reshaping data frames, providing a more intuitive syntaxpivot_longer()
: Convert a data frame from wide to long formatpivot_wider()
: Convert a data frame from long to wide format
- Example:
melt(my_df, id.vars = "name")
,pivot_longer(my_df, cols = c("score1", "score2"), names_to = "test", values_to = "score")
Data Manipulation with dplyr
- Use functions from the
dplyr
package for manipulating and transforming data frames in a concise and readable mannerselect()
: Select specific columns from a data framefilter()
: Filter rows based on logical conditionsarrange()
: Arrange rows based on one or more columnsmutate()
: Create new columns or modify existing onessummarise()
: Summarize data by calculating aggregate functions
- Example:
my_df %>% select(name, age) %>% filter(age > 30) %>% arrange(desc(age)) %>% mutate(age_squared = age^2)