R is a powerful tool for statistical analysis, offering versatile functions for data manipulation and visualization. It allows users to create vectors, perform basic statistical operations, and generate informative plots with ease.
R's data structures, like data frames and matrices, enable efficient organization of complex datasets. Custom functions and packages expand R's capabilities, making it a flexible platform for various statistical tasks and data science projects.
Introduction to R for Statistical Analysis
Vector creation in R
- Vectors store multiple elements of the same data type in a one-dimensional array
- Numeric vectors store numbers (1, 2.5, -3)
- Character vectors store strings ("apple", "banana", "cherry")
- Logical vectors store TRUE or FALSE values
- Create vectors using the
c()
function to combine elementsx <- c(1, 2, 3, 4, 5)
creates a numeric vector with values 1, 2, 3, 4, and 5y <- c("a", "b", "c")
creates a character vector with values "a", "b", and "c"
- Generate sequences of numbers using the
:
operatorz <- 1:10
creates a numeric vector with values 1, 2, 3, ..., 10seq(from = 0, to = 1, by = 0.1)
creates a sequence from 0 to 1 in increments of 0.1
- Assign vectors to variables using the assignment operator
<-
prices <- c(10.99, 15.50, 8.75)
assigns the vector to the variableprices
Basic R commands for statistics
length(x)
returns the number of elements in vectorx
sum(x)
calculates the sum of all elements in numeric vectorx
mean(x)
calculates the arithmetic mean of numeric vectorx
median(x)
calculates the median value of numeric vectorx
min(x)
andmax(x)
return the minimum and maximum values in numeric vectorx
var(x)
andsd(x)
calculate the variance and standard deviation of numeric vectorx
summary(x)
provides a summary of the distribution of vectorx
, including minimum, maximum, median, and quartile values- Perform arithmetic operations on vectors element-wise
x + y
adds corresponding elements of vectorsx
andy
x 2
multiplies each element of vectorx
by 2
- Subset vectors using logical operations based on conditions
x[x > 3]
returns a vector containing only the elements ofx
greater than 3y[y == "a"]
returns a vector containing only the elements ofy
equal to "a"
Statistical plots with R functions
- Create basic scatter plots using the
plot()
functionplot(x, y)
creates a scatter plot withx
values on the x-axis andy
values on the y-axis- Customize plots using arguments like
main
,xlab
,ylab
,col
,pch
- Generate histograms of a numeric vector using the
hist()
functionhist(x)
creates a histogram of the values in vectorx
- Adjust number of bins, colors, and labels with arguments like
breaks
,col
,main
- Create box plots of a numeric vector or grouped by a factor variable using the
boxplot()
functionboxplot(x)
creates a box plot of the values in vectorx
boxplot(x ~ f)
creates box plots ofx
grouped by the levels of factorf
- Arrange multiple plots in a grid using the
par()
function withmfrow
ormfcol
argumentpar(mfrow = c(2, 2))
creates a 2x2 grid of plots- Subsequent plotting functions will fill the grid in row-wise or column-wise order
Data structures and functions in R
- Data frames are two-dimensional structures that can hold different types of data in columns
- Create data frames using the
data.frame()
function - Access columns using the
$
operator or by name with square brackets
- Create data frames using the
- Matrices are two-dimensional structures that hold data of the same type
- Create matrices using the
matrix()
function or by combining vectors
- Create matrices using the
- Factors are used to represent categorical data with predefined levels
- Convert vectors to factors using the
factor()
function
- Convert vectors to factors using the
- Functions are reusable blocks of code that perform specific tasks
- Create custom functions using the
function()
keyword
- Create custom functions using the
- Packages extend R's functionality with additional functions and data sets
- Install packages using
install.packages()
and load them withlibrary()
- Install packages using
- RStudio is an integrated development environment (IDE) for R that provides a user-friendly interface for coding, data analysis, and visualization