R's basic syntax and data types are the building blocks of programming in this language. Understanding these fundamentals is crucial for writing efficient and effective code. From case-sensitivity to assignment operators, these rules form the foundation of R programming.
Data types in R, like numeric, character, and logical, allow you to work with different kinds of information. Knowing how to create, manipulate, and analyze these data types is essential for handling various data science tasks and statistical analyses in R.
R Syntax and Conventions
Basic Syntax Rules
- R is case-sensitive
- Variable names, function names, and other identifiers must match exactly in terms of uppercase and lowercase letters
- Example:
myVariable
andmyvariable
are treated as different variables
- R uses the
<-
operator for assignment- Assigns a value to a variable or object
- Example:
x <- 10
assigns the value 10 to the variablex
- Comments in R are denoted by the
#
symbol- Anything following
#
on the same line is treated as a comment and not executed as code - Example:
# This is a comment
- Anything following
- R statements are typically written on separate lines or separated by semicolons
;
if multiple statements are on the same line- Example:
x <- 10; y <- 20
assigns values tox
andy
on the same line
- Example:
Function and Indexing Conventions
- R uses parentheses
()
to enclose function arguments- Example:
sum(1, 2, 3)
calls thesum
function with arguments 1, 2, and 3
- Example:
- R uses square brackets
[]
for indexing and subsetting- Example:
my_vector[1]
accesses the first element ofmy_vector
- Example:
- R uses a variety of operators for arithmetic, comparison, and logical operations
- Arithmetic operators:
+
,-
, ``,/
- Comparison operators:
==
,<
,>
- Logical operators:
&
(and),|
(or) - Example:
x + y
adds the values ofx
andy
- Arithmetic operators:
Data Types in R
Basic Data Types
- Numeric data represents numbers
- Can be further classified as integers (whole numbers) or doubles (numbers with decimal points)
- Example:
10
,3.14
- Character data represents text or strings
- Enclosed in quotation marks, such as
"hello"
or'world'
- Example:
"John Doe"
,'123 Main St.'
- Enclosed in quotation marks, such as
- Logical data represents boolean values
- Can be either
TRUE
orFALSE
- Example:
TRUE
,FALSE
- Can be either
- Factor data represents categorical or qualitative variables
- Used to store predefined levels or categories
- Example:
factor(c("male", "female", "male"))
creates a factor with levels "male" and "female"
Special Data Types and Type Checking
- R has special data types for specific purposes
Date
for representing datesPOSIXct
for representing date-time values- Example:
as.Date("2023-06-01")
creates aDate
object
- The
typeof()
function can be used to determine the data type of an object in R- Example:
typeof(10)
returns"double"
, indicating that 10 is a numeric value - Example:
typeof("hello")
returns"character"
, indicating that "hello" is a character string
- Example:
Data Structures in R
Vectors and Matrices
- Vectors are one-dimensional arrays that can hold elements of the same data type
- Created using the
c()
function - Example:
my_vector <- c(1, 2, 3, 4, 5)
creates a numeric vector
- Created using the
- Matrices are two-dimensional arrays with elements of the same data type
- Created using the
matrix()
function - Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
creates a 2x3 matrix
- Created using the
- Elements in vectors and matrices can be accessed and modified using indexing and subsetting techniques
- Example:
my_vector[1]
accesses the first element ofmy_vector
- Example:
my_matrix[1, 2]
accesses the element in the first row and second column ofmy_matrix
- Example:
Data Frames and Basic Operations
- Data frames are two-dimensional structures similar to matrices but can hold elements of different data types
- Created using the
data.frame()
function - Example:
my_dataframe <- data.frame(x = 1:3, y = c("a", "b", "c"))
creates a data frame with columnsx
andy
- Created using the
- Basic operations such as arithmetic, comparison, and logical operations can be applied element-wise to vectors, matrices, and data frames
- Example:
my_vector + 1
adds 1 to each element ofmy_vector
- Example:
my_dataframe$x > 2
returns a logical vector indicating which elements in thex
column ofmy_dataframe
are greater than 2
- Example:
- Functions like
length()
,nrow()
,ncol()
,dim()
, andstr()
can be used to examine the properties and structure of data objects- Example:
length(my_vector)
returns the number of elements inmy_vector
- Example:
dim(my_matrix)
returns the dimensions (number of rows and columns) ofmy_matrix
- Example:
Indexing and Subsetting in R
Basic Indexing Techniques
- Indexing in R starts at 1
- The first element of a vector, matrix, or data frame has an index of 1
- Example:
my_vector[1]
accesses the first element ofmy_vector
- Elements in a vector can be accessed using square brackets
[]
and the index or a vector of indices- Example:
my_vector[c(1, 3, 5)]
accesses the first, third, and fifth elements ofmy_vector
- Example:
- Negative indexing can be used to exclude specific elements from a vector
- Example:
my_vector[-2]
returns all elements ofmy_vector
except the second element
- Example:
- Matrices and data frames can be indexed using square brackets
[]
with row and column indices separated by a comma- Example:
my_matrix[1, 2]
accesses the element in the first row and second column ofmy_matrix
- Example:
my_dataframe[2, "y"]
accesses the element in the second row and the column named "y" inmy_dataframe
- Example:
Subsetting and Column Access
- Subsetting can be performed using logical vectors, which select elements that meet a specific condition
- Example:
my_vector[my_vector > 3]
returns all elements ofmy_vector
that are greater than 3
- Example:
- The
$
operator can be used to access columns of a data frame by name- Example:
my_dataframe$x
accesses the column named "x" inmy_dataframe
- Example:
- The
subset()
function can be used to extract subsets of a data frame based on specified conditions- Example:
subset(my_dataframe, x > 1)
returns a subset ofmy_dataframe
where the "x" column is greater than 1 - Example:
subset(my_dataframe, y == "a")
returns a subset ofmy_dataframe
where the "y" column is equal to "a"
- Example: