📊Causal Inference Unit 1 Review

1.1 Probability theory

📊Causal Inference
Unit 1 Review

1.1 Probability theory

Written by the Fiveable Content Team • Last updated September 2025

📊Causal Inference

Unit & Topic Study Guides

1.1 Probability theory

1.2 Random variables and distributions

1.3 Sampling and estimation

1.4 Hypothesis testing

1.5 Regression analysis

Probability theory forms the backbone of causal inference, providing tools to quantify uncertainty and make informed decisions. It introduces key concepts like probability distributions, independence, and conditional probability, which are essential for understanding cause-and-effect relationships.

Mastering probability theory enables researchers to model complex scenarios, estimate causal effects, and assess the strength of evidence. From basic axioms to advanced concepts like Bayes' theorem and limit theorems, probability theory equips us with the necessary framework to tackle causal inference challenges.

Basics of probability

Probability is a fundamental concept in statistics and causal inference that quantifies the likelihood of an event occurring
Understanding probability is crucial for making inferences about cause-and-effect relationships and assessing the strength of evidence for causal claims

Probability axioms

Non-negativity: Probability of an event is always greater than or equal to 0, $P(A) \geq 0$
Normalization: Probability of the entire sample space is equal to 1, $P(S) = 1$
Additivity: If events A and B are mutually exclusive, then $P(A \cup B) = P(A) + P(B)$
Complementary events: The probability of an event A and its complement A' sum to 1, $P(A) + P(A') = 1$

Sample spaces and events

Sample space (S) is the set of all possible outcomes of a random experiment (coin toss, rolling a die)
An event (A) is a subset of the sample space, representing a specific outcome or group of outcomes (getting heads, rolling an even number)
Events can be simple (a single outcome) or compound (a combination of outcomes)
Mutually exclusive events cannot occur simultaneously (rolling a 1 and rolling a 6 on a single die roll)

Conditional probability

Conditional probability $P(A|B)$ is the probability of event A occurring given that event B has already occurred
Calculated as $P(A|B) = \frac{P(A \cap B)}{P(B)}$, where $P(A \cap B)$ is the probability of both A and B occurring
Allows for updating probabilities based on new information or evidence
Helps in understanding the dependence between events and is crucial for causal inference

Probability distributions

A probability distribution is a function that describes the likelihood of different outcomes in a random variable
Probability distributions are essential for modeling uncertainty and variability in causal inference

Discrete probability distributions

Discrete random variables have a countable number of possible outcomes (number of defective items in a batch)
Probability mass function (PMF) assigns probabilities to each possible outcome
Examples include Bernoulli, binomial, and Poisson distributions

Continuous probability distributions

Continuous random variables can take on any value within a specified range (height, weight)
Probability density function (PDF) describes the relative likelihood of different values
Examples include normal, exponential, and uniform distributions
Probabilities are calculated using integrals of the PDF over a given range

Joint probability distributions

Joint probability distribution describes the probabilities of two or more random variables occurring together
Denoted as $P(X, Y)$ for random variables X and Y
Allows for modeling the dependence between multiple variables
Marginal and conditional probabilities can be derived from the joint distribution

Marginal probability distributions

Marginal probability distribution is the probability distribution of a single random variable, ignoring the others
Obtained by summing (discrete) or integrating (continuous) the joint distribution over the other variables
Provides information about the individual behavior of a random variable
Useful for simplifying complex joint distributions and focusing on specific variables of interest

Independence and dependence

Independence and dependence describe the relationship between events or random variables
Understanding these concepts is crucial for correctly modeling and interpreting causal relationships

Independent events

Events A and B are independent if the occurrence of one does not affect the probability of the other
Mathematically, $P(A|B) = P(A)$ and $P(B|A) = P(B)$
For independent events, the joint probability is the product of the individual probabilities, $P(A \cap B) = P(A) \times P(B)$
Example: Flipping a fair coin twice, the outcome of the second flip is independent of the first

Dependent events

Events A and B are dependent if the occurrence of one affects the probability of the other
Mathematically, $P(A|B) \neq P(A)$ or $P(B|A) \neq P(B)$
The joint probability of dependent events is not equal to the product of their individual probabilities
Example: Drawing cards from a deck without replacement, the probability of drawing a specific card changes after each draw

Conditional independence

Events A and B are conditionally independent given event C if $P(A|B,C) = P(A|C)$ and $P(B|A,C) = P(B|C)$
Conditional independence implies that once we know the outcome of C, the occurrence of A does not provide any additional information about B, and vice versa
Plays a crucial role in causal inference, as it helps in identifying confounding factors and estimating causal effects

Bayes' theorem

Bayes' theorem is a fundamental rule in probability theory that describes how to update probabilities based on new evidence
It is named after the Reverend Thomas Bayes, an 18th-century British statistician and Presbyterian minister

Bayes' rule

Bayes' rule states that the probability of an event A given event B is equal to the probability of event B given A, multiplied by the probability of A, divided by the probability of B
Mathematically, $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$
Allows for updating prior probabilities (before observing evidence) to posterior probabilities (after observing evidence)
Example: In medical testing, Bayes' rule can be used to calculate the probability of a patient having a disease given a positive test result

Prior vs posterior probabilities

Prior probability $P(A)$ is the initial probability of an event A before observing any evidence
Posterior probability $P(A|B)$ is the updated probability of event A after observing evidence B
Bayes' rule provides a way to calculate the posterior probability by combining the prior probability with the likelihood of the evidence
Example: Prior probability of a patient having a disease based on population prevalence, updated to a posterior probability after a positive test result

Bayesian inference

Bayesian inference is a method of statistical inference that uses Bayes' theorem to update probabilities as more evidence becomes available
Involves specifying a prior distribution for the parameters of interest, then updating it with observed data to obtain a posterior distribution
Allows for incorporating prior knowledge and beliefs into the analysis
Widely used in causal inference for estimating causal effects, handling missing data, and assessing the sensitivity of results to assumptions

Expectation and variance

Expectation and variance are two fundamental concepts in probability theory that describe the central tendency and variability of a random variable
They are essential for summarizing and comparing probability distributions in causal inference

Expected value

The expected value (or mean) of a random variable X, denoted as $E(X)$, is the average value of X over its entire range
For a discrete random variable, $E(X) = \sum_{x} x \times P(X=x)$, where $x$ are the possible values of X
For a continuous random variable, $E(X) = \int_{-\infty}^{\infty} x \times f(x) dx$, where $f(x)$ is the probability density function
Represents the long-run average value of the random variable if the experiment is repeated many times

Variance and standard deviation

Variance, denoted as $Var(X)$ or $\sigma^2$, measures the average squared deviation of a random variable X from its expected value
Calculated as $Var(X) = E[(X - E(X))^2]$
Standard deviation, denoted as $\sigma$, is the square root of the variance and measures the average deviation from the mean
Both variance and standard deviation quantify the spread or dispersion of a probability distribution

Covariance and correlation

Covariance, denoted as $Cov(X,Y)$, measures the joint variability of two random variables X and Y
Calculated as $Cov(X,Y) = E[(X - E(X))(Y - E(Y))]$
A positive covariance indicates that X and Y tend to increase or decrease together, while a negative covariance suggests an inverse relationship
Correlation, denoted as $\rho(X,Y)$, is a standardized version of covariance that ranges from -1 to 1
Calculated as $\rho(X,Y) = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$, where $\sigma_X$ and $\sigma_Y$ are the standard deviations of X and Y
Correlation measures the strength and direction of the linear relationship between two variables

Common probability distributions

Probability distributions are mathematical functions that describe the likelihood of different outcomes for a random variable
Understanding common probability distributions is essential for modeling and analyzing data in causal inference

Bernoulli and binomial distributions

Bernoulli distribution models a single trial with two possible outcomes (success or failure), with a fixed probability of success $p$
Probability mass function: $P(X=1) = p$ and $P(X=0) = 1-p$
Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials
Probability mass function: $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$, where $n$ is the number of trials and $k$ is the number of successes

Poisson distribution

Models the number of events occurring in a fixed interval of time or space, given a constant average rate of occurrence
Probability mass function: $P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$, where $\lambda$ is the average rate of occurrence
Often used to model rare events, such as the number of defects in a manufacturing process or the number of accidents in a given time period

Normal distribution

Also known as the Gaussian distribution, it is a continuous probability distribution that is symmetric and bell-shaped
Probability density function: $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$, where $\mu$ is the mean and $\sigma$ is the standard deviation
Many natural phenomena and measurement errors follow a normal distribution
Central Limit Theorem states that the sum or average of a large number of independent random variables will be approximately normally distributed

Exponential distribution

Models the time between events in a Poisson process, or the time until a specific event occurs
Probability density function: $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$, where $\lambda$ is the rate parameter
Memoryless property: The probability of an event occurring in the next time interval does not depend on how much time has already passed
Often used to model waiting times, such as the time between customer arrivals or the time until a machine failure

Limit theorems

Limit theorems are fundamental results in probability theory that describe the behavior of random variables and their distributions as the sample size increases
They are crucial for making inferences and justifying statistical methods in causal inference

Law of large numbers

States that the average of a large number of independent and identically distributed (i.i.d.) random variables will converge to their expected value as the sample size increases
Weak law of large numbers: The sample mean converges in probability to the expected value
Strong law of large numbers: The sample mean converges almost surely to the expected value
Provides a theoretical justification for using sample averages to estimate population means

Central limit theorem

States that the sum or average of a large number of i.i.d. random variables will be approximately normally distributed, regardless of the underlying distribution
More precisely, if $X_1, X_2, ..., X_n$ are i.i.d. random variables with mean $\mu$ and variance $\sigma^2$, then $\frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}}$ converges in distribution to a standard normal random variable as $n \to \infty$
Allows for using normal-based inference methods, such as confidence intervals and hypothesis tests, for non-normal data when the sample size is large

Convergence in probability vs distribution

Convergence in probability: A sequence of random variables $X_n$ converges in probability to a random variable $X$ if, for any $\epsilon > 0$, $P(|X_n - X| > \epsilon) \to 0$ as $n \to \infty$
Convergence in distribution: A sequence of random variables $X_n$ converges in distribution to a random variable $X$ if $\lim_{n \to \infty} F_{X_n}(x) = F_X(x)$ for all continuity points $x$ of $F_X$, where $F_{X_n}$ and $F_X$ are the cumulative distribution functions of $X_n$ and $X$, respectively
Convergence in probability is a stronger notion than convergence in distribution
Both types of convergence are important in causal inference for establishing the asymptotic properties of estimators and test statistics

Probability in causal inference

Probability plays a crucial role in causal inference by quantifying the uncertainty associated with cause-and-effect relationships
It provides a framework for defining and estimating causal effects, assessing the strength of evidence, and making predictions under different scenarios

Probability of causation

The probability of causation (PC) is the probability that an outcome would not have occurred in the absence of a particular cause
Formally, $PC = P(Y_0 = 0 | Y_1 = 1)$, where $Y_1$ is the observed outcome under the presence of the cause, and $Y_0$ is the counterfactual outcome under the absence of the cause
Quantifies the extent to which a cause is responsible for an observed effect
Helps in attributing outcomes to specific causes and making causal attributions

Probability of necessity and sufficiency

The probability of necessity (PN) is the probability that an outcome would not have occurred if the cause had been absent
Formally, $PN = P(Y_0 = 0 | Y = 1)$, where $Y$ is the observed outcome
The probability of sufficiency (PS) is the probability that an outcome would have occurred if the cause had been present
Formally, $PS = P(Y_1 = 1 | Y = 0)$
PN and PS provide information about the causal relationship between a cause and an effect
High PN suggests that the cause is necessary for the effect, while high PS suggests that the cause is sufficient for the effect

Probability and counterfactuals

Counterfactuals are hypothetical scenarios that describe what would have happened under different causal conditions
In causal inference, counterfactuals are used to define causal effects and reason about cause-and-effect relationships
Probability is used to express the uncertainty associated with counterfactual outcomes
For example, the average causal effect (ACE) can be defined as $ACE = E[Y_1 - Y_0] = E[Y_1] - E[Y_0]$, where $Y_1$ and $Y_0$ are the potential outcomes under treatment and control, respectively
Counterfactual probabilities, such as $P(Y_1 = 1)$ and $P(Y_0 = 1)$, are used to estimate causal effects from observational data
Probability and counterfactuals provide a unified framework for causal reasoning and inference

📊Causal Inference Unit 1 Review

1.1 Probability theory

📊Causal Inference Unit 1 Review

1.1 Probability theory

Unit & Topic Study Guides

Basics of probability

Probability axioms

Sample spaces and events

Conditional probability

Probability distributions

Discrete probability distributions

Continuous probability distributions

Joint probability distributions

Marginal probability distributions

Independence and dependence

Independent events

Dependent events

Conditional independence

Bayes' theorem

Bayes' rule

Prior vs posterior probabilities

Bayesian inference

Expectation and variance

Expected value

Variance and standard deviation

Covariance and correlation

Common probability distributions

Bernoulli and binomial distributions

Poisson distribution

Normal distribution

Exponential distribution

Limit theorems

Law of large numbers

Central limit theorem

Convergence in probability vs distribution

Probability in causal inference

Probability of causation

Probability of necessity and sufficiency

Probability and counterfactuals

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Causal Inference
Unit 1 Review