🔀Stochastic Processes Unit 2 Review

2.3 Joint probability distributions

🔀Stochastic Processes
Unit 2 Review

2.3 Joint probability distributions

Written by the Fiveable Content Team • Last updated September 2025

🔀Stochastic Processes

Unit & Topic Study Guides

2.1 Discrete probability distributions

2.2 Continuous probability distributions

2.3 Joint probability distributions

2.4 Marginal and conditional distributions

2.5 Transformations of random variables

2.6 Limit theorems

Joint probability distributions are a fundamental concept in stochastic processes, describing how multiple random variables interact. They allow us to model complex systems with multiple uncertain components, providing a framework for analyzing their behavior and making predictions.

These distributions come in discrete and continuous forms, each with unique properties and calculation methods. Understanding marginal and conditional distributions derived from joint distributions is crucial for extracting specific information and updating probabilities based on observed data.

Joint probability distribution definition

Joint probability distributions describe the probabilistic relationship between two or more random variables, capturing how likely different combinations of values are to occur simultaneously
Allow modeling and analyzing systems or experiments involving multiple uncertain quantities, which is foundational in stochastic processes and many real-world applications

Discrete vs continuous

Discrete joint distributions are used when the random variables can only take on a countable number of distinct values (integers, specific categories)
Continuous joint distributions apply when the variables have an uncountably infinite range of possible values (real numbers on an interval or the whole real line)
The type of joint distribution affects how probabilities are calculated and represented mathematically (sums for discrete, integrals for continuous)

Marginal vs conditional distributions

Marginal distributions consider only one variable at a time, ignoring information about the others
- Obtained by summing (discrete) or integrating (continuous) the joint distribution over the other variables
- Represents the individual behavior of each component variable
Conditional distributions fix the values of some variables and look at the probabilities for the remaining ones
- Calculated by dividing the joint probability by the marginal of the fixed variables (like Bayes' rule)
- Shows how the distribution of certain variables changes based on knowledge of others

Joint probability mass functions

A joint probability mass function (PMF) gives the probability of each possible combination of values for discrete random variables
The PMF is a function $p(x_1, x_2, \ldots, x_n)$ that maps from the possible values of the variables to probabilities between 0 and 1
The probabilities for all possible outcomes must sum to 1, a key property of valid PMFs

Discrete random variables

PMFs are defined over a countable sample space, the set of all possible combinations of values the discrete random variables can take
Common discrete distributions used in multivariate settings include multinomial, Poisson, geometric, and more
Many concepts from univariate discrete distributions extend intuitively to the multivariate case (expected values, variance, generating functions)

Multivariate distributions

A multivariate distribution is a joint distribution over more than one variable, discrete or continuous
Multivariate PMFs can be represented by tables or matrices enumerating the probability of each possible combination of values
Sums and other operations on the PMF can be used to derive useful quantities and distributions (marginals, conditionals, moments)

Calculating probabilities

Probabilities of events are calculated by summing the PMF values for all outcomes contained in the event
For an event $A$ defined by conditions on the variables: $P(A) = \sum_{(x_1,\ldots,x_n) \in A} p(x_1,\ldots,x_n)$
The inclusion-exclusion principle and other counting techniques are often helpful in determining which outcomes satisfy the conditions defining an event of interest

Joint probability density functions

A joint probability density function (PDF) is used to specify a continuous multivariate distribution
Gives the relative likelihood of different combinations of values, but not directly interpretable as probabilities
Probabilities are found by integrating the PDF over a region of interest, not just evaluating it at a point

Continuous random variables

Joint PDFs apply to continuous random variables that can take any value in a specified range
Common continuous multivariate distributions include multivariate normal, exponential, beta, gamma, and more
Densities allow working with continuous quantities (measurements, times, etc.) without discretization

Multivariate density functions

A multivariate PDF is a function $f(x_1,\ldots,x_n)$ that gives the joint density of continuous random variables $X_1,\ldots,X_n$
Must be non-negative everywhere, and integrate to 1 over the entire domain
Can be used to find marginal and conditional PDFs through integration and division similar to the discrete case

Probability calculations with integrals

For an event $A$ defined by conditions on the continuous random variables, the probability is given by an integral: $P(A) = \int_{A} f(x_1,\ldots,x_n) dx_1\cdots dx_n$
Multiple integrals are often required, taken over the region of the sample space corresponding to event $A$
Computational tools and clever manipulations are often needed to evaluate the integrals for complex regions

Joint cumulative distribution functions

The joint cumulative distribution function (CDF) of random variables $X_1,\ldots,X_n$ is defined as $F(x_1,\ldots,x_n) = P(X_1 \leq x_1,\ldots, X_n \leq x_n)$
Gives the probability that each variable is less than or equal to a specified value simultaneously
Applies to both discrete and continuous distributions, unifying the PMF and PDF perspectives

CDF definition for joint distributions

For discrete variables, the joint CDF can be expressed as a sum: $F(x_1,\ldots,x_n) = \sum_{y_1 \leq x_1} \cdots \sum_{y_n \leq x_n} p(y_1,\ldots,y_n)$
In the continuous case, the CDF is an integral: $F(x_1,\ldots,x_n) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_n} f(y_1,\ldots,y_n) dy_n \cdots dy_1$
The CDF is the fundamental way to specify any multivariate distribution, from which other representations can be derived

Properties of joint CDFs

Joint CDFs are monotonically increasing in each argument: if $x_i \leq y_i$ for all $i$, then $F(x_1,\ldots,x_n) \leq F(y_1,\ldots,y_n)$
Marginal CDFs can be found by taking limits as the other arguments go to infinity: $\lim_{x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_n \to \infty} F(x_1,\ldots,x_n) = F_i(x_i)$
The joint CDF converges to 1 as all arguments go to infinity, and to 0 if any argument goes to $-\infty$

Relationship to probability

The joint CDF evaluated at particular values gives the probability of the random variables falling in the rectangular region bounded above by those values
$P(a_1 < X_1 \leq b_1, \ldots, a_n < X_n \leq b_n) = F(b_1,\ldots,b_n) - F(b_1,\ldots,b_{n-1},a_n) - \cdots - F(a_1,b_2,\ldots,b_n) + \cdots + (-1)^n F(a_1,\ldots,a_n)$
Intuitively, the probability is found by including and excluding the relevant corners of the rectangular region

Independent vs dependent variables

Independence and dependence describe the relationship between random variables in a joint distribution
Determine whether knowing the value of one variable provides any information about the likely values of the others
Have significant implications for inference, sampling, and many applications of joint distributions

Definition of independence

Random variables $X_1,\ldots,X_n$ are independent if their joint PMF or PDF factors as a product of marginals: $p(x_1,\ldots,x_n) = p_1(x_1) \cdots p_n(x_n)$ or $f(x_1,\ldots,x_n) = f_1(x_1) \cdots f_n(x_n)$
Intuitively, the variables are independent if knowing the values of some of them provides no information about the probabilities of the others
Independent variables can be treated separately, simplifying analysis and allowing results from univariate distributions to be applied more easily

Factoring joint distributions

For independent variables, the joint PMF, PDF, or CDF can be written as a product of the marginal distributions for each variable
This factorization greatly simplifies working with the joint distribution, as the individual variables can be considered in isolation
Many results for sums and transformations of independent random variables rely on this product structure

Conditional distributions for dependence

If random variables are not independent, their conditional distributions provide a way to describe the dependence between them
The conditional PMF or PDF of $X_1,\ldots,X_k$ given $X_{k+1},\ldots,X_n$ is defined as $p(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{p(x_1,\ldots,x_n)}{p(x_{k+1},\ldots,x_n)}$ or $f(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{f(x_1,\ldots,x_n)}{f(x_{k+1},\ldots,x_n)}$
Conditional distributions allow updating probabilities based on observed values, a key idea in Bayesian inference and many applications

Covariance and correlation

Covariance and correlation are two measures of the linear dependence between random variables
Provide a way to quantify the strength and direction of any linear relationship
Are important summary statistics for multivariate data and appear in many formulas related to joint distributions

Measures of dependence

The covariance between random variables $X$ and $Y$ is defined as $\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]$
- Measures the joint variability of the variables around their means
- Is positive when larger values of one variable tend to occur with larger values of the other, and negative when larger values of one tend to occur with smaller values of the other
The correlation between $X$ and $Y$ is defined as $\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}$
- Normalizes the covariance to be between -1 and 1, allowing comparison across different scales
- Measures the linear relationship: $\rho = \pm 1$ implies a perfect linear relationship, while $\rho = 0$ implies no linear relationship (but a nonlinear relationship may exist)

Covariance matrix

The covariance matrix $\Sigma$ of a random vector $\mathbf{X} = (X_1,\ldots,X_n)$ is an $n \times n$ matrix whose $(i,j)$ entry is $\text{Cov}(X_i,X_j)$
Summarizes all pairwise covariances between the components of the random vector
Is symmetric and positive semi-definite, with diagonal entries equal to the variances of each component
Appears in multivariate versions of Chebyshev's inequality, the weak law of large numbers, and the central limit theorem

Correlation coefficient

The correlation coefficient matrix $\mathbf{R}$ has $(i,j)$ entry equal to the correlation $\rho(X_i,X_j)$
Is the covariance matrix of the standardized variables $(X_i - \mu_i)/\sigma_i$, where $\mu_i$ and $\sigma_i$ are the mean and standard deviation of $X_i$
Has diagonal entries of 1 and off-diagonal entries between -1 and 1
Is often easier to interpret than the covariance matrix due to the normalized scale

Transformations of random vectors

Transformations of random vectors are used to create new random variables or vectors from existing ones
Often used to simplify calculations, standardize variables, or obtain distributions with desirable properties
The distribution of the transformed variables can be found using the joint distribution of the original variables

Linear transformations

A linear transformation of a random vector $\mathbf{X} = (X_1,\ldots,X_n)$ is a new vector $\mathbf{Y} = (Y_1,\ldots,Y_m)$ defined by $\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}$ for an $m \times n$ matrix $\mathbf{A}$ and $m \times 1$ vector $\mathbf{b}$
The mean vector and covariance matrix of $\mathbf{Y}$ are given by $E[\mathbf{Y}] = \mathbf{A}E[\mathbf{X}] + \mathbf{b}$ and $\text{Cov}(\mathbf{Y}) = \mathbf{A}\text{Cov}(\mathbf{X})\mathbf{A}^T$
Many important results in statistics and signal processing involve linear transformations of random vectors (principal component analysis, filtering, etc.)

Jacobian matrix

For a general (nonlinear) transformation $\mathbf{Y} = g(\mathbf{X})$, the joint PDF of $\mathbf{Y}$ is related to that of $\mathbf{X}$ by $f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(g^{-1}(\mathbf{y})) |\det(J_{g^{-1}}(\mathbf{y}))|$
$J_{g^{-1}}(\mathbf{y})$ is the Jacobian matrix of the inverse transformation $\mathbf{X} = g^{-1}(\mathbf{Y})$, with $(i,j)$ entry equal to $\frac{\partial x_i}{\partial y_j}$
The Jacobian matrix accounts for how the transformation stretches or compresses regions of the sample space, affecting the probability density

Distribution of transformed variables

The joint CDF of $\mathbf{Y} = g(\mathbf{X})$ is given by $F_{\mathbf{Y}}(\mathbf{y}) = P(g(\mathbf{X}) \leq \mathbf{y}) = \int_{g(\mathbf{x}) \leq \mathbf{y}} f_{\mathbf{X}}(\mathbf{x}) d\mathbf{x}$
- The region of integration is the set of $\mathbf{x}$ values that map into the rectangle $(-\infty,y_1] \times \cdots \times (-\infty,y_m]$ under $g$
For linear transformations of continuous random vectors, the joint PDF can be found using the Jacobian formula with $J_{g^{-1}}(\mathbf{y}) = \mathbf{A}^{-1}$
In the discrete case, the PMF of $\mathbf{Y}$ is given by $p_{\mathbf{Y}}(\mathbf{y}) = \sum_{\mathbf{x}: g(\mathbf{x}) = \mathbf{y}} p_{\mathbf{X}}(\mathbf{x})$

Sums of random variables

Sums of random variables arise in many applications, such as repeated measurements, cumulative effects, or aggregations
The distribution of a sum depends on the joint distribution of the individual variables being added together
Convolutions provide a general way to find the distribution of sums in both the discrete and continuous cases

Convolution for discrete variables

For independent discrete random variables $X$ and $Y$ with PMFs $p_X$ and $p_Y$, the PMF of their sum $Z = X + Y$ is given by the convolution sum: $p_Z(z) = \sum_k p_X(k)p_Y(z-k)$
- The convolution evaluates the probability of all ways to achieve a sum of $z$ by adding values of $X$ and $Y$
The convolution sum extends to more than two variables: $p_{X_1 + \cdots + X_n}(z) = \sum_{k_1 + \cdots + k_n = z} p_{X_1}(k_1) \cdots p_{X_n}(k_n)$
Convolution sums can be efficiently computed using generating functions or Fourier transforms

Convolution integral for continuous variables

For independent continuous random variables $X$ and $Y$ with PDFs $f_X$ and $f_Y$, the PDF of their sum $Z = X + Y$ is given by the convolution integral: $f_Z(z) = \int_{

🔀Stochastic Processes Unit 2 Review

2.3 Joint probability distributions

🔀Stochastic Processes Unit 2 Review

2.3 Joint probability distributions

Unit & Topic Study Guides

Joint probability distribution definition

Discrete vs continuous

Marginal vs conditional distributions

Joint probability mass functions

Discrete random variables

Multivariate distributions

Calculating probabilities

Joint probability density functions

Continuous random variables

Multivariate density functions

Probability calculations with integrals

Joint cumulative distribution functions

CDF definition for joint distributions

Properties of joint CDFs

Relationship to probability

Independent vs dependent variables

Definition of independence

Factoring joint distributions

Conditional distributions for dependence

Covariance and correlation

Measures of dependence

Covariance matrix

Correlation coefficient

Transformations of random vectors

Linear transformations

Jacobian matrix

Distribution of transformed variables

Sums of random variables

Convolution for discrete variables

Convolution integral for continuous variables

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🔀Stochastic Processes
Unit 2 Review