Fiveable

🔀Stochastic Processes Unit 2 Review

QR code for Stochastic Processes practice questions

2.3 Joint probability distributions

🔀Stochastic Processes
Unit 2 Review

2.3 Joint probability distributions

Written by the Fiveable Content Team • Last updated September 2025
Written by the Fiveable Content Team • Last updated September 2025
🔀Stochastic Processes
Unit & Topic Study Guides

Joint probability distributions are a fundamental concept in stochastic processes, describing how multiple random variables interact. They allow us to model complex systems with multiple uncertain components, providing a framework for analyzing their behavior and making predictions.

These distributions come in discrete and continuous forms, each with unique properties and calculation methods. Understanding marginal and conditional distributions derived from joint distributions is crucial for extracting specific information and updating probabilities based on observed data.

Joint probability distribution definition

  • Joint probability distributions describe the probabilistic relationship between two or more random variables, capturing how likely different combinations of values are to occur simultaneously
  • Allow modeling and analyzing systems or experiments involving multiple uncertain quantities, which is foundational in stochastic processes and many real-world applications

Discrete vs continuous

  • Discrete joint distributions are used when the random variables can only take on a countable number of distinct values (integers, specific categories)
  • Continuous joint distributions apply when the variables have an uncountably infinite range of possible values (real numbers on an interval or the whole real line)
  • The type of joint distribution affects how probabilities are calculated and represented mathematically (sums for discrete, integrals for continuous)

Marginal vs conditional distributions

  • Marginal distributions consider only one variable at a time, ignoring information about the others
    • Obtained by summing (discrete) or integrating (continuous) the joint distribution over the other variables
    • Represents the individual behavior of each component variable
  • Conditional distributions fix the values of some variables and look at the probabilities for the remaining ones
    • Calculated by dividing the joint probability by the marginal of the fixed variables (like Bayes' rule)
    • Shows how the distribution of certain variables changes based on knowledge of others

Joint probability mass functions

  • A joint probability mass function (PMF) gives the probability of each possible combination of values for discrete random variables
  • The PMF is a function $p(x_1, x_2, \ldots, x_n)$ that maps from the possible values of the variables to probabilities between 0 and 1
  • The probabilities for all possible outcomes must sum to 1, a key property of valid PMFs

Discrete random variables

  • PMFs are defined over a countable sample space, the set of all possible combinations of values the discrete random variables can take
  • Common discrete distributions used in multivariate settings include multinomial, Poisson, geometric, and more
  • Many concepts from univariate discrete distributions extend intuitively to the multivariate case (expected values, variance, generating functions)

Multivariate distributions

  • A multivariate distribution is a joint distribution over more than one variable, discrete or continuous
  • Multivariate PMFs can be represented by tables or matrices enumerating the probability of each possible combination of values
  • Sums and other operations on the PMF can be used to derive useful quantities and distributions (marginals, conditionals, moments)

Calculating probabilities

  • Probabilities of events are calculated by summing the PMF values for all outcomes contained in the event
  • For an event $A$ defined by conditions on the variables: $P(A) = \sum_{(x_1,\ldots,x_n) \in A} p(x_1,\ldots,x_n)$
  • The inclusion-exclusion principle and other counting techniques are often helpful in determining which outcomes satisfy the conditions defining an event of interest

Joint probability density functions

  • A joint probability density function (PDF) is used to specify a continuous multivariate distribution
  • Gives the relative likelihood of different combinations of values, but not directly interpretable as probabilities
  • Probabilities are found by integrating the PDF over a region of interest, not just evaluating it at a point

Continuous random variables

  • Joint PDFs apply to continuous random variables that can take any value in a specified range
  • Common continuous multivariate distributions include multivariate normal, exponential, beta, gamma, and more
  • Densities allow working with continuous quantities (measurements, times, etc.) without discretization

Multivariate density functions

  • A multivariate PDF is a function $f(x_1,\ldots,x_n)$ that gives the joint density of continuous random variables $X_1,\ldots,X_n$
  • Must be non-negative everywhere, and integrate to 1 over the entire domain
  • Can be used to find marginal and conditional PDFs through integration and division similar to the discrete case

Probability calculations with integrals

  • For an event $A$ defined by conditions on the continuous random variables, the probability is given by an integral: $P(A) = \int_{A} f(x_1,\ldots,x_n) dx_1\cdots dx_n$
  • Multiple integrals are often required, taken over the region of the sample space corresponding to event $A$
  • Computational tools and clever manipulations are often needed to evaluate the integrals for complex regions

Joint cumulative distribution functions

  • The joint cumulative distribution function (CDF) of random variables $X_1,\ldots,X_n$ is defined as $F(x_1,\ldots,x_n) = P(X_1 \leq x_1,\ldots, X_n \leq x_n)$
  • Gives the probability that each variable is less than or equal to a specified value simultaneously
  • Applies to both discrete and continuous distributions, unifying the PMF and PDF perspectives

CDF definition for joint distributions

  • For discrete variables, the joint CDF can be expressed as a sum: $F(x_1,\ldots,x_n) = \sum_{y_1 \leq x_1} \cdots \sum_{y_n \leq x_n} p(y_1,\ldots,y_n)$
  • In the continuous case, the CDF is an integral: $F(x_1,\ldots,x_n) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_n} f(y_1,\ldots,y_n) dy_n \cdots dy_1$
  • The CDF is the fundamental way to specify any multivariate distribution, from which other representations can be derived

Properties of joint CDFs

  • Joint CDFs are monotonically increasing in each argument: if $x_i \leq y_i$ for all $i$, then $F(x_1,\ldots,x_n) \leq F(y_1,\ldots,y_n)$
  • Marginal CDFs can be found by taking limits as the other arguments go to infinity: $\lim_{x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_n \to \infty} F(x_1,\ldots,x_n) = F_i(x_i)$
  • The joint CDF converges to 1 as all arguments go to infinity, and to 0 if any argument goes to $-\infty$

Relationship to probability

  • The joint CDF evaluated at particular values gives the probability of the random variables falling in the rectangular region bounded above by those values
  • $P(a_1 < X_1 \leq b_1, \ldots, a_n < X_n \leq b_n) = F(b_1,\ldots,b_n) - F(b_1,\ldots,b_{n-1},a_n) - \cdots - F(a_1,b_2,\ldots,b_n) + \cdots + (-1)^n F(a_1,\ldots,a_n)$
  • Intuitively, the probability is found by including and excluding the relevant corners of the rectangular region

Independent vs dependent variables

  • Independence and dependence describe the relationship between random variables in a joint distribution
  • Determine whether knowing the value of one variable provides any information about the likely values of the others
  • Have significant implications for inference, sampling, and many applications of joint distributions

Definition of independence

  • Random variables $X_1,\ldots,X_n$ are independent if their joint PMF or PDF factors as a product of marginals: $p(x_1,\ldots,x_n) = p_1(x_1) \cdots p_n(x_n)$ or $f(x_1,\ldots,x_n) = f_1(x_1) \cdots f_n(x_n)$
  • Intuitively, the variables are independent if knowing the values of some of them provides no information about the probabilities of the others
  • Independent variables can be treated separately, simplifying analysis and allowing results from univariate distributions to be applied more easily

Factoring joint distributions

  • For independent variables, the joint PMF, PDF, or CDF can be written as a product of the marginal distributions for each variable
  • This factorization greatly simplifies working with the joint distribution, as the individual variables can be considered in isolation
  • Many results for sums and transformations of independent random variables rely on this product structure

Conditional distributions for dependence

  • If random variables are not independent, their conditional distributions provide a way to describe the dependence between them
  • The conditional PMF or PDF of $X_1,\ldots,X_k$ given $X_{k+1},\ldots,X_n$ is defined as $p(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{p(x_1,\ldots,x_n)}{p(x_{k+1},\ldots,x_n)}$ or $f(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{f(x_1,\ldots,x_n)}{f(x_{k+1},\ldots,x_n)}$
  • Conditional distributions allow updating probabilities based on observed values, a key idea in Bayesian inference and many applications

Covariance and correlation

  • Covariance and correlation are two measures of the linear dependence between random variables
  • Provide a way to quantify the strength and direction of any linear relationship
  • Are important summary statistics for multivariate data and appear in many formulas related to joint distributions

Measures of dependence

  • The covariance between random variables $X$ and $Y$ is defined as $\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]$
    • Measures the joint variability of the variables around their means
    • Is positive when larger values of one variable tend to occur with larger values of the other, and negative when larger values of one tend to occur with smaller values of the other
  • The correlation between $X$ and $Y$ is defined as $\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}$
    • Normalizes the covariance to be between -1 and 1, allowing comparison across different scales
    • Measures the linear relationship: $\rho = \pm 1$ implies a perfect linear relationship, while $\rho = 0$ implies no linear relationship (but a nonlinear relationship may exist)

Covariance matrix

  • The covariance matrix $\Sigma$ of a random vector $\mathbf{X} = (X_1,\ldots,X_n)$ is an $n \times n$ matrix whose $(i,j)$ entry is $\text{Cov}(X_i,X_j)$
  • Summarizes all pairwise covariances between the components of the random vector
  • Is symmetric and positive semi-definite, with diagonal entries equal to the variances of each component
  • Appears in multivariate versions of Chebyshev's inequality, the weak law of large numbers, and the central limit theorem

Correlation coefficient

  • The correlation coefficient matrix $\mathbf{R}$ has $(i,j)$ entry equal to the correlation $\rho(X_i,X_j)$
  • Is the covariance matrix of the standardized variables $(X_i - \mu_i)/\sigma_i$, where $\mu_i$ and $\sigma_i$ are the mean and standard deviation of $X_i$
  • Has diagonal entries of 1 and off-diagonal entries between -1 and 1
  • Is often easier to interpret than the covariance matrix due to the normalized scale

Transformations of random vectors

  • Transformations of random vectors are used to create new random variables or vectors from existing ones
  • Often used to simplify calculations, standardize variables, or obtain distributions with desirable properties
  • The distribution of the transformed variables can be found using the joint distribution of the original variables

Linear transformations

  • A linear transformation of a random vector $\mathbf{X} = (X_1,\ldots,X_n)$ is a new vector $\mathbf{Y} = (Y_1,\ldots,Y_m)$ defined by $\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}$ for an $m \times n$ matrix $\mathbf{A}$ and $m \times 1$ vector $\mathbf{b}$
  • The mean vector and covariance matrix of $\mathbf{Y}$ are given by $E[\mathbf{Y}] = \mathbf{A}E[\mathbf{X}] + \mathbf{b}$ and $\text{Cov}(\mathbf{Y}) = \mathbf{A}\text{Cov}(\mathbf{X})\mathbf{A}^T$
  • Many important results in statistics and signal processing involve linear transformations of random vectors (principal component analysis, filtering, etc.)

Jacobian matrix

  • For a general (nonlinear) transformation $\mathbf{Y} = g(\mathbf{X})$, the joint PDF of $\mathbf{Y}$ is related to that of $\mathbf{X}$ by $f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(g^{-1}(\mathbf{y})) |\det(J_{g^{-1}}(\mathbf{y}))|$
  • $J_{g^{-1}}(\mathbf{y})$ is the Jacobian matrix of the inverse transformation $\mathbf{X} = g^{-1}(\mathbf{Y})$, with $(i,j)$ entry equal to $\frac{\partial x_i}{\partial y_j}$
  • The Jacobian matrix accounts for how the transformation stretches or compresses regions of the sample space, affecting the probability density

Distribution of transformed variables

  • The joint CDF of $\mathbf{Y} = g(\mathbf{X})$ is given by $F_{\mathbf{Y}}(\mathbf{y}) = P(g(\mathbf{X}) \leq \mathbf{y}) = \int_{g(\mathbf{x}) \leq \mathbf{y}} f_{\mathbf{X}}(\mathbf{x}) d\mathbf{x}$
    • The region of integration is the set of $\mathbf{x}$ values that map into the rectangle $(-\infty,y_1] \times \cdots \times (-\infty,y_m]$ under $g$
  • For linear transformations of continuous random vectors, the joint PDF can be found using the Jacobian formula with $J_{g^{-1}}(\mathbf{y}) = \mathbf{A}^{-1}$
  • In the discrete case, the PMF of $\mathbf{Y}$ is given by $p_{\mathbf{Y}}(\mathbf{y}) = \sum_{\mathbf{x}: g(\mathbf{x}) = \mathbf{y}} p_{\mathbf{X}}(\mathbf{x})$

Sums of random variables

  • Sums of random variables arise in many applications, such as repeated measurements, cumulative effects, or aggregations
  • The distribution of a sum depends on the joint distribution of the individual variables being added together
  • Convolutions provide a general way to find the distribution of sums in both the discrete and continuous cases

Convolution for discrete variables

  • For independent discrete random variables $X$ and $Y$ with PMFs $p_X$ and $p_Y$, the PMF of their sum $Z = X + Y$ is given by the convolution sum: $p_Z(z) = \sum_k p_X(k)p_Y(z-k)$
    • The convolution evaluates the probability of all ways to achieve a sum of $z$ by adding values of $X$ and $Y$
  • The convolution sum extends to more than two variables: $p_{X_1 + \cdots + X_n}(z) = \sum_{k_1 + \cdots + k_n = z} p_{X_1}(k_1) \cdots p_{X_n}(k_n)$
  • Convolution sums can be efficiently computed using generating functions or Fourier transforms

Convolution integral for continuous variables

  • For independent continuous random variables $X$ and $Y$ with PDFs $f_X$ and $f_Y$, the PDF of their sum $Z = X + Y$ is given by the convolution integral: $f_Z(z) = \int_{