📊Probability and Statistics Unit 7 Review

7.2 Measures of dispersion

📊Probability and Statistics
Unit 7 Review

7.2 Measures of dispersion

Written by the Fiveable Content Team • Last updated September 2025

📊Probability and Statistics

Unit & Topic Study Guides

7.1 Measures of central tendency

7.2 Measures of dispersion

7.3 Histograms and density plots

7.4 Box plots and scatter plots

7.5 Contingency tables and bar charts

Measures of dispersion quantify how spread out data points are in a dataset. These tools help statisticians understand variability and distribution, providing crucial insights into data patterns and outliers.

Variance, standard deviation, range, and interquartile range are key dispersion measures. Each offers unique perspectives on data spread, with some being more robust to outliers than others. Understanding these measures is essential for effective data analysis and interpretation.

Variance and standard deviation

Variance and standard deviation quantify the spread or dispersion of a dataset around its mean
These measures are essential for understanding the variability and distribution of data in probability and statistics

Population vs sample variance

Population variance $\sigma^2$ represents the average squared deviation from the mean for an entire population
Sample variance $s^2$ estimates the population variance using a subset of data (sample)
- Uses $n-1$ in the denominator as a correction factor to account for bias
Formulas:
- Population variance: $\sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}$
- Sample variance: $s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}$

Calculating variance

Calculate the mean of the dataset
Subtract the mean from each data point and square the result
Sum the squared differences and divide by the number of data points (or $n-1$ for sample variance)
Example:
- Dataset: 4, 7, 9, 12, 18
- Mean: $\bar{x} = \frac{4+7+9+12+18}{5} = 10$
- Squared differences: $(4-10)^2 = 36$, $(7-10)^2 = 9$, $(9-10)^2 = 1$, $(12-10)^2 = 4$, $(18-10)^2 = 64$
- Sample variance: $s^2 = \frac{36+9+1+4+64}{4} = 28.5$

Standard deviation from variance

Standard deviation is the square root of the variance
Represents the average distance of data points from the mean
Has the same units as the original data, making it more interpretable than variance
Formulas:
- Population standard deviation: $\sigma = \sqrt{\sigma^2}$
- Sample standard deviation: $s = \sqrt{s^2}$

Interpreting standard deviation

A low standard deviation indicates data points are clustered closely around the mean (less dispersion)
A high standard deviation suggests data points are spread out over a wider range (more dispersion)
Approximately 68% of data falls within one standard deviation of the mean in a normal distribution (empirical rule)
Comparing standard deviations allows for assessing the relative variability of different datasets

Range and interquartile range

Range and interquartile range (IQR) are measures of dispersion that do not rely on the mean
These measures are less sensitive to outliers compared to variance and standard deviation

Calculating range

Range is the difference between the maximum and minimum values in a dataset
Provides a simple measure of the total spread of the data
Formula: $Range = max(x) - min(x)$
Example:
- Dataset: 4, 7, 9, 12, 18
- Range: $18 - 4 = 14$

Percentiles and quartiles

Percentiles divide a dataset into 100 equal parts
Quartiles divide a dataset into four equal parts (Q1, Q2 or median, Q3)
Calculating quartiles:
- First, arrange the data in ascending order
- Q1 is the middle value between the minimum and the median
- Q3 is the middle value between the median and the maximum

Interquartile range (IQR)

IQR is the range of the middle 50% of the data
Calculated as the difference between the third quartile (Q3) and the first quartile (Q1)
Formula: $IQR = Q3 - Q1$
Example:
- Dataset: 4, 7, 9, 12, 18
- Q1: 7, Q3: 12
- IQR: $12 - 7 = 5$

Outliers and IQR

IQR can be used to identify potential outliers in a dataset
Outliers are data points that fall below $Q1 - 1.5 \times IQR$ or above $Q3 + 1.5 \times IQR$
Example:
- Dataset: 4, 7, 9, 12, 18
- IQR: 5
- Lower fence: $7 - 1.5 \times 5 = -0.5$
- Upper fence: $12 + 1.5 \times 5 = 19.5$
- No outliers in this dataset

Coefficient of variation

Coefficient of variation (CV) is a standardized measure of dispersion
Useful for comparing the relative variability of datasets with different units or means

Relative variability

CV expresses the standard deviation as a percentage of the mean
Allows for comparing the variability of datasets with different scales
A higher CV indicates greater relative variability

Calculating coefficient of variation

Formula: $CV = \frac{s}{\bar{x}} \times 100%$
- $s$ is the sample standard deviation
- $\bar{x}$ is the sample mean
Example:
- Dataset: 4, 7, 9, 12, 18
- Sample mean: $\bar{x} = 10$
- Sample standard deviation: $s \approx 5.34$
- CV: $\frac{5.34}{10} \times 100% \approx 53.4%$

Comparing distributions with CV

When comparing datasets, a higher CV indicates greater relative variability
CV is dimensionless, allowing for comparison of variability across different types of data
Limitations:
- Not suitable for datasets with means close to zero or negative means
- Sensitive to small changes in the mean when the mean is close to zero

Mean absolute deviation

Mean absolute deviation (MAD) is another measure of dispersion
Based on the absolute differences between each data point and the mean

Calculating MAD

Calculate the mean of the dataset
Compute the absolute difference between each data point and the mean
Sum the absolute differences and divide by the number of data points
Formula: $MAD = \frac{\sum_{i=1}^{n} |x_i - \bar{x}|}{n}$
Example:
- Dataset: 4, 7, 9, 12, 18
- Mean: $\bar{x} = 10$
- Absolute differences: $|4-10| = 6$, $|7-10| = 3$, $|9-10| = 1$, $|12-10| = 2$, $|18-10| = 8$
- MAD: $\frac{6+3+1+2+8}{5} = 4$

MAD vs standard deviation

MAD is less sensitive to outliers compared to standard deviation
Standard deviation squares the differences, giving more weight to larger deviations
MAD is more robust and may be preferred when dealing with datasets containing outliers
However, standard deviation is more mathematically tractable and widely used in statistical analysis

Chebyshev's inequality

Chebyshev's inequality provides a bound on the proportion of data within a certain number of standard deviations from the mean
Applicable to any dataset, regardless of its distribution

Proportion of data within k standard deviations

Chebyshev's inequality states that at least $1 - \frac{1}{k^2}$ of the data falls within $k$ standard deviations of the mean
Formula: $P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$, where $k > 1$
Example:
- For $k = 2$, at least $1 - \frac{1}{2^2} = 0.75$ or 75% of the data falls within 2 standard deviations of the mean
- For $k = 3$, at least $1 - \frac{1}{3^2} \approx 0.89$ or 89% of the data falls within 3 standard deviations of the mean

Applications of Chebyshev's inequality

Provides a conservative bound on the proportion of data within a certain range
Useful when the underlying distribution is unknown or non-normal
Helps in identifying potential outliers or anomalies in a dataset
Used in various fields, such as finance (value at risk) and quality control (process monitoring)

Dispersion in non-normal distributions

Measures of dispersion can behave differently in non-normal distributions
Skewness and kurtosis are important factors to consider when analyzing dispersion in these cases

Skewness and dispersion

Skewness measures the asymmetry of a distribution
Positive skew: tail on the right side is longer or fatter (right-skewed)
- Mean > Median > Mode
Negative skew: tail on the left side is longer or fatter (left-skewed)
- Mode > Median > Mean
In skewed distributions, measures like standard deviation may not accurately capture the dispersion

Kurtosis and dispersion

Kurtosis measures the heaviness of the tails and the peakedness of a distribution
Leptokurtic: heavy tails and a sharp peak (positive kurtosis)
- More outliers and higher dispersion in the tails compared to a normal distribution
Platykurtic: light tails and a flatter peak (negative kurtosis)
- Fewer outliers and lower dispersion in the tails compared to a normal distribution
Mesokurtic: normal distribution (kurtosis = 0)

Robust measures of dispersion

In the presence of skewness, kurtosis, or outliers, robust measures of dispersion are preferred
Median absolute deviation (MAD) is a robust alternative to standard deviation
- Calculates the median of the absolute deviations from the median
- Less sensitive to outliers and skewness
Interquartile range (IQR) is another robust measure
- Focuses on the middle 50% of the data, ignoring the tails
Trimmed or Winsorized standard deviation
- Removes or limits the influence of a certain percentage of the highest and lowest values before calculating the standard deviation
- Reduces the impact of outliers on the dispersion measure

📊Probability and Statistics Unit 7 Review

7.2 Measures of dispersion

📊Probability and Statistics Unit 7 Review

7.2 Measures of dispersion

Unit & Topic Study Guides

Variance and standard deviation

Population vs sample variance

Calculating variance

Standard deviation from variance

Interpreting standard deviation

Range and interquartile range

Calculating range

Percentiles and quartiles

Interquartile range (IQR)

Outliers and IQR

Coefficient of variation

Relative variability

Calculating coefficient of variation

Comparing distributions with CV

Mean absolute deviation

Calculating MAD

MAD vs standard deviation

Chebyshev's inequality

Proportion of data within k standard deviations

Applications of Chebyshev's inequality

Dispersion in non-normal distributions

Skewness and dispersion

Kurtosis and dispersion

Robust measures of dispersion

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Probability and Statistics
Unit 7 Review