Measures of variability help us understand how spread out data is. Standard deviation, the most common measure, tells us how far values typically stray from the average. We use different formulas for samples versus entire populations, and can compare datasets using the coefficient of variation.
Other ways to measure spread include range, interquartile range, and skewness. These tools give us a fuller picture of how data is distributed, which is crucial for making accurate interpretations and predictions in statistical analysis.
Measures of Variability
Standard deviation calculation and interpretation
- Quantifies spread of data values around the mean by calculating square root of variance
- Variance averages squared deviations from mean, making all values positive and giving more weight to larger deviations
- Low standard deviation indicates data points clustered closely around mean (test scores)
- High standard deviation indicates data points spread out over wider range (income levels)
- In normal distribution, approximately 68% of values fall within one standard deviation of mean, 95% within two, and 99.7% within three
- Outliers can significantly impact standard deviation (few extremely high or low values)
- Range provides a simple measure of spread by calculating the difference between the highest and lowest values in a dataset
Sample vs population standard deviation
- Population standard deviation $\sigma$ used when data represents entire population
- Formula: $\sigma = \sqrt{\frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}}$
- $x_i$: individual values
- $\mu$: population mean
- $N$: number of values in population (census data)
- Formula: $\sigma = \sqrt{\frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}}$
- Sample standard deviation $s$ used when data represents subset of population
- Formula: $s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}$
- $x_i$: individual values
- $\bar{x}$: sample mean
- $n$: number of values in sample (survey responses)
- Denominator $n-1$ (degrees of freedom) accounts for using sample mean $\bar{x}$ instead of population mean $\mu$
- Formula: $s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}$
Coefficient of variation for comparisons
- Coefficient of variation (CV) standardizes dispersion as ratio of standard deviation to mean, expressed as percentage
- Formula: $CV = \frac{s}{\bar{x}} \times 100%$ or $CV = \frac{\sigma}{\mu} \times 100%$
- Compares variability between datasets with different units or means (stock returns vs bond returns)
- Higher CV indicates greater variability relative to mean
- Useful for comparing risk or relative variability across investments, industries, populations (pharmaceutical companies)
- Limitations: sensitive to small changes in mean close to zero, assumes standard deviation is meaningful for data
Additional Measures of Spread
- Interquartile range (IQR) measures variability by calculating the difference between the third and first quartiles
- Dispersion refers to the overall spread or scatter of data points in a distribution
- Skewness indicates the degree of asymmetry in a distribution, affecting how data is spread around the central tendency