🎲Data, Inference, and Decisions Unit 4 Review

4.1 Measures of central tendency and dispersion

🎲Data, Inference, and Decisions
Unit 4 Review

4.1 Measures of central tendency and dispersion

Written by the Fiveable Content Team • Last updated September 2025

🎲Data, Inference, and Decisions

Unit & Topic Study Guides

4.1 Measures of central tendency and dispersion

4.2 Correlation and association measures

4.3 Data visualization techniques (histograms, box plots, scatter plots)

4.4 Exploring multivariate relationships

4.5 Introduction to data preprocessing and transformation

Measures of central tendency and dispersion are key tools for summarizing data. They help us understand the typical values and spread in a dataset, giving us a quick snapshot of what's going on.

These measures are crucial for making sense of large datasets. By using means, medians, modes, and measures like standard deviation, we can compare different groups and spot trends that might not be obvious at first glance.

Measures of Central Tendency

Mean, Median, and Mode Calculations

Mean represents arithmetic average of dataset calculated by summing all values and dividing by number of observations
- Formula: $\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$
- Example: For dataset [2, 4, 6, 8, 10], mean = (2 + 4 + 6 + 8 + 10) / 5 = 6
Median signifies middle value in ordered dataset separating lower half from upper half of data points
- For odd number of values, median = middle value
- For even number of values, median = average of two middle values
- Example: For dataset [1, 3, 5, 7, 9], median = 5
Mode denotes most frequently occurring value in dataset
- Datasets can have multiple modes (bimodal, multimodal) or no mode
- Example: For dataset [1, 2, 2, 3, 4, 4, 5], mode = 2 and 4 (bimodal)

Relationships and Applications

Skewed distributions relationship between mean, median, and mode indicates direction and degree of skewness
- Right-skewed: Mean > Median > Mode
- Left-skewed: Mode > Median > Mean
- Example: Income distribution often right-skewed, with mean higher than median due to high earners
Choice between mean, median, and mode depends on data type and presence of outliers
- Nominal data: Only mode applicable (hair color)
- Ordinal data: Median and mode applicable (education levels)
- Interval/Ratio data: All measures applicable (temperature, weight)
Mean sensitivity to outliers while median more robust to extreme values
- Example: Dataset [1, 2, 3, 4, 100] has mean of 22 but median of 3
Weighted means used when certain data points should have more influence
- Formula: $\bar{x}_w = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}$
- Example: Calculating GPA with different credit weights for courses

Measures of Variability

Range, Variance, and Standard Deviation

Range represents simplest measure of dispersion calculated as difference between maximum and minimum values
- Formula: Range = Max - Min
- Example: For dataset [2, 4, 6, 8, 10], range = 10 - 2 = 8
Variance measures average squared deviation from mean providing insight into spread of data points
- Formula: $s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}$
- Example: For dataset [1, 2, 3, 4, 5], variance ≈ 2.5
Standard deviation expresses square root of variance in same units as original data
- Formula: $s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}$
- Example: For dataset [1, 2, 3, 4, 5], standard deviation ≈ 1.58

Advanced Concepts and Applications

Empirical rule (68-95-99.7 rule) relates standard deviation to proportion of data points within specific ranges for normally distributed data
- 68% of data within 1 standard deviation of mean
- 95% of data within 2 standard deviations of mean
- 99.7% of data within 3 standard deviations of mean
Variance and standard deviation sensitivity to outliers potentially leading to inflated measures of dispersion
- Example: Dataset [1, 2, 3, 4, 100] has much larger standard deviation than [1, 2, 3, 4, 5]
Coefficient of variation (CV) standardized measure of dispersion calculated as ratio of standard deviation to mean
- Formula: $CV = \frac{s}{\bar{x}} \times 100\%$
- Allows comparison between datasets with different units or scales (comparing variability in heights vs weights)
Grouped data computational formulas for variance and standard deviation involve frequency distributions and midpoints of class intervals
- Used when dealing with large datasets or data presented in frequency tables

Choosing Appropriate Measures

Data Types and Measure Selection

Choice of central tendency and dispersion measures depends on level of measurement of data
- Nominal data (categories without order)
  - Central tendency: Only mode applicable
  - Dispersion: Limited to frequency distributions
  - Example: Eye color (mode = brown)
- Ordinal data (categories with order)
  - Central tendency: Median and mode applicable
  - Dispersion: Interquartile range appropriate
  - Example: Education levels (high school, bachelor's, master's, doctorate)
- Interval data (ordered with equal intervals)
  - All measures applicable
  - Example: Temperature in Celsius or Fahrenheit
- Ratio data (ordered with equal intervals and true zero)
  - All measures applicable
  - Example: Height, weight, income

Distribution Characteristics and Outliers

Skewed distributions may require median and interquartile range instead of mean and standard deviation
- Provides more robust and representative summaries
- Example: Income distributions often use median due to right skew
Presence of outliers should be considered when choosing measures
- Can significantly impact means and standard deviations
- Median and interquartile range more resistant to outliers
- Example: House prices in a neighborhood with one extremely expensive mansion
Transformations applied to data to make it more amenable to certain statistical measures and analyses
- Logarithmic transformation for right-skewed data
- Square root transformation for count data
- Example: Log-transforming stock prices to analyze percentage changes

🎲Data, Inference, and Decisions Unit 4 Review

4.1 Measures of central tendency and dispersion

🎲Data, Inference, and Decisions
Unit 4 Review

4.1 Measures of central tendency and dispersion

Unit & Topic Study Guides

Measures of Central Tendency

Mean, Median, and Mode Calculations

Relationships and Applications

Measures of Variability

Range, Variance, and Standard Deviation

Advanced Concepts and Applications

Choosing Appropriate Measures

Data Types and Measure Selection

Distribution Characteristics and Outliers

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes