📉Statistical Methods for Data Science Unit 3 Review

3.1 Measures of Central Tendency and Dispersion

📉Statistical Methods for Data Science
Unit 3 Review

3.1 Measures of Central Tendency and Dispersion

Written by the Fiveable Content Team • Last updated September 2025

📉Statistical Methods for Data Science

Unit & Topic Study Guides

3.1 Measures of Central Tendency and Dispersion

3.2 Data Visualization Techniques

3.3 Exploratory Data Analysis (EDA) Methods

3.4 Identifying Patterns, Outliers, and Relationships in Data

Measures of central tendency and dispersion are key tools for summarizing data. They help us understand the typical values in a dataset and how spread out the data is. These concepts are crucial for getting a quick snapshot of your data's main features.

By using these measures, you can compare different datasets and spot patterns. They're the foundation for more advanced statistical analyses and data visualization techniques. Understanding these basics is essential for making sense of complex data in real-world situations.

Measures of Central Tendency

Calculating Averages

Mean represents the arithmetic average of a set of values
- Calculated by summing all values and dividing by the number of values
- Sensitive to extreme values or outliers
- Example: The mean of the set {1, 2, 3, 4, 5} is $\frac{1+2+3+4+5}{5} = 3$
Median represents the middle value when a dataset is ordered from lowest to highest
- Robust to outliers as it only considers the position of values
- For an odd number of values, the median is the middle value
- For an even number of values, the median is the average of the two middle values
- Example: The median of the set {1, 2, 3, 4, 5} is 3, and the median of the set {1, 2, 3, 4, 5, 6} is $\frac{3+4}{2} = 3.5$
Mode represents the most frequently occurring value in a dataset
- Can have no mode (if no value appears more than once), one mode (unimodal), or multiple modes (bimodal or multimodal)
- Useful for categorical or discrete data
- Example: The mode of the set {1, 2, 2, 3, 4, 4, 5} is 2 and 4 (bimodal)

Comparing Measures of Central Tendency

In symmetric distributions, the mean, median, and mode are equal
In right-skewed distributions, the mean is greater than the median, which is greater than the mode
In left-skewed distributions, the mode is greater than the median, which is greater than the mean
The mean is influenced by extreme values, while the median and mode are not
The choice of measure depends on the data type, distribution, and presence of outliers

Measures of Variability

Range and Interquartile Range

Range is the difference between the maximum and minimum values in a dataset
- Provides a simple measure of the spread of data
- Sensitive to outliers as it only considers the extreme values
- Example: The range of the set {1, 2, 3, 4, 5} is 5 - 1 = 4
Interquartile range (IQR) is the difference between the first quartile (Q1) and third quartile (Q3)
- Quartiles divide the ordered dataset into four equal parts
- Q1 is the median of the lower half of the data, and Q3 is the median of the upper half
- IQR is a robust measure of spread as it is not affected by outliers
- Example: For the set {1, 2, 3, 4, 5, 6, 7, 8, 9}, Q1 = 2.5, Q3 = 7.5, and IQR = 7.5 - 2.5 = 5

Variance and Standard Deviation

Variance measures the average squared deviation from the mean
- Calculated by summing the squared differences between each value and the mean, then dividing by the number of values (or n-1 for sample variance)
- Expressed in squared units, making interpretation difficult
- Example: For the set {1, 2, 3, 4, 5}, the variance is $\frac{(1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2}{5} = 2$
Standard deviation is the square root of the variance
- Provides a measure of spread in the same units as the original data
- Interpretation: approximately 68% of data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations (for normally distributed data)
- Example: For the set {1, 2, 3, 4, 5}, the standard deviation is $\sqrt{2} \approx 1.41$

Measures of Distribution Shape

Skewness

Skewness measures the asymmetry of a distribution
- Positive skewness indicates a longer or fatter tail on the right side of the distribution (right-skewed)
- Negative skewness indicates a longer or fatter tail on the left side of the distribution (left-skewed)
- A skewness value of zero indicates a symmetric distribution
- Example: Income data often exhibits positive skewness, with a few high earners pulling the mean to the right of the median
Pearson's coefficient of skewness is a common measure of skewness
- Calculated as $\frac{3(\text{mean} - \text{median})}{\text{standard deviation}}$
- Values greater than 1 or less than -1 indicate substantial skewness
- Example: For a right-skewed distribution with mean = 10, median = 8, and standard deviation = 4, the Pearson's coefficient of skewness is $\frac{3(10-8)}{4} = 1.5$, indicating substantial positive skewness

Kurtosis

Kurtosis measures the tailedness and peakedness of a distribution compared to a normal distribution
- Positive kurtosis (leptokurtic) indicates heavier tails and a sharper peak than a normal distribution
- Negative kurtosis (platykurtic) indicates lighter tails and a flatter peak than a normal distribution
- A kurtosis value of zero (mesokurtic) indicates a distribution similar to a normal distribution
- Example: Financial return data often exhibits positive kurtosis, with more extreme values than expected under a normal distribution
Excess kurtosis is a common measure of kurtosis
- Calculated as the fourth standardized moment minus 3 (to make the kurtosis of a normal distribution equal to zero)
- Values greater than 0 indicate positive kurtosis, while values less than 0 indicate negative kurtosis
- Example: For a distribution with excess kurtosis of 2, the tails are heavier, and the peak is sharper than a normal distribution, indicating a leptokurtic distribution

📉Statistical Methods for Data Science Unit 3 Review

3.1 Measures of Central Tendency and Dispersion

📉Statistical Methods for Data Science
Unit 3 Review

3.1 Measures of Central Tendency and Dispersion

Unit & Topic Study Guides

Measures of Central Tendency

Calculating Averages

Comparing Measures of Central Tendency

Measures of Variability

Range and Interquartile Range

Variance and Standard Deviation

Measures of Distribution Shape

Skewness

Kurtosis

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes