📊Probability and Statistics Unit 4 Review

4.4 Skewness and kurtosis

📊Probability and Statistics
Unit 4 Review

4.4 Skewness and kurtosis

Written by the Fiveable Content Team • Last updated September 2025

📊Probability and Statistics

Unit & Topic Study Guides

4.1 Expected value

4.2 Variance and standard deviation

4.3 Moment generating functions

4.4 Skewness and kurtosis

4.5 Covariance and correlation

Skewness and kurtosis are key measures that describe the shape of probability distributions. Skewness indicates asymmetry, while kurtosis measures peakedness and tail behavior. These concepts help characterize data distributions beyond just the mean and variance.

Understanding skewness and kurtosis allows us to identify non-normal distributions, assess data quality, and select appropriate statistical methods. Calculating these measures provides valuable insights into the nature of our data and guides sound statistical analysis and interpretation.

Skewness

Skewness is a measure of the asymmetry of a probability distribution or dataset
It indicates the extent to which the data deviates from a symmetric distribution, such as the normal distribution
Skewness is an important concept in probability and statistics as it helps characterize the shape and properties of a distribution

Measures of skewness

There are several measures of skewness, each with its own formula and interpretation
Common measures include Pearson's coefficient of skewness, Bowley's coefficient of skewness, and Kelly's measure of skewness
These measures quantify the degree and direction of skewness in a distribution

Positive vs negative skewness

Positive skewness occurs when the tail of the distribution extends more to the right of the mean
- In a positively skewed distribution, the mean is typically greater than the median
- Examples of positively skewed distributions include income distribution and reaction time data
Negative skewness occurs when the tail of the distribution extends more to the left of the mean
- In a negatively skewed distribution, the mean is typically less than the median
- Examples of negatively skewed distributions include the distribution of ages at death and the distribution of grades in a difficult exam

Pearson's coefficient of skewness

Pearson's coefficient of skewness is a widely used measure of skewness
It is calculated using the formula: $\text{Skewness} = \frac{3(\text{Mean} - \text{Median})}{\text{Standard Deviation}}$
The coefficient ranges from -3 to +3, with values close to 0 indicating a symmetric distribution, positive values indicating right skewness, and negative values indicating left skewness

Bowley's coefficient of skewness

Bowley's coefficient of skewness is another measure of skewness that uses quartiles
It is calculated using the formula: $\text{Skewness} = \frac{Q_3 + Q_1 - 2Q_2}{Q_3 - Q_1}$ $Skewness = \frac{Q _{3} + Q _{1} - 2 Q _{2}}{Q _{3} - Q _{1}}$
- where $Q_1$, $Q_2$, and $Q_3$ are the first, second (median), and third quartiles, respectively
Bowley's coefficient ranges from -1 to +1, with values close to 0 indicating a symmetric distribution, positive values indicating right skewness, and negative values indicating left skewness

Kelly's measure of skewness

Kelly's measure of skewness is based on the percentiles of the distribution
It is calculated using the formula: $\text{Skewness} = \frac{P_{90} + P_{10} - 2P_{50}}{P_{90} - P_{10}}$ $Skewness = \frac{P _{90} + P _{10} - 2 P _{50}}{P _{90} - P _{10}}$
- where $P_{10}$, $P_{50}$, and $P_{90}$ are the 10th, 50th (median), and 90th percentiles, respectively
Kelly's measure ranges from -1 to +1, with values close to 0 indicating a symmetric distribution, positive values indicating right skewness, and negative values indicating left skewness

Interpretation of skewness

Skewness provides insights into the shape and characteristics of a distribution
A symmetric distribution (skewness ≈ 0) has an equal number of observations on both sides of the mean
A positively skewed distribution (skewness > 0) has a longer tail on the right side, with more extreme values pulling the mean higher than the median
A negatively skewed distribution (skewness < 0) has a longer tail on the left side, with more extreme values pulling the mean lower than the median

Effects of skewness on statistical analyses

Skewness can have significant implications for statistical analyses and inference
Many statistical tests and models assume a symmetric or normally distributed data
Skewed data can violate these assumptions, leading to biased or misleading results
When dealing with skewed data, it may be necessary to transform the data, use robust methods, or consider alternative statistical techniques that are less sensitive to skewness

Kurtosis

Kurtosis is a measure of the peakedness or flatness of a probability distribution compared to a normal distribution
It quantifies the concentration of data around the mean and the heaviness of the tails
Kurtosis is another important concept in probability and statistics, as it provides additional information about the shape and characteristics of a distribution

Definition of kurtosis

Kurtosis is the fourth standardized moment of a probability distribution
It is a dimensionless quantity that measures the relative concentration of data in the center and tails of the distribution
A higher kurtosis indicates a more peaked distribution with heavier tails, while a lower kurtosis indicates a flatter distribution with lighter tails

Types of kurtosis

There are three main types of kurtosis: mesokurtic, leptokurtic, and platykurtic
These types describe the relative peakedness and tail heaviness of a distribution compared to a normal distribution
Understanding the type of kurtosis can provide insights into the shape and properties of the data

Mesokurtic distributions

A mesokurtic distribution has a kurtosis equal to that of a normal distribution
The normal distribution is the reference point for comparing the kurtosis of other distributions
In a mesokurtic distribution, the data is moderately concentrated around the mean, and the tails are neither too heavy nor too light

Leptokurtic distributions

A leptokurtic distribution has a higher kurtosis than a normal distribution
It is characterized by a more peaked center and heavier tails compared to a normal distribution
In a leptokurtic distribution, the data is highly concentrated around the mean, and there is a higher probability of extreme values in the tails
Examples of leptokurtic distributions include the Laplace distribution and the t-distribution with low degrees of freedom

Platykurtic distributions

A platykurtic distribution has a lower kurtosis than a normal distribution
It is characterized by a flatter center and lighter tails compared to a normal distribution
In a platykurtic distribution, the data is more spread out around the mean, and there is a lower probability of extreme values in the tails
Examples of platykurtic distributions include the uniform distribution and the raised cosine distribution

Pearson's coefficient of kurtosis

Pearson's coefficient of kurtosis is a measure of kurtosis that compares the sample kurtosis to that of a normal distribution
It is calculated using the formula: $\text{Kurtosis} = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum_{i=1}^{n} \left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)}$ $Kurtosis = \frac{n ( n + 1 )}{( n - 1 ) ( n - 2 ) ( n - 3 )} \sum_{i = 1}^{n} (\frac{x _{i} - x ˉ}{s})^{4} - \frac{3 ( n - 1 ) ^{2}}{( n - 2 ) ( n - 3 )}$
- where $n$ is the sample size, $x_i$ are the individual values, $\bar{x}$ is the sample mean, and $s$ is the sample standard deviation
Pearson's coefficient of kurtosis is 3 for a normal distribution, higher than 3 for leptokurtic distributions, and lower than 3 for platykurtic distributions

Excess kurtosis

Excess kurtosis is another measure of kurtosis that subtracts 3 from Pearson's coefficient of kurtosis
It is calculated as: $\text{Excess Kurtosis} = \text{Pearson's Kurtosis} - 3$
Excess kurtosis is 0 for a normal distribution, positive for leptokurtic distributions, and negative for platykurtic distributions
Excess kurtosis is often used to make the interpretation more intuitive, as it directly compares the kurtosis to that of a normal distribution

Interpretation of kurtosis

Kurtosis provides information about the peakedness and tail behavior of a distribution
A high kurtosis indicates a more peaked distribution with heavier tails, implying a higher concentration of data around the mean and a higher probability of extreme values
A low kurtosis indicates a flatter distribution with lighter tails, implying a more spread out distribution and a lower probability of extreme values
Kurtosis can help identify distributions that deviate from normality and assess the risk of extreme events

Effects of kurtosis on statistical analyses

Kurtosis can have significant implications for statistical analyses and inference
Many statistical tests and models assume a normal distribution with a kurtosis of 3
Deviations from this assumption can lead to biased or inefficient estimates, incorrect standard errors, and invalid hypothesis tests
When dealing with data that has high or low kurtosis, it may be necessary to use robust methods, consider alternative distributions, or apply appropriate transformations to the data

Relationship between skewness and kurtosis

Skewness and kurtosis are related concepts that jointly describe the shape and characteristics of a probability distribution
While skewness measures the asymmetry of a distribution, kurtosis measures the peakedness and tail behavior
Understanding the relationship between skewness and kurtosis can provide a more comprehensive picture of the data distribution

Joint interpretation of skewness and kurtosis

Skewness and kurtosis can be interpreted together to gain insights into the shape of a distribution
A distribution with zero skewness and a kurtosis of 3 is considered a normal distribution
Deviations from these values indicate departures from normality
For example, a distribution with positive skewness and high kurtosis may have a long right tail and a peaked center, while a distribution with negative skewness and low kurtosis may have a long left tail and a flatter center

Implications for data distribution

The combination of skewness and kurtosis can have implications for the overall shape and properties of the data distribution
Skewness affects the symmetry and the relative positions of the mean, median, and mode
Kurtosis affects the concentration of data around the mean and the likelihood of extreme values in the tails
Different combinations of skewness and kurtosis can result in various non-normal distributions, such as the log-normal, gamma, and beta distributions

Impact on statistical assumptions

Skewness and kurtosis can impact the assumptions underlying many statistical methods
Normality is a common assumption in parametric tests, linear regression, and other statistical techniques
Departures from normality, as indicated by skewness and kurtosis, can violate these assumptions and affect the validity and efficiency of the analyses
It is important to assess skewness and kurtosis when checking the assumptions of statistical methods and to consider alternative approaches if the assumptions are not met

Applications of skewness and kurtosis

Skewness and kurtosis have various applications in data analysis and statistical modeling
They are used to assess the quality and characteristics of data, identify potential issues, and guide the selection of appropriate statistical methods
Understanding the applications of skewness and kurtosis is crucial for effective data analysis and decision-making

Identifying non-normal distributions

Skewness and kurtosis can be used to identify distributions that deviate from normality
Non-normal distributions can arise due to various factors, such as the presence of outliers, the nature of the variable being measured, or the underlying data generating process
By calculating and interpreting skewness and kurtosis, analysts can determine whether a distribution is symmetric, skewed, or has heavy or light tails
This information can help in selecting appropriate statistical methods and models that are robust to non-normality

Assessing data quality and outliers

Skewness and kurtosis can be used to assess the quality of the data and identify potential issues
Highly skewed distributions or distributions with extreme kurtosis may indicate the presence of outliers or data entry errors
By examining the skewness and kurtosis, analysts can detect anomalies and investigate the reasons behind unusual data points
This assessment can help in data cleaning, outlier detection, and ensuring the integrity of the dataset

Selecting appropriate statistical methods

The values of skewness and kurtosis can guide the selection of appropriate statistical methods
If the data is approximately normally distributed (skewness ≈ 0 and kurtosis ≈ 3), parametric methods such as t-tests, ANOVA, and linear regression can be used
If the data is skewed or has non-normal kurtosis, non-parametric methods, such as the Mann-Whitney U test, Kruskal-Wallis test, or quantile regression, may be more appropriate
In some cases, data transformations (e.g., log transformation) can be applied to reduce skewness and kurtosis and make the data more suitable for parametric methods

Communicating data characteristics

Skewness and kurtosis are important descriptive statistics that can be used to communicate the characteristics of a dataset
When presenting data analysis results, reporting the skewness and kurtosis alongside other summary statistics (e.g., mean, median, standard deviation) can provide a more comprehensive picture of the data distribution
Visualizations, such as histograms, density plots, and box plots, can also be used to illustrate the skewness and kurtosis of the data
Clear communication of skewness and kurtosis can help stakeholders understand the nature of the data and make informed decisions based on the analysis

Calculating skewness and kurtosis

Skewness and kurtosis can be calculated using various formulas depending on whether the data represents a sample or a population
It is important to use the appropriate formula based on the nature of the data and the purpose of the analysis
Computational examples and software implementations can help in calculating skewness and kurtosis efficiently

Sample vs population formulas

The formulas for calculating skewness and kurtosis differ depending on whether the data is a sample or a population
For a sample, the formulas include a bias correction term to account for the fact that the sample statistics are estimates of the population parameters
The sample formulas for skewness and kurtosis are:
- Skewness (sample): $\text{Skewness} = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n} \left(\frac{x_i - \bar{x}}{s}\right)^3$
- Kurtosis (sample): $\text{Kurtosis} = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum_{i=1}^{n} \left(\frac{x_i - \bar{x}}{s}\right)^4 - \frac{3(n-1)^2}{(n-2)(n-3)}$
For a population, the formulas do not include the bias correction term:
- Skewness (population): $\text{Skewness} = \frac{1}{N} \sum_{i=1}^{N} \left(\frac{x_i - \mu}{\sigma}\right)^3$
- Kurtosis (population): $\text{Kurtosis} = \frac{1}{N} \sum_{i=1}^{N} \left(\frac{x_i - \mu}{\sigma}\right)^4$

Computational examples

Calculating skewness and kurtosis by hand can be tedious, especially for large datasets
Computational examples using statistical software or programming languages can streamline the calculations
For example, in Python, the scipy.stats module provides functions for calculating skewness and kurtosis:

from scipy.stats import skew, kurtosis
skewness = skew(data)
kurtosis = kurtosis(data)

In R, the moments package offers functions for calculating skewness and kurtosis:

library(moments)
skewness <- skewness(data)

kurtosis <- kurtosis(data)

Software implementations

Most statistical software packages and programming languages have built-in functions or libraries for calculating skewness and kurtosis
These implementations handle the computational details and provide efficient and accurate results
Some commonly used software for calculating skewness and kurtosis include:
- Microsoft Excel: SKEW() and KURT() functions
- SPSS: Descriptives command with the Skewness and Kurtosis options
- SAS: PROC MEANS or PROC UNIVARIATE with the SKEWNESS and KURTOSIS options
- Python: scipy.stats.skew() and scipy.stats.kurtosis() functions
- R: skewness() and kurtosis() functions from the moments package

Interpreting results

After calculating skewness and kurtosis, it is important to interpret the results in the context of the data and the research question
Skewness values greater than 1 or less than -1 are considered highly skewed, while values between -1 and 1 are considered moderately skewed
Kurtosis values greater than 3 indicate a leptokurtic distribution, while values less than 3 indicate a platykurtic distribution
The interpretation should also consider the sample size, as small samples may produce less reliable estimates of skewness and kurtosis
It is important

📊Probability and Statistics Unit 4 Review

4.4 Skewness and kurtosis

📊Probability and Statistics Unit 4 Review

4.4 Skewness and kurtosis

Unit & Topic Study Guides

Skewness

Measures of skewness

Positive vs negative skewness

Pearson's coefficient of skewness

Bowley's coefficient of skewness

Kelly's measure of skewness

Interpretation of skewness

Effects of skewness on statistical analyses

Kurtosis

Definition of kurtosis

Types of kurtosis

Mesokurtic distributions

Leptokurtic distributions

Platykurtic distributions

Pearson's coefficient of kurtosis

Excess kurtosis

Interpretation of kurtosis

Effects of kurtosis on statistical analyses

Relationship between skewness and kurtosis

Joint interpretation of skewness and kurtosis

Implications for data distribution

Impact on statistical assumptions

Applications of skewness and kurtosis

Identifying non-normal distributions

Assessing data quality and outliers

Selecting appropriate statistical methods

Communicating data characteristics

Calculating skewness and kurtosis

Sample vs population formulas

Computational examples

Software implementations

Interpreting results

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

📊Probability and Statistics
Unit 4 Review