Fiveable
Fiveable
pep
Fiveable
Fiveable

or

Log in

Find what you need to study


Light

8.1 Introducing Statistics: Are My Results Unexpected?

4 min readjanuary 7, 2023

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Attend a live cram event

Review all units live with expert teachers & students

In this unit, we'll deal a lot with questions suggested by variation between observed and expected counts in categorical data. In other words, variation between what we find and what we expect to find may be random or not.

What does this mean? 🤔

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F8-naC7Aw5IJG1g.png?alt=media&token=2c601be3-e49a-454d-80ed-0374c1db380d

Image From Pinterest

Random Chance or Incorrect Claim?

Just as with any outcome, there is always a chance that something abnormal happens. The tricky part (AKA statistics) comes into play when we determine if our outcome was just due to a random chance or something incorrect in the original claim. 

When conducting a statistical test, there is always the possibility that the observed difference between the variables is due to chance rather than a true relationship between the variables. This is known as random variation. The p-value that is calculated in a statistical test reflects the likelihood that the observed difference between the variables occurred by chance.

If the p-value is very low (e.g., below 0.05), then it is unlikely that the observed difference occurred by chance, and we can conclude that there is a significant difference between the variables. However, if the p-value is high (e.g., above 0.05), then it is more likely that the observed difference occurred by chance, and we cannot conclude that there is a significant difference between the variables. 🍀

Example

For instance, if we flip a fair coin 10 times, we would expect to get 5 heads and 5 tails. Would this absolutely be our results? Probably not. Flipping a coin is a random process that could result in a variety of outcomes. The most likely outcome would be 5 heads and 5 tails, but there are other outcomes that are almost just as likely. 

If we flipped a coin 10 times and got 4 heads and 6 tails, would we doubt that the coin was a fair coin? Probably not. That is a normal outcome and it is pretty close to our expected counts of 5 and 5. If we were to get 10 heads and 0 tails, this is a much larger discrepancy so this might cause us to doubt that the coin is really a fair coin. 

Sample Size

Just like we had with our other inference procedures, our sample size plays a huge part in our outcome. When we flip a coin 10 times and receive 4 heads and 6 tails, no big deal. If we flipped a coin 1000 times and received 400 heads and 600 tails, that seems a lot more unlikely. 🪙

The main reason why the sample size affects our expected outcome is due to the standard deviation decreasing as the sample size increases. This is a very important inverse relationship between sample size and standard deviation among all statistics that we have discussed this year and is almost sure to show up on the AP exam several times!

Side Note: Power in Statistical Tests

The sample size does play a role in the statistical power of a test, which is the probability of correctly detecting a difference between the variables if one actually exists. In general, the larger the sample size, the greater the statistical power of the test.

One reason why a larger sample size can increase the statistical power of a test is because it leads to a smaller standard deviation of the test statistic. For example, in a chi-square test, the standard deviation of the chi-square statistic decreases as the sample size increases, which means that the test is more precise and has a higher probability of detecting a true difference between the variables. 👊

Law of Large Numbers

The law of large numbers states that as the sample size increases, the sample mean will become increasingly close to the population mean. This is because the sample mean is a better estimate of the population mean as the sample size increases, due to the decreased standard deviation of the sample mean. 🏙️

In the context of flipping a coin, the law of large numbers would state that as the number of coin flips increases, the proportion of heads will converge towards the true probability of getting heads (which is 0.5). For example, if you flip a coin 10 times and get 6 heads, you might not be confident that the coin is fair. Therefore, if we still had a 40/60 count after 1000 flips, we might still doubt the fairness of the coin. However, if you flip the coin 1000 times and get 500 heads, you would be more confident that the coin is fair, because the proportion of heads is very close to the true probability of getting heads.

The law of large numbers is a fundamental principle of statistics that underlies many statistical procedures and is important for understanding how to make reliable inferences about a population based on a sample. 😄

🎥  Watch: AP Stats Unit 8 - Chi Squared Tests

Key Terms to Review (10)

Categorical Data

: Categorical data refers to data that can be divided into categories or groups based on qualitative characteristics.

Chi-Square Test

: A statistical test used to determine if there is a significant association between two categorical variables. It compares the observed frequencies with the expected frequencies under the assumption of independence.

Law of Large Numbers

: A principle in probability theory stating that as more observations are collected, the sample mean will converge to the population mean.

P-value

: The p-value is a probability value that helps determine whether an observed result is statistically significant or occurred by chance. It quantifies how strong or weak evidence against a null hypothesis exists.

Population Mean

: The population mean is the average value of a variable for an entire population. It represents a summary measure for all individuals or units within that population.

Random Variation

: Random variation refers to the natural variability or differences that occur in a set of data due to chance. It is the result of factors that are not controlled or accounted for in an experiment or study.

Sample Size

: The sample size refers to the number of individuals or observations included in a study or experiment.

Significant Difference

: A significant difference refers to a result or finding that is unlikely to occur by chance alone and has practical importance or relevance. It indicates that there is a meaningful distinction between groups or conditions being compared.

Standard Deviation

: The standard deviation measures the average amount of variation or dispersion in a set of data. It tells us how spread out the values are from the mean.

Statistical Power

: Statistical power refers to the ability of a statistical test to detect an effect or relationship when it truly exists. It is the probability of correctly rejecting a false null hypothesis.

8.1 Introducing Statistics: Are My Results Unexpected?

4 min readjanuary 7, 2023

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Jed Quiaoit

Jed Quiaoit

Attend a live cram event

Review all units live with expert teachers & students

In this unit, we'll deal a lot with questions suggested by variation between observed and expected counts in categorical data. In other words, variation between what we find and what we expect to find may be random or not.

What does this mean? 🤔

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2F8-naC7Aw5IJG1g.png?alt=media&token=2c601be3-e49a-454d-80ed-0374c1db380d

Image From Pinterest

Random Chance or Incorrect Claim?

Just as with any outcome, there is always a chance that something abnormal happens. The tricky part (AKA statistics) comes into play when we determine if our outcome was just due to a random chance or something incorrect in the original claim. 

When conducting a statistical test, there is always the possibility that the observed difference between the variables is due to chance rather than a true relationship between the variables. This is known as random variation. The p-value that is calculated in a statistical test reflects the likelihood that the observed difference between the variables occurred by chance.

If the p-value is very low (e.g., below 0.05), then it is unlikely that the observed difference occurred by chance, and we can conclude that there is a significant difference between the variables. However, if the p-value is high (e.g., above 0.05), then it is more likely that the observed difference occurred by chance, and we cannot conclude that there is a significant difference between the variables. 🍀

Example

For instance, if we flip a fair coin 10 times, we would expect to get 5 heads and 5 tails. Would this absolutely be our results? Probably not. Flipping a coin is a random process that could result in a variety of outcomes. The most likely outcome would be 5 heads and 5 tails, but there are other outcomes that are almost just as likely. 

If we flipped a coin 10 times and got 4 heads and 6 tails, would we doubt that the coin was a fair coin? Probably not. That is a normal outcome and it is pretty close to our expected counts of 5 and 5. If we were to get 10 heads and 0 tails, this is a much larger discrepancy so this might cause us to doubt that the coin is really a fair coin. 

Sample Size

Just like we had with our other inference procedures, our sample size plays a huge part in our outcome. When we flip a coin 10 times and receive 4 heads and 6 tails, no big deal. If we flipped a coin 1000 times and received 400 heads and 600 tails, that seems a lot more unlikely. 🪙

The main reason why the sample size affects our expected outcome is due to the standard deviation decreasing as the sample size increases. This is a very important inverse relationship between sample size and standard deviation among all statistics that we have discussed this year and is almost sure to show up on the AP exam several times!

Side Note: Power in Statistical Tests

The sample size does play a role in the statistical power of a test, which is the probability of correctly detecting a difference between the variables if one actually exists. In general, the larger the sample size, the greater the statistical power of the test.

One reason why a larger sample size can increase the statistical power of a test is because it leads to a smaller standard deviation of the test statistic. For example, in a chi-square test, the standard deviation of the chi-square statistic decreases as the sample size increases, which means that the test is more precise and has a higher probability of detecting a true difference between the variables. 👊

Law of Large Numbers

The law of large numbers states that as the sample size increases, the sample mean will become increasingly close to the population mean. This is because the sample mean is a better estimate of the population mean as the sample size increases, due to the decreased standard deviation of the sample mean. 🏙️

In the context of flipping a coin, the law of large numbers would state that as the number of coin flips increases, the proportion of heads will converge towards the true probability of getting heads (which is 0.5). For example, if you flip a coin 10 times and get 6 heads, you might not be confident that the coin is fair. Therefore, if we still had a 40/60 count after 1000 flips, we might still doubt the fairness of the coin. However, if you flip the coin 1000 times and get 500 heads, you would be more confident that the coin is fair, because the proportion of heads is very close to the true probability of getting heads.

The law of large numbers is a fundamental principle of statistics that underlies many statistical procedures and is important for understanding how to make reliable inferences about a population based on a sample. 😄

🎥  Watch: AP Stats Unit 8 - Chi Squared Tests

Key Terms to Review (10)

Categorical Data

: Categorical data refers to data that can be divided into categories or groups based on qualitative characteristics.

Chi-Square Test

: A statistical test used to determine if there is a significant association between two categorical variables. It compares the observed frequencies with the expected frequencies under the assumption of independence.

Law of Large Numbers

: A principle in probability theory stating that as more observations are collected, the sample mean will converge to the population mean.

P-value

: The p-value is a probability value that helps determine whether an observed result is statistically significant or occurred by chance. It quantifies how strong or weak evidence against a null hypothesis exists.

Population Mean

: The population mean is the average value of a variable for an entire population. It represents a summary measure for all individuals or units within that population.

Random Variation

: Random variation refers to the natural variability or differences that occur in a set of data due to chance. It is the result of factors that are not controlled or accounted for in an experiment or study.

Sample Size

: The sample size refers to the number of individuals or observations included in a study or experiment.

Significant Difference

: A significant difference refers to a result or finding that is unlikely to occur by chance alone and has practical importance or relevance. It indicates that there is a meaningful distinction between groups or conditions being compared.

Standard Deviation

: The standard deviation measures the average amount of variation or dispersion in a set of data. It tells us how spread out the values are from the mean.

Statistical Power

: Statistical power refers to the ability of a statistical test to detect an effect or relationship when it truly exists. It is the probability of correctly rejecting a false null hypothesis.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.