Fiveable
Fiveable

or

Log in

Find what you need to study


Light

Find what you need to study

What Are the Best Quizlet Decks for AP Statistics?

11 min readdecember 22, 2021

Dalia Savy

Dalia Savy

Harrison Burnside

Harrison Burnside

Dalia Savy

Dalia Savy

Harrison Burnside

Harrison Burnside

If there was a holy trinity for AP study sites, Quizlet would most certainly be in it. Its easy to use interface, combined with its multi-purpose functionality, helps students of all different learning styles in endless subject areas. However, it can sometimes be challenging to find the best vocab sets.

Fiveable’s AP Stats teachers & students have compiled the best quizlet study decks for each unit. The AP Stats exam is very concept heavy, so make sure you take the time to learn these terms.

Catch a live review or watch a replay for AP Stats on Fiveable’s AP Stats hub!


Unit 1 Key Terms (15-23%): Exploring One-Variable Data

Unit 1 includes the roots of statistics, and it is very important to get these concepts down and memorized. In this unit, you will distinguish between categorical and quantitative data, describe and compare distributions, and begin to learn about normal distributions.

Best Quizlet Deck: AP Statistics Unit 1 by Rob_Eriksen

Key Terms:

  • – Record which of several groups an individual belongs to

  • – Taking numerical values for future calculations of sorting

  • Describing a (SOCS) MUST include:

    • – uniform/skew/peaks in context (symmetric, skewed left/right, unimodal/bimodal)

    • – mention in context

    • – mean or median in context

    • – range//IQR in context

  • , , , , , – All methods of representing Data. It is good to know how to read information off of each of these.

  • 5 Number Summary – Minimum, Quartile 1, Median, Quartile 3, Maximum

  • The – Tells us that if we all behave normally then about 68% of the values fall within 1 of the mean, about 95% of the values fall within 2 standard deviations of the mean, and about 99.7%—almost all—of the values fall within 3 standard deviations of the mean.

  • – z = x - x̄ / s


Unit 2 Key Terms (5-7%): Exploring Two-Variable Data

Unit 2 is an expansion of Unit 1. It builds on the relationships between two categorical or quantitative variables and how to argue about the between the two. This unit includes a lot of set interpretations of the different components of an LSRL which are very important to remember.

Best Quizlet Deck: AP Statistic Chapter 2 - Two Variable Data by Mrs_Gutosky

Key Terms:

  • Scatterplots – A way to organize data. On the x-axis is the explanatory (independent) variable and on the y-axis is the response (dependent) variable.

  • Form – linear or curved

  • – positive or negative correlation

  • – depends on the correlation coefficient; could be weak, moderate, or strong.

  • LSRL – ŷ=a+bx where x denotes (context) and ŷ denotes predicted (context)

  • Correlation Coefficient (r) – The correlation coefficient shows the degree to which there is a linear correlation between the two variables, that is, how close the points are to forming a line. The closer r is to 1 or -1, the stronger the relationship.

  • Slope (b) – There is a predicted increase/decrease of ______ (slope in unit of y variable) for every 1 (unit of x variable).

  • Y Intercept – The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context).

  • Coefficient of Determination (r^2) – ____% of the variation in (y in context) is due to its linear relationship with (x in context).

  • Residuals – The difference between the actual data and the value predicted by a linear regression model, or y-ŷ. The ideal pattern is random scatter above and below the LSRL (where the residuals=0). A positive residual means the model underestimated the true value while a negative residual means the model overestimated the true value.

  • Extrapolation – The use of a regression model to make predictions outside of the domain of the given data. If you go outside this domain and farther outside the domain you go, the less accurate your predictions will be.


Unit 3 Key Terms (12-15%): Collecting Data

This unit discusses sampling methods and ways of collecting data that can be used to represent a population. It is filled with vocabulary that is essential in future units.

Best Quizlet Deck: AP Statistics Collecting Data by rachel_thiebout

Key Terms:

  • – Deliberately imposes treatment in order to observe the response. Causation could be proven.

  • 🔭 – Observe individuals and measure variables of interest but don’t attempt to influence the responses. These studies look for association between variables  because in a study, no treatment is imposed.

  • 👀 – Occurs when two variables are associated in a way that their effects on the response variable cannot be deciphered from each other individually.

  • – Statistical studies are biased if it is likely to underestimate or overestimate the value you are looking for.

  • Simple Random Sample (SRS) – Chooses a sample size “n” in a way that a group of individuals in the population has an equal chance to be selected as the sample

  • – Selects a sample by choosing an SRS from each strata and combining the SRSs into one overall sample. These reduce variability in the data and give more precise results.

  • –The population is divided into groups, called clusters, and an SRS of clusters is taken within each cluster. All individuals are sampled in the clusters selected.

  • – Sample members from a population selected according to a random starting number and a fixed periodic interval.


Unit 4 Key Terms (10-20%): Probability, Random Variables, and Probability Distributions

Everyday, we see things that happen simultaneously to the point we question the possibility of that event happening again. This brings in probability, the proportion of times the outcome would occur in a large number of repetitions. Unit 4 is all about probability and is very calculation heavy.

Best Quizlet Deck: AP Statistics CB Unit 4+5: Probability, Random Variables, Probability Distributions, Sampling Distributions by Zosia_Stawiarska

Key Terms:

  • – The outcome of one event doesn't influence the outcome of another event.

  • P(A|B) = P(A)

  • P(A and B) =P(A)*P(B)

  • – Cannot occur at the same time and have no outcomes in common.

  • P(A and B) = 0

  • P(A or B) = P(A) + P(B)

  • – Probability of one event under the condition that another event is known.

  • P(A|B)= P(A and B) / P (B) – The probability of A given B = Probability of A and B / Probability of B

  • – Takes a fixed set of possible values with gaps between them (cannot include decimals).

  • – Summation(xi*pi)

  • – sqrt(summation(xi-mean of x)^2 * pi). You cannot add standard deviations, only variances.

  • – Can take any value in an interval on the number line (can include decimals).

  • – Says that in many repetitions of the same chance process, the simulated probability gets closer to the true probability as more trials are run.

  • – Arises when we perform independent trials of the same chance process and count the number of times that a particular outcome called a success (p), occurs. Failure (q) is defined as 1 minus the probability of success. Must check .

  • – Arises when we perform independent trials of the same chance process and count the number of times that a particular outcome called a success (p), occurs. Failure (q) is defined as 1 minus the probability of success. Must check .

  • ⭕️ – Arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. On each trial, the probability p of success must be the same. The number of trials Y that it takes to get one success in a is a geometric random variable.


Unit 5 Key Terms (7-12%): Sampling Distributions

This unit is an introduction to significant tests, which are covered in later units. It begins introducing statistics, , the CLT, and population parameters.

Best Quizlet Deck: AP Statistics Chapter 7 - Sampling Distributions by Janet_Delery

Key Terms:

  • – A where we take ALL possible samples of a given size and put them together as a data set.

  • A statistic is used to estimate a parameter.

  • Parameters vs Statistics – Mean (𝝁, x̅); (σ, s); Proportions (𝝆, p̂).

  • – The number of successes and failures is at least 10.

  • – States that if n (the sample size) is ≥30, the is normal. The larger n is, the more normal the sample is.


Unit 6 Key Terms (12-15%): Inference for Categorical Data: Proportions

This unit is the beginning of significant tests, in which you are expected to check conditions, construct and interpret confidence intervals, and calculate a p-value. This unit consists of estimating population parameters involving categorical data.

Best Quizlet Deck: Unit 6: Inferences for Categorical Data: Proportions by Cfritz15

Key Terms:

  • – An interval of numbers based on our sample proportion that gives us a range where we can expect to find the true population proportion.

  • In repeated sampling, I am __% confident that the true population proportion (context) falls within this interval.

  • – Estimating the probability of obtaining our collected sample from the of our size when we assume that the given population proportion is correct.

  • – The number of successes and failures is at least 10 (np≥10 and n(1-p)≥10). This condition proves normality.

  • – Reduces any that may be caused from taking a bad sample. When answering inference questions, it is always essential to make note that our sample was random. Without a random sample, our findings cannot be generalized to a population, meaning our scope of inference is inaccurate.

  • – Check that the population in question is at least 10 times as large as our sample in order to prove independence.

  • – The "buffer zone" of our ; point estimate +- (z*)(stan. dev.)

  • – The hypothesis based on our claim that was given in the problem (p=___).

  • – The hypothesis that the claim in our null is not true (p<___, p>___ or p≠___).

  • Type I (𝞪) Error – When we reject our Ho, when in fact, we should have failed to reject.

  • Type II (𝞫) Error – When we fail to reject our Ho, but we actually should have rejected our Ho.

  • – How strong our test is because = 1-P().


Unit 7 Key Terms (10-18%): Inference for Quantitative Data: Means

This unit is very similar to Unit 6, but instead of dealing with proportions (p), we are dealing with means (μ). Many of the concepts overlap, but it is important to note that we cannot write proportions anywhere we are running an inference test for quantitative data.

Best Quizlet Deck: AP Statistics - Inference for Quantitative Data: Means by AlyssaMorales107

Key Terms:

  • – An interval of numbers based on our sample proportion that gives us a range where we can expect to find the true population mean.

  • In repeated sampling, I am __% confident that the true population mean (context) falls within this interval.

  • In repeated sampling, I am ___% confident that the true difference in population means (context) falls within this interval.

  • – States that if n (the sample size) is ≥30, the is normal. The larger n is, the more normal the sample is.

  • – Reduces any that may be caused from taking a bad sample.

  • – Check that the population in question is at least 10 times as large as our sample in order to prove independence.

  • – The hypothesis based on our claim that was given in the problem (μ=___).

  • – The hypothesis that the claim in our null is not true (μ<___, μ>___ or μ≠___).

  • p<𝞪 – Since p<𝞪, we reject our Ho. We have convincing evidence at the 𝞪 level that (Ha in context).

  • p<𝞪 – Since p<𝞪, we reject our Ho. We have convincing evidence at the 𝞪 level that (Ha in context).

  • – n-1


Unit 8 Key Terms (2-5%): Inference for Categorical Data: Chi-Squared

Chi-Squared significant tests operate differently than proportion and mean significant tests. All χ² distributions are skewed right. The degree of skewness depends on the .

Best Quizlet Deck: AP Statistics Chapter 11: Inference for Distributions of Categorical Data

Key Terms:

  • Goodness of Fit Test – Must have random sampling. All must be greater than 5.

    • – categories - 1

    • – There is no association between ___ and ___.

    • – There is association between ___ and ___.

  • – Checks the association between two variables in a single population.

    • – (row total)(column total) / (table total)

    • – There is no association between ___ and ___.

    • – There is association between ___ and ___.

  • – Checks the of a single variable in several populations to see if these populations are similar with respect to the variable.

    • Null Hypothesis – There is no difference between ___ and ___.

    • – There is difference between ___ and ___.


Unit 9 Key Terms (2-5%): Inference for Quantitative Data: Slopes

There is variability in slopes as well. In this unit, you will learn how to perform significant tests and construct confidence intervals about the slope of a regression line.

Best Quizlet Deck: AP Stats U9: Linear Regression Inference Vocab + Variables by harrisonb_4

Key Terms:

  • LSRL – ŷ=a+bx where x denotes (context) and ŷ denotes predicted (context)

  • – n-2

  • – b+- t*(SEb)

  • In repeated sampling, I am __% confident that the falls within this interval.

  • – 𝞫

  • – 𝞫 = 0; changing x does nothing to y


Closing Thoughts

Hopefully, these decks can help you study for your tests and ultimately, the AP exam. The best feature about Quizlet is the option to play games and use the flashcards wherever you are. When you are studying, you can always duplicate a deck and customize it to your own needs.

As long as you review these flashcards at least once a day a few days before your test, you should be good to go. Make sure to take advantage of starring flashcards you struggle with! Before a test, it's great to quickly look over the starred ones and then feel more confident about them.

You got this! Good luck studying.🍀


Key Terms to Review (57)

10% Condition

: The 10% condition states that for sampling without replacement to be considered valid, the sample size must be less than 10% of the population size.

Alternative Hypothesis

: The alternative hypothesis is a statement that contradicts or negates the null hypothesis. It suggests that there is a significant relationship or difference between variables.

Bar Graphs

: Bar graphs are a type of graphical representation that use rectangular bars to represent different categories or groups. They are commonly used to compare categorical data by showing the frequency or proportion of each category.

Bias

: Bias refers to a systematic deviation from the true value or an unfair influence that affects the results of statistical analysis. It can occur during data collection, sampling, or analysis and leads to inaccurate or misleading conclusions.

Binomial Setting

: In statistics, a binomial setting refers to an experiment with two possible outcomes (success or failure) for each trial, fixed number of trials, independence between trials, and constant probability of success for each trial.

Box Plots

: Box plots, also known as box-and-whisker plots, are graphical representations of a set of data that display the distribution and key statistical measures such as the median, quartiles, and outliers.

Categorical Variable

: A categorical variable is one that represents characteristics or qualities rather than numerical values. It consists of categories or groups into which data can be classified.

Center

: The center refers to the middle or average value of a data set. It represents the typical or central value around which the data tends to cluster.

Central Limit Theorem

: The Central Limit Theorem states that as the sample size increases, the sampling distribution of the mean approaches a normal distribution regardless of the shape of the population distribution.

Cluster Sample

: A cluster sample is a sampling method where the population is divided into groups or clusters, and a random selection of clusters is chosen to be included in the sample. All individuals within the selected clusters are then included in the sample.

Coefficient of Determination (r^2)

: The coefficient of determination, denoted as r^2, measures the proportion of the variation in the dependent variable that can be explained by the independent variable(s). It ranges from 0 to 1, where 0 indicates no relationship and 1 indicates a perfect fit.

Conditional Probability

: Conditional probability is the likelihood of an event occurring given that another event has already occurred. It measures the probability of one event happening, taking into account the occurrence of a different event.

Confidence Interval

: A confidence interval is a range of values that is likely to contain the true value of a population parameter. It provides an estimate along with a level of confidence about how accurate the estimate is.

Confounding

: Confounding occurs when there is an additional variable that affects both the independent variable and dependent variable, making it difficult to determine their individual effects accurately.

Continuous Random Variable

: A continuous random variable is a variable that can take on any value within a certain range, usually represented by an interval. It can have an infinite number of possible values.

Correlation Coefficient (r)

: The correlation coefficient, denoted as "r", measures the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 0 indicates no linear relationship, and 1 indicates a perfect positive linear relationship.

Degrees of Freedom

: Degrees of freedom refers to the number of values in a calculation that are free to vary. In statistics, it represents the number of independent pieces of information available for estimating a parameter.

Direction

: Direction refers to the positive or negative trend of a relationship between two variables. It indicates whether an increase in one variable is associated with an increase or decrease in the other variable.

Discrete Random Variable

: A discrete random variable is a variable that can only take on specific, isolated values. It is characterized by a countable set of possible outcomes.

Disjoint/Mutually Exclusive Events

: Disjoint or mutually exclusive events are events that cannot occur at the same time. If one event happens, the other event cannot happen simultaneously.

Distribution

: In statistics, distribution refers to the way data is spread out or organized. It describes how frequently each value occurs and provides insights into patterns and characteristics of the dataset.

Dotplots

: Dotplots are simple graphical displays that use dots to represent individual data points. They provide a visual representation of the distribution and allow us to see patterns, gaps, clusters, or outliers in the data.

Empirical Rule

: The empirical rule, also known as the 68-95-99.7 rule, is a statistical guideline that states that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and roughly 99.7% falls within three standard deviations.

Expected Counts

: Expected counts are calculated values that represent what we would expect to observe in each category if there was no association between variables. They are based on assumptions and can be compared to observed counts to assess whether there is a significant relationship.

Experiment

: An experiment is a scientific study in which researchers manipulate one or more variables to observe the effect on another variable, while controlling for other factors.

Geometric Setting

: In statistics, geometric setting refers to an experiment with repeated independent trials until the first success occurs, where each trial has two possible outcomes (success or failure) and a constant probability of success.

Goodness of Fit Test

: A Goodness of Fit Test is a statistical test used to determine how well an observed sample data fits an expected theoretical distribution. It assesses whether any differences between observed and expected frequencies are statistically significant or simply due to random chance.

Histograms

: Histograms are graphical representations of data that use bars to show how many times each value or range of values occurs within a dataset.

Independent Events

: Independent events are events that have no influence on each other. The outcome of one event does not affect the outcome of another event.

Large Counts Condition

: The large counts condition, also known as the "success-failure" condition, is used when applying certain statistical methods to categorical data. It states that for these methods to be valid, both the number of successes and failures must be at least 10.

Law of Large Numbers

: A principle in probability theory stating that as more observations are collected, the sample mean will converge to the population mean.

LSRL (Linear Regression)

: LSRL, or Linear Regression, stands for Least Squares Regression Line. It is a line that best represents the relationship between two variables by minimizing the sum of squared differences between observed data points and predicted values on the line.

Margin of Error

: The margin of error is a measure of the uncertainty or variability in survey results. It represents the range within which the true population parameter is likely to fall.

Mean (Expected) Value

: The mean value, also known as the expected value, is the average outcome of a random variable. It represents the long-term average or center of the distribution.

Null Hypothesis

: The null hypothesis is a statement of no effect or relationship between variables in a statistical analysis. It assumes that any observed differences or associations are due to random chance.

Observational Study

: An observational study is a research method where researchers observe subjects without intervening or manipulating any variables. They collect data by watching subjects' behaviors or characteristics.

Outliers

: Outliers are extreme values that significantly differ from other values in a dataset. They can greatly affect statistical analyses and should be carefully examined.

Power

: Power refers to the probability of correctly rejecting a false null hypothesis. It measures our ability to detect an actual effect or relationship.

Quantitative Variable

: A quantitative variable is a type of variable that represents numerical data and can be measured or counted. It provides information about quantities and amounts rather than qualities.

Random Condition

: The random condition refers to the requirement that subjects or samples in a study must be randomly selected in order to ensure unbiased results.

Sampling Distribution

: A sampling distribution refers to the distribution of a statistic (such as mean, proportion, or difference) calculated from multiple random samples taken from the same population. It provides information about how sample statistics vary from sample to sample.

Shape

: In statistics, shape refers to the overall appearance or form of a distribution. It describes how the data is distributed and can be characterized by its symmetry, skewness, or modality.

Significance Test

: A significance test is a statistical method used to determine whether an observed result is statistically significant or simply due to chance. It involves comparing sample data with what would be expected under the null hypothesis.

Simple random sample (SRS)

: A simple random sample (SRS) is a subset of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen and all possible samples have the same probability of being selected.

Spread

: Spread refers to how much variability or dispersion exists within a data set. It measures how far apart the values are from each other.

Standard Deviation

: The standard deviation measures the average amount of variation or dispersion in a set of data. It tells us how spread out the values are from the mean.

Stemplots

: Stemplots (also known as stem-and-leaf plots) are graphical displays that organize numerical data by separating each value into two parts - a stem and one or more leaves. They allow us to see both individual values and patterns within the data.

Stratified Random Sample

: A stratified random sample involves dividing a population into subgroups (strata) based on certain characteristics, and then selecting individuals from each subgroup using a random sampling method. This ensures representation from each subgroup in proportion to their size within the overall population.

Strength

: Strength refers to the degree of association or relationship between two variables. It measures how closely the data points in a scatterplot cluster around a straight line.

Systematic Random Sample

: A systematic random sample is a sampling method where every nth individual is selected from a list or population after starting at a randomly chosen initial individual. The interval between selections remains constant throughout.

Test for Homogeneity

: A statistical test used to compare the distributions of multiple groups or populations based on one categorical variable. It determines whether the proportions within each group are similar or significantly different.

Test for Independence

: A statistical test used to determine if there is a relationship between two categorical variables. It assesses whether the occurrence of one variable is independent of the occurrence of another variable.

True Slope

: The true slope refers to the actual value representing the change in the response variable for each unit increase in the explanatory variable in a linear regression model. It is the population parameter that we aim to estimate.

Two-way tables

: Two-way tables are used to organize and display categorical data. They show the relationship between two different variables by listing the categories of each variable and counting the number of observations that fall into each combination.

Type I Error

: Type I error refers to rejecting a true null hypothesis. It occurs when we conclude there is a significant difference or relationship between variables when there actually isn't one.

Type II error

: Type II error occurs when we fail to reject a null hypothesis that is actually false. In other words, it's the mistake of accepting the null hypothesis when we should have rejected it.

Z-Score

: A z-score is a measure of how many standard deviations an individual data point is away from the mean of a distribution. It helps to determine the relative position of a data point within a dataset.

What Are the Best Quizlet Decks for AP Statistics?

11 min readdecember 22, 2021

Dalia Savy

Dalia Savy

Harrison Burnside

Harrison Burnside

Dalia Savy

Dalia Savy

Harrison Burnside

Harrison Burnside

If there was a holy trinity for AP study sites, Quizlet would most certainly be in it. Its easy to use interface, combined with its multi-purpose functionality, helps students of all different learning styles in endless subject areas. However, it can sometimes be challenging to find the best vocab sets.

Fiveable’s AP Stats teachers & students have compiled the best quizlet study decks for each unit. The AP Stats exam is very concept heavy, so make sure you take the time to learn these terms.

Catch a live review or watch a replay for AP Stats on Fiveable’s AP Stats hub!


Unit 1 Key Terms (15-23%): Exploring One-Variable Data

Unit 1 includes the roots of statistics, and it is very important to get these concepts down and memorized. In this unit, you will distinguish between categorical and quantitative data, describe and compare distributions, and begin to learn about normal distributions.

Best Quizlet Deck: AP Statistics Unit 1 by Rob_Eriksen

Key Terms:

  • – Record which of several groups an individual belongs to

  • – Taking numerical values for future calculations of sorting

  • Describing a (SOCS) MUST include:

    • – uniform/skew/peaks in context (symmetric, skewed left/right, unimodal/bimodal)

    • – mention in context

    • – mean or median in context

    • – range//IQR in context

  • , , , , , – All methods of representing Data. It is good to know how to read information off of each of these.

  • 5 Number Summary – Minimum, Quartile 1, Median, Quartile 3, Maximum

  • The – Tells us that if we all behave normally then about 68% of the values fall within 1 of the mean, about 95% of the values fall within 2 standard deviations of the mean, and about 99.7%—almost all—of the values fall within 3 standard deviations of the mean.

  • – z = x - x̄ / s


Unit 2 Key Terms (5-7%): Exploring Two-Variable Data

Unit 2 is an expansion of Unit 1. It builds on the relationships between two categorical or quantitative variables and how to argue about the between the two. This unit includes a lot of set interpretations of the different components of an LSRL which are very important to remember.

Best Quizlet Deck: AP Statistic Chapter 2 - Two Variable Data by Mrs_Gutosky

Key Terms:

  • Scatterplots – A way to organize data. On the x-axis is the explanatory (independent) variable and on the y-axis is the response (dependent) variable.

  • Form – linear or curved

  • – positive or negative correlation

  • – depends on the correlation coefficient; could be weak, moderate, or strong.

  • LSRL – ŷ=a+bx where x denotes (context) and ŷ denotes predicted (context)

  • Correlation Coefficient (r) – The correlation coefficient shows the degree to which there is a linear correlation between the two variables, that is, how close the points are to forming a line. The closer r is to 1 or -1, the stronger the relationship.

  • Slope (b) – There is a predicted increase/decrease of ______ (slope in unit of y variable) for every 1 (unit of x variable).

  • Y Intercept – The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context).

  • Coefficient of Determination (r^2) – ____% of the variation in (y in context) is due to its linear relationship with (x in context).

  • Residuals – The difference between the actual data and the value predicted by a linear regression model, or y-ŷ. The ideal pattern is random scatter above and below the LSRL (where the residuals=0). A positive residual means the model underestimated the true value while a negative residual means the model overestimated the true value.

  • Extrapolation – The use of a regression model to make predictions outside of the domain of the given data. If you go outside this domain and farther outside the domain you go, the less accurate your predictions will be.


Unit 3 Key Terms (12-15%): Collecting Data

This unit discusses sampling methods and ways of collecting data that can be used to represent a population. It is filled with vocabulary that is essential in future units.

Best Quizlet Deck: AP Statistics Collecting Data by rachel_thiebout

Key Terms:

  • – Deliberately imposes treatment in order to observe the response. Causation could be proven.

  • 🔭 – Observe individuals and measure variables of interest but don’t attempt to influence the responses. These studies look for association between variables  because in a study, no treatment is imposed.

  • 👀 – Occurs when two variables are associated in a way that their effects on the response variable cannot be deciphered from each other individually.

  • – Statistical studies are biased if it is likely to underestimate or overestimate the value you are looking for.

  • Simple Random Sample (SRS) – Chooses a sample size “n” in a way that a group of individuals in the population has an equal chance to be selected as the sample

  • – Selects a sample by choosing an SRS from each strata and combining the SRSs into one overall sample. These reduce variability in the data and give more precise results.

  • –The population is divided into groups, called clusters, and an SRS of clusters is taken within each cluster. All individuals are sampled in the clusters selected.

  • – Sample members from a population selected according to a random starting number and a fixed periodic interval.


Unit 4 Key Terms (10-20%): Probability, Random Variables, and Probability Distributions

Everyday, we see things that happen simultaneously to the point we question the possibility of that event happening again. This brings in probability, the proportion of times the outcome would occur in a large number of repetitions. Unit 4 is all about probability and is very calculation heavy.

Best Quizlet Deck: AP Statistics CB Unit 4+5: Probability, Random Variables, Probability Distributions, Sampling Distributions by Zosia_Stawiarska

Key Terms:

  • – The outcome of one event doesn't influence the outcome of another event.

  • P(A|B) = P(A)

  • P(A and B) =P(A)*P(B)

  • – Cannot occur at the same time and have no outcomes in common.

  • P(A and B) = 0

  • P(A or B) = P(A) + P(B)

  • – Probability of one event under the condition that another event is known.

  • P(A|B)= P(A and B) / P (B) – The probability of A given B = Probability of A and B / Probability of B

  • – Takes a fixed set of possible values with gaps between them (cannot include decimals).

  • – Summation(xi*pi)

  • – sqrt(summation(xi-mean of x)^2 * pi). You cannot add standard deviations, only variances.

  • – Can take any value in an interval on the number line (can include decimals).

  • – Says that in many repetitions of the same chance process, the simulated probability gets closer to the true probability as more trials are run.

  • – Arises when we perform independent trials of the same chance process and count the number of times that a particular outcome called a success (p), occurs. Failure (q) is defined as 1 minus the probability of success. Must check .

  • – Arises when we perform independent trials of the same chance process and count the number of times that a particular outcome called a success (p), occurs. Failure (q) is defined as 1 minus the probability of success. Must check .

  • ⭕️ – Arises when we perform independent trials of the same chance process and record the number of trials it takes to get one success. On each trial, the probability p of success must be the same. The number of trials Y that it takes to get one success in a is a geometric random variable.


Unit 5 Key Terms (7-12%): Sampling Distributions

This unit is an introduction to significant tests, which are covered in later units. It begins introducing statistics, , the CLT, and population parameters.

Best Quizlet Deck: AP Statistics Chapter 7 - Sampling Distributions by Janet_Delery

Key Terms:

  • – A where we take ALL possible samples of a given size and put them together as a data set.

  • A statistic is used to estimate a parameter.

  • Parameters vs Statistics – Mean (𝝁, x̅); (σ, s); Proportions (𝝆, p̂).

  • – The number of successes and failures is at least 10.

  • – States that if n (the sample size) is ≥30, the is normal. The larger n is, the more normal the sample is.


Unit 6 Key Terms (12-15%): Inference for Categorical Data: Proportions

This unit is the beginning of significant tests, in which you are expected to check conditions, construct and interpret confidence intervals, and calculate a p-value. This unit consists of estimating population parameters involving categorical data.

Best Quizlet Deck: Unit 6: Inferences for Categorical Data: Proportions by Cfritz15

Key Terms:

  • – An interval of numbers based on our sample proportion that gives us a range where we can expect to find the true population proportion.

  • In repeated sampling, I am __% confident that the true population proportion (context) falls within this interval.

  • – Estimating the probability of obtaining our collected sample from the of our size when we assume that the given population proportion is correct.

  • – The number of successes and failures is at least 10 (np≥10 and n(1-p)≥10). This condition proves normality.

  • – Reduces any that may be caused from taking a bad sample. When answering inference questions, it is always essential to make note that our sample was random. Without a random sample, our findings cannot be generalized to a population, meaning our scope of inference is inaccurate.

  • – Check that the population in question is at least 10 times as large as our sample in order to prove independence.

  • – The "buffer zone" of our ; point estimate +- (z*)(stan. dev.)

  • – The hypothesis based on our claim that was given in the problem (p=___).

  • – The hypothesis that the claim in our null is not true (p<___, p>___ or p≠___).

  • Type I (𝞪) Error – When we reject our Ho, when in fact, we should have failed to reject.

  • Type II (𝞫) Error – When we fail to reject our Ho, but we actually should have rejected our Ho.

  • – How strong our test is because = 1-P().


Unit 7 Key Terms (10-18%): Inference for Quantitative Data: Means

This unit is very similar to Unit 6, but instead of dealing with proportions (p), we are dealing with means (μ). Many of the concepts overlap, but it is important to note that we cannot write proportions anywhere we are running an inference test for quantitative data.

Best Quizlet Deck: AP Statistics - Inference for Quantitative Data: Means by AlyssaMorales107

Key Terms:

  • – An interval of numbers based on our sample proportion that gives us a range where we can expect to find the true population mean.

  • In repeated sampling, I am __% confident that the true population mean (context) falls within this interval.

  • In repeated sampling, I am ___% confident that the true difference in population means (context) falls within this interval.

  • – States that if n (the sample size) is ≥30, the is normal. The larger n is, the more normal the sample is.

  • – Reduces any that may be caused from taking a bad sample.

  • – Check that the population in question is at least 10 times as large as our sample in order to prove independence.

  • – The hypothesis based on our claim that was given in the problem (μ=___).

  • – The hypothesis that the claim in our null is not true (μ<___, μ>___ or μ≠___).

  • p<𝞪 – Since p<𝞪, we reject our Ho. We have convincing evidence at the 𝞪 level that (Ha in context).

  • p<𝞪 – Since p<𝞪, we reject our Ho. We have convincing evidence at the 𝞪 level that (Ha in context).

  • – n-1


Unit 8 Key Terms (2-5%): Inference for Categorical Data: Chi-Squared

Chi-Squared significant tests operate differently than proportion and mean significant tests. All χ² distributions are skewed right. The degree of skewness depends on the .

Best Quizlet Deck: AP Statistics Chapter 11: Inference for Distributions of Categorical Data

Key Terms:

  • Goodness of Fit Test – Must have random sampling. All must be greater than 5.

    • – categories - 1

    • – There is no association between ___ and ___.

    • – There is association between ___ and ___.

  • – Checks the association between two variables in a single population.

    • – (row total)(column total) / (table total)

    • – There is no association between ___ and ___.

    • – There is association between ___ and ___.

  • – Checks the of a single variable in several populations to see if these populations are similar with respect to the variable.

    • Null Hypothesis – There is no difference between ___ and ___.

    • – There is difference between ___ and ___.


Unit 9 Key Terms (2-5%): Inference for Quantitative Data: Slopes

There is variability in slopes as well. In this unit, you will learn how to perform significant tests and construct confidence intervals about the slope of a regression line.

Best Quizlet Deck: AP Stats U9: Linear Regression Inference Vocab + Variables by harrisonb_4

Key Terms:

  • LSRL – ŷ=a+bx where x denotes (context) and ŷ denotes predicted (context)

  • – n-2

  • – b+- t*(SEb)

  • In repeated sampling, I am __% confident that the falls within this interval.

  • – 𝞫

  • – 𝞫 = 0; changing x does nothing to y


Closing Thoughts

Hopefully, these decks can help you study for your tests and ultimately, the AP exam. The best feature about Quizlet is the option to play games and use the flashcards wherever you are. When you are studying, you can always duplicate a deck and customize it to your own needs.

As long as you review these flashcards at least once a day a few days before your test, you should be good to go. Make sure to take advantage of starring flashcards you struggle with! Before a test, it's great to quickly look over the starred ones and then feel more confident about them.

You got this! Good luck studying.🍀


Key Terms to Review (57)

10% Condition

: The 10% condition states that for sampling without replacement to be considered valid, the sample size must be less than 10% of the population size.

Alternative Hypothesis

: The alternative hypothesis is a statement that contradicts or negates the null hypothesis. It suggests that there is a significant relationship or difference between variables.

Bar Graphs

: Bar graphs are a type of graphical representation that use rectangular bars to represent different categories or groups. They are commonly used to compare categorical data by showing the frequency or proportion of each category.

Bias

: Bias refers to a systematic deviation from the true value or an unfair influence that affects the results of statistical analysis. It can occur during data collection, sampling, or analysis and leads to inaccurate or misleading conclusions.

Binomial Setting

: In statistics, a binomial setting refers to an experiment with two possible outcomes (success or failure) for each trial, fixed number of trials, independence between trials, and constant probability of success for each trial.

Box Plots

: Box plots, also known as box-and-whisker plots, are graphical representations of a set of data that display the distribution and key statistical measures such as the median, quartiles, and outliers.

Categorical Variable

: A categorical variable is one that represents characteristics or qualities rather than numerical values. It consists of categories or groups into which data can be classified.

Center

: The center refers to the middle or average value of a data set. It represents the typical or central value around which the data tends to cluster.

Central Limit Theorem

: The Central Limit Theorem states that as the sample size increases, the sampling distribution of the mean approaches a normal distribution regardless of the shape of the population distribution.

Cluster Sample

: A cluster sample is a sampling method where the population is divided into groups or clusters, and a random selection of clusters is chosen to be included in the sample. All individuals within the selected clusters are then included in the sample.

Coefficient of Determination (r^2)

: The coefficient of determination, denoted as r^2, measures the proportion of the variation in the dependent variable that can be explained by the independent variable(s). It ranges from 0 to 1, where 0 indicates no relationship and 1 indicates a perfect fit.

Conditional Probability

: Conditional probability is the likelihood of an event occurring given that another event has already occurred. It measures the probability of one event happening, taking into account the occurrence of a different event.

Confidence Interval

: A confidence interval is a range of values that is likely to contain the true value of a population parameter. It provides an estimate along with a level of confidence about how accurate the estimate is.

Confounding

: Confounding occurs when there is an additional variable that affects both the independent variable and dependent variable, making it difficult to determine their individual effects accurately.

Continuous Random Variable

: A continuous random variable is a variable that can take on any value within a certain range, usually represented by an interval. It can have an infinite number of possible values.

Correlation Coefficient (r)

: The correlation coefficient, denoted as "r", measures the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 0 indicates no linear relationship, and 1 indicates a perfect positive linear relationship.

Degrees of Freedom

: Degrees of freedom refers to the number of values in a calculation that are free to vary. In statistics, it represents the number of independent pieces of information available for estimating a parameter.

Direction

: Direction refers to the positive or negative trend of a relationship between two variables. It indicates whether an increase in one variable is associated with an increase or decrease in the other variable.

Discrete Random Variable

: A discrete random variable is a variable that can only take on specific, isolated values. It is characterized by a countable set of possible outcomes.

Disjoint/Mutually Exclusive Events

: Disjoint or mutually exclusive events are events that cannot occur at the same time. If one event happens, the other event cannot happen simultaneously.

Distribution

: In statistics, distribution refers to the way data is spread out or organized. It describes how frequently each value occurs and provides insights into patterns and characteristics of the dataset.

Dotplots

: Dotplots are simple graphical displays that use dots to represent individual data points. They provide a visual representation of the distribution and allow us to see patterns, gaps, clusters, or outliers in the data.

Empirical Rule

: The empirical rule, also known as the 68-95-99.7 rule, is a statistical guideline that states that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and roughly 99.7% falls within three standard deviations.

Expected Counts

: Expected counts are calculated values that represent what we would expect to observe in each category if there was no association between variables. They are based on assumptions and can be compared to observed counts to assess whether there is a significant relationship.

Experiment

: An experiment is a scientific study in which researchers manipulate one or more variables to observe the effect on another variable, while controlling for other factors.

Geometric Setting

: In statistics, geometric setting refers to an experiment with repeated independent trials until the first success occurs, where each trial has two possible outcomes (success or failure) and a constant probability of success.

Goodness of Fit Test

: A Goodness of Fit Test is a statistical test used to determine how well an observed sample data fits an expected theoretical distribution. It assesses whether any differences between observed and expected frequencies are statistically significant or simply due to random chance.

Histograms

: Histograms are graphical representations of data that use bars to show how many times each value or range of values occurs within a dataset.

Independent Events

: Independent events are events that have no influence on each other. The outcome of one event does not affect the outcome of another event.

Large Counts Condition

: The large counts condition, also known as the "success-failure" condition, is used when applying certain statistical methods to categorical data. It states that for these methods to be valid, both the number of successes and failures must be at least 10.

Law of Large Numbers

: A principle in probability theory stating that as more observations are collected, the sample mean will converge to the population mean.

LSRL (Linear Regression)

: LSRL, or Linear Regression, stands for Least Squares Regression Line. It is a line that best represents the relationship between two variables by minimizing the sum of squared differences between observed data points and predicted values on the line.

Margin of Error

: The margin of error is a measure of the uncertainty or variability in survey results. It represents the range within which the true population parameter is likely to fall.

Mean (Expected) Value

: The mean value, also known as the expected value, is the average outcome of a random variable. It represents the long-term average or center of the distribution.

Null Hypothesis

: The null hypothesis is a statement of no effect or relationship between variables in a statistical analysis. It assumes that any observed differences or associations are due to random chance.

Observational Study

: An observational study is a research method where researchers observe subjects without intervening or manipulating any variables. They collect data by watching subjects' behaviors or characteristics.

Outliers

: Outliers are extreme values that significantly differ from other values in a dataset. They can greatly affect statistical analyses and should be carefully examined.

Power

: Power refers to the probability of correctly rejecting a false null hypothesis. It measures our ability to detect an actual effect or relationship.

Quantitative Variable

: A quantitative variable is a type of variable that represents numerical data and can be measured or counted. It provides information about quantities and amounts rather than qualities.

Random Condition

: The random condition refers to the requirement that subjects or samples in a study must be randomly selected in order to ensure unbiased results.

Sampling Distribution

: A sampling distribution refers to the distribution of a statistic (such as mean, proportion, or difference) calculated from multiple random samples taken from the same population. It provides information about how sample statistics vary from sample to sample.

Shape

: In statistics, shape refers to the overall appearance or form of a distribution. It describes how the data is distributed and can be characterized by its symmetry, skewness, or modality.

Significance Test

: A significance test is a statistical method used to determine whether an observed result is statistically significant or simply due to chance. It involves comparing sample data with what would be expected under the null hypothesis.

Simple random sample (SRS)

: A simple random sample (SRS) is a subset of individuals selected from a larger population in such a way that every individual has an equal chance of being chosen and all possible samples have the same probability of being selected.

Spread

: Spread refers to how much variability or dispersion exists within a data set. It measures how far apart the values are from each other.

Standard Deviation

: The standard deviation measures the average amount of variation or dispersion in a set of data. It tells us how spread out the values are from the mean.

Stemplots

: Stemplots (also known as stem-and-leaf plots) are graphical displays that organize numerical data by separating each value into two parts - a stem and one or more leaves. They allow us to see both individual values and patterns within the data.

Stratified Random Sample

: A stratified random sample involves dividing a population into subgroups (strata) based on certain characteristics, and then selecting individuals from each subgroup using a random sampling method. This ensures representation from each subgroup in proportion to their size within the overall population.

Strength

: Strength refers to the degree of association or relationship between two variables. It measures how closely the data points in a scatterplot cluster around a straight line.

Systematic Random Sample

: A systematic random sample is a sampling method where every nth individual is selected from a list or population after starting at a randomly chosen initial individual. The interval between selections remains constant throughout.

Test for Homogeneity

: A statistical test used to compare the distributions of multiple groups or populations based on one categorical variable. It determines whether the proportions within each group are similar or significantly different.

Test for Independence

: A statistical test used to determine if there is a relationship between two categorical variables. It assesses whether the occurrence of one variable is independent of the occurrence of another variable.

True Slope

: The true slope refers to the actual value representing the change in the response variable for each unit increase in the explanatory variable in a linear regression model. It is the population parameter that we aim to estimate.

Two-way tables

: Two-way tables are used to organize and display categorical data. They show the relationship between two different variables by listing the categories of each variable and counting the number of observations that fall into each combination.

Type I Error

: Type I error refers to rejecting a true null hypothesis. It occurs when we conclude there is a significant difference or relationship between variables when there actually isn't one.

Type II error

: Type II error occurs when we fail to reject a null hypothesis that is actually false. In other words, it's the mistake of accepting the null hypothesis when we should have rejected it.

Z-Score

: A z-score is a measure of how many standard deviations an individual data point is away from the mean of a distribution. It helps to determine the relative position of a data point within a dataset.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.