Fiveable

๐Ÿ“ŠProbability and Statistics Unit 9 Review

QR code for Probability and Statistics practice questions

9.5 t-tests and z-tests

๐Ÿ“ŠProbability and Statistics
Unit 9 Review

9.5 t-tests and z-tests

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠProbability and Statistics
Unit & Topic Study Guides

Hypothesis testing is a crucial statistical method for making decisions about populations based on sample data. It involves formulating null and alternative hypotheses, collecting data, and using statistical tests to assess the significance of research findings.

T-tests and z-tests are common tools for comparing means and proportions. These tests help researchers determine if there are significant differences between samples or if sample statistics differ from hypothesized population parameters. Understanding their assumptions and applications is essential for accurate data analysis.

Fundamentals of hypothesis testing

  • Hypothesis testing is a statistical method used to make decisions or draw conclusions about a population based on sample data
  • Involves formulating a null hypothesis ($H_0$) and an alternative hypothesis ($H_a$), collecting data, and using statistical tests to determine whether to reject or fail to reject the null hypothesis
  • Hypothesis tests are widely used in various fields, including psychology, biology, economics, and social sciences, to assess the significance of research findings and make data-driven decisions

Null and alternative hypotheses

  • The null hypothesis ($H_0$) represents the default or status quo position, typically stating that there is no significant difference or relationship between variables
  • The alternative hypothesis ($H_a$) is the claim that the researcher wants to support, usually indicating the presence of a significant difference or relationship
  • The choice of the null and alternative hypotheses depends on the research question and the direction of the expected effect

One-tailed vs two-tailed tests

  • One-tailed tests are used when the alternative hypothesis specifies a direction (greater than or less than) for the difference or relationship
  • Two-tailed tests are used when the alternative hypothesis does not specify a direction, only that there is a difference or relationship
  • The choice between a one-tailed or two-tailed test affects the critical values and the interpretation of the results
    • One-tailed tests allocate the entire significance level (e.g., $\alpha = 0.05$) to one side of the distribution
    • Two-tailed tests split the significance level equally between both sides of the distribution (e.g., $\alpha/2 = 0.025$ on each side)

Test statistics

  • Test statistics are calculated values used to make decisions in hypothesis testing by comparing them to critical values or p-values
  • The choice of the test statistic depends on the type of data, the sample size, and the assumptions of the test
  • Common test statistics include the t-statistic for t-tests and the z-statistic for z-tests

t-statistic for t-tests

  • The t-statistic is used when the sample size is small ($n < 30$) or when the population standard deviation is unknown
  • It follows a t-distribution with $n-1$ degrees of freedom
  • The formula for the t-statistic is: $t = \frac{\bar{x} - \mu}{s/\sqrt{n}}$ where $\bar{x}$ is the sample mean, $\mu$ is the hypothesized population mean, $s$ is the sample standard deviation, and $n$ is the sample size

z-statistic for z-tests

  • The z-statistic is used when the sample size is large ($n \geq 30$) and the population standard deviation is known
  • It follows a standard normal distribution (mean = 0, standard deviation = 1)
  • The formula for the z-statistic is: $z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}$ where $\bar{x}$ is the sample mean, $\mu$ is the hypothesized population mean, $\sigma$ is the population standard deviation, and $n$ is the sample size

t-tests for means

  • t-tests are used to compare means and determine if there is a significant difference between them
  • There are three main types of t-tests: one-sample t-test, two-sample t-test, and paired t-test
  • t-tests assume that the data are approximately normally distributed and have equal variances (for two-sample t-tests)

One-sample t-test

  • Used to compare a sample mean to a known or hypothesized population mean
  • Tests whether the sample mean is significantly different from the population mean
  • Example: Testing if the average height of a sample of students is significantly different from the national average height

Two-sample t-test

  • Used to compare the means of two independent samples
  • Tests whether the means of the two samples are significantly different from each other
  • Example: Comparing the average test scores of students in two different teaching methods (traditional vs. online)

Paired t-test

  • Used to compare the means of two related or dependent samples (e.g., before and after measurements on the same individuals)
  • Tests whether the mean difference between the paired observations is significantly different from zero
  • Example: Comparing the blood pressure of patients before and after taking a new medication

Assumptions of t-tests

  • The data should be approximately normally distributed
    • For larger sample sizes ($n \geq 30$), the t-test is robust to violations of normality due to the Central Limit Theorem
  • The samples should be independent (except for paired t-tests)
  • The variances of the samples should be equal (for two-sample t-tests)
    • If the variances are unequal, alternative tests like Welch's t-test can be used

z-tests for proportions

  • z-tests are used to compare proportions and determine if there is a significant difference between them
  • There are two main types of z-tests for proportions: one-sample z-test and two-sample z-test
  • z-tests assume that the sample size is large enough ($np \geq 10$ and $n(1-p) \geq 10$) and that the samples are independent

One-sample z-test

  • Used to compare a sample proportion to a known or hypothesized population proportion
  • Tests whether the sample proportion is significantly different from the population proportion
  • Example: Testing if the proportion of defective products in a sample is significantly different from the claimed proportion

Two-sample z-test

  • Used to compare the proportions of two independent samples
  • Tests whether the proportions of the two samples are significantly different from each other
  • Example: Comparing the proportion of smokers in two different age groups (18-30 vs. 31-50)

Assumptions of z-tests

  • The sample size should be large enough ($np \geq 10$ and $n(1-p) \geq 10$)
  • The samples should be independent
  • The sample proportions should be normally distributed (approximated by the normal distribution when the sample size is large)

Significance level and p-values

  • The significance level ($\alpha$) is the probability of rejecting the null hypothesis when it is true (Type I error)
    • Common significance levels are 0.05 and 0.01
  • The p-value is the probability of obtaining a test statistic as extreme as or more extreme than the observed value, assuming the null hypothesis is true
  • If the p-value is less than the significance level, we reject the null hypothesis; otherwise, we fail to reject the null hypothesis
  • The p-value provides a measure of the strength of evidence against the null hypothesis

Type I and Type II errors

  • Type I error (false positive) occurs when the null hypothesis is rejected when it is true
    • The probability of a Type I error is equal to the significance level ($\alpha$)
  • Type II error (false negative) occurs when the null hypothesis is not rejected when it is false
    • The probability of a Type II error is denoted by $\beta$
  • The power of a test is the probability of correctly rejecting the null hypothesis when it is false (1 - $\beta$)
  • There is a trade-off between Type I and Type II errors; decreasing one type of error increases the other

Confidence intervals

  • Confidence intervals provide a range of plausible values for a population parameter (e.g., mean or proportion) based on sample data
  • The confidence level (e.g., 95%) represents the proportion of intervals that would contain the true population parameter if the sampling process were repeated many times
  • Confidence intervals are constructed using the sample statistic (e.g., sample mean or proportion) and the standard error

Interpreting confidence intervals

  • A 95% confidence interval means that if the sampling process were repeated many times, 95% of the resulting intervals would contain the true population parameter
  • The width of the confidence interval indicates the precision of the estimate; narrower intervals suggest more precise estimates
  • Confidence intervals can be used to assess the significance of a result; if the interval does not contain the null hypothesis value, the result is significant at the corresponding confidence level

Confidence intervals vs hypothesis tests

  • Confidence intervals and hypothesis tests are related but serve different purposes
  • Hypothesis tests provide a decision about the significance of a result based on a pre-specified significance level
  • Confidence intervals provide a range of plausible values for the population parameter and indicate the precision of the estimate
  • Confidence intervals can be used to perform hypothesis tests by checking if the null hypothesis value falls within the interval

Power and sample size

  • Power is the probability of correctly rejecting the null hypothesis when it is false (1 - $\beta$)
  • Higher power means a higher chance of detecting a true effect or difference
  • Sample size is a key factor that influences the power of a test; larger sample sizes generally increase power

Factors affecting power

  • Effect size: Larger effects are easier to detect and require smaller sample sizes to achieve the same power
  • Significance level ($\alpha$): Smaller significance levels (e.g., 0.01) require larger sample sizes to maintain the same power compared to larger significance levels (e.g., 0.05)
  • Variability of the data: Higher variability requires larger sample sizes to achieve the same power
  • Type of test: Some tests (e.g., one-tailed tests) have higher power than others (e.g., two-tailed tests) for the same sample size and effect size

Calculating required sample size

  • The required sample size can be calculated based on the desired power, significance level, effect size, and variability of the data
  • There are formulas and software packages available to determine the required sample size for various types of tests
  • Example: Using GPower software to calculate the sample size needed for a two-sample t-test with a power of 0.80, significance level of 0.05, and a medium effect size (Cohen's d = 0.5)

Limitations and alternatives

  • Hypothesis tests and confidence intervals have limitations and assumptions that should be considered when interpreting results
  • Violations of assumptions (e.g., non-normality, unequal variances) can affect the validity of the results
  • Alternative methods can be used when assumptions are violated or when the data are not suitable for parametric tests

Nonparametric tests

  • Nonparametric tests do not assume a specific distribution of the data and are based on ranks or order statistics
  • Examples of nonparametric tests include the Wilcoxon rank-sum test (for two independent samples), the Wilcoxon signed-rank test (for paired samples), and the Kruskal-Wallis test (for three or more independent samples)
  • Nonparametric tests are less powerful than parametric tests when the assumptions are met, but they are more robust to violations of assumptions

Bayesian hypothesis testing

  • Bayesian hypothesis testing is an alternative approach that incorporates prior information and updates the probability of the hypotheses based on the observed data
  • Bayesian methods provide posterior probabilities of the hypotheses, which can be easier to interpret than p-values
  • Bayesian hypothesis testing requires specifying prior distributions for the parameters of interest, which can be subjective and may influence the results
  • Bayesian methods are becoming increasingly popular in various fields, including psychology, economics, and machine learning