Fiveable

๐Ÿ“‰Statistical Methods for Data Science Unit 11 Review

QR code for Statistical Methods for Data Science practice questions

11.2 Factor Analysis

๐Ÿ“‰Statistical Methods for Data Science
Unit 11 Review

11.2 Factor Analysis

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“‰Statistical Methods for Data Science
Unit & Topic Study Guides

Factor analysis is a powerful tool for uncovering hidden patterns in data. It helps us identify underlying factors that explain relationships between observed variables, reducing complexity and revealing key insights.

This technique is crucial in dimensionality reduction, allowing us to simplify large datasets. By grouping related variables, factor analysis makes it easier to understand and analyze complex data structures in various fields.

Latent Factors

Unobserved Variables

  • Latent variables represent unobserved constructs or underlying dimensions that influence observed variables
  • Cannot be directly measured but are inferred from the relationships among observed variables
  • Assumed to be the underlying causes of the correlations among observed variables
  • Explain the common variance shared by a set of observed variables (intelligence, personality traits)

Types of Factors

  • Common factors are latent variables that contribute to the variance of multiple observed variables
    • Represent the shared variance among the observed variables
    • Assumed to be the underlying dimensions that influence the observed variables (general intelligence factor)
  • Unique factors are latent variables that contribute to the variance of a single observed variable
    • Represent the variance specific to each observed variable
    • Include measurement error and variable-specific factors (test-taking anxiety)

Factor Analysis Methods

Exploratory and Confirmatory Approaches

  • Exploratory factor analysis (EFA) is used to explore the underlying factor structure of a set of observed variables
    • Does not impose a predetermined factor structure
    • Aims to identify the number and nature of the latent factors that best explain the correlations among the observed variables
  • Confirmatory factor analysis (CFA) is used to test a hypothesized factor structure based on prior knowledge or theory
    • Specifies the number of factors and the pattern of factor loadings in advance
    • Assesses the fit of the hypothesized model to the observed data

Rotation Techniques

  • Rotation methods are used to simplify the interpretation of factor loadings by maximizing high loadings and minimizing low loadings
  • Orthogonal rotation assumes that the factors are uncorrelated (varimax rotation)
    • Maintains the independence of the factors
    • Suitable when the factors are expected to be distinct and unrelated
  • Oblique rotation allows the factors to be correlated (promax rotation)
    • Reflects the possibility of interrelated factors
    • Appropriate when the factors are expected to have some degree of correlation

Factor Interpretation

Factor Loadings and Communality

  • Factor loadings represent the correlations between the observed variables and the latent factors
    • Indicate the strength and direction of the relationship between each variable and each factor
    • Higher loadings suggest a stronger influence of the factor on the variable
  • Communality refers to the proportion of variance in an observed variable that is explained by the common factors
    • Calculated as the sum of squared factor loadings for each variable
    • Represents the shared variance accounted for by the factors

Factor Scores

  • Factor scores are estimates of an individual's standing on the latent factors based on their observed variable scores
    • Computed as weighted combinations of the observed variable scores
    • Used to assess an individual's relative position on the underlying dimensions
    • Can be used in subsequent analyses as variables representing the latent constructs (using factor scores as predictors in regression)