Fiveable

๐Ÿ“ŠBusiness Intelligence Unit 8 Review

QR code for Business Intelligence practice questions

8.2 Supervised and Unsupervised Learning Algorithms

๐Ÿ“ŠBusiness Intelligence
Unit 8 Review

8.2 Supervised and Unsupervised Learning Algorithms

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐Ÿ“ŠBusiness Intelligence
Unit & Topic Study Guides

Machine learning algorithms fall into two main categories: supervised and unsupervised. Supervised learning uses labeled data to predict outcomes, while unsupervised learning uncovers hidden patterns in unlabeled data. These approaches power various applications, from predicting house prices to grouping customers.

Supervised algorithms like linear regression and decision trees tackle specific prediction tasks. Unsupervised techniques such as clustering and dimensionality reduction reveal underlying data structures. Choosing the right algorithm depends on the problem type, data characteristics, and desired outcomes.

Supervised and Unsupervised Learning Algorithms

Supervised vs unsupervised learning

  • Supervised learning trains models using labeled data with known target variables (house prices) to predict outcomes for new, unseen data
    • Requires input features (square footage, number of bedrooms) and corresponding target variables to learn patterns
    • Enables prediction or classification tasks (predicting house prices, classifying email as spam or not spam)
  • Unsupervised learning discovers hidden patterns or structures in unlabeled data without predefined target variables
    • Identifies inherent groupings (customer segments) or reduces data dimensionality for visualization or feature extraction
    • Includes clustering algorithms (K-means) and dimensionality reduction techniques (Principal Component Analysis)

Applications of supervised algorithms

  • Linear regression predicts continuous target variables assuming a linear relationship between input features and the target
    • Estimates coefficients to minimize the sum of squared errors between predicted and actual values
    • Equation: $y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n$
  • Logistic regression predicts binary target variables (customer churn) by modeling the probability of the target belonging to a particular class
    • Uses the logistic function to map input features to a probability between 0 and 1
    • Equation: $p(y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n)}}$
  • Decision trees recursively split data based on input features to create a tree-like model for prediction or classification
    • Internal nodes represent decisions based on features (age > 30), leaf nodes represent class labels or continuous values
    • Can be prone to overfitting, mitigated by pruning or setting maximum depth
  • Random forests combine multiple decision trees to improve performance and reduce overfitting
    • Builds trees on bootstrapped samples of training data using random subsets of input features
    • Final prediction is the majority vote (classification) or average (regression) of individual tree predictions

Techniques for unsupervised learning

  • Clustering groups similar data points based on intrinsic characteristics
    • K-means partitions data into K clusters based on Euclidean distance between data points and cluster centroids
      1. Assigns data points to nearest centroid
      2. Updates centroids based on assigned points
      3. Repeats steps 1-2 until convergence
    • Hierarchical clustering creates a tree-like structure (dendrogram) by iteratively merging or splitting clusters based on similarity
  • Dimensionality reduction reduces the number of input features while preserving essential information
    • Enables visualization of high-dimensional data, noise removal, and computational efficiency
    • Principal Component Analysis (PCA) identifies directions of maximum variance (principal components) and projects data onto lower-dimensional space
    • t-SNE (t-Distributed Stochastic Neighbor Embedding) preserves local structure of high-dimensional data in a lower-dimensional representation

Algorithm selection criteria

  • Problem type dictates the choice of algorithm
    • Regression for predicting continuous target variables
      • Linear regression for linear relationships (housing prices based on square footage)
      • Decision trees or random forests for non-linear relationships (stock prices based on economic indicators)
    • Classification for predicting categorical target variables
      • Logistic regression for binary classification (customer churn)
      • Decision trees or random forests for multi-class classification (image recognition)
  • Data characteristics influence algorithm selection
    • Linear regression for linear relationships, decision trees or random forests for non-linear relationships
    • Dimensionality reduction (PCA) for high-dimensional data (gene expression data)
    • Decision trees and random forests are less sensitive to outliers compared to linear and logistic regression
    • Decision trees provide interpretable rules, while linear and logistic regression coefficients can be interpreted as feature importances