Fiveable

👁️Computer Vision and Image Processing Unit 6 Review

QR code for Computer Vision and Image Processing practice questions

6.2 Unsupervised learning

👁️Computer Vision and Image Processing
Unit 6 Review

6.2 Unsupervised learning

Written by the Fiveable Content Team • Last updated September 2025
Written by the Fiveable Content Team • Last updated September 2025
👁️Computer Vision and Image Processing
Unit & Topic Study Guides

Unsupervised learning in computer vision analyzes unlabeled data to find patterns and structures without human guidance. It's crucial for tasks like image segmentation, object detection, and visual representation learning, enabling automatic feature extraction from large image datasets.

Key techniques include clustering algorithms, dimensionality reduction, and generative models. These methods group similar data points, reduce data complexity, and create new, realistic images. Unsupervised learning serves as a foundation for more advanced computer vision tasks and continues to evolve with recent advancements.

Fundamentals of unsupervised learning

  • Unsupervised learning algorithms analyze and cluster unlabeled data without human intervention
  • In computer vision, unsupervised learning enables automatic feature extraction and pattern recognition from large image datasets
  • Plays a crucial role in image segmentation, object detection, and visual representation learning

Definition and key concepts

  • Learning patterns and structures in data without labeled examples or explicit supervision
  • Focuses on finding hidden patterns, grouping similar data points, and reducing data dimensionality
  • Key concepts include clustering, dimensionality reduction, and density estimation
  • Relies on statistical properties and intrinsic relationships within the data

Comparison to supervised learning

  • Unsupervised learning works with unlabeled data, while supervised learning requires labeled training examples
  • Supervised learning aims to predict specific outputs, unsupervised learning discovers inherent structures
  • Unsupervised methods often serve as preprocessing steps for supervised tasks in computer vision
  • Evaluation of unsupervised models more challenging due to lack of ground truth labels

Applications in computer vision

  • Image segmentation groups pixels into meaningful regions or objects
  • Feature learning extracts relevant visual representations from raw image data
  • Anomaly detection identifies unusual patterns or objects in images
  • Generative modeling creates new, realistic images based on learned distributions

Clustering algorithms

  • Clustering algorithms group similar data points together based on their features or attributes
  • In computer vision, clustering helps segment images, group similar objects, and organize large image datasets
  • Enables unsupervised object recognition and scene understanding in visual data

K-means clustering

  • Partitions data into K predefined clusters based on feature similarity
  • Iterative algorithm minimizes within-cluster variance
  • Steps include:
    1. Initialize K cluster centroids randomly
    2. Assign each data point to the nearest centroid
    3. Recalculate centroids based on assigned points
    4. Repeat steps 2-3 until convergence
  • Widely used for image segmentation and color quantization
  • Sensitive to initial centroid placement and outliers

Hierarchical clustering

  • Builds a tree-like structure of nested clusters (dendrogram)
  • Two main approaches: agglomerative (bottom-up) and divisive (top-down)
  • Agglomerative clustering steps:
    1. Start with each data point as a separate cluster
    2. Merge closest clusters based on a distance metric
    3. Repeat until all points belong to a single cluster
  • Provides multi-scale representation of data structure
  • Used for hierarchical image segmentation and object part decomposition

DBSCAN algorithm

  • Density-Based Spatial Clustering of Applications with Noise
  • Groups together points with many nearby neighbors, marking points in low-density regions as outliers
  • Key parameters: epsilon (neighborhood radius) and minPoints (minimum points to form a cluster)
  • Advantages include:
    • Discovers clusters of arbitrary shape
    • Robust to noise and outliers
    • Automatically determines the number of clusters
  • Applied in image segmentation and object detection in complex scenes

Dimensionality reduction techniques

  • Dimensionality reduction transforms high-dimensional data into lower-dimensional representations
  • In computer vision, reduces the complexity of image data while preserving important visual information
  • Enables efficient processing, visualization, and analysis of large-scale image datasets

Principal Component Analysis (PCA)

  • Linear dimensionality reduction technique that identifies principal components of data
  • Steps to perform PCA:
    1. Standardize the data
    2. Compute the covariance matrix
    3. Calculate eigenvectors and eigenvalues
    4. Sort eigenvectors by decreasing eigenvalues
    5. Project data onto selected principal components
  • Captures maximum variance in the data with orthogonal components
  • Used for feature extraction, image compression, and face recognition

t-SNE

  • t-Distributed Stochastic Neighbor Embedding
  • Non-linear dimensionality reduction technique for visualizing high-dimensional data
  • Preserves local structure and reveals clusters in the data
  • Algorithm steps:
    1. Compute pairwise similarities in high-dimensional space
    2. Initialize low-dimensional embedding
    3. Minimize Kullback-Leibler divergence between high and low-dimensional distributions
  • Effective for visualizing image datasets and understanding feature spaces
  • Computationally intensive for large datasets

Autoencoders for dimensionality reduction

  • Neural network architecture that learns to compress and reconstruct data
  • Consists of an encoder that maps input to a lower-dimensional latent space and a decoder that reconstructs the input
  • Training process minimizes reconstruction error
  • Variants include denoising autoencoders and variational autoencoders
  • Applied in image compression, feature learning, and anomaly detection

Feature extraction methods

  • Feature extraction identifies distinctive characteristics or patterns in image data
  • Crucial for various computer vision tasks (object recognition, image retrieval, scene understanding)
  • Enables efficient representation and processing of visual information

SIFT and SURF

  • Scale-Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF)
  • Local feature descriptors robust to scale, rotation, and illumination changes
  • SIFT algorithm steps:
    1. Scale-space extrema detection
    2. Keypoint localization
    3. Orientation assignment
    4. Keypoint descriptor generation
  • SURF uses integral images and box filters for faster computation
  • Applied in image matching, object recognition, and 3D reconstruction

HOG descriptors

  • Histogram of Oriented Gradients
  • Captures local shape information by computing gradient orientations in image patches
  • HOG computation process:
    1. Divide image into cells
    2. Compute gradient magnitudes and orientations for each pixel
    3. Create histograms of gradient orientations for each cell
    4. Normalize histograms across blocks of cells
  • Widely used in pedestrian detection and object recognition
  • Robust to illumination changes and small geometric transformations

Convolutional features

  • Features extracted from convolutional neural networks (CNNs) trained on large image datasets
  • Hierarchical representation of visual information:
    • Lower layers capture low-level features (edges, textures)
    • Higher layers represent more abstract concepts (objects, scenes)
  • Transfer learning uses pre-trained CNN features for various vision tasks
  • Feature visualization techniques reveal patterns learned by CNN layers
  • Enables state-of-the-art performance in many computer vision applications

Generative models

  • Generative models learn to generate new data samples that resemble the training data distribution
  • In computer vision, they create realistic images, perform image-to-image translation, and learn visual representations
  • Enable various applications (image synthesis, data augmentation, style transfer)

Gaussian Mixture Models (GMMs)

  • Probabilistic model representing data as a mixture of Gaussian distributions
  • Parameters include means, covariances, and mixing coefficients for each Gaussian component
  • Training uses Expectation-Maximization (EM) algorithm:
    1. Initialize parameters
    2. E-step: Compute responsibilities (posterior probabilities)
    3. M-step: Update parameters based on responsibilities
    4. Repeat until convergence
  • Applied in image segmentation, background modeling, and color clustering
  • Flexible model for representing complex data distributions

Variational Autoencoders (VAEs)

  • Probabilistic generative model combining autoencoders with variational inference
  • Architecture consists of an encoder (inference network) and a decoder (generative network)
  • Training objectives:
    1. Reconstruction loss: measures how well the model reconstructs input data
    2. KL divergence: encourages the latent space to follow a prior distribution (usually Gaussian)
  • Enables generation of new samples by sampling from the latent space
  • Used for image generation, interpolation, and representation learning

Generative Adversarial Networks (GANs)

  • Framework for training generative models through adversarial process
  • Consists of two neural networks:
    1. Generator: creates fake samples to fool the discriminator
    2. Discriminator: distinguishes between real and fake samples
  • Training process:
    • Generator minimizes the probability of discriminator correctly classifying fake samples
    • Discriminator maximizes its ability to distinguish real from fake samples
  • Variants include DCGANs, StyleGAN, and CycleGAN
  • Applied in high-resolution image synthesis, image-to-image translation, and data augmentation

Anomaly detection

  • Anomaly detection identifies unusual patterns or samples that deviate from the norm
  • In computer vision, detects abnormal objects, events, or behaviors in images and videos
  • Critical for quality control, security surveillance, and medical image analysis

One-class SVM

  • Support Vector Machine variant for anomaly detection
  • Learns a decision boundary that encloses the majority of normal data points
  • Training process:
    1. Map data to high-dimensional feature space
    2. Find the smallest hypersphere that contains most of the data
    3. Use kernel trick to handle non-linear decision boundaries
  • Detects anomalies as points falling outside the learned boundary
  • Effective for detecting novelties in image data and identifying defects in visual inspection

Isolation Forest

  • Ensemble method based on the principle that anomalies are easier to isolate
  • Algorithm steps:
    1. Randomly select a feature and split value
    2. Recursively partition the data until each point is isolated
    3. Compute anomaly score based on average path length to isolate a point
  • Advantages include:
    • Handles high-dimensional data efficiently
    • Robust to irrelevant features
    • Scalable to large datasets
  • Applied in detecting unusual objects or regions in images and video frames

Autoencoders for anomaly detection

  • Uses reconstruction error of autoencoders to identify anomalies
  • Training process:
    1. Train autoencoder on normal data samples
    2. Compute reconstruction error for new samples
    3. Classify samples with high reconstruction error as anomalies
  • Variants include denoising autoencoders and variational autoencoders
  • Effective for detecting anomalies in complex image data (medical imaging, industrial inspection)
  • Can learn hierarchical features relevant for anomaly detection

Self-organizing maps

  • Self-organizing maps (SOMs) create low-dimensional representations of high-dimensional data
  • Preserve topological relationships between data points in the input space
  • In computer vision, SOMs enable visualization and clustering of complex image features

Kohonen networks

  • Type of artificial neural network that implements self-organizing maps
  • Architecture consists of a grid of neurons, each associated with a weight vector
  • Training process:
    1. Initialize neuron weights randomly
    2. Present input vector to the network
    3. Find the best matching unit (BMU) based on similarity
    4. Update BMU and its neighbors' weights to move closer to the input
    5. Repeat steps 2-4 for all inputs and multiple epochs
  • Resulting map represents a non-linear projection of the input space
  • Neurons close in the grid respond to similar input patterns

Applications in image processing

  • Color quantization reduces the number of colors in an image while preserving visual quality
  • Texture analysis groups similar textures and identifies dominant patterns in images
  • Image compression represents images using a reduced set of SOM neurons
  • Feature extraction creates low-dimensional representations of image patches or regions
  • Image segmentation clusters pixels or regions based on their similarity in the SOM space

Evaluation metrics

  • Evaluation metrics assess the quality and performance of unsupervised learning algorithms
  • In computer vision, these metrics help compare different clustering or dimensionality reduction techniques
  • Enable objective assessment of algorithm performance without ground truth labels

Silhouette score

  • Measures how similar an object is to its own cluster compared to other clusters
  • Silhouette coefficient for a single sample: s=bamax(a,b)s = \frac{b - a}{\max(a, b)} where $a$ is the mean intra-cluster distance and $b$ is the mean nearest-cluster distance
  • Range: -1 to 1, higher values indicate better-defined clusters
  • Advantages:
    • Applicable to any distance metric
    • Provides a visual representation of cluster quality
  • Used to evaluate image segmentation and object clustering results

Davies-Bouldin index

  • Ratio of within-cluster distances to between-cluster distances
  • Lower values indicate better clustering
  • Formula: DB=1ni=1nmaxji(σi+σjd(ci,cj))DB = \frac{1}{n} \sum_{i=1}^n \max_{j \neq i} \left( \frac{\sigma_i + \sigma_j}{d(c_i, c_j)} \right) where $n$ is the number of clusters, $\sigma_i$ is the average distance of points in cluster $i$ to its centroid, and $d(c_i, c_j)$ is the distance between centroids of clusters $i$ and $j$
  • Evaluates both compactness within clusters and separation between clusters
  • Applied in assessing the quality of image segmentation algorithms

Calinski-Harabasz index

  • Ratio of between-cluster dispersion to within-cluster dispersion
  • Higher values indicate better-defined clusters
  • Formula: CH=tr(Bk)tr(Wk)×nkk1CH = \frac{tr(B_k)}{tr(W_k)} \times \frac{n - k}{k - 1} where $tr(B_k)$ is the trace of the between-cluster scatter matrix, $tr(W_k)$ is the trace of the within-cluster scatter matrix, $n$ is the number of data points, and $k$ is the number of clusters
  • Advantages:
    • Works well for convex clusters
    • Computationally efficient
  • Used to evaluate clustering results in image analysis and feature space partitioning

Challenges and limitations

  • Unsupervised learning in computer vision faces various challenges and limitations
  • Understanding these issues helps in developing more robust and effective algorithms
  • Addressing these challenges drives research in advanced unsupervised learning techniques

Curse of dimensionality

  • Refers to various phenomena that arise when analyzing data in high-dimensional spaces
  • Effects on unsupervised learning:
    • Increased sparsity of data points
    • Diminished effectiveness of distance metrics
    • Increased computational complexity
  • Manifests in image analysis due to high-dimensional nature of pixel data
  • Dimensionality reduction techniques (PCA, t-SNE) help mitigate this issue
  • Feature selection methods identify relevant dimensions for specific tasks

Interpretation of results

  • Unsupervised learning often produces results that require expert interpretation
  • Challenges in computer vision applications:
    • Assigning semantic meaning to discovered clusters or features
    • Validating the relevance of extracted patterns without ground truth
    • Explaining the decision-making process of complex models (GANs)
  • Visualization techniques help in understanding high-dimensional data representations
  • Domain expertise crucial for meaningful interpretation of unsupervised learning outcomes

Scalability issues

  • Many unsupervised learning algorithms face challenges when applied to large-scale image datasets
  • Scalability problems include:
    • Increased computational complexity with growing data size
    • Memory limitations for storing large feature matrices
    • Difficulty in processing high-resolution images or video streams
  • Approaches to address scalability:
    • Distributed computing and parallel processing
    • Online or incremental learning algorithms
    • Approximate methods for nearest neighbor search and clustering
  • Trade-offs between accuracy and computational efficiency often necessary

Recent advancements

  • Recent advancements in unsupervised learning have significantly impacted computer vision
  • These techniques push the boundaries of what can be achieved without labeled data
  • Enable more efficient and effective visual representation learning

Self-supervised learning

  • Leverages inherent structure in unlabeled data to create supervised learning tasks
  • Pretext tasks in computer vision:
    • Image rotation prediction
    • Jigsaw puzzle solving
    • Colorization of grayscale images
  • Contrastive predictive coding learns representations by predicting future encodings
  • Benefits include:
    • Learning rich visual representations without manual labeling
    • Improved performance on downstream tasks with limited labeled data
    • Applicability to large-scale image and video datasets

Contrastive learning

  • Learns representations by comparing similar and dissimilar samples
  • Key idea: minimize distance between positive pairs and maximize distance between negative pairs
  • Popular methods:
    • SimCLR: uses data augmentation to create positive pairs
    • MoCo: maintains a dynamic dictionary for negative samples
    • BYOL: learns from positive pairs without explicit negative samples
  • Applications in image retrieval, object detection, and semantic segmentation
  • Achieves state-of-the-art results in self-supervised visual representation learning

Few-shot learning approaches

  • Aims to learn from a small number of labeled examples
  • Unsupervised techniques contribute to few-shot learning:
    • Meta-learning algorithms that learn to learn from limited data
    • Prototypical networks that use unsupervised clustering in feature space
    • Data augmentation using generative models to expand limited datasets
  • Applications in object recognition with rare classes or limited training data
  • Combines benefits of supervised and unsupervised learning paradigms