👁️Computer Vision and Image Processing Unit 6 Review

6.2 Unsupervised learning

👁️Computer Vision and Image Processing
Unit 6 Review

6.2 Unsupervised learning

Written by the Fiveable Content Team • Last updated September 2025

👁️Computer Vision and Image Processing

Unit & Topic Study Guides

6.1 Supervised learning

6.2 Unsupervised learning

6.3 Semi-supervised learning

6.4 Transfer learning

6.5 Reinforcement learning

6.6 Evaluation metrics for machine learning models

Unsupervised learning in computer vision analyzes unlabeled data to find patterns and structures without human guidance. It's crucial for tasks like image segmentation, object detection, and visual representation learning, enabling automatic feature extraction from large image datasets.

Key techniques include clustering algorithms, dimensionality reduction, and generative models. These methods group similar data points, reduce data complexity, and create new, realistic images. Unsupervised learning serves as a foundation for more advanced computer vision tasks and continues to evolve with recent advancements.

Fundamentals of unsupervised learning

Unsupervised learning algorithms analyze and cluster unlabeled data without human intervention
In computer vision, unsupervised learning enables automatic feature extraction and pattern recognition from large image datasets
Plays a crucial role in image segmentation, object detection, and visual representation learning

Definition and key concepts

Learning patterns and structures in data without labeled examples or explicit supervision
Focuses on finding hidden patterns, grouping similar data points, and reducing data dimensionality
Key concepts include clustering, dimensionality reduction, and density estimation
Relies on statistical properties and intrinsic relationships within the data

Comparison to supervised learning

Unsupervised learning works with unlabeled data, while supervised learning requires labeled training examples
Supervised learning aims to predict specific outputs, unsupervised learning discovers inherent structures
Unsupervised methods often serve as preprocessing steps for supervised tasks in computer vision
Evaluation of unsupervised models more challenging due to lack of ground truth labels

Applications in computer vision

Image segmentation groups pixels into meaningful regions or objects
Feature learning extracts relevant visual representations from raw image data
Anomaly detection identifies unusual patterns or objects in images
Generative modeling creates new, realistic images based on learned distributions

Clustering algorithms

Clustering algorithms group similar data points together based on their features or attributes
In computer vision, clustering helps segment images, group similar objects, and organize large image datasets
Enables unsupervised object recognition and scene understanding in visual data

K-means clustering

Partitions data into K predefined clusters based on feature similarity
Iterative algorithm minimizes within-cluster variance
Steps include:
1. Initialize K cluster centroids randomly
2. Assign each data point to the nearest centroid
3. Recalculate centroids based on assigned points
4. Repeat steps 2-3 until convergence
Widely used for image segmentation and color quantization
Sensitive to initial centroid placement and outliers

Hierarchical clustering

Builds a tree-like structure of nested clusters (dendrogram)
Two main approaches: agglomerative (bottom-up) and divisive (top-down)
Agglomerative clustering steps:
1. Start with each data point as a separate cluster
2. Merge closest clusters based on a distance metric
3. Repeat until all points belong to a single cluster
Provides multi-scale representation of data structure
Used for hierarchical image segmentation and object part decomposition

DBSCAN algorithm

Density-Based Spatial Clustering of Applications with Noise
Groups together points with many nearby neighbors, marking points in low-density regions as outliers
Key parameters: epsilon (neighborhood radius) and minPoints (minimum points to form a cluster)
Advantages include:
- Discovers clusters of arbitrary shape
- Robust to noise and outliers
- Automatically determines the number of clusters
Applied in image segmentation and object detection in complex scenes

Dimensionality reduction techniques

Dimensionality reduction transforms high-dimensional data into lower-dimensional representations
In computer vision, reduces the complexity of image data while preserving important visual information
Enables efficient processing, visualization, and analysis of large-scale image datasets

Principal Component Analysis (PCA)

Linear dimensionality reduction technique that identifies principal components of data
Steps to perform PCA:
1. Standardize the data
2. Compute the covariance matrix
3. Calculate eigenvectors and eigenvalues
4. Sort eigenvectors by decreasing eigenvalues
5. Project data onto selected principal components
Captures maximum variance in the data with orthogonal components
Used for feature extraction, image compression, and face recognition

t-SNE

t-Distributed Stochastic Neighbor Embedding
Non-linear dimensionality reduction technique for visualizing high-dimensional data
Preserves local structure and reveals clusters in the data
Algorithm steps:
1. Compute pairwise similarities in high-dimensional space
2. Initialize low-dimensional embedding
3. Minimize Kullback-Leibler divergence between high and low-dimensional distributions
Effective for visualizing image datasets and understanding feature spaces
Computationally intensive for large datasets

Autoencoders for dimensionality reduction

Neural network architecture that learns to compress and reconstruct data
Consists of an encoder that maps input to a lower-dimensional latent space and a decoder that reconstructs the input
Training process minimizes reconstruction error
Variants include denoising autoencoders and variational autoencoders
Applied in image compression, feature learning, and anomaly detection

Feature extraction methods

Feature extraction identifies distinctive characteristics or patterns in image data
Crucial for various computer vision tasks (object recognition, image retrieval, scene understanding)
Enables efficient representation and processing of visual information

SIFT and SURF

Scale-Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF)
Local feature descriptors robust to scale, rotation, and illumination changes
SIFT algorithm steps:
1. Scale-space extrema detection
2. Keypoint localization
3. Orientation assignment
4. Keypoint descriptor generation
SURF uses integral images and box filters for faster computation
Applied in image matching, object recognition, and 3D reconstruction

HOG descriptors

Histogram of Oriented Gradients
Captures local shape information by computing gradient orientations in image patches
HOG computation process:
1. Divide image into cells
2. Compute gradient magnitudes and orientations for each pixel
3. Create histograms of gradient orientations for each cell
4. Normalize histograms across blocks of cells
Widely used in pedestrian detection and object recognition
Robust to illumination changes and small geometric transformations

Convolutional features

Features extracted from convolutional neural networks (CNNs) trained on large image datasets
Hierarchical representation of visual information:
- Lower layers capture low-level features (edges, textures)
- Higher layers represent more abstract concepts (objects, scenes)
Transfer learning uses pre-trained CNN features for various vision tasks
Feature visualization techniques reveal patterns learned by CNN layers
Enables state-of-the-art performance in many computer vision applications

Generative models

Generative models learn to generate new data samples that resemble the training data distribution
In computer vision, they create realistic images, perform image-to-image translation, and learn visual representations
Enable various applications (image synthesis, data augmentation, style transfer)

Gaussian Mixture Models (GMMs)

Probabilistic model representing data as a mixture of Gaussian distributions
Parameters include means, covariances, and mixing coefficients for each Gaussian component
Training uses Expectation-Maximization (EM) algorithm:
1. Initialize parameters
2. E-step: Compute responsibilities (posterior probabilities)
3. M-step: Update parameters based on responsibilities
4. Repeat until convergence
Applied in image segmentation, background modeling, and color clustering
Flexible model for representing complex data distributions

Variational Autoencoders (VAEs)

Probabilistic generative model combining autoencoders with variational inference
Architecture consists of an encoder (inference network) and a decoder (generative network)
Training objectives:
1. Reconstruction loss: measures how well the model reconstructs input data
2. KL divergence: encourages the latent space to follow a prior distribution (usually Gaussian)
Enables generation of new samples by sampling from the latent space
Used for image generation, interpolation, and representation learning

Generative Adversarial Networks (GANs)

Framework for training generative models through adversarial process
Consists of two neural networks:
1. Generator: creates fake samples to fool the discriminator
2. Discriminator: distinguishes between real and fake samples
Training process:
- Generator minimizes the probability of discriminator correctly classifying fake samples
- Discriminator maximizes its ability to distinguish real from fake samples
Variants include DCGANs, StyleGAN, and CycleGAN
Applied in high-resolution image synthesis, image-to-image translation, and data augmentation

Anomaly detection

Anomaly detection identifies unusual patterns or samples that deviate from the norm
In computer vision, detects abnormal objects, events, or behaviors in images and videos
Critical for quality control, security surveillance, and medical image analysis

One-class SVM

Support Vector Machine variant for anomaly detection
Learns a decision boundary that encloses the majority of normal data points
Training process:
1. Map data to high-dimensional feature space
2. Find the smallest hypersphere that contains most of the data
3. Use kernel trick to handle non-linear decision boundaries
Detects anomalies as points falling outside the learned boundary
Effective for detecting novelties in image data and identifying defects in visual inspection

Isolation Forest

Ensemble method based on the principle that anomalies are easier to isolate
Algorithm steps:
1. Randomly select a feature and split value
2. Recursively partition the data until each point is isolated
3. Compute anomaly score based on average path length to isolate a point
Advantages include:
- Handles high-dimensional data efficiently
- Robust to irrelevant features
- Scalable to large datasets
Applied in detecting unusual objects or regions in images and video frames

Autoencoders for anomaly detection

Uses reconstruction error of autoencoders to identify anomalies
Training process:
1. Train autoencoder on normal data samples
2. Compute reconstruction error for new samples
3. Classify samples with high reconstruction error as anomalies
Variants include denoising autoencoders and variational autoencoders
Effective for detecting anomalies in complex image data (medical imaging, industrial inspection)
Can learn hierarchical features relevant for anomaly detection

Self-organizing maps

Self-organizing maps (SOMs) create low-dimensional representations of high-dimensional data
Preserve topological relationships between data points in the input space
In computer vision, SOMs enable visualization and clustering of complex image features

Kohonen networks

Type of artificial neural network that implements self-organizing maps
Architecture consists of a grid of neurons, each associated with a weight vector
Training process:
1. Initialize neuron weights randomly
2. Present input vector to the network
3. Find the best matching unit (BMU) based on similarity
4. Update BMU and its neighbors' weights to move closer to the input
5. Repeat steps 2-4 for all inputs and multiple epochs
Resulting map represents a non-linear projection of the input space
Neurons close in the grid respond to similar input patterns

Applications in image processing

Color quantization reduces the number of colors in an image while preserving visual quality
Texture analysis groups similar textures and identifies dominant patterns in images
Image compression represents images using a reduced set of SOM neurons
Feature extraction creates low-dimensional representations of image patches or regions
Image segmentation clusters pixels or regions based on their similarity in the SOM space

Evaluation metrics

Evaluation metrics assess the quality and performance of unsupervised learning algorithms
In computer vision, these metrics help compare different clustering or dimensionality reduction techniques
Enable objective assessment of algorithm performance without ground truth labels

Silhouette score

Measures how similar an object is to its own cluster compared to other clusters
Silhouette coefficient for a single sample: $s = \frac{b - a}{\max(a, b)}$ where $a$ is the mean intra-cluster distance and $b$ is the mean nearest-cluster distance
Range: -1 to 1, higher values indicate better-defined clusters
Advantages:
- Applicable to any distance metric
- Provides a visual representation of cluster quality
Used to evaluate image segmentation and object clustering results

Davies-Bouldin index

Ratio of within-cluster distances to between-cluster distances
Lower values indicate better clustering
Formula: $DB = \frac{1}{n} \sum_{i=1}^n \max_{j \neq i} \left( \frac{\sigma_i + \sigma_j}{d(c_i, c_j)} \right)$ where $n$ is the number of clusters, $\sigma_i$ is the average distance of points in cluster $i$ to its centroid, and $d(c_i, c_j)$ is the distance between centroids of clusters $i$ and $j$
Evaluates both compactness within clusters and separation between clusters
Applied in assessing the quality of image segmentation algorithms

Calinski-Harabasz index

Ratio of between-cluster dispersion to within-cluster dispersion
Higher values indicate better-defined clusters
Formula: $CH = \frac{tr(B_k)}{tr(W_k)} \times \frac{n - k}{k - 1}$ where $tr(B_k)$ is the trace of the between-cluster scatter matrix, $tr(W_k)$ is the trace of the within-cluster scatter matrix, $n$ is the number of data points, and $k$ is the number of clusters
Advantages:
- Works well for convex clusters
- Computationally efficient
Used to evaluate clustering results in image analysis and feature space partitioning

Challenges and limitations

Unsupervised learning in computer vision faces various challenges and limitations
Understanding these issues helps in developing more robust and effective algorithms
Addressing these challenges drives research in advanced unsupervised learning techniques

Curse of dimensionality

Refers to various phenomena that arise when analyzing data in high-dimensional spaces
Effects on unsupervised learning:
- Increased sparsity of data points
- Diminished effectiveness of distance metrics
- Increased computational complexity
Manifests in image analysis due to high-dimensional nature of pixel data
Dimensionality reduction techniques (PCA, t-SNE) help mitigate this issue
Feature selection methods identify relevant dimensions for specific tasks

Interpretation of results

Unsupervised learning often produces results that require expert interpretation
Challenges in computer vision applications:
- Assigning semantic meaning to discovered clusters or features
- Validating the relevance of extracted patterns without ground truth
- Explaining the decision-making process of complex models (GANs)
Visualization techniques help in understanding high-dimensional data representations
Domain expertise crucial for meaningful interpretation of unsupervised learning outcomes

Scalability issues

Many unsupervised learning algorithms face challenges when applied to large-scale image datasets
Scalability problems include:
- Increased computational complexity with growing data size
- Memory limitations for storing large feature matrices
- Difficulty in processing high-resolution images or video streams
Approaches to address scalability:
- Distributed computing and parallel processing
- Online or incremental learning algorithms
- Approximate methods for nearest neighbor search and clustering
Trade-offs between accuracy and computational efficiency often necessary

Recent advancements

Recent advancements in unsupervised learning have significantly impacted computer vision
These techniques push the boundaries of what can be achieved without labeled data
Enable more efficient and effective visual representation learning

Self-supervised learning

Leverages inherent structure in unlabeled data to create supervised learning tasks
Pretext tasks in computer vision:
- Image rotation prediction
- Jigsaw puzzle solving
- Colorization of grayscale images
Contrastive predictive coding learns representations by predicting future encodings
Benefits include:
- Learning rich visual representations without manual labeling
- Improved performance on downstream tasks with limited labeled data
- Applicability to large-scale image and video datasets

Contrastive learning

Learns representations by comparing similar and dissimilar samples
Key idea: minimize distance between positive pairs and maximize distance between negative pairs
Popular methods:
- SimCLR: uses data augmentation to create positive pairs
- MoCo: maintains a dynamic dictionary for negative samples
- BYOL: learns from positive pairs without explicit negative samples
Applications in image retrieval, object detection, and semantic segmentation
Achieves state-of-the-art results in self-supervised visual representation learning

Few-shot learning approaches

Aims to learn from a small number of labeled examples
Unsupervised techniques contribute to few-shot learning:
- Meta-learning algorithms that learn to learn from limited data
- Prototypical networks that use unsupervised clustering in feature space
- Data augmentation using generative models to expand limited datasets
Applications in object recognition with rare classes or limited training data
Combines benefits of supervised and unsupervised learning paradigms

👁️Computer Vision and Image Processing Unit 6 Review

6.2 Unsupervised learning

👁️Computer Vision and Image Processing Unit 6 Review

6.2 Unsupervised learning

Unit & Topic Study Guides

Fundamentals of unsupervised learning

Definition and key concepts

Comparison to supervised learning

Applications in computer vision

Clustering algorithms

K-means clustering

Hierarchical clustering

DBSCAN algorithm

Dimensionality reduction techniques

Principal Component Analysis (PCA)

t-SNE

Autoencoders for dimensionality reduction

Feature extraction methods

SIFT and SURF

HOG descriptors

Convolutional features

Generative models

Gaussian Mixture Models (GMMs)

Variational Autoencoders (VAEs)

Generative Adversarial Networks (GANs)

Anomaly detection

One-class SVM

Isolation Forest

Autoencoders for anomaly detection

Self-organizing maps

Kohonen networks

Applications in image processing

Evaluation metrics

Silhouette score

Davies-Bouldin index

Calinski-Harabasz index

Challenges and limitations

Curse of dimensionality

Interpretation of results

Scalability issues

Recent advancements

Self-supervised learning

Contrastive learning

Few-shot learning approaches

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

👁️Computer Vision and Image Processing
Unit 6 Review