🥸Advanced Computer Architecture Unit 9 Review

9.3 Directory-based Cache Coherence Protocols

🥸Advanced Computer Architecture
Unit 9 Review

9.3 Directory-based Cache Coherence Protocols

Written by the Fiveable Content Team • Last updated September 2025

🥸Advanced Computer Architecture

Unit & Topic Study Guides

9.1 Shared Memory Multiprocessor Architectures

9.2 Snooping-based Cache Coherence Protocols

9.3 Directory-based Cache Coherence Protocols

9.4 Memory Consistency Models

Directory-based cache coherence protocols are crucial for maintaining data consistency in large-scale multiprocessor systems. They use a centralized directory to track the state and location of cached data across all processors, enabling efficient coordination of coherence actions and state transitions.

These protocols offer better scalability compared to snooping-based approaches, making them ideal for systems with many processors and distributed memory. While they may incur higher latency for cache misses, optimizations like directory caching and silent eviction can improve performance and reduce unnecessary coherence traffic.

Directory-Based Cache Coherence

Principles and Operations

Directory-based cache coherence protocols maintain a centralized directory that tracks the state and location of cached data across all processors in a multiprocessor system
The directory acts as a global point of control, managing coherence by storing information about which caches hold copies of each memory block and their respective states (shared, exclusive, modified)
When a processor requests access to a memory block, it sends a message to the directory, which consults its records to determine the appropriate actions required to maintain coherence
- If the block is not present in any cache, the directory fetches it from main memory and sends it to the requesting processor
- If the block is present in other caches, the directory coordinates the necessary invalidation or update messages to ensure coherence before granting access to the requesting processor
Directory-based protocols typically employ a set of coherence states (MESI, MOESI) to track the status of each cached block and enforce the coherence invariants
The directory maintains a presence vector or a sharing list to keep track of which processors have copies of each memory block, enabling efficient invalidation or update operations

Coherence States and Invariants

Common coherence states include:
- Modified (M): The block is exclusively owned by a single cache and has been modified
- Exclusive (E): The block is exclusively owned by a single cache but has not been modified
- Shared (S): The block is shared among multiple caches and is read-only
- Invalid (I): The block is not present in the cache or is outdated
Coherence invariants ensure that:
- At most one cache can have a block in the Modified state
- If a block is in the Shared state, no cache can have it in the Modified or Exclusive state
- A block in the Exclusive state cannot coexist with copies in other caches
Directory-based protocols enforce these invariants by coordinating coherence actions and state transitions based on the information stored in the directory

Directory vs Snooping Protocols

Scalability

Directory-based protocols offer better scalability compared to snooping-based protocols, making them more suitable for large-scale multiprocessor systems with many processors and distributed memory
- In snooping-based protocols, all processors monitor a shared bus for coherence transactions, which can lead to increased traffic and limited scalability as the number of processors grows
- Directory-based protocols avoid the need for a shared bus and centralized snooping, reducing the communication overhead and enabling more efficient use of interconnect bandwidth
Directory-based protocols provide more flexibility in terms of interconnect topology, allowing for scalable designs such as mesh or torus networks, whereas snooping-based protocols are often limited to bus-based or hierarchical bus-based topologies

Performance Characteristics

Directory-based protocols typically incur higher latency for cache misses compared to snooping-based protocols due to the additional directory lookups and coherence message exchanges
- However, the impact of this latency can be mitigated through optimizations such as directory caching, hierarchical directories, and coherence message aggregation
The storage overhead of the directory itself is a consideration in directory-based protocols, as it grows with the number of memory blocks and processors in the system
- Efficient directory storage techniques and compression mechanisms can help reduce this overhead
Directory-based protocols can leverage techniques such as silent eviction and write-back caching to reduce unnecessary coherence traffic and improve performance
The performance of directory-based protocols depends on factors such as cache miss rates, sharing patterns, and communication latencies, which can vary based on the specific system architecture and workload characteristics

Efficient Directory Protocols for Multiprocessors

Design Considerations

Designing an efficient directory-based cache coherence protocol involves making trade-offs between performance, scalability, and hardware complexity
The choice of coherence states (MESI, MOESI) and the associated state transitions should be carefully considered to minimize coherence traffic and optimize common access patterns
The directory organization, such as a full-map directory or a sparse directory, should be selected based on the system size, memory block granularity, and storage constraints
- Full-map directories maintain a presence bit for each processor, providing fast lookup but requiring more storage overhead
- Sparse directories use compressed representations, such as coarse vectors or limited pointers, to reduce storage overhead at the cost of potential indirection or imprecise tracking
Efficient directory access mechanisms, such as directory caching or hierarchical directories, can be employed to reduce the latency and bandwidth requirements of directory lookups

Protocol Optimizations

Optimizations such as silent eviction, write-back caching, and coherence message aggregation can be incorporated to reduce unnecessary coherence traffic and improve performance
- Silent eviction allows a cache to silently evict a clean block without notifying the directory, reducing invalidation messages
- Write-back caching defers the propagation of modified data to memory until necessary, reducing write traffic
- Coherence message aggregation combines multiple coherence messages into a single message, reducing communication overhead
Implementing a directory-based cache coherence protocol requires careful consideration of race conditions, deadlock avoidance, and protocol correctness to ensure the coherence invariants are maintained under all scenarios
Techniques such as directory entry prefetching, speculative coherence actions, and adaptive coherence policies can be explored to optimize performance based on runtime behavior and access patterns

Directory Organization Trade-offs

Storage Overhead

Directory organizations present trade-offs between storage overhead, lookup latency, and protocol complexity
- Full-map directories provide fast lookup and precise tracking but incur high storage overhead, especially for systems with a large number of processors
- Sparse directories, such as coarse-grained or limited-pointer directories, reduce storage overhead but may introduce indirection or imprecise tracking, potentially leading to increased coherence traffic
Directory entry size and memory block granularity should be carefully chosen to balance storage overhead and false sharing
- Larger memory block sizes reduce the number of directory entries but may increase false sharing, while smaller block sizes provide finer-grained coherence control but require more directory storage
Techniques such as directory entry compression, dynamic directory allocation, and selective directory tracking can be applied to optimize directory storage utilization and reduce the memory footprint

Access Optimizations

Directory caching can be employed to store frequently accessed directory entries in a fast cache, reducing the latency of directory lookups and minimizing accesses to the main directory storage
Hierarchical directory organizations, such as two-level directories or distributed directories, can be used to scale the directory structure for larger systems and reduce the storage and access overhead at each level
Access patterns and workload characteristics should be analyzed to identify opportunities for directory access optimizations
- Directory entry prefetching can be used to speculatively fetch directory entries based on predicted access patterns, hiding directory lookup latency
- Adaptive coherence policies can dynamically adjust the coherence actions based on runtime behavior, such as selectively updating or invalidating shared copies based on the observed sharing patterns

🥸Advanced Computer Architecture Unit 9 Review

9.3 Directory-based Cache Coherence Protocols

🥸Advanced Computer Architecture
Unit 9 Review

9.3 Directory-based Cache Coherence Protocols

Unit & Topic Study Guides

Directory-Based Cache Coherence

Principles and Operations

Coherence States and Invariants

Directory vs Snooping Protocols

Scalability

Performance Characteristics

Efficient Directory Protocols for Multiprocessors

Design Considerations

Protocol Optimizations

Directory Organization Trade-offs

Storage Overhead

Access Optimizations

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes