Fiveable

๐ŸงฉIntro to Algorithms Unit 6 Review

QR code for Intro to Algorithms practice questions

6.3 Open addressing and chaining

๐ŸงฉIntro to Algorithms
Unit 6 Review

6.3 Open addressing and chaining

Written by the Fiveable Content Team โ€ข Last updated September 2025
Written by the Fiveable Content Team โ€ข Last updated September 2025
๐ŸงฉIntro to Algorithms
Unit & Topic Study Guides

Hash tables are a powerful data structure for efficient key-value storage and retrieval. Open addressing and chaining are two main collision resolution techniques, each with unique advantages. Understanding their implementation and performance characteristics is crucial for optimizing hash table design.

This section explores open addressing techniques like linear probing and double hashing, as well as chaining with linked lists. We'll compare their space and time complexities, discussing factors that influence performance and guide the choice between these approaches in different scenarios.

Open Addressing vs Chaining

Collision Resolution Techniques

  • Open addressing resolves collisions by probing for the next available slot within the hash table array itself
  • Chaining handles collisions by storing multiple key-value pairs that hash to the same index in a separate data structure (linked list or array)
  • Open addressing requires less memory overhead led to more efficient space utilization
  • Chaining allows for a higher load factor before performance degradation occurs enabled better handling of high collision rates
  • Choice between open addressing and chaining depends on factors such as expected load factor, memory constraints, and nature of data being stored

Performance Characteristics

  • Open addressing more susceptible to clustering resulted in decreased performance with increasing load
  • Chaining maintains better performance at higher load factors compared to open addressing
  • Open addressing typically keeps load factor below 0.7 to maintain good performance and avoid excessive probing
  • Chaining allows for load factor greater than 1 enabled storage of multiple key-value pairs at each slot
  • Open addressing performs better with smaller, fixed-size data sets (integers or pointers)
  • Chaining handles variable-sized elements and dynamic data sets more efficiently

Open Addressing Techniques

Linear Probing

  • Resolves collisions by sequentially searching for the next available slot
  • Probe sequence determined by formula h(k,i)=(hโ€ฒ(k)+i)modโ€‰โ€‰mh(k, i) = (h'(k) + i) \mod m
    • hโ€ฒ(k)h'(k) initial hash function
    • ii probe number
    • mm table size
  • Simple to implement and has good cache performance
  • Suffers from primary clustering led to longer probe sequences
  • Example: For key 42 and initial hash hโ€ฒ(42)=7h'(42) = 7, probe sequence: 7, 8, 9, 10, ...

Double Hashing

  • Uses two hash functions to compute the probe sequence reduced clustering compared to linear probing
  • Probe sequence given by h(k,i)=(h1(k)+ih2(k))modโ€‰โ€‰mh(k, i) = (h1(k) + i h2(k)) \mod m
    • h1(k)h1(k) and h2(k)h2(k) two different hash functions
  • Provides more uniform probing distribution
  • Requires careful selection of hash functions to ensure full table coverage
  • Example: For key 42, h1(42)=7h1(42) = 7, h2(42)=3h2(42) = 3, probe sequence: 7, 10, 13, 16, ...

Additional Techniques and Considerations

  • Quadratic probing uses a quadratic function to determine probe sequence offered compromise between linear probing and double hashing
  • Deletion in open addressing requires special handling often implemented using "tombstone" marker
  • Load factor in open addressing should typically be kept below 0.7 maintained good performance
  • Example of quadratic probing sequence: h(k,i)=(hโ€ฒ(k)+c1โˆ—i+c2โˆ—i2)modโ€‰โ€‰mh(k, i) = (h'(k) + c1*i + c2*i^2) \mod m, where c1c1 and c2c2 are constants

Chaining with Linked Lists

Implementation Details

  • Each slot in hash table array points to separate data structure (commonly linked list or dynamic array)
  • Linked list implementation involves nodes containing key-value pair and pointer to next node
  • Array-based chaining uses dynamic array at each slot resized as needed to accommodate more elements
  • Insertion hashes key, accesses corresponding slot, adds new key-value pair to linked list or array
  • Deletion straightforward removes target key-value pair from linked list or array at hashed index
  • Searching hashes key and searches through linked list or array at corresponding slot

Performance Considerations

  • Chaining allows for load factor greater than 1 enabled storage of multiple key-value pairs per slot
  • Performance degrades as number of elements per chain increases led to longer search times
  • Balanced distribution of elements across chains crucial for maintaining good performance
  • Rehashing entire table when load factor exceeds threshold improved overall performance
  • Example: Hash table with 10 slots, 25 elements stored load factor of 2.5

Complexity Analysis of Hash Table Methods

Space Complexity

  • Open addressing space complexity O(n) where n number of slots in hash table array
  • Chaining space complexity O(n + m) where n number of slots and m total number of key-value pairs stored
  • Open addressing more space-efficient for smaller load factors
  • Chaining requires additional memory for linked lists or arrays at each slot
  • Example: Open addressing table with 1000 slots uses exactly 1000 memory units, while chaining might use more depending on number of stored elements

Time Complexity

  • Average-case time complexity for successful searches in open addressing O(1/(1-ฮฑ)) where ฮฑ load factor
  • Chaining average-case time complexity for operations O(1 + ฮฑ) where ฮฑ load factor (average number of elements per slot)
  • Worst-case time complexity for open addressing operations O(n) when table nearly full or poorly distributed
  • Chaining worst-case time complexity O(n) if all elements hash to same slot forming single long chain
  • Performance of both techniques degrades as load factor increases
  • Example: Open addressing with load factor 0.5 expected to perform search in ~2 probes, while chaining with same load factor requires ~1.5 comparisons on average