What Is a Hash Table? From Array to O(1) Lookup

What Is a Hash Table? From Array to O(1) Lookup

What is a hash table? An array indexed by a computed value. Hash functions, collisions, load factor, and O(1) lookup explained from first principles.

10 minutes
Beginner
What you will learn

How a hash function converts keys into array indices

Why hash tables achieve `O(1)` average lookup time

How separate chaining and open addressing resolve collisions

When load factor triggers rehashing and why it matters

You've probably used hash tables in every project you've built. But what is a hash table, really? Not the API, not the syntax, but the actual mechanics underneath. If someone asked you to build one from an empty array, tracing every insertion and collision by hand, could you?

If you can't, it's not because the concept is hard. Every language you've ever used abstracts the internals away. That abstraction helps you ship code, but it won't help you answer interview questions about why hash tables work the way they do.

TL;DR
A hash table is an array where each key is converted to an index by a hash function. That computed index gives you O(1) average lookups. Collisions are the tradeoff, and there are two clean strategies to handle them.

What a hash table actually is

A hash table is an array indexed by a computed value. Instead of scanning elements to find a match, a hash function converts any key into an array position directly. That computed index is what gives hash tables O(1) average lookup.

Everything else follows from one design decision: use a function to convert keys into array indices. The collision handling, the load factor management, the rehashing, all of it traces back to that single idea.

In a regular array, checking whether a value exists means scanning every element, O(n). In a hash table, you compute the index from the key, jump to that position, and check. One calculation, one lookup. That difference between linear search and direct access is exactly what makes hash tables the default choice for frequency counting, duplicate detection, and any problem where lookup speed is the bottleneck.

The gap between "I can use a hash map" and "I can explain how one works" shows up in interviews constantly. Interviewers don't care whether you know the API. They want to see that you understand the mechanics well enough to reason about performance.

ℹ️ Info
Every language names hash tables differently. Python uses dictionaries, Java has HashMap, C++ provides unordered_map. The syntax changes, but the underlying array with computed index design is identical across all of them.

How a hash table converts keys to positions

The hash function is the engine of the entire data structure. It takes any key and produces an integer. That integer, reduced to fit the array size, becomes the index where the value gets stored.

Consider an array of size 7 storing string keys. A basic hash function sums the character codes of each character, then takes the result modulo the array size.

  1. Python

Trace the insertions. For the key "cat", that's 99 + 97 + 116 = 312. Then 312 % 7 = 4, so "cat" goes at index 4. For "dog", it's 100 + 111 + 103 = 314. Then 314 % 7 = 6, so "dog" goes at index 6. Two different keys, two different positions. The hash function deterministically maps each key to a slot, with no ambiguity and no linear search involved.

That's the mechanism behind O(1). You skip the search entirely.

O(1) lookup isn't magic. It's a function call that tells you exactly where to look.”
Time complexity basics

A good hash function is deterministic: the same key always produces the same index. It's also uniform, spreading keys evenly across the array so they don't cluster at a few positions. And it needs to be fast, because the computation itself shouldn't become the bottleneck.

Production implementations like Java's hashCode() or Python's hash() use bitwise operations, prime number multiplication, and careful distribution logic. They're far more sophisticated than summing character codes, but the principle doesn't change. You put a key in and you get an index out.

What happens when two keys collide in a hash table

If you have thousands of possible keys and an array of size 7, multiple keys will hash to the same index. When two keys produce the same position, that's a collision.

Try it yourself. "cat" hashes to index 4, as you traced above. Now compute "act" with the same function: 97 + 99 + 116 = 312. Same sum, same index, but the keys are completely different. This isn't a contrived edge case either. Any hash function working against a finite array will produce collisions once enough keys are inserted.

There are two main approaches to handling collisions.

Separate chaining

Each slot in the array holds a linked list rather than a single value. When two keys hash to the same index, both entries get added to that slot's list. To find a key later, you hash it, jump to the index, and walk the list until you find the match.

  1. Python

The upside is simplicity. The downside: if many keys cluster at the same index, the chain grows, and your O(1) lookup degrades toward O(n) for that slot.

Open addressing

Instead of chaining, open addressing stores everything directly in the array. When a collision occurs, you probe for the next available slot using a defined sequence.

Linear probing is the simplest variant. If index 4 is taken, try 5, then 6, then 7, moving forward until you find an empty slot. Quadratic probing spaces out the jumps by trying offsets of 1, 4, 9, and so on, which reduces clustering. Double hashing uses a second hash function to determine the step size, so different keys that collide at the same index probe different sequences entirely.

💡 Tip
Understanding both strategies matters for interviews. Codeintuition's Hash Table course builds all four collision resolution methods from scratch, each with a full design problem where you implement the complete hash table and trace the mechanics at every step.

Most articles skip this, but these aren't just textbook categories. Java's HashMap uses separate chaining internally, switching to a red-black tree when a chain exceeds 8 nodes. Python's dict uses open addressing with a custom probing sequence. Knowing which strategy your language uses changes how you reason about performance in edge cases, and interviewers in system design rounds pick up on that kind of depth.

How deletion works in a hash table

Insertion and lookup get most of the attention, but deletion has a subtle trap that catches people off guard in interviews.

With separate chaining, deletion is straightforward. You hash the key, find the correct chain, and remove the node from the linked list. The rest of the chain stays intact, and future lookups aren't affected.

Open addressing is where it gets tricky. You can't just empty a slot when you delete a key. Consider what happens if you insert three keys that all hash to index 4. With linear probing, they land at indices 4, 5, and 6. Now delete the key at index 5. If you simply mark that slot as empty, a future lookup for the key at index 6 will stop probing at the empty slot at 5 and incorrectly report that the key doesn't exist.

The fix is a tombstone marker. Instead of clearing the slot entirely, you mark it as "deleted." During lookups, the probe sequence treats tombstones as occupied and keeps going. During insertions, tombstones are treated as available slots that can be overwritten.

  1. Python

Tombstones solve the correctness problem, but they create a performance one. A table full of tombstones still forces long probe sequences even though those slots don't hold real data. That's why heavy deletion workloads sometimes trigger a full rehash to clean out accumulated tombstones and restore probe efficiency.

This distinction between chaining and open addressing deletion is exactly the kind of implementation detail that separates textbook knowledge from real understanding. When an interviewer asks "what's the problem with deleting from an open addressing hash table?", the word they're looking for is tombstone.

Why O(1) isn't guaranteed

Hash table operations are O(1) on average, not always. The worst case is O(n), and it happens when too many keys cluster at the same index.

The variable that controls this is the load factor: the ratio of stored elements to array size. If you have 7 slots and 6 elements, your load factor is 0.86. At that density, collisions become nearly inevitable. Every lookup involves walking a chain or probing through occupied slots.

When the load factor crosses a threshold (typically 0.7 to 0.75), the hash table creates a new, larger array, usually double the size. Every existing key gets rehashed against the new size and reinserted. This is rehashing.

Rehashing itself is O(n) for a single resize. But it happens rarely enough that the amortized time per operation stays O(1). You accept occasional expensive resizes in exchange for consistently fast average operations. That's the fundamental tradeoff baked into every hash table.

For a deeper look at how to analyze these complexity tradeoffs in interviews, the Big O explanation covers amortized analysis and why average case guarantees matter when reasoning under time pressure.

Real problems where hash tables are the key insight

Understanding the mechanics is one thing. Recognizing when a problem is a hash table problem is another skill entirely. Here are four classic patterns where hash tables turn a brute force solution into something efficient.

Two sum and its variants

The classic. Given an array of numbers and a target sum, find two elements that add up to it. The brute force approach checks every pair in O(n^2). With a hash table, you store each number as you iterate. For every new number, you check whether target - current already exists in the table. One pass, O(n) time.

This same pattern extends to three sum, four sum, and subarray sum problems. The hash table turns a "search for a complement" operation from linear into constant time.

Frequency counting

Count the occurrences of each element in an array, then use those counts to answer a question. Problems like "find the first non-repeating character," "top K frequent elements," or "check if two strings are anagrams" all reduce to building a frequency map and querying it. Without a hash table, you'd need to sort first or use nested loops. With one, the counting pass is O(n) and each query is O(1).

Grouping by computed key

Anagram grouping is the classic example. You need to group strings so that all anagrams end up together. The trick is computing a canonical form for each string (sorted characters or a character frequency tuple) and using that as the hash table key. Every string that shares the same canonical form maps to the same bucket. The grouping happens in a single pass through the input.

Caching previous results

Memoization is really just a hash table that stores function results. If you've solved a subproblem before, look up the answer instead of recomputing it. This is the bridge between hash tables and dynamic programming. The dp dictionary in a top down recursive solution is a hash table, and its O(1) lookup is what makes memoization efficient.

In every one of these patterns, the core insight is the same: you're trading memory for speed by storing values in a structure that gives you constant time access. Once you recognize that tradeoff in a problem statement, you know a hash table belongs in your solution.

When to use a hash table (and when not to)

Hash tables solve a specific class of problem: any situation where you need fast lookup, insertion, or deletion by key. If you can express your data as key value pairs and your bottleneck is search time, a hash table is almost always the right call.

Use a hash table when you need to:

  • Count frequencies: Character counts, word counts, or element occurrences
  • Detect duplicates: Check membership in O(1)
  • Build a lookup index: Where the key is known, like caching, memoization, or symbol tables
  • Group elements: By a computed property, such as anagram grouping or pattern matching

Avoid hash tables when:

  • Sorted order needed: Hash tables don't maintain ordering. Use a BST or sorted array instead.
  • Range queries needed: "Find all keys between A and B." Hash tables can't do this efficiently.
  • Memory constrained: The load factor overhead and collision handling mean hash tables consume more memory than plain arrays for the same number of elements.

Hash tables are the foundation for five distinct patterns on Codeintuition, including counting, pattern generation, fixed sliding window, variable sliding window, and prefix sum. Each pattern builds on the O(1) lookup guarantee you now understand from the inside out. The learning path sequences the Hash Table course after Arrays and Linked Lists, so the array mechanics you need are already internalized before you start. Premium is $79.99/year, though the prerequisite Array and Linked List courses are permanently free.

For the broader picture of how hash tables fit into a complete DSA preparation approach, including which patterns to prioritize and how deep to go, see the guide on how to master DSA from first principles. The understanding lesson for the Hash Table course walks through the motivation behind the data structure before introducing a single line of code.

Six months ago, you'd have called dict[key] and moved on. Now, when an interviewer asks "what happens if two keys collide?", you don't guess. You trace the hash function, identify the resolution strategy, and explain the load factor threshold that triggers a resize. That's the difference between using a data structure and understanding it well enough to reason about its behavior under pressure.

Build hash tables from scratch, then use them to solve real problems

The Hash Table course builds all four collision strategies with full design problems where you implement the complete data structure. Start with the free prerequisite courses. Permanently FREE

In most contexts, they refer to the same underlying mechanism. Java distinguishes between HashTable (synchronized, thread safe, legacy) and HashMap (unsynchronized, faster, preferred for most use cases), but general computer science usage treats the two terms as interchangeable.
Prime sized arrays distribute keys more evenly because prime numbers share fewer common factors with typical hash values. When the array size shares a factor with the hash output, keys cluster at predictable positions. A prime size breaks those clustering patterns, reducing average chain length for separate chaining and probe count for open addressing.
Standard implementations don't allow duplicate keys, and inserting a key that already exists overwrites the previous value. If you need multiple values per key, store a list or set as the value.
Average case is O(1) for insert, lookup, and delete. Worst case is O(n), which occurs when all keys hash to the same index and the lookup degrades to a linked list scan. Maintaining a load factor below 0.75 through rehashing keeps the average case dominant in practice. The rehashing operation itself takes O(n) time, but it happens infrequently enough that the amortized time per operation remains constant.
Use a hash table when you need to look up values by a noninteger key, count frequencies, detect duplicates, or check membership in O(1). Arrays are better when you need ordered access, contiguous memory, or indices that are naturally sequential integers.
Was this helpful?