What Is a Hash Table? From Array to O(1) Lookup

What Is a Hash Table? From Array to O(1) Lookup

What is a hash table? An array indexed by a computed value. Hash functions, collisions, load factor, and O(1) lookup explained from first principles.

10 minutes
Beginner
What you will learn

How a hash function converts keys into array indices

Why hash tables achieve O(1) average lookup time

How separate chaining and open addressing resolve collisions

When load factor triggers rehashing and why it matters

You've probably used hash tables in every project you've built. But what is a hash table, really? Not the API, not the syntax, but the actual mechanics underneath. If someone asked you to build one from an empty array, tracing every insertion and collision by hand, could you?

If you can't, it's not because the concept is hard. Every language you've ever used abstracts the internals away. That abstraction helps you ship code, but it won't help you answer interview questions about why hash tables work the way they do.

TL;DR
A hash table is an array where each key is converted to an index by a hash function. That computed index gives you O(1) average lookups. Collisions are the tradeoff, and there are two clean strategies to handle them.

What a hash table actually is

A hash table is an array indexed by a computed value. Instead of scanning elements to find a match, a hash function converts any key into an array position directly. That computed index is what gives hash tables O(1) average lookup.

Everything else follows from one design decision: use a function to convert keys into array indices. The collision handling, the load factor management, the rehashing, all of it traces back to that single idea.

In a regular array, checking whether a value exists means scanning every element, O(n). In a hash table, you compute the index from the key, jump to that position, and check. One calculation, one lookup. That difference between linear search and direct access is exactly what makes hash tables the default choice for frequency counting, duplicate detection, and any problem where lookup speed is the bottleneck.

The gap between "I can use a hash map" and "I can explain how one works" shows up in interviews constantly. Interviewers don't care whether you know the API. They want to see that you understand the mechanics well enough to reason about performance.

ℹ️ Info
Every language names hash tables differently. Python uses dictionaries, Java has HashMap, C++ provides unordered_map. The syntax changes, but the underlying array with computed index design is identical across all of them.

How a hash table converts keys to positions

The hash function is the engine of the entire data structure. It takes any key and produces an integer. That integer, reduced to fit the array size, becomes the index where the value gets stored.

Consider an array of size 7 storing string keys. A basic hash function sums the character codes of each character, then takes the result modulo the array size.

  1. Python

Trace the insertions. For the key "cat", that's 99 + 97 + 116 = 312. Then 312 % 7 = 4, so "cat" goes at index 4. For "dog", it's 100 + 111 + 103 = 314. Then 314 % 7 = 6, so "dog" goes at index 6. Two different keys, two different positions. The hash function deterministically maps each key to a slot, with no ambiguity and no linear search involved.

That's the mechanism behind O(1). You skip the search entirely.

“O(1) lookup isn't magic. It's a function call that tells you exactly where to look.”
Time complexity basics

A good hash function is deterministic: the same key always produces the same index. It's also uniform, spreading keys evenly across the array so they don't cluster at a few positions. And it needs to be fast, because the computation itself shouldn't become the bottleneck.

Production implementations like Java's hashCode() or Python's hash() use bitwise operations, prime number multiplication, and careful distribution logic. They're far more sophisticated than summing character codes, but the principle doesn't change. You put a key in and you get an index out.

What happens when two keys collide in a hash table

If you have thousands of possible keys and an array of size 7, multiple keys will hash to the same index. When two keys produce the same position, that's a collision.

Try it yourself. "cat" hashes to index 4, as you traced above. Now compute "act" with the same function: 97 + 99 + 116 = 312. Same sum, same index, but the keys are completely different. This isn't a contrived edge case either. Any hash function working against a finite array will produce collisions once enough keys are inserted.

There are two main approaches to handling collisions.

Separate chaining

Each slot in the array holds a linked list rather than a single value. When two keys hash to the same index, both entries get added to that slot's list. To find a key later, you hash it, jump to the index, and walk the list until you find the match.

  1. Python

The upside is simplicity. The downside: if many keys cluster at the same index, the chain grows, and your O(1) lookup degrades toward O(n) for that slot.

Open addressing

Instead of chaining, open addressing stores everything directly in the array. When a collision occurs, you probe for the next available slot using a defined sequence.

Linear probing is the simplest variant. If index 4 is taken, try 5, then 6, then 7, moving forward until you find an empty slot. Quadratic probing spaces out the jumps by trying offsets of 1, 4, 9, and so on, which reduces clustering. Double hashing uses a second hash function to determine the step size, so different keys that collide at the same index probe different sequences entirely.

💡 Tip
Understanding both strategies matters for interviews. Codeintuition's Hash Table course builds all four collision resolution methods from scratch, each with a full design problem where you implement the complete hash table and trace the mechanics at every step.

Most articles skip this, but these aren't just textbook categories. Java's HashMap uses separate chaining internally, switching to a red-black tree when a chain exceeds 8 nodes. Python's dict uses open addressing with a custom probing sequence. Knowing which strategy your language uses changes how you reason about performance in edge cases, and interviewers in system design rounds pick up on that kind of depth.

Why O(1) isn't guaranteed

Hash table operations are O(1) on average, not always. The worst case is O(n), and it happens when too many keys cluster at the same index.

The variable that controls this is the load factor: the ratio of stored elements to array size. If you have 7 slots and 6 elements, your load factor is 0.86. At that density, collisions become nearly inevitable. Every lookup involves walking a chain or probing through occupied slots.

When the load factor crosses a threshold (typically 0.7 to 0.75), the hash table creates a new, larger array, usually double the size. Every existing key gets rehashed against the new size and reinserted. This is rehashing.

Rehashing itself is O(n) for a single resize. But it happens rarely enough that the amortized time per operation stays O(1). You accept occasional expensive resizes in exchange for consistently fast average operations. That's the fundamental tradeoff baked into every hash table.

For a deeper look at how to analyze these complexity tradeoffs in interviews, the Big O explanation covers amortized analysis and why average case guarantees matter when reasoning under time pressure.

When to use a hash table (and when not to)

Hash tables solve a specific class of problem: any situation where you need fast lookup, insertion, or deletion by key. If you can express your data as key value pairs and your bottleneck is search time, a hash table is almost always the right call.

Use a hash table when you need to:

  • Count frequencies, such as character counts, word counts, or element occurrences
  • Detect duplicates or check membership in O(1)
  • Build a lookup index where the key is known, like caching, memoization, or symbol tables
  • Group elements by a computed property, such as anagram grouping or pattern matching

Avoid hash tables when:

  • You need elements in sorted order. Hash tables don't maintain ordering. Use a BST or sorted array instead.
  • You need range queries like "find all keys between A and B." Hash tables can't do this efficiently.
  • Memory is tightly constrained. The load factor overhead and collision handling mean hash tables consume more memory than plain arrays for the same number of elements.

Hash tables are the foundation for five distinct patterns on Codeintuition, including counting, pattern generation, fixed sliding window, variable sliding window, and prefix sum. Each pattern builds on the O(1) lookup guarantee you now understand from the inside out. The learning path sequences the Hash Table course after Arrays and Linked Lists, so the array mechanics you need are already internalized before you start. Premium is $79.99/year, though the prerequisite Array and Linked List courses are permanently free.

For the broader picture of how hash tables fit into a complete DSA preparation approach, including which patterns to prioritize and how deep to go, see the guide on how to master DSA from first principles. The understanding lesson for the Hash Table course walks through the motivation behind the data structure before introducing a single line of code.

Six months ago, you'd have called dict[key] and moved on. Now, when an interviewer asks "what happens if two keys collide?", you don't guess. You trace the hash function, identify the resolution strategy, and explain the load factor threshold that triggers a resize. That's the difference between using a data structure and understanding it well enough to reason about its behavior under pressure.

Build hash tables from scratch, then use them to solve real problems

The Hash Table course builds all four collision strategies with full design problems where you implement the complete data structure. Start with the free prerequisite courses. Permanently FREE

In most contexts, they refer to the same underlying mechanism. Java distinguishes between HashTable (synchronized, thread safe, legacy) and HashMap (unsynchronized, faster, preferred for most use cases), but general computer science usage treats the two terms as interchangeable.
Prime sized arrays distribute keys more evenly because prime numbers share fewer common factors with typical hash values. When the array size shares a factor with the hash output, keys cluster at predictable positions. A prime size breaks those clustering patterns, reducing average chain length for separate chaining and probe count for open addressing.
Standard implementations don't allow duplicate keys, and inserting a key that already exists overwrites the previous value. If you need multiple values per key, store a list or set as the value.
Average case is O(1) for insert, lookup, and delete. Worst case is O(n), which occurs when all keys hash to the same index and the lookup degrades to a linked list scan. Maintaining a load factor below 0.75 through rehashing keeps the average case dominant in practice. The rehashing operation itself takes O(n) time, but it happens infrequently enough that the amortized time per operation remains constant.
Use a hash table when you need to look up values by a noninteger key, count frequencies, detect duplicates, or check membership in O(1). Arrays are better when you need ordered access, contiguous memory, or indices that are naturally sequential integers.
Was this helpful?