Big O Notation Explained

Big O notation explained through derivation, not memorization. Learn to calculate time and space complexity from unfamiliar code.

10 minutes
Easy
Beginner

What you will learn

What Big O notation measures and why growth rate matters

The five complexity classes every engineer encounters in interviews

How to derive Big O from unfamiliar code by counting operations

Why amortized analysis explains hash map O(1) lookups

How space complexity works including hidden call stack costs

Common Big O mistakes and how to avoid them in interviews

What does O(n log n) actually mean for your code? If you've searched "big o notation explained," you've probably seen the same list of definitions repeated across a dozen articles. You can recite that binary search is O(log n) and nested loops are O(n²). But when someone hands you a function you've never seen and asks "what's the time complexity?", the recitation stops helping.

TL;DR
Big O notation describes how an algorithm's performance scales with input size. The useful skill is deriving complexity from unfamiliar code by counting how operations grow as input grows, not memorising complexities for known algorithms.

What It Actually Measures

Big O notation describes how an algorithm's running time or space usage grows as input size increases. It measures the upper bound on growth rate, not the exact operation count. That's what makes it the standard language for comparing algorithm efficiency across hardware and implementations. The word that matters is growth.

Big O doesn't tell you that binary search takes exactly 20 comparisons on an array of 1,000,000 elements. It tells you that binary search's work grows logarithmically with input size, so doubling the array adds roughly one more comparison. That growth relationship is what makes the notation useful for predicting how your algorithm will behave when the input scales from a test case to production data.

The "upper bound" part means Big O captures the worst case. An algorithm that's O(n²) might occasionally finish faster on certain inputs, but it won't do worse than quadratic growth. Worst-case behaviour is what interviewers care about, because best-case performance doesn't protect you from the input that breaks your solution.

ℹ️ Info
Big O drops constants and lower-order terms. O(3n + 5) simplifies to O(n) because as n grows large, the constant 3 and the additive 5 become irrelevant compared to n itself. This is why Big O is about growth rate, not exact speed.

Big O Notation Explained Through Five Growth Classes

Every algorithm you'll encounter in interviews falls into one of these growth classes. The table below shows what each complexity feels like at scale.

Name
  • O(1)
    Constant
  • O(log n)
    Logarithmic
  • O(n)
    Linear
  • O(n log n)
    Linearithmic
  • O(n²)
    Quadratic
10 inputs
  • O(1)
    1 op
  • O(log n)
    3 ops
  • O(n)
    10 ops
  • O(n log n)
    33 ops
  • O(n²)
    100 ops
1,000 inputs
  • O(1)
    1 op
  • O(log n)
    10 ops
  • O(n)
    1,000 ops
  • O(n log n)
    10,000 ops
  • O(n²)
    1,000,000 ops
1,000,000 inputs
  • O(1)
    1 op
  • O(log n)
    20 ops
  • O(n)
    1,000,000 ops
  • O(n log n)
    20,000,000 ops
  • O(n²)
    1,000,000,000,000 ops
Example
  • O(1)
    Hash table lookup
  • O(log n)
    Binary search
  • O(n)
    Single pass through array
  • O(n log n)
    Merge sort
  • O(n²)
    Nested loops (brute force)

The jump from O(n) to O(n²) is where most interview problems live. A brute-force solution that checks every pair in an array is O(n²). The optimised version using a hash map or two pointers drops it to O(n). At one million elements, that's the difference between one million operations and one trillion.

O(n log n) deserves a side note here. It's the natural speed limit for comparison-based sorting, meaning any algorithm that sorts by comparing elements can't beat it. Merge sort, heapsort, and well-implemented quicksort all land here. Worth knowing: O(n log n) isn't an arbitrary number. It's a mathematical boundary from information theory.

“Big O tells you what happens when the input doubles, then doubles again. That's the question worth asking.”
Growth rate thinking

How to Derive Big O Notation from Code You've Never Seen

Most Big O guides hand you a table of algorithms and their complexities. Useful for reference, but it doesn't help when your interviewer writes a function on the whiteboard and asks you to analyse it. The actual skill is derivation: how many times does each operation execute as a function of n?

Single loops

A loop that iterates through every element once is O(n), because the work grows linearly with input size.

  1. Python

Nested loops

When one loop sits inside another, multiply their iteration counts. Two nested loops over the same array give O(n × n) = O(n²).

  1. Python

The inner loop doesn't always run the full n times. On the first outer iteration, it runs n-1 times. On the second, n-2. The sum is n(n-1)/2, which simplifies to O(n²) because Big O drops the constant 1/2 and the lower-order term.

Halving patterns

Binary search is where derivation gets more interesting:

  1. Python

Each iteration cuts the search space in half. You start with n elements, and one comparison leaves n/2, two comparisons leave n/4, and three leave n/8. You stop when the search space hits 1.

So: how many times can you halve n before you reach 1? The answer is log₂(n). An array of 1,000,000 elements needs at most 20 comparisons, and an array of 1,000,000,000 needs at most 30. Binary search feels almost instant on large inputs, and that's exactly why O(log n) matters so much in practice.

💡 Tip
The derivation pattern for O(log n) always involves the search space shrinking by a constant fraction each step. When you see a loop where the index doubles (i *= 2) or the range halves, you're looking at logarithmic complexity.

Recursive complexity

Recursion introduces a subtlety: the total work is the number of recursive calls multiplied by the work per call. A recursive function that makes two calls at each level and processes n/2 elements per call creates a tree of calls.

Consider the standard fibonacci implementation. It makes two recursive calls per invocation, each reducing the problem by 1. The call tree branches at every level, producing O(2ⁿ) calls. That's exponential, and it's why naive Fibonacci is unusably slow for large inputs. Adding memoisation collapses the tree to O(n) by ensuring each subproblem is computed once. That's the core mechanism behind dynamic programming.

The Derivation Skill Nobody Teaches

Most Big O guides give you a lookup table: binary search is O(log n), merge sort is O(n log n), hash map insert is O(1) amortized. That's useful for reference. It's useless when your interviewer writes an unfamiliar function on the whiteboard and asks "what's the complexity of this?"

The actual skill is derivation: counting the dominant operation as a function of input size. And the best way to build it is on code you haven't memorized the answer for. Take a hash map resize operation. Most engineers know that hash map lookups are O(1) amortized. Fewer can explain why the amortization works, or what happens during the resize itself.

When a hash map's load factor exceeds a threshold (typically 0.75), it allocates a new array of double the size and rehashes every existing key into the new array.

  1. Python

Derivation: the outer loop runs once per old bucket. The inner loop runs once per key-value pair across all buckets. The total inner iterations equal n (the number of stored entries), regardless of how they're distributed across buckets. Each rehash is O(1). So a single resize is O(n).

But a resize only triggers after n insertions (roughly). So the O(n) resize cost is spread across n insertions, giving each insertion an amortized cost of O(n)/n = O(1). That's where "O(1) amortized" comes from. The amortization works because you account for the resize properly, not by ignoring it.

Now try deriving the complexity of BFS on a graph with V vertices and E edges. The queue processes each vertex at most once (O(V) dequeues). For each vertex, you examine its adjacency list. Across all vertices, the total adjacency list entries examined equals the total number of edges (O(E)). Total work: O(V + E). The derivation follows directly from counting what happens per vertex and summing across all vertices.

Same process every time. Identify the dominant operation, count how many times it executes as a function of input parameters, and sum across all levels of the algorithm. It works on any code, whether you've seen it before or not.

Space Complexity: The Dimension Most Engineers Forget

Time complexity gets most of the attention, but interviewers ask about space too. Space complexity measures how much additional memory your algorithm needs beyond the input itself.

The common mistake is confusing auxiliary space with input space. When someone asks "what's the space complexity?", they're almost always asking about auxiliary space: the extra memory your algorithm allocates. The input array doesn't count.

A function that sorts an array in place uses O(1) auxiliary space, while a function that creates a copy of the array uses O(n) auxiliary space for the same sorting result.

Recursive call stack space catches even experienced engineers off guard. Every recursive call adds a frame to the call stack. A function that recurses n levels deep uses O(n) stack space, even if it doesn't explicitly allocate any data structures.

  1. Python

The iterative version of the same function uses O(1) space. That's a concrete reason to understand when recursion uses extra memory and when converting to iteration is worth it. Codeintuition's Recursion course starts with program memory and stack frames before introducing recursive patterns, specifically because most explanations skip straight to the code and leave the space dimension unexplained.

Common space complexity mistakes
Correct space analysis
Forgetting to count the call stack in recursive solutions
Counting every recursive level as one stack frame
Counting input array size as auxiliary space
Measuring only the extra memory your algorithm allocates
Ignoring that hash maps and sets grow with input
Tracking dynamic data structures that grow with n
Missing that string concatenation in loops creates O(n²) space
Recognising when immutable operations create hidden copies

Where Big O Shows Up in Real Interviews

Interviewers don't ask you to recite that merge sort is O(n log n). They hand you a solution and ask you to analyse it, or they ask you to improve a brute-force O(n²) approach to O(n) or O(n log n). Derivation, not recall.

And that's where most self-taught preparation falls apart. You can grind through hundreds of problems and memorise the complexity of each one, but if you haven't practised the reasoning process of counting operations as a function of input size, novel code still feels opaque.

In learning science, this is called elaborative interrogation: asking yourself why each operation contributes the work it does, rather than accepting the final answer.

Codeintuition's Searching course, for instance, doesn't just state that binary search is O(log n). It walks through the halving invariant frame by frame, then extends the same reasoning to predicate search patterns where you're binary searching on an answer space, not an array. The derivation transfers because the reasoning transfers.

Codeintuition's learning path embeds complexity derivation into every algorithm explanation across 16 courses, not as a separate topic but as part of understanding why each approach works. The free Arrays and Singly Linked List courses walk through the complexity of two pointers (O(n) from single-pass convergence), sliding window (O(n) from amortized pointer movement), and binary search (O(log n) from halving), each derived from the mechanism rather than stated as a label. Start there and practice deriving complexity before memorizing it.

Before Big O, you read a solution and think "that looks fast." After, you look at the same solution and know it's O(n log n) because you can trace the divide step and the merge step independently. You can explain why adding a hash map drops a nested loop from quadratic to linear. And you can evaluate a new algorithm you've never seen and tell the interviewer exactly how it scales, because the derivation process works on any code.

At that point, Big O stops being a label you apply after the fact and becomes a tool you think with.

Do you want to master data structures?

Try our data structures learning path made of highly visual and interactive courses. Get hands on experience by solving real problems in a structured manner. All resources you would ever need in one place for FREE

Big O describes the upper bound on growth rate (worst case). Big Omega describes the lower bound (best case). Big Theta describes a tight bound, meaning the algorithm always grows at exactly that rate. In interviews, Big O is the standard because worst-case analysis is what matters for evaluating algorithm reliability. You'll rarely be asked about Theta or Omega directly, but understanding the distinction shows depth if an interviewer probes further.
Not always. Big O ignores constants, so an O(n) algorithm with a large constant factor can be slower than an O(n log n) algorithm for small inputs. At large inputs, the growth rate dominates and Big O becomes a reliable predictor.
Count the number of recursive calls and the work done per call. Draw the recursion tree if it helps, because each level represents one recursive step and the total nodes represent total function calls. For divide-and-conquer patterns, the Master Theorem gives a shortcut, but counting the tree manually is more reliable for interview settings. The key insight is that the depth of the tree determines the number of levels, and the branching factor determines how many calls exist at each level, so total work is typically O(branching_factor^depth) before accounting for work-per-call.
For large inputs, yes. At very small input sizes (under roughly 50 elements), the constant factors in O(n²) algorithms like insertion sort can actually make them faster in practice. That's why many standard library sort implementations switch to insertion sort for small subarrays inside a merge sort or quicksort.
Amortised analysis averages the time of an operation over a sequence of operations. A dynamic array's append is O(1) amortised: most appends are constant-time, and the occasional resize takes O(n), but it happens infrequently enough that the average stays O(1). Interviewers ask about amortised complexity for dynamic arrays, hash table insertions, and union-find operations, so it's worth understanding even if it doesn't appear in every interview.
Was this helpful?