Learn about the most versatile range data structure
What you will learn
What is segment tree and where is it used?
How does a segment tree work?
Calculating sum of ranges efficiently
Dynamically updating values over a range efficiently
Every data structure we have today was created to solve a particular set of problems efficiently. The segment tree is no exception. We will start this blog on segment trees by first understanding one of the problems from a set of core problems that a segment tree tries to solve.
The simplest way to answer this query is to loop in [L .. R] and add all the elements in a variable to get your answer.
Now, this method may be simple but it is not very efficient.
The problem gets difficult when we have to answer a lot of queries like these. Now suppose we have M such queries which we have to answer, the worst-case time complexity of our program is O(M*N).
This may seem good but if N and M are large then the program becomes quite slow. Let's say the array has 1 million entries (N = 10^6) and we have to make 1 million queries (M = 10^6), so overall in the worst case we will have to do 10^12 iterations. These many iterations take a long time to complete for our computers.
A Faster Solution
Well, there is a faster way to solve this problem. If you think a little, all we want is the sum of a range. With this in mind, we can just create another array S such that S[i] is the sum of all elements in the range [0 .. i] of the array A. Such an array is called a prefix sum array. Now whenever someone asks you the sum of [L .. R] of array A, you can return the result using the prefix sum array S as S[R] - S[L-1].
The worst-case time complexity of this solution is O(1) for every query as we just read values from a pre-calculated array of sums.
What if there is also an update query that asks you to update the value at any index in the array A. To do this you will have to recalculate the entire prefix sum array S all over again which in the worst case will have a time complexity of O(N). This will be very slow if we have a lot of data and a lot of queries that also update the values in the array.
A segment tree is designed to solve exactly this kind of problem. Let us see what a segment tree is and how it can solve this problem so efficiently.
A Segment tree is a binary tree built on an array where each node stores the information of a segment or range of that array. A segment tree breaks down a range into two parts at every step. The left child is responsible for the left half of the parent's range and the right child is responsible for the right half of the parent's range.
A simple segment tree
Let us now understand how we will find sum of ranges using this magical data structure.
- sum(0-3) = value at root node 1
- sum(1-3) = value at node 5 + value at node 3
- sum(0-2) = value at node 2 + value at node 6
How does a segment tree work?
Let us try to understand in more detail how this magic works. We will talk in generic terms to give you a generalised idea of the concept in action.
Suppose you have a segment tree build on the range [A .. B] of an array. You get a query to find the sum [L .. R] of in this array where [L .. R] lies within [A .. B]. The result can be calculated using a very simple idea.
So now we know that there can be multiple ways to make up the query range [L .. R] by adding up smaller ranges where each range is completely represented in the segment tree. However not all the solutions are equally efficient. If we use only the leaf node ranges [L..L], [L+1 .. L+1] .....[R .. R], to make up the query range [L .. R] there is no point in using the segment tree at all as it is equivalent to summing up individual elements in an array using a loop. The non-efficient solution we saw earlier.
How do we break down the query range minimally ?
This is where things get interesting. In order to get the minimum number of ranges, you always start with the largest available range in the segment tree and see if it overlaps completely with your query range or not. If it does, great you have got your answer. If it doesn't, you fall back to the next available largest ranges represented by the segment tree and repeat the process. This way you are checking all the ranges that can have an overlap with your query range in decreasing order of their sizes and so you are guaranteed to always have the minimum number of smaller ranges.
This can be seen as a recursive top to bottom traversal of the segment tree. Imagine you have a segment tree build on the range [A .. B] of an array. You get a query to find the sum [L .. R] of in this array where [L .. R] lies within [A .. B]. This is how the algorithm will look like -
- Check if the range represented by the current node which is [A .. B] lies completely within the given query range [L .. R].
- If it does, the range represented by the current node is one of the smaller ranges in the final solution and we use the stored sum in it as the result and get back the result to whoever queried us.
- If it doesn't, got to step 2.
- Check if the range represented by the current node [A .. B] lies completely outside the query range [L .. R].
- If it does, go back to the caller with a 0 as there is no point in looking ahead as there will never be overlap
- If it doesn't, got to step 3.
- If we are at this point, it means that only a part of the range represented by the current node [A .. B] has an overlap with the query range [L .. R].
- We break down the range represented by this node which is [A .. B] into the next available largest ranges in the segment tree which are [A .. (A+B)/2](left) and [(A+B+1)/2 .. B](right).
- We go back to step 1 two times now with the values of A and B replaced by these new boundaries [A .. (A+B)/2] and [(A+B+1)/2 .. B].
- We add up the results returned from these two smaller subproblems and return the result back to our caller.
Finding overlaps in the segment tree starting from the top
What about update?
If you look closely, the leaf nodes in the segment tree represent the array A. To update a value at any index in the array, we need to update the corresponding leaf node. Once the leaf node is updated, we would then also need to recursively update its parents with the new sum all the way up to the root node of the segment tree.
This can be done very easily by a recursive top to bottom traversal of the tree from the root node to the respective leaf node. When we reach the leaf node which should be the stopping condition of our recursion, we update its value. Now during the unwinding of recursion, we will travel back from the leaf node all the way to the root node. This is where we update the values in these individual nodes.
1 of 8
2 of 8
3 of 8
4 of 8
5 of 8
6 of 8
7 of 8
8 of 8
Updating a value at index  to 5
- Check if the current node is a leaf node. If it is a leaf node, update its value and return to the caller
- If it is not:
- decide if the index to update lies in the left or right half of the current range [A .. B]. If it lies to the left half goto step 1 with the left child as the current node and [A .. mid] as the current range otherwise go to step 1 with the right child as the current node and [mid+1 .. B] as the current range.
- After the call to the respective child (Step 2) has been completed, all the nodes below the current node have been updated with the new value. Update the current node with the new value by updating its value to the sum of its left and right child and then return to the caller.
Now that we are clear with the basic working of a segment tree and how it can solve our problem in a radically different way using ranges, let us quickly look at why is it such an efficient method. We will not go into a full-proof mathematical analysis to come up with the runtime complexity but instead, use an intuitive and approach to understanding.
As we can see from the algorithm for query, to query a range from [L .. R] we only need to travel top to bottom in the segment tree once. In most cases, we can get our result even before reaching the leaf node. The best case is when we query the entire range [0 .. N] since we get the result just at the root node and don't have to travel any further. We hit the worst case when the range has a length one, like [L .. L]. In this case, we have to travel all the way from the top to the bottom of the tree and so we do iterations of the order of the height of the segment tree. Given segment tree is a binary tree with N leaf nodes, its height is of the order logN and so the worst-case time complexity for query is O(log(N))
Just like querying, to update a single value in the segment tree we need to do a top to bottom traversal of the segment tree all the way from the root node to the corresponding leaf node. The number of iterations, in this case, is also of the order of the height of the segment tree which is logN and so the worst-case time complexity for an update is also O(log(N))
We just looked at one of many problems that a segment tree can solve. In our example above we used sum as our query. We can extend the segment tree to solve similar problems.
A few examples of such functions are -
- ... and the list goes on
Do you want to master data structures?
Try our data structures learning path made of highly visual and interactive courses. Get hands on experience by solving real problems in a structured manner. All resources you would ever need in one place for FREE