SC2001 Project 02

Dijkstra's Algorithm

Team 5

Setup

To facilitate subsequent implementation, we initialize the following global variables.

              
                
              
          

To generate a random graph, we present the following lambda function.

              
                
              
          
To debug/print the graph, we use a few other auxiliary lambda functions.

Part (a)

Using an array as a priority queue is rather ambiguous phrasing... For now, we'll interpret it as using the distance array directly as the priority queue, i.e., we'll iterate through the array in $\mathcal{O}(V)$ to determine the next unvisited node with the smallest distance.

Using an array as a priority queue is rather ambiguous phrasing... For now, we'll interpret it as using the distance array directly as the priority queue, i.e., we'll iterate through the array in $\mathcal{O}(V)$ to determine the next unvisited node with the smallest distance.

              
                
              
          

Having established the necessary API for using an array as a priority queue, we present the following implementation.

              
                
              
          
Note that each node is visited just once (outer loop). Retrieval of the next node (get_min), checking if the priority queue is empty (is_empty), and iterating through every node in the inner loop costs $\mathcal{O}(V)$, producing a total time complexity of $\mathcal{O}(V \cdot (3V)) = \mathcal{O}(V^2)$.

Next, we'll count the number of operations taken on complete graphs of varying V.

What if we set E to 500?

When E is small, relative to V, there is a high chance of producing a disconnected graph, i.e., Dijkstra's would terminate before inspecting every node, resulting in much fewer than V² operations.

Now, consider varying E from 0 to 10,000 while keeping V constant at 500.

Observe that when E is sufficiently large to produce a connected graph with high probability, it no longer affects the time complexity.

Part (b)

Observe that the classic Dijkstra described in the lecture cannot be implemented using a minimizing heap since it requires the deletion of non-minimum items within the heap...

Such an operation would require $\mathcal{O}(V)$ time.
Therefore, some kind of self-balancing tree is usually employed instead.
              
                
              
          
Fortunately, Dijkstra's algorithm also has a modified form that leverages the lazy deletion technique. This can be done with a minimizing heap, as required in the problem.

By default, C++'s heap is a maximizing heap; to get a minimizing heap, we can specify the following template parameters.

              
                
              
          
If you happen to forget the syntax, a common trick is to invert all edge weights/distances...

First, let's examine a common mistake in the implementation.

              
                
              
          

First, let's examine a common mistake in the implementation.

              
                
              
          
Unfortunately, it passed only 15/23 test cases on CSES's Shortest Routes I, while receiving the TLE verdict on the remaining 8 test cases.

Notice that in this edge case, the heap can contain up to $\mathcal{O}(V^2)$ elements, each of which take $\mathcal{O}(V)$ operations to process, leading to an awful $\mathcal{O}(V^3)$ time complexity.

              
                
              
          
Observe that we don't need to insert all nodes at the start, and we also need to perform a check to guarantee that the current node is the most recently updated one (i.e., minimum distance).

This implementation receives the AC verdict on CSES's Shortest Routes I.

              
                
              
          

Observe that the initial assignment to the distance array costs $\mathcal{O}(V)$ time, and for each possible "improvement" in distance, our heap size is incremented by one. There can be $\mathcal{O}(E)$ such improvements, and each heap operation can take up to $\mathcal{O}(\log_2 E)$ time. Therefore, the total time complexity is $\mathcal{O}(V + E \cdot \log_2 E)$.

Observe that the initial assignment to the distance array costs $\mathcal{O}(V)$ time, and for each possible "improvement" in distance, our heap size is incremented by one. There can be $\mathcal{O}(E)$ such improvements, and each heap operation can take up to $\mathcal{O}(\log_2 E)$ time. Therefore, the total time complexity is $\mathcal{O}(V + E \cdot \log_2 E)$.

Since $E \leq V^2$, another valid representation is $\mathcal{O}(V + E \cdot \log_2 V^2)$ = $\mathcal{O}(V + E \cdot \log_2 V)$

By varying V and keeping E at 2,500, we obtained the following results.

In practice, it is rarely the case that every edge improves the distance (worst case assumption).

Next, we'll fix V at 1,000 and vary E.

Finally, let's consider the edge case mentioned earlier where every edge leads to an insertion.

Part (c)

In terms of memory, one of the major drawbacks of adjacency matrices is the $\mathcal{O}(V^2)$ memory that it requires compared to to the $\mathcal{O}(V + E)$ memory for adjacency lists.

This is very inefficient for sparse graphs and on many problems where $V \leq 10^5$, you wouldn't even be able to allocate enough memory without getting a segmentation fault.

In terms of time complexity, $\mathcal{O}(V + E \cdot \log_2 E)$ is better than $\mathcal{O}(V^2)$ under most circumstances.

In terms of time complexity, $\mathcal{O}(V + E \cdot \log_2 E)$ is better than $\mathcal{O}(V^2)$ under most circumstances.

However, when $E \approx V^2$ (e.g., complete graph), the former evolves into $\mathcal{O}(V^2 \cdot \log_2 V)$, which is inferior to $\mathcal{O}(V^2)$. Therefore, in such cases, we'd prefer part (a)'s implementation.

First, let's vary V while keeping E at 500. Notice that part (b)'s implementation is generally preferable except for small V where part (a)'s implementation performs slightly better.

On complete graphs, it is evident that the logarithmic factor results in part (b)'s implementation being slightly slower.

Finally, we'd like to offer another interpretation of "using an array as a priority queue". Instead of iterating over the distance array in $\mathcal{O}(V)$, let's construct a segment tree instead.

              
                
              
          
Both point updates and range queries cost $\mathcal{O}(\log_2 V)$, and it could be used as a priority queue in place of a binary heap.
              
                
              
          
Note that part (a)'s algorithm is still $\mathcal{O}(V^2)$ because for every node, we still have to iterate over $\mathcal{O}(V)$ other nodes in the inner loop.

Just as we did earlier, let's vary V while keeping E at 500. Notice that the segment tree offers a tiny speedup as more often than not, $\mathcal{O}(\log_2 (V))$ is better than $\mathcal{O}(\log_2 (E))$.

On complete graphs, the difference in logarithmic factors is even more obvious.

Thank You

https://github.com/anAcc22/SC2001_Group_Project

References

  • Halim, S., Halim, F., & Effendy, S. (2020). Competitive programming 4: The lower bound of programming contests in the 2020s. book 1 Chapter 1-4. Lulu.com.
  • Segment tree. Segment Tree - Algorithms for Competitive Programming. (2023, December 20). https://cp-algorithms.com/data_structures/segment_tree.html