Real World Applications
[!NOTE] Graphs are not just for interviews. They run the world.
1. The Internet: PageRank (Google)
How does Google know which website is most important? The web is essentially a massive directed graph.
- Nodes: Web pages.
- Edges: Hyperlinks from one page to another.
- The Core Idea: A page is considered important if other important pages link to it. It’s a recursive popularity contest where votes are weighted by the importance of the voter.
Mathematical Intuition
Imagine a random surfer clicking links indefinitely. PageRank represents the probability that the surfer lands on a particular page. If page A has a PageRank of 0.1 and links to 5 pages, it passes 0.1 / 5 = 0.02 PageRank to each outgoing link. To prevent getting stuck in loops, a Damping Factor (typically 0.85) is introduced, representing a 15% chance the surfer gets bored and randomly jumps to any other page in the network.
2. Social Networks (Facebook & LinkedIn)
Social networks are the quintessential graph structures, operating at a scale of billions of nodes.
- The Graph: A massive undirected graph for bidirectional relationships (Facebook Friends) or a directed graph for unidirectional relationships (Twitter/LinkedIn Followers).
- “People You May Know” (Triangle Closing): This feature relies on calculating the intersection of neighbor sets. If Node A and Node B are connected, and Node B is connected to Node C, the system calculates the probability that A should connect to C. This is often solved using Graph Neural Networks or simpler randomized walk algorithms like Personalized PageRank.
- Degrees of Separation: To find how closely related two people are, systems use a Bidirectional BFS. Running a standard BFS from a source node to a target node in a massive graph would explode in complexity (
O(B<sup>D</sup>)where B is the branching factor and D is the distance). By running BFS simultaneously from both the source and target, the search space is drastically reduced toO(B<sup>D/2</sup>).
3. Uber & Lyft: Real-Time Routing
Routing millions of cars is not a simple shortest-path problem. Let’s analyze this using the PEDALS Framework.
PEDALS: Uber Routing Case Study
- Process Requirements: Find the fastest route between Point A and Point B for millions of concurrent requests. The graph weights (travel times) change constantly due to traffic and accidents.
- Estimate: With a road network of 50 million intersections (nodes) and 100 million road segments (edges), calculating standard Dijkstra’s algorithm for every single request would require significant latency (seconds per request), which is unacceptable.
- Data Model: A Directed Weighted Graph. Nodes are intersections, edges are road segments, and weights are dynamic travel times.
- Architecture: The routing engine cannot run standard Dijkstra. Instead, it relies on precomputation algorithms like Contraction Hierarchies.
- Localized Details:
- Contraction Hierarchies: The graph is pre-processed by “contracting” less important nodes (like residential dead-ends) and adding “shortcut” edges between more important nodes (like highway ramps). This transforms the graph into a hierarchy.
- During a live request, a Bidirectional Dijkstra is run on this contracted graph. Because the algorithm only explores nodes of equal or higher importance, the search space is minuscule. A search that would normally explore 2,000,000 nodes might only explore 500 nodes.
- Scale: To handle real-time traffic updates, the system uses “Customizable Route Planning” (CRP) or dynamic edge weight updates, selectively recalculating only the affected shortcuts without rebuilding the entire hierarchy.
4. Interactive: PageRank Simulator
Simulate the flow of “Importance” (Probability).
- Start with equal importance.
- In each step, a node passes its importance evenly to its neighbors.
- Watch how nodes with many incoming links (or links from important nodes) grow larger.
5. Deep Dive Strategy Lab: Use Cases
Intuition Through Analogy
Think of this chapter like city route planning. The goal is not to memorize a fixed trick, but to repeatedly answer:
- What is the bottleneck?
- Which constraint dominates (time, memory, latency, correctness)?
- Which representation makes the bottleneck easier to eliminate?
Practice
Ready to apply these concepts? Head over to the problem vault to test your graph skills.