Building a Calculator: Expression Evaluation
Imagine you are tasked with building the core formula engine for a spreadsheet application like Excel, or writing the parser for a new programming language. A user inputs an expression: 3 + 4 * 2 / ( 1 - 5 ) ^ 2.
How do you write code that evaluates this to the correct mathematical result?
Computers do not intuitively understand “Order of Operations” (PEMDAS). They read left-to-right. If a computer naively evaluated 3 + 4 * 2 sequentially, it would get (3 + 4) * 2 = 14, instead of the correct answer, 11. To evaluate complex math correctly, we need a structured way to parse and process expressions, respecting operator precedence and parentheses.
Anatomy of an Expression Evaluator
When a computer processes a mathematical string, it moves through three distinct phases. Think of this as the “assembly line” of calculation:
- Lexical Analysis (Tokenizer): The raw string
"34 + 2"is split into meaningful chunks (tokens):["34", "+", "2"]. It distinguishes multi-digit numbers from operators. - Parsing (Shunting-Yard): Converts the machine-hostile infix tokens into a strict, sequential postfix format:
["34", "2", "+"]. - Evaluation (RPN Engine): A simple stack-based machine processes the postfix tokens to compute the final integer result:
36.
We will focus on the Parsing and Evaluation phases in this chapter.
1. Notation Types: Speaking the Machine’s Language
Before we can calculate, we must translate human-readable math into a format a machine can process linearly.
- Infix Notation:
3 + 4- Human readable: The operator is in between the operands.
- Machine hostile: Requires understanding precedence (multiply before add) and grouping (parentheses), making it difficult to process in a single left-to-right pass.
- Postfix Notation (Reverse Polish Notation or RPN):
3 4 +- Machine friendly: The operator is placed after the operands.
- No parentheses needed: The order of the expression inherently dictates the order of operations.
- Example:
3 + 4 * 2becomes3 4 2 * +in Postfix.
2. The Shunting-Yard Algorithm
Created by computing pioneer Edsger Dijkstra, the Shunting-Yard algorithm converts Infix → Postfix notation.
The Analogy: A Train Switching Yard
Think of the algorithm like a freight train switching yard.
- The Main Track is our Output buffer (the final Postfix string).
- The Side Track is our Operator Stack (a LIFO data structure).
- Passenger Cars (Numbers): They have priority to get to their destination. They bypass the side track entirely and go straight to the Main Track.
- Locomotives (Operators): They must be sequenced based on their horsepower (Precedence). When a locomotive arrives, it is routed to the Side Track. If a heavier, higher-priority locomotive is already on the side track, the weaker one must wait, forcing the higher-priority one out to the Main Track first to keep traffic flowing logically.
The Rules
We read the infix expression token by token from left to right:
- Number: Send it directly to the Output.
- Operator (
+,-,*,/): Push it to the Stack. Crucial Rule: Before pushing, pop any operators from the Stack that have greater than or equal precedence, and append them to the Output. - Left Parenthesis
(: Push it to the Stack. It acts as a strict boundary. - Right Parenthesis
): Pop operators from the Stack and append to Output until a matching Left Parenthesis(is encountered. Pop and discard the(.
Interactive: Shunting-Yard Execution
Let’s convert the expression 3 * ( 4 + 2 ) to Postfix (3 4 2 + *). This example demonstrates the powerful parenthesis boundary rule.
Expression
Operator Stack
Output (Postfix)
3. Evaluating Postfix (RPN)
Once we have successfully converted our expression to Postfix (3 4 2 * +), evaluation becomes trivial using a single Stack.
The Evaluation Rules
We read the postfix expression strictly left-to-right:
- If it’s a Number: Push it onto the stack.
- If it’s an Operator:
- Pop the top two numbers from the stack.
- Important: The first number popped is the right operand, and the second number popped is the left operand. This ordering matters immensely for subtraction and division!
- Apply the operator to these two numbers.
- Push the result back onto the stack.
- Result: When the expression ends, exactly one number will remain on the stack. That is your final answer.
Step-by-Step Example
Let’s evaluate 3 4 2 * +:
| Token Scanned | Action Taken | Stack State (Bottom → Top) |
|---|---|---|
3 |
Push 3 |
[3] |
4 |
Push 4 |
[3, 4] |
2 |
Push 2 |
[3, 4, 2] |
* |
Pop 2 (right), Pop 4 (left). Calculate 4 * 2 = 8. Push 8. |
[3, 8] |
+ |
Pop 8 (right), Pop 3 (left). Calculate 3 + 8 = 11. Push 11. |
[11] |
Final Answer: 11.
⚔️ Industry War Story: The Excel Calculation Engine
When Microsoft was building the core calculation engine for Excel, they faced a massive challenge: parsing and evaluating complex formulas containing hundreds of nested parentheses and operators across thousands of interconnected cells. They couldn’t rely on simple recursive evaluation because deeply nested formulas would cause stack overflow errors and block the main thread.
Instead, they utilized heavily optimized variations of the Shunting-Yard algorithm to parse formulas into Abstract Syntax Trees (ASTs) and postfix representations once during the “compile” phase. When a user updates a single cell’s value, Excel doesn’t re-parse the massive formula text; it simply re-runs the lightning-fast, linear Postfix Evaluation (RPN) over the pre-compiled tokens using the new values. This decoupling of Parsing and Evaluation is what allows millions of cells to update in milliseconds.
4. Code Implementations
Let’s look at how to implement the Postfix Evaluation algorithm. Notice how elegantly it handles the operations.
Java
public int evalRPN(String[] tokens) {
Deque<Integer> stack = new ArrayDeque<>();
for (String token : tokens) {
// If the token is an operator, pop two operands and calculate
if (token.equals("+")) {
stack.push(stack.pop() + stack.pop());
} else if (token.equals("*")) {
stack.push(stack.pop() * stack.pop());
} else if (token.equals("-")) {
int right = stack.pop(); // First pop is the right operand
int left = stack.pop(); // Second pop is the left operand
stack.push(left - right);
} else if (token.equals("/")) {
int right = stack.pop();
int left = stack.pop();
stack.push(left / right);
} else {
// If it's a number, push it to the stack
stack.push(Integer.parseInt(token));
}
}
// The final result is the only remaining element
return stack.pop();
}
Go
func evalRPN(tokens []string) int {
stack := []int{}
for _, token := range tokens {
if token == "+" || token == "-" || token == "*" || token == "/" {
// Pop the top two elements
right := stack[len(stack)-1]
left := stack[len(stack)-2]
stack = stack[:len(stack)-2] // Shrink stack
// Perform calculation and push result
var result int
switch token {
case "+": result = left + right
case "-": result = left - right
case "*": result = left * right
case "/": result = left / right
}
stack = append(stack, result)
} else {
// If it's a number, convert to int and push
val, _ := strconv.Atoi(token)
stack = append(stack, val)
}
}
return stack[0]
}
Complexity Analysis
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Shunting-Yard (Infix to Postfix) | O(N) - We process each token exactly once. | O(N) - The operator stack stores at most N operators. |
| RPN Evaluation | O(N) - We scan the postfix expression sequentially. | O(N) - The stack stores at most N operands. |