Disk Scheduling: The Elevator Problem
[!NOTE] The Mechanical Limit In a Hard Disk Drive (HDD), the read/write head must physically move to the correct track (Seek) and wait for the sector to spin under it (Rotation). Minimizing this physical movement is the #1 performance goal for HDDs.
1. Anatomy of Latency
When an application asks for data, the time taken is:
Latency = Seek Time + Rotational Latency + Transfer Time
- Seek Time: Moving the arm. (Slowest part: ~3-10ms).
- Rotational Latency: Waiting for the disk to spin. (Avg: 4ms at 7200 RPM).
- Transfer Time: Reading the bits. (Fast).
2. Scheduling Algorithms
The OS acts like an Elevator Dispatcher. If requests come for floors 10, 50, and 5, it shouldn’t go 10 → 50 → 5.
A. FCFS (First-Come, First-Served)
- Process requests in order.
- Problem: “Wild Swings” (1 → 100 → 2 → 99). Terrible seek time.
B. SSTF (Shortest Seek Time First)
- Go to the closest request next.
- Problem: Starvation. If requests keep hitting tracks 10-20, a request for track 100 will never be served.
C. SCAN (The Elevator)
- Move continuously from 0 → 100, servicing requests. Then reverse 100 → 0.
- Problem: Middle tracks get serviced twice as often (going up and down).
D. C-SCAN (Circular SCAN)
- Move 0 → 100. Then immediately jump back to 0 without servicing.
- Pros: Uniform wait time.
3. Interactive: The Elevator Visualizer
Be the scheduler. Add requests and see how the head moves.
4. The Modern Era: SSDs and NVMe
Wait… SSDs don’t have moving heads. Why do we still have schedulers?
Actually, for SSDs, we often disable complex scheduling (using noop or none scheduler).
Why?
- Seek is Free: Reading address 0 or 1000000 takes the same time.
- Internal Parallelism: SSDs have multiple NAND chips. We want to send requests in parallel, not serialize them into a SCAN line.
Multi-Queue Block Layer (blk-mq)
Modern Linux uses blk-mq. Instead of one global request queue (which requires a global Lock), we have one queue per CPU core.
- Requests are pushed to the software queue.
- Batched and sent to the NVMe hardware queues.
io_uring: Asynchronous I/O Done Right
Standard read() blocks the thread. aio was broken.
io_uring (introduced in Linux 5.1) uses two ring buffers in shared memory:
- Submission Queue (SQ): App pushes requests.
- Completion Queue (CQ): Kernel pushes results. This allows Zero Syscall Overhead (in polling mode).
5. Code Example: Request Merging Simulation (Go)
The OS tries to merge adjacent requests to reduce overhead. Read sectors 1-2 and 3-4? Merge into Read 1-4.
package main
import (
"fmt"
"sort"
)
type IORequest struct {
StartBlock int
Count int
}
func main() {
// Incoming requests: [10..12], [100..101], [12..14]
queue := []IORequest{
{10, 2}, // Blocks 10, 11
{100, 1}, // Block 100
{12, 2}, // Blocks 12, 13
}
fmt.Println("Before Merge:", queue)
merged := mergeRequests(queue)
fmt.Println("After Merge: ", merged)
}
func mergeRequests(reqs []IORequest) []IORequest {
if len(reqs) == 0 {
return nil
}
// 1. Sort by StartBlock
sort.Slice(reqs, func(i, j int) bool {
return reqs[i].StartBlock < reqs[j].StartBlock
})
var result []IORequest
current := reqs[0]
for i := 1; i < len(reqs); i++ {
next := reqs[i]
// Check if next starts where current ends
if current.StartBlock+current.Count == next.StartBlock {
// Merge!
current.Count += next.Count
} else {
result = append(result, current)
current = next
}
}
result = append(result, current)
return result
}