Greedy Algorithms

Make the locally optimal choice at each stage with the hope of finding a global optimum. Simple idea — surprisingly powerful results.

Core Idea
A greedy algorithm builds a solution piece by piece, always choosing the next piece that offers the most obvious and immediate benefit. Unlike dynamic programming, it never backtracks.

The chess player thinking many moves ahead is not greedy. The Scrabble player who picks the best word right now is. Greedy algorithms exploit the Scrabble intuition — and when the problem has the right structure, the locally optimal choices cascade into a globally optimal solution.

Key properties that make greedy work: Greedy choice property (a global optimum can be reached by making locally optimal choices) and Optimal substructure (an optimal solution contains optimal solutions to sub-problems).

Greedy algorithms often produce sub-optimal results — but the algorithms in this unit are all optimal. The running time usually depends on the choice of underlying data structure.

Binary Heaps

Min Binary Heap
A min binary heap is a complete binary tree stored as an array where every node's key is ≤ the keys of its children. The minimum element is always at the root.

For a heap of n elements, the tree has depth ⌊log₂ n⌋. Three fundamental operations:

Insert-Key — O(log n)

Place the new element at the bottom (first available position). If it is smaller than its parent, swap them. Repeat until the heap property is restored ("bubble up"). At most ⌊log₂ n⌋ swaps.

Delete-Min — O(log n)

Return the root (minimum). Move the last element to the root, then "sift down" — swap with the smaller child until heap property holds. At most ⌊log₂ n⌋ swaps.

Build-Heap — O(n log n)

Insert n elements one by one. Each insertion is O(log n), giving O(n log n) total. (A smarter bottom-up build is O(n).)

OperationTimeHow
InsertO(log n)Bubble up from insertion point
Delete-MinO(log n)Sift down from root
Find-MinO(1)Always at index 0
Decrease-KeyO(log n)Update + bubble up
Build-HeapO(n log n)n inserts
Array Representation
For node at index i (1-indexed): parent at ⌊i/2⌋, left child at 2i, right child at 2i+1.

Minimum Spanning Trees

Definition
For undirected graph G = (V, E) with weight function w : E → ℝ≥0, a spanning tree is a connected acyclic subgraph with all vertices. A Minimum Spanning Tree (MST) minimizes the total edge weight.

Key insight: the optimal set of edges cannot contain a cycle, because removing a cycle edge would reduce cost without breaking connectivity. So the solution is a tree.

Tree Properties

Property 1: Removing a cycle edge cannot disconnect a graph.

Property 2: A tree on n nodes has exactly n−1 edges.

Property 3: Any connected undirected graph with |E| = |V|−1 is a tree.

Property 4: There is a unique path between any pair of nodes in a tree.

Kruskal's Algorithm

Start with an empty graph and repeatedly add the lightest edge that doesn't create a cycle.

procedure kruskal(G, w): for all u ∈ V: makeset(u) // each vertex is its own component X = {} Sort edges E by weight (ascending) for each edge {u, v} ∈ E in increasing order: if find(u) ≠ find(v): // different components → no cycle add {u, v} to X union(u, v) return X
Correctness — Cut Property
If edges X are part of some MST, and e is the minimum-weight edge crossing some cut where X has no crossing edges, then X ∪ {e} is part of some MST.

When Kruskal picks edge e connecting components T₁ and T₂, it is the lightest edge between T₁ and V − T₁ — satisfying the cut property.

Time Complexity

Sorting edges: O(E log E) = O(E log V) since log E ≈ log V²= 2 log V.

Union-Find operations: O(E log V) with union by rank.

With path compression: approaches O(E α(V)) ≈ nearly linear.

Overall: O(E log V)

Prim's Algorithm

Grow a single tree from a source vertex. At each step, add the cheapest edge that connects the current tree to a vertex not yet in the tree.

procedure prim(G, w): for all u ∈ V: cost(u) = ∞; prev(u) = nil Pick any initial node u₀; cost(u₀) = 0 H = makequeue(V) // priority queue keyed on cost while H is not empty: v = deletemin(H) for each {v, z} ∈ E: if cost(z) > w(v, z): cost(z) = w(v, z) prev(z) = v decreasekey(H, z)

The MST is encoded in the prev[] array when the algorithm finishes.

Key Difference from Dijkstra

Prim stores the weight of the cheapest edge to each unvisited vertex. Dijkstra stores the total distance from source. Same structure, different key values.

Time Complexity — same as Dijkstra

OperationCountCost
make-queue1O(n log n)
delete-minnO(n log n)
decrease-keym (edges)O(m log n)

Total: O((n + m) log n) where n = |V|, m = |E|.

The Cut Property

Cut Property (Formal)
Suppose edges X ⊆ E are part of some MST of G = (V, E). Pick any subset S ⊂ V for which X has no edges crossing between S and V − S. Let e be the lightest edge across this partition. Then X ∪ {e} is part of some MST.

Proof sketch: If e is already in the MST, done. Otherwise, add e to the MST — it creates a cycle. This cycle must contain some other edge e' crossing the same cut. Remove e': we get a new spanning tree T' = T ∪ {e} − {e'}. Since e is the lightest crossing edge, w(e) ≤ w(e'), so weight(T') ≤ weight(T). Since T was an MST, T' is too. ∎

This property justifies any algorithm conforming to: pick a cut with no current tree edges across it, then add the minimum-weight crossing edge.

Disjoint Sets (Union-Find)

Kruskal's algorithm needs to efficiently check whether two vertices are in the same component. The union-find data structure does this.

Operations

makeset(x): Create a singleton set {x}. Sets parent and rank to 0.

find(x): Return the representative (root) of x's set. Follow parent pointers to root.

union(x, y): Merge the sets containing x and y.

Union by Rank

Make the root of the shorter tree point to the root of the taller tree. This ensures tree height stays at most O(log n).

function find(x): while x ≠ π(x): x = π(x) return x procedure union(x, y): rx = find(x); ry = find(y) if rx = ry: return if rank(rx) > rank(ry): π(ry) = rx else: π(rx) = ry if rank(rx) = rank(ry): rank(ry)++

Path Compression

During each find, make all nodes on the path point directly to the root. This flattens the tree over time.

function find(x): if x ≠ π(x): π(x) = find(π(x)) // recurse + point to root return π(x)
Complexity with Both Optimizations

With union by rank and path compression, any sequence of m union/find operations on n elements runs in O(m log* n) time.

log* n (iterated logarithm) is ≤ 5 for any practical input — essentially constant.

Dijkstra's Algorithm (SSSP)

Single Source Shortest Path Problem
Given a directed graph G = (V, E) with non-negative edge lengths l : E → ℝ≥0 and source s, find the shortest path from s to every vertex v.
procedure dijkstra(G, l, s): for all v ∈ V: d(v) = ∞ d(s) = 0; S = {} H = makequeue(V) // priority queue on d(v) while H is not empty: u = deletemin(H) // pick unvisited vertex with smallest d S = S ∪ {u} for each edge (u, v) ∈ E: if d(v) > d(u) + l(u,v): d(v) = d(u) + l(u,v) prev(v) = u decreasekey(H, v)
Correctness (by Induction)
When vertex v is added to S, d(v) equals the true shortest-path distance from s to v.

Induction step: Any path P from s to v must cross some edge (x, y) from S to V−S. Since all weights are non-negative: l(P) ≥ d(x) + l(x,y) ≥ d(u) + l(u,v) = l(Pᵥ). ∎
⚠ Critical Constraint
Dijkstra's algorithm requires non-negative edge weights. Negative edges break the greedy assumption that once a vertex is settled, its distance is final.

Time Complexity (with binary heap)

OperationCountCost eachTotal
make-queue1O(n log n)O(n log n)
delete-minnO(log n)O(n log n)
decrease-keymO(log n)O(m log n)

Total: O((m + n) log n)

Naïve implementation (no heap): O(V²). With Fibonacci heap: O(m + n log n).

Huffman Encoding

Problem
Given symbol frequencies f₁, f₂, …, fₙ, find a prefix-free binary encoding that minimizes the total encoding cost:
cost = Σᵢ fᵢ · (depth of symbol i in tree)

Prefix-free: No codeword is a prefix of another. Represented by a full binary tree where symbols are at leaves.

Key insight: The two symbols with the smallest frequencies must be at the deepest level of the optimal tree, as siblings. This gives the greedy strategy.

procedure Huffman(f[1..n]): H = min-priority-queue ordered by f for i = 1 to n: insert(H, i) for k = n+1 to 2n-1: i = deletemin(H); j = deletemin(H) create node k with children i, j f[k] = f[i] + f[j] insert(H, k) return tree rooted at k

Running time: O(n log n) using a binary heap.

Alternative Cost Formulation

The cost of the tree equals the sum of frequencies of all internal nodes (plus leaves, excluding root). Each time we merge two subtrees, we pay the sum of their frequencies.

Connection to Entropy

For a probability distribution {p₁, …, pₙ}, the entropy is:

H = Σᵢ pᵢ log(1/pᵢ)

This is the theoretical minimum average bits per symbol. Huffman encoding achieves a per-symbol cost close to entropy — it is near-optimal. Higher entropy = more randomness = less compressible. A fair coin has entropy 1 bit; a biased coin has less.

Example: ACGT Encoding

SymbolFrequencyHuffman Code
T40%0
A31%10
C20%110
G9%111

Build: merge G(9)+C(20)=GC(29), then GC(29)+A(31)=GCA(60), then GCA(60)+T(40)=root(100).

Horn Formulas

Definition
A Horn formula consists of Boolean variables with two types of clauses:
  • Implications: (x₁ ∧ x₂ ∧ … xₖ) ⇒ y — if all left-side literals are true, then y must be true.
  • Pure negative clauses: (ū ∨ v̄ ∨ ȳ) — at least one of these must be false.

Goal: Find a satisfying assignment (true/false for each variable) that satisfies all clauses.

Algorithm (Greedy / "Stingy"): set all variables to false while ∃ unsatisfied implication: set the right-hand variable to true if all negative clauses satisfied: return assignment else: return "not satisfiable"
Correctness
The algorithm is correct because it maintains the invariant: any variable set to true must be true in any satisfying assignment. If the final assignment fails a negative clause, no satisfying assignment exists.

Horn formulas are the foundation of Prolog (logic programming). The algorithm runs in linear time in the length of the formula using a directed graph of implications.

Set Cover

Problem
Given a universe B and sets S₁, S₂, …, Sₘ ⊆ B, find the minimum number of sets whose union equals B.
Greedy Algorithm: repeat until all elements covered: pick Sᵢ with the most uncovered elements
⚠ Not Optimal!
The greedy algorithm is not guaranteed to find the minimum cover. Example: optimal solution might use 3 schools, greedy might pick 4.
Approximation Guarantee
If the optimal cover uses k sets and |B| = n, the greedy algorithm uses at most k ln n sets.

Proof: After t iterations, at most n(1 − 1/k)ᵗ ≤ ne^{−t/k} elements remain uncovered. At t = k ln n, fewer than 1 element remains → all covered. ∎

The approximation factor of ln n is tight: under standard complexity assumptions, no polynomial-time algorithm can achieve a smaller approximation factor for Set Cover. This is a provably optimal approximation for a greedy approach.

Kruskal's Algorithm
Greedily adds the cheapest edge that doesn't form a cycle. Uses Union-Find to detect cycles.
Step 0 / 0
Press Next to begin Kruskal's algorithm on the example graph.
Prim's Algorithm
Grows a single tree from source A. Always adds the cheapest edge connecting the tree to a new vertex.
Step 0 / 0
Press Next to begin Prim's algorithm starting from vertex A.
Dijkstra's Algorithm
Finds shortest paths from source S to all vertices. Uses a priority queue on tentative distances.
Step 0 / 0
Press Next to begin Dijkstra's from source vertex S.
Huffman Encoding
Build the optimal prefix-free code tree for ACGT with frequencies T=40%, A=31%, C=20%, G=9%.
Step 0 / 0
Press Next to start building the Huffman tree.
Binary Heap Operations
Watch insert and delete-min operations on a min binary heap.
Insert values to build a min binary heap. The minimum is always at the root.
Array: []

Practice Questions

From Dasgupta, Papadimitriou & Vazirani, Chapter 5 — answers verified.

Exercise 5.13

Huffman Encoding — ACGT Frequencies

A long string consists of the four characters A, C, G, T with frequencies 31%, 20%, 9%, and 40% respectively. Find the Huffman encoding of these four characters.

Step 1: List the symbols sorted by frequency (ascending).
G: 9%, C: 20%, A: 31%, T: 40% — sorted from least to most frequent.
Step 2: Merge the two lowest-frequency symbols. What is the new combined node?
Merge G (9%) and C (20%) → node GC with frequency 29%.
Remaining queue: GC(29%), A(31%), T(40%).
Step 3: Merge the next two lowest. What happens now?
Merge GC (29%) and A (31%) → node GCA with frequency 60%.
Remaining queue: T(40%), GCA(60%).
Step 4: Merge the final two nodes to form the root.
Merge T (40%) and GCA (60%) → root with frequency 100%.
Step 5: Assign codes by tracing paths (0 = left, 1 = right). What are the final codewords?
T: 0 (depth 1), A: 10 (depth 2), C: 110 (depth 3), G: 111 (depth 3).

Expected bits per symbol = 0.40×1 + 0.31×2 + 0.20×3 + 0.09×3 = 0.40 + 0.62 + 0.60 + 0.27 = 1.89 bits.
Fixed-length encoding would need 2 bits — so Huffman saves ~5.5%.
Exercise 5.1 (Adapted)

MST Cost — Kruskal's on a Sample Graph

Consider the graph from the textbook (Fig 5.1): vertices A–F with edges B–C(1), B–D(2), C–D(3), C–F(4), D–F(4), E–F(4), A–D(5), A–B(5), C–E(5), A–C(6). Run Kruskal's algorithm.

Step 1: Sort all edges by weight. List them in order.
B–C(1), B–D(2), C–D(3), C–F(4), D–F(4), E–F(4), A–D(5), A–B(5), C–E(5), A–C(6).
Step 2: Process B–C(1). Is a cycle formed? Add or skip?
B and C are in different components. Add B–C(1). Components: {B,C}, {A}, {D}, {E}, {F}.
Step 3: Process B–D(2). Add or skip?
B and D are in different components. Add B–D(2). Components: {B,C,D}, {A}, {E}, {F}.
Step 4: Process C–D(3). Add or skip?
C and D are both in {B,C,D} — same component. Skip (would form cycle).
Step 5: Process edges C–F(4), D–F(4), E–F(4). Which are added?
Add C–F(4): {B,C,D,F}, {A}, {E}.
Skip D–F(4): D and F now in same component.
Add E–F(4): {B,C,D,E,F}, {A}.
Step 6: We have 4 edges, need 5 (n−1 = 6−1). What edge connects A?
Add A–D(5): connects {A} to {B,C,D,E,F}. Now all vertices are connected.

MST edges: B–C(1), B–D(2), C–F(4), E–F(4), A–D(5). Total cost = 16.
Exercise 5.5

MST & Shortest Paths — Edge Weight Increase

You have computed an MST and shortest paths from s. Each edge weight increases by 1 (w'ₑ = wₑ + 1). (a) Does the MST change? (b) Do the shortest paths change?

Part (a): Think about what the MST criterion depends on. If every edge weight increases by 1, does the relative ordering of spanning trees change?
MST does NOT necessarily change. Since every spanning tree has exactly n−1 edges, adding 1 to every edge weight increases every spanning tree's cost by the same amount (n−1). So the minimum spanning tree remains the same tree. The MST is unchanged.
Part (b): Think about paths of different lengths. If two paths have 2 vs 3 edges, does increasing all weights by 1 affect them equally?
Shortest paths CAN change. A path with k edges increases in cost by k. Paths with different numbers of edges are affected differently. Example: if the shortest path s→t had 3 edges and an alternative had 2 edges with slightly higher original weight, the alternative may become shorter after the +1 increase. So yes, shortest paths can change.
Exercise 5.9 (Selected)

True/False MST Properties

Determine if each statement is True or False. Prove or give a counterexample.

(c) Let e be any edge of minimum weight in G. Must e be part of some MST?
TRUE. The minimum-weight edge e forms a cut {u} vs V−{u}. By the cut property, the lightest edge across this cut is in some MST. Since e is the global minimum, it is the lightest across this cut, so it must appear in some MST.
(d) If the lightest edge is unique, must it be in every MST?
TRUE. Suppose for contradiction that some MST T does not contain e. By the cycle property, adding e to T creates a cycle. This cycle contains another edge e' with w(e') ≥ w(e). But since e is the unique minimum-weight edge, w(e) < w(e'). So T − {e'} ∪ {e} is a spanning tree with lower cost — contradiction with T being an MST. Thus every MST must contain e.
(g) Is the shortest-path tree from Dijkstra's necessarily an MST?
FALSE. Counterexample: three vertices A, B, C with edges A–B(1), B–C(1), A–C(3). Dijkstra from A: shortest path tree uses A–B and B–C (total 2). MST also uses A–B and B–C (total 2). Seems the same here — but try: A–B(1), A–C(2), B–C(2). Dijkstra from A gives A–B(1), A–C(2). MST gives A–B(1) and B–C(2) — different! So they need not coincide.
(i) Does Prim's algorithm work correctly with negative edge weights?
TRUE. Prim's correctness relies only on the cut property, not on non-negativity of weights. The cut property holds for any edge weights. Dijkstra's fails with negatives, but Prim's does not — the key difference is that Prim stores edge weights, not cumulative path distances.
Exercise 5.6

Unique MST with Distinct Edge Weights

Prove: if all edge weights in G are distinct, then G has a unique MST.

Step 1: Assume two MSTs T₁ and T₂ exist. How would you derive a contradiction?
Since T₁ ≠ T₂, there is some edge e ∈ T₁ \ T₂. Consider what happens when you add e to T₂.
Step 2: Adding e to T₂ creates a cycle C. What can you say about the edges of C?
Cycle C contains e and some edges of T₂. Since e ∉ T₂, C contains at least one edge e' ∈ T₂ \ T₁.
Step 3: Compare w(e) and w(e'). Use the fact that T₁ and T₂ are both MSTs.
Since T₁ is an MST and e' ∉ T₁: adding e' to T₁ would create a cycle. In that cycle, e must be the max-weight edge (otherwise T₁ is not an MST by the cycle property), so w(e) ≥ w(e'). Similarly, since T₂ is an MST and e ∉ T₂: w(e') ≥ w(e). Therefore w(e) = w(e'). But all weights are distinct — contradiction! ∎

Answer in your own words, then reveal the correct answer to check yourself.

Exercise 5.7

Maximum Spanning Tree

Show how to find the maximum spanning tree of a graph — the spanning tree with the largest total weight.

✓ Answer

Method 1 — Negate weights: Multiply every edge weight by −1, then run any MST algorithm (Kruskal's or Prim's). The resulting MST on the negated graph is the maximum spanning tree on the original graph.

Method 2 — Reverse Kruskal's: Sort edges in decreasing order of weight. Add an edge if it doesn't form a cycle (using Union-Find exactly as in standard Kruskal's). Stop after n−1 edges.

Correctness follows from a symmetric "heavy cut property": the heaviest edge across any cut must be in some maximum spanning tree.

Exercise 5.14

Huffman Encoding — Specific Frequencies

Symbols a, b, c, d, e occur with frequencies 1/2, 1/4, 1/8, 1/16, 1/16. (a) What is the Huffman encoding? (b) If applied to 1,000,000 characters, what is the length of the encoded file in bits?

✓ Answer

(a) Build the tree:

Start: d(1/16), e(1/16), c(1/8), b(1/4), a(1/2).

Merge d+e → de(1/8). Queue: de(1/8), c(1/8), b(1/4), a(1/2).

Merge de+c → dec(1/4). Queue: dec(1/4), b(1/4), a(1/2).

Merge dec+b → decb(1/2). Queue: decb(1/2), a(1/2).

Merge decb+a → root(1).

Codes: a=0 (depth 1), b=10 (depth 2), c=110 (depth 3), d=1110 (depth 4), e=1111 (depth 4).

(b) Total bits = m × Σ pᵢ × depthᵢ
= 1,000,000 × (1/2×1 + 1/4×2 + 1/8×3 + 1/16×4 + 1/16×4)
= 1,000,000 × (0.5 + 0.5 + 0.375 + 0.25 + 0.25)
= 1,000,000 × 1.875 = 1,875,000 bits.

Exercise 5.11

Union-Find Sequence

Starting from singletons {1},...,{8}, apply union-by-rank with path compression. Perform: union(1,2), union(3,4), union(5,6), union(7,8), union(1,4), union(6,7), union(4,5), find(1). Show the final state. Tie-breaking: lower-numbered root points to higher-numbered root.

✓ Answer

After each operation (parent[i] shown, lower→higher on ties):

union(1,2): rank equal → 1 points to 2. π(1)=2, rank(2)=1.

union(3,4): 3 → 4. rank(4)=1.

union(5,6): 5 → 6. rank(6)=1.

union(7,8): 7 → 8. rank(8)=1.

union(1,4): find(1)=2, find(4)=4. Both rank 1 → lower points to higher: 2 → 4. rank(4)=2.

union(6,7): find(6)=6, find(7)=8. Both rank 1 → 6 → 8. rank(8)=2.

union(4,5): find(4)=4(rank 2), find(5)=find(5)→6→8 = 8(rank 2). Equal ranks → 4 → 8. rank(8)=3.

find(1): 1→2→4→8. Path compression: π(1)=8, π(2)=8, π(4)=8. Returns 8.

Final parent array: π = [_, 8, 8, 4, 8, 6, 8, 8, 8] (1-indexed). All elements eventually point to root 8.

Conceptual

Why Greedy Fails for 0/1 Knapsack but Works for MST?

Greedy algorithms work optimally for MST (Kruskal/Prim) and Huffman encoding, but fail for the 0/1 Knapsack problem. Explain the key structural difference.

✓ Answer

MST and Huffman have the "greedy choice property" + "optimal substructure":

For MST: The cut property guarantees that the locally cheapest safe edge is always in some global optimum. Once you pick it, the remaining problem is structurally identical — a smaller MST problem. Each greedy choice is globally safe.

For Huffman: The two rarest symbols must be deepest in any optimal tree (provable by exchange argument). Merging them and solving the reduced problem is equivalent — the structure telescopes.

0/1 Knapsack lacks this: Taking the highest value-to-weight item greedily can "fill" the knapsack in a way that prevents fitting a globally better combination. There's no cut property or exchange argument — choices interact with each other in complex ways that only dynamic programming can resolve optimally.

The formal criterion: greedy works when the problem has a matroid structure — a hereditary independence system with the exchange property. MST over a graphic matroid is the classic case.

Exercise 5.31

Optimal Customer Ordering

A server has n customers. Customer i requires tᵢ minutes of service. Minimize total waiting time T = Σᵢ (time spent waiting by customer i). Give an efficient algorithm and prove optimality.

✓ Answer

Algorithm: Sort customers in increasing order of service time tᵢ (Shortest Job First). Serve in that order.

Time complexity: O(n log n) for sorting.

Proof of optimality (exchange argument): Consider any optimal ordering. If two adjacent customers i and j are out of order (tᵢ > tⱼ but i comes before j), then swapping them reduces total waiting time. If i is served before j, customer j waits an extra tᵢ more than necessary. Swapping saves tᵢ − tⱼ > 0 in total waiting. Therefore the optimal ordering has no such inversion — it is sorted by increasing service time. ∎

Formula: If the order is t₍₁₎ ≤ t₍₂₎ ≤ … ≤ t₍ₙ₎, then total wait = (n−1)t₍₁₎ + (n−2)t₍₂₎ + … + 0·t₍ₙ₎ = Σᵢ (n−i)t₍ᵢ₎.