Greedy Algorithms
Make the locally optimal choice at each stage with the hope of finding a global optimum. Simple idea — surprisingly powerful results.
The chess player thinking many moves ahead is not greedy. The Scrabble player who picks the best word right now is. Greedy algorithms exploit the Scrabble intuition — and when the problem has the right structure, the locally optimal choices cascade into a globally optimal solution.
Key properties that make greedy work: Greedy choice property (a global optimum can be reached by making locally optimal choices) and Optimal substructure (an optimal solution contains optimal solutions to sub-problems).
Greedy algorithms often produce sub-optimal results — but the algorithms in this unit are all optimal. The running time usually depends on the choice of underlying data structure.
Binary Heaps
For a heap of n elements, the tree has depth ⌊log₂ n⌋. Three fundamental operations:
Insert-Key — O(log n)
Place the new element at the bottom (first available position). If it is smaller than its parent, swap them. Repeat until the heap property is restored ("bubble up"). At most ⌊log₂ n⌋ swaps.
Delete-Min — O(log n)
Return the root (minimum). Move the last element to the root, then "sift down" — swap with the smaller child until heap property holds. At most ⌊log₂ n⌋ swaps.
Build-Heap — O(n log n)
Insert n elements one by one. Each insertion is O(log n), giving O(n log n) total. (A smarter bottom-up build is O(n).)
| Operation | Time | How |
|---|---|---|
| Insert | O(log n) | Bubble up from insertion point |
| Delete-Min | O(log n) | Sift down from root |
| Find-Min | O(1) | Always at index 0 |
| Decrease-Key | O(log n) | Update + bubble up |
| Build-Heap | O(n log n) | n inserts |
Minimum Spanning Trees
Key insight: the optimal set of edges cannot contain a cycle, because removing a cycle edge would reduce cost without breaking connectivity. So the solution is a tree.
Property 1: Removing a cycle edge cannot disconnect a graph.
Property 2: A tree on n nodes has exactly n−1 edges.
Property 3: Any connected undirected graph with |E| = |V|−1 is a tree.
Property 4: There is a unique path between any pair of nodes in a tree.
Kruskal's Algorithm
Start with an empty graph and repeatedly add the lightest edge that doesn't create a cycle.
When Kruskal picks edge e connecting components T₁ and T₂, it is the lightest edge between T₁ and V − T₁ — satisfying the cut property.
Time Complexity
Sorting edges: O(E log E) = O(E log V) since log E ≈ log V²= 2 log V.
Union-Find operations: O(E log V) with union by rank.
With path compression: approaches O(E α(V)) ≈ nearly linear.
Overall: O(E log V)
Prim's Algorithm
Grow a single tree from a source vertex. At each step, add the cheapest edge that connects the current tree to a vertex not yet in the tree.
The MST is encoded in the prev[] array when the algorithm finishes.
Key Difference from Dijkstra
Prim stores the weight of the cheapest edge to each unvisited vertex. Dijkstra stores the total distance from source. Same structure, different key values.
Time Complexity — same as Dijkstra
| Operation | Count | Cost |
|---|---|---|
| make-queue | 1 | O(n log n) |
| delete-min | n | O(n log n) |
| decrease-key | m (edges) | O(m log n) |
Total: O((n + m) log n) where n = |V|, m = |E|.
The Cut Property
Proof sketch: If e is already in the MST, done. Otherwise, add e to the MST — it creates a cycle. This cycle must contain some other edge e' crossing the same cut. Remove e': we get a new spanning tree T' = T ∪ {e} − {e'}. Since e is the lightest crossing edge, w(e) ≤ w(e'), so weight(T') ≤ weight(T). Since T was an MST, T' is too. ∎
This property justifies any algorithm conforming to: pick a cut with no current tree edges across it, then add the minimum-weight crossing edge.
Disjoint Sets (Union-Find)
Kruskal's algorithm needs to efficiently check whether two vertices are in the same component. The union-find data structure does this.
makeset(x): Create a singleton set {x}. Sets parent and rank to 0.
find(x): Return the representative (root) of x's set. Follow parent pointers to root.
union(x, y): Merge the sets containing x and y.
Union by Rank
Make the root of the shorter tree point to the root of the taller tree. This ensures tree height stays at most O(log n).
Path Compression
During each find, make all nodes on the path point directly to the root. This flattens the tree over time.
With union by rank and path compression, any sequence of m union/find operations on n elements runs in O(m log* n) time.
log* n (iterated logarithm) is ≤ 5 for any practical input — essentially constant.
Dijkstra's Algorithm (SSSP)
Induction step: Any path P from s to v must cross some edge (x, y) from S to V−S. Since all weights are non-negative: l(P) ≥ d(x) + l(x,y) ≥ d(u) + l(u,v) = l(Pᵥ). ∎
Time Complexity (with binary heap)
| Operation | Count | Cost each | Total |
|---|---|---|---|
| make-queue | 1 | O(n log n) | O(n log n) |
| delete-min | n | O(log n) | O(n log n) |
| decrease-key | m | O(log n) | O(m log n) |
Total: O((m + n) log n)
Naïve implementation (no heap): O(V²). With Fibonacci heap: O(m + n log n).
Huffman Encoding
Prefix-free: No codeword is a prefix of another. Represented by a full binary tree where symbols are at leaves.
Key insight: The two symbols with the smallest frequencies must be at the deepest level of the optimal tree, as siblings. This gives the greedy strategy.
Running time: O(n log n) using a binary heap.
Alternative Cost Formulation
The cost of the tree equals the sum of frequencies of all internal nodes (plus leaves, excluding root). Each time we merge two subtrees, we pay the sum of their frequencies.
Connection to Entropy
For a probability distribution {p₁, …, pₙ}, the entropy is:
This is the theoretical minimum average bits per symbol. Huffman encoding achieves a per-symbol cost close to entropy — it is near-optimal. Higher entropy = more randomness = less compressible. A fair coin has entropy 1 bit; a biased coin has less.
Example: ACGT Encoding
| Symbol | Frequency | Huffman Code |
|---|---|---|
| T | 40% | 0 |
| A | 31% | 10 |
| C | 20% | 110 |
| G | 9% | 111 |
Build: merge G(9)+C(20)=GC(29), then GC(29)+A(31)=GCA(60), then GCA(60)+T(40)=root(100).
Horn Formulas
- Implications: (x₁ ∧ x₂ ∧ … xₖ) ⇒ y — if all left-side literals are true, then y must be true.
- Pure negative clauses: (ū ∨ v̄ ∨ ȳ) — at least one of these must be false.
Goal: Find a satisfying assignment (true/false for each variable) that satisfies all clauses.
Horn formulas are the foundation of Prolog (logic programming). The algorithm runs in linear time in the length of the formula using a directed graph of implications.
Set Cover
Proof: After t iterations, at most n(1 − 1/k)ᵗ ≤ ne^{−t/k} elements remain uncovered. At t = k ln n, fewer than 1 element remains → all covered. ∎
The approximation factor of ln n is tight: under standard complexity assumptions, no polynomial-time algorithm can achieve a smaller approximation factor for Set Cover. This is a provably optimal approximation for a greedy approach.