Inclusion-Exclusion Principle

When we want to count the total number of items in the union of two or more sets, we can’t simply add their sizes together. Why? Because items that belong to multiple sets would be counted more than once.

The Inclusion-Exclusion Principle gives us a systematic way to count the size of a union by:

  1. Including the sizes of individual sets.
  2. Excluding the sizes of pairwise intersections.
  3. Including the sizes of triple intersections (and so on).

1. Two Sets

For two sets A and B:

** A ∪ B = A + B - A ∩ B **
  • ** A **: Size of set A
  • ** B **: Size of set B
  • ** A ∩ B **: Size of the intersection (items in both A AND B)

Think of a Venn Diagram. If you add circle A and circle B, the overlapping region is added twice. You must subtract it once to correct the count.

2. Three Sets

For three sets A, B, and C, it gets slightly more complex.

** A ∪ B ∪ C = A + B + C - ( A ∩ B + A ∩ C + B ∩ C ) + A ∩ B ∩ C **

The Logic:

  1. Add individual sets: Regions with 1 set counted once, 2 sets counted twice, 3 sets counted thrice.
  2. Subtract pairwise intersections: Regions with 2 sets now correct (2-1=1). Regions with 3 sets removed 3 times (3-3=0).
  3. Add triple intersection: We subtracted the triple intersection 3 times (once for each pair), so its count is currently 0. We must add it back once.

3. The General Formula

For n finite sets A1, …, An, the size of the union is:

** ∪ Ai = Σ Ai - Σ Ai ∩ Aj + Σ Ai ∩ Aj ∩ Ak - … + (-1)n-1 A1 ∩ … ∩ An **

In plain English:

  1. Sum the sizes of all single sets.
  2. Subtract the sizes of all pairwise intersections.
  3. Add the sizes of all triple intersections.
  4. Subtract the sizes of all quadruple intersections.
  5. Continue until you reach the intersection of all n sets.

Venn Diagram Solver (2 Sets)

A
B
Union |A ∪ B| 80

4. Hardware Reality: Sets as Bitmasks

In systems programming and competitive programming, when the universe of elements is small (e.g., < 64 items), we often represent sets using integers as bitmasks.

  • Set A: 00101 (Elements at index 0 and 2 are present)
  • Set B: 00110 (Elements at index 1 and 2 are present)

This allows set operations to happen in O(1) CPU cycles:

  • Intersection: A & B (00100 → Only element 2 matches)
  • Union: A | B (00111 → Elements 0, 1, 2)
  • Difference: A & ~B

[!TIP] This is how permissions (rwx), database indexes (bitmap index), and Bloom filters work under the hood. It is the most efficient way to perform inclusion-exclusion logic.

5. Implementation Examples

Java

In Java, the Set interface (e.g., HashSet) provides methods for these operations.

import java.util.HashSet;
import java.util.Set;
import java.util.Arrays;

public class InclusionExclusion {
    public static void main(String[] args) {
        Set<String> setA = new HashSet<>(Arrays.asList("user1", "user2", "user3", "user4"));
        Set<String> setB = new HashSet<>(Arrays.asList("user3", "user4", "user5"));

        // Intersection (A AND B)
        Set<String> intersection = new HashSet<>(setA);
        intersection.retainAll(setB); // Modifies intersection to keep only elements in B
        System.out.println("Intersection size: " + intersection.size()); // 2

        // Union (A OR B)
        Set<String> union = new HashSet<>(setA);
        union.addAll(setB); // Adds all elements from B
        System.out.println("Union size: " + union.size()); // 5

        // Verify Principle
        int calcUnion = setA.size() + setB.size() - intersection.size();
        System.out.println("Calculated Union: " + calcUnion); // 4 + 3 - 2 = 5
    }
}

Go

In Go, sets are typically implemented as maps with empty struct values map[T]struct{}.

package main

import "fmt"

func main() {
	setA := map[string]struct{}{
		"user1": {}, "user2": {}, "user3": {}, "user4": {},
	}
	setB := map[string]struct{}{
		"user3": {}, "user4": {}, "user5": {},
	}

	// Calculate Intersection
	intersectionSize := 0
	for k := range setA {
		if _, exists := setB[k]; exists {
			intersectionSize++
		}
	}

	// Calculate Union directly (counting unique keys)
	unionMap := make(map[string]struct{})
	for k := range setA { unionMap[k] = struct{}{} }
	for k := range setB { unionMap[k] = struct{}{} }
	unionSize := len(unionMap)

	fmt.Printf("Set A: %d, Set B: %d\n", len(setA), len(setB))
	fmt.Printf("Intersection: %d\n", intersectionSize)
	fmt.Printf("Actual Union: %d\n", unionSize)

	// Verify Principle
	calcUnion := len(setA) + len(setB) - intersectionSize
	fmt.Printf("Calculated Union: %d\n", calcUnion) // 4 + 3 - 2 = 5
}

Python

Python’s set data structure handles this natively.

# Define sets of user IDs
group_a = {"user1", "user2", "user3", "user4"} # 4 users
group_b = {"user3", "user4", "user5"}          # 3 users

# Intersection (Users in BOTH groups)
intersection = group_a.intersection(group_b)
print(f"Intersection: {intersection}")
# Output: {'user3', 'user4'} (Size 2)

# Union (Unique users across groups)
union = group_a.union(group_b)
print(f"Union: {union}")
# Output: {'user1', 'user2', 'user3', 'user4', 'user5'} (Size 5)

# Verify Principle: |A| + |B| - |A &cap; B|
# 4 + 3 - 2 = 5
assert len(union) == len(group_a) + len(group_b) - len(intersection)

6. Data Science Application: Monthly Active Users (MAU)

A common metric in tech companies is Monthly Active Users (MAU).

Suppose you have:

  • Web Users: 1,000,000
  • iOS Users: 800,000
  • Android Users: 700,000

If you just add them up (2.5M), you are overestimating your reach because many users use multiple platforms.

To get the True Total Reach, you need to subtract the overlaps:

  • Users on Web & iOS
  • Users on Web & Android
  • Users on iOS & Android

And add back the super-users (Web & iOS & Android).

Formula: Total = Web + iOS + Android - (Web&iOS + Web&And + iOS&And) + (AllThree)

This logic is fundamental to SQL queries using COUNT(DISTINCT user_id) which effectively applies this principle under the hood.

[!IMPORTANT] Failing to account for overlap is a common mistake in marketing attribution and growth accounting. Always check for intersections!