Covariance Matrices

When dealing with multiple variables, variance becomes a matrix. The Covariance Matrix Σ (Sigma) captures both the spread of individual variables and their relationships with each other.

It is the fundamental building block of Multivariate Calculus, Machine Learning (PCA, Gaussian Mixtures), and Portfolio Theory.

Analogy: Think of Variance as the “spread” of a single stock’s price. A Covariance Matrix is the “relationship map” of an entire stock portfolio. It tells you not just how volatile each stock is, but how they move together (e.g., if Tech goes down, does Energy go up?).

PEDALS Case Study: Real Estate Market

  • Process Requirements: We want to model house prices based on SquareFootage and DistanceToCenter.
  • Estimate: Both features vary wildly. They likely have a negative covariance (as distance increases, footage might increase slightly, but price decreases).
  • Data Model: We represent these features as a 2D random vector X = [SquareFootage, DistanceToCenter]T.
  • Architecture: The Covariance Matrix Σ stores Var(SquareFootage), Var(DistanceToCenter), and Cov(SquareFootage, DistanceToCenter).
  • Localized Details: A negative off-diagonal element tells us that houses further out tend to be cheaper, adjusting our multidimensional Gaussian contours to be tilted.
  • Scale: In high dimensions (e.g., thousands of features), covariance matrices become too large for memory (O(N2)). We use low-rank approximations or PCA.

1. Definition

For a random vector X = [X1, X2, …, Xn]T, the covariance matrix Σ is an n × n matrix where the element at (i, j) is the covariance between Xi and Xj.

Σij = Cov(Xi, Xj) = E[(Xi - μi)(Xj - μj)]

For two variables X and Y:

Σ =
[
Var(X)Cov(X, Y)
Cov(Y, X)Var(Y)
]
=
[
σX2ρσXσY
ρσYσXσY2
]

Key Properties

  1. Symmetric: Σij = Σji (since Cov(X, Y) = Cov(Y, X)).
  2. Positive Semi-Definite (PSD): vT Σ v ≥ 0 for any vector v. This ensures variance is never negative along any projection.
  3. Eigen-Decomposition: The eigenvectors of Σ point in the directions of greatest variance (Principal Components). The eigenvalues represent the magnitude of variance in those directions. Imagine stretching a circular rubber sheet: the eigenvectors are the directions in which you stretch, and the eigenvalues are how far you pull in each direction.

2. Interactive: Covariance Playground

Visualize how changing variances and correlation affects the shape of a Bivariate Gaussian distribution.

  • Sliders: Adjust σX, σY, and Correlation ρ.
  • Blue Ellipse: Represents the 95% confidence region.
  • Red Arrows: The eigenvectors (Principal Axes).
Matrix Σ = [
1.00 0.00
0.00 1.00
]

3. Hardware Reality: Memory Layout Matters

Mathematically, a matrix is just a grid of numbers. To a CPU, it is a linear strip of memory.

  • Row-Major (C/Java/Python): [[1, 2], [3, 4]] is stored as 1, 2, 3, 4.
  • Column-Major (Fortran/MATLAB): Stored as 1, 3, 2, 4.

When computing the covariance matrix from raw data X (N samples x D dimensions), we essentially compute XT X. This involves dot products of columns.

If your data is stored Row-Major (N rows of D features), accessing a “column” (feature) requires skipping over D-1 elements for every read. This causes Cache Thrashing (fetching a whole cache line but only using one number). Think of reading a book column-by-column (skipping lines) instead of left-to-right: it’s incredibly slow and inefficient for your brain (the CPU).

Optimization: High-performance libraries (BLAS/LAPACK) often transpose the matrix first or use Block Matrix Multiplication to keep data in the CPU cache (L1/L2), speeding up covariance calculation by 10x-100x.


4. Coding Example

We can calculate the covariance matrix from scratch. The formula for sample covariance is: Cov(X, Y) = Σ (xi - x̄)(yi - ȳ) / (N - 1)

Java
Go
Python

Java

import java.util.Arrays;

public class CovarianceMatrix {
  public static void main(String[] args) {
    // Data: 3 samples, 2 features (X, Y)
    double[][] data = {
      {1.0, 2.0},
      {2.0, 3.0},
      {3.0, 5.0}
    };

    double[][] cov = calculateCovariance(data);
    System.out.println(Arrays.deepToString(cov));
  }

  public static double[][] calculateCovariance(double[][] data) {
    int n = data.length;
    int d = data[0].length;

    // 1. Calculate Means
    double[] means = new double[d];
    for (double[] row : data) {
      for (int j = 0; j < d; j++) {
        means[j] += row[j];
      }
    }
    for (int j = 0; j < d; j++) means[j] /= n;

    // 2. Calculate Covariance Matrix (d x d)
    double[][] cov = new double[d][d];
    for (int j = 0; j < d; j++) {
      for (int k = j; k < d; k++) { // Symmetric, so start k from j
        double sum = 0;
        for (int i = 0; i < n; i++) {
          sum += (data[i][j] - means[j]) * (data[i][k] - means[k]);
        }
        double val = sum / (n - 1); // Sample covariance (unbiased)
        cov[j][k] = val;
        cov[k][j] = val;
      }
    }
    return cov;
  }
}

Go

package main

import (
  "fmt"
)

func main() {
  // Data: 3 samples, 2 features (X, Y)
  data := [][]float64{
    {1.0, 2.0},
    {2.0, 3.0},
    {3.0, 5.0},
  }

  cov := calculateCovariance(data)
  fmt.Println(cov)
}

func calculateCovariance(data [][]float64) [][]float64 {
  n := len(data)
  d := len(data[0])

  // 1. Calculate Means
  means := make([]float64, d)
  for _, row := range data {
    for j, val := range row {
      means[j] += val
    }
  }
  for j := range means {
    means[j] /= float64(n)
  }

  // 2. Calculate Covariance Matrix (d x d)
  cov := make([][]float64, d)
  for i := range cov {
    cov[i] = make([]float64, d)
  }

  for j := 0; j < d; j++ {
    for k := j; k < d; k++ {
      sum := 0.0
      for i := 0; i < n; i++ {
        sum += (data[i][j] - means[j]) * (data[i][k] - means[k])
      }
      val := sum / float64(n-1)
      cov[j][k] = val
      cov[k][j] = val
    }
  }
  return cov
}

Python

import numpy as np

# 1. Generate correlated data
# X ~ N(0, 1)
x = np.random.normal(0, 1, 1000)
# Y = 0.8*X + Noise
y = 0.8 * x + np.random.normal(0, 0.6, 1000)

# Stack into a (2, 1000) matrix
data = np.vstack([x, y])

# 2. Compute Covariance Matrix
# bias=True normalizes by N instead of N-1
sigma = np.cov(data, bias=True)

print("Covariance Matrix:")
print(sigma)

5. Summary

  • Covariance Matrix: Summary of spread and linear relation between multiple variables.
  • PSD Property: Ensures volumes (determinants) are non-negative and variance is real.
  • Hardware: Matrix layout impacts performance significantly due to cache lines.