Covariance Matrices

When dealing with multiple variables, variance becomes a matrix. The Covariance Matrix Σ (Sigma) captures both the spread of individual variables and their relationships with each other.

It is the fundamental building block of Multivariate Calculus, Machine Learning (PCA, Gaussian Mixtures), and Portfolio Theory.

1. Definition

For a random vector X = [X₁, X₂, …, X_n]^T, the covariance matrix Σ is an n × n matrix where the element at (i, j) is the covariance between X_i and X_j.

Σ ij = Cov(X i, X j) = E[(X i - μ i)(X j - μ j)]

For two variables X and Y:

Σ =

[

Var(X)Cov(X, Y)

Cov(Y, X)Var(Y)

]

[

σ_X²ρσ_Xσ_Y

ρσ_Yσ_Xσ_Y²

]

Key Properties

Symmetric: Σ_ij = Σ_ji (since Cov(X, Y) = Cov(Y, X)).
Positive Semi-Definite (PSD): v^T Σ v ≥ 0 for any vector v. This ensures variance is never negative along any projection.
Eigen-Decomposition: The eigenvectors of Σ point in the directions of greatest variance (Principal Components). The eigenvalues represent the magnitude of variance in those directions.

2. Interactive: Covariance Playground

Visualize how changing variances and correlation affects the shape of a Bivariate Gaussian distribution.

Sliders: Adjust σ_X, σ_Y, and Correlation ρ.
Blue Ellipse: Represents the 95% confidence region.
Red Arrows: The eigenvectors (Principal Axes).

Std Dev X (σ_X): 1.0

Std Dev Y (σ_Y): 1.0

Correlation (ρ): 0.0

Matrix Σ = [

1.00 0.00

0.00 1.00

]

3. Hardware Reality: Memory Layout Matters

Mathematically, a matrix is just a grid of numbers. To a CPU, it is a linear strip of memory.

Row-Major (C/Java/Python): [[1, 2], [3, 4]] is stored as 1, 2, 3, 4.
Column-Major (Fortran/MATLAB): Stored as 1, 3, 2, 4.

When computing the covariance matrix from raw data X (N samples x D dimensions), we essentially compute X^T X. This involves dot products of columns.

If your data is stored Row-Major (N rows of D features), accessing a “column” (feature) requires skipping over D-1 elements for every read. This causes Cache Thrashing (fetching a whole cache line but only using one number).

Optimization: High-performance libraries (BLAS/LAPACK) often transpose the matrix first or use Block Matrix Multiplication to keep data in the CPU cache (L1/L2), speeding up covariance calculation by 10x-100x.

4. Coding Example

We can calculate the covariance matrix from scratch. The formula for sample covariance is: Cov(X, Y) = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{N - 1}

Java Example

import java.util.Arrays;

public class CovarianceMatrix {
    public static void main(String[] args) {
        // Data: 3 samples, 2 features (X, Y)
        double[][] data = {
            {1.0, 2.0},
            {2.0, 3.0},
            {3.0, 5.0}
        };

        double[][] cov = calculateCovariance(data);
        System.out.println(Arrays.deepToString(cov));
    }

    public static double[][] calculateCovariance(double[][] data) {
        int n = data.length;
        int d = data[0].length;

        // 1. Calculate Means
        double[] means = new double[d];
        for (double[] row : data) {
            for (int j = 0; j < d; j++) {
                means[j] += row[j];
            }
        }
        for (int j = 0; j < d; j++) means[j] /= n;

        // 2. Calculate Covariance Matrix (d x d)
        double[][] cov = new double[d][d];
        for (int j = 0; j < d; j++) {
            for (int k = j; k < d; k++) { // Symmetric, so start k from j
                double sum = 0;
                for (int i = 0; i < n; i++) {
                    sum += (data[i][j] - means[j]) * (data[i][k] - means[k]);
                }
                double val = sum / (n - 1); // Sample covariance (unbiased)
                cov[j][k] = val;
                cov[k][j] = val;
            }
        }
        return cov;
    }
}

Go Example

package main

import (
	"fmt"
)

func main() {
	// Data: 3 samples, 2 features (X, Y)
	data := [][]float64{
		{1.0, 2.0},
		{2.0, 3.0},
		{3.0, 5.0},
	}

	cov := calculateCovariance(data)
	fmt.Println(cov)
}

func calculateCovariance(data [][]float64) [][]float64 {
	n := len(data)
	d := len(data[0])

	// 1. Calculate Means
	means := make([]float64, d)
	for _, row := range data {
		for j, val := range row {
			means[j] += val
		}
	}
	for j := range means {
		means[j] /= float64(n)
	}

	// 2. Calculate Covariance Matrix (d x d)
	cov := make([][]float64, d)
	for i := range cov {
		cov[i] = make([]float64, d)
	}

	for j := 0; j < d; j++ {
		for k := j; k < d; k++ {
			sum := 0.0
			for i := 0; i < n; i++ {
				sum += (data[i][j] - means[j]) * (data[i][k] - means[k])
			}
			val := sum / float64(n-1)
			cov[j][k] = val
			cov[k][j] = val
		}
	}
	return cov
}

Python Example

import numpy as np

# 1. Generate correlated data
# X ~ N(0, 1)
x = np.random.normal(0, 1, 1000)
# Y = 0.8*X + Noise
y = 0.8 * x + np.random.normal(0, 0.6, 1000)

# Stack into a (2, 1000) matrix
data = np.vstack([x, y])

# 2. Compute Covariance Matrix
# bias=True normalizes by N instead of N-1
sigma = np.cov(data, bias=True)

print("Covariance Matrix:")
print(sigma)

5. Summary

Covariance Matrix: Summary of spread and linear relation between multiple variables.
PSD Property: Ensures volumes (determinants) are non-negative and variance is real.
Hardware: Matrix layout impacts performance significantly due to cache lines.

Next: Review & Cheat Sheet