Find hidden directions in a support-incident matrix with SVD, then use rank, PCA, truncation, and condition numbers without losing sight of what the numbers mean.
Access-ticket features can already be arranged into tensors and projected through a matrix. A harder question comes next: if two feature columns mostly repeat the same pattern, do you need to keep both?
That question appears everywhere in machine learning. An embedding table can contain redundant directions. A learned update can be almost low-rank. A data matrix can have a weak direction that makes a numerical solve unreliable. Singular value decomposition (SVD) gives you a precise way to find those directions, measure their strengths, and decide what to keep.
Use a tiny matrix with the same named-axis discipline as the previous lesson. Each row is one ticket. The two columns record whether the ticket contains a latency signal or a rollback signal:
The third ticket contains both signals. That makes the combined direction [latency + rollback] stronger than the contrast direction [latency - rollback].
To compare directions fairly, give each one length 1:
Multiply A by either direction. The result tells you how strongly each ticket activates that pattern; its length tells you the pattern's total strength in the matrix.
1import numpy as np
2
3A = np.array([
4 [1.0, 0.0], # latency
5 [0.0, 1.0], # rollback
6 [1.0, 1.0], # latency and rollback
7])
8common = np.array([1.0, 1.0]) / np.sqrt(2)
9contrast = np.array([1.0, -1.0]) / np.sqrt(2)
10
11common_scores = A @ common
12contrast_scores = A @ contrast
13
14print("A_shape", A.shape)
15print("common_scores", common_scores.round(4).tolist())
16print("contrast_scores", contrast_scores.round(4).tolist())
17print("strengths", round(np.linalg.norm(common_scores), 4), round(np.linalg.norm(contrast_scores), 4))1A_shape (3, 2)
2common_scores [0.7071, 0.7071, 1.4142]
3contrast_scores [0.7071, -0.7071, 0.0]
4strengths 1.7321 1.0The common direction has strength ; the contrast direction has strength . The third row reinforces the common direction and cancels in the contrast direction. Nothing is compressed yet. You're only measuring which feature combinations the rows support most strongly.
For a data matrix A, multiply it by its transpose on the feature side:
G has shape [2, 2] because it compares feature directions with feature directions. A direction v that obeys is an eigenvector of G: G scales it without turning it. Its eigenvalue is squared strength, because
when v has length 1.
For this matrix:
1import numpy as np
2
3A = np.array([[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]])
4G = A.T @ A
5common = np.array([1.0, 1.0]) / np.sqrt(2)
6contrast = np.array([1.0, -1.0]) / np.sqrt(2)
7
8print("G", G.tolist())
9print("G_common", (G @ common).round(4).tolist())
10print("three_common", (3 * common).round(4).tolist())
11print("G_contrast_equals_contrast", np.allclose(G @ contrast, contrast))1G [[2.0, 1.0], [1.0, 2.0]]
2G_common [2.1213, 2.1213]
3three_common [2.1213, 2.1213]
4G_contrast_equals_contrast TrueEvery real matrix has a singular value decomposition:[1]
For the incident matrix A with shape [3, 2], use the compact (or reduced) form. It keeps only the two singular directions that can fit in a matrix with two feature columns:
| Factor | Shape | Meaning here |
|---|---|---|
[2, 2] | directions in feature space, such as common signal versus contrast | |
[2, 2] | non-negative strengths, ordered largest first | |
[3, 2] | how each ticket participates in each direction |
Our right singular vectors are the normalized common and contrast directions. Our singular values are:
The corresponding left singular vectors come from . Put the factors back together and they reconstruct every original entry.
Use the direct SVD routine for computation. It returns U, the one-dimensional singular-value array S, and Vt, which is .
1import numpy as np
2
3A = np.array([[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]])
4U, S, Vt = np.linalg.svd(A, full_matrices=False)
5reconstructed = U @ np.diag(S) @ Vt
6
7print("shapes", U.shape, S.shape, Vt.shape)
8print("singular_values", S.round(4).tolist())
9print("reconstruction_error", float(np.max(np.abs(A - reconstructed))))
10print("reconstructs_A", np.allclose(A, reconstructed))1shapes (3, 2) (2,) (2, 2)
2singular_values [1.7321, 1.0]
3reconstruction_error 4.440892098500626e-16
4reconstructs_A TrueThe sign of a singular vector may differ between correct implementations. If both matching directions in U and V flip sign, their product stays the same. Compare singular values and reconstruction, not raw signs.
The rank of a matrix is the number of non-zero singular values. It counts how many independent directions the matrix really needs.
Suppose every rollback feature exactly repeats the latency feature:
The second column adds no new information. B has two columns, but only one independent direction, so its rank is 1.
1import numpy as np
2
3B = np.array([
4 [1.0, 1.0],
5 [0.0, 0.0],
6 [1.0, 1.0],
7])
8singular_values = np.linalg.svd(B, compute_uv=False)
9
10print("shape", B.shape)
11print("singular_values", singular_values.round(6).tolist())
12print("rank", int(np.linalg.matrix_rank(B)))1shape (3, 2)
2singular_values [2.0, 0.0]
3rank 1A feature table with rank below its column count deserves inspection. Sometimes it contains a deliberate encoding. Sometimes two features duplicate each other and waste compute. Rank exposes the question; domain meaning answers it.
Exact rank is a mathematical definition. Floating-point data rarely produces a perfect zero, so np.linalg.matrix_rank treats sufficiently small singular values as zero using a tolerance. Measurement noise or downstream requirements may justify a different cutoff.
Return to the full-rank incident matrix A. Keeping both singular directions reconstructs it exactly. Keeping only the strongest direction creates a rank-one approximation:
Among all rank-one matrices, this approximation has the smallest Frobenius reconstruction error: the square root of the sum of squared entry errors.[2] Equivalently, it minimizes that squared sum. The guarantee is about reconstructing A. It doesn't promise that a downstream classifier or retriever keeps its quality.
1import numpy as np
2
3A = np.array([[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]])
4U, S, Vt = np.linalg.svd(A, full_matrices=False)
5rank_one = U[:, :1] @ np.diag(S[:1]) @ Vt[:1, :]
6
7error = np.linalg.norm(A - rank_one, ord="fro")
8
9print("rank_one")
10print(rank_one.round(3))
11print("frobenius_error", round(float(error), 4))
12print("discarded_singular_value", round(float(S[1]), 4))1rank_one
2[[0.5 0.5]
3 [0.5 0.5]
4 [1. 1. ]]
5frobenius_error 1.0
6discarded_singular_value 1.0Here the discarded direction has strength 1, so the rank-one approximation loses meaningful contrast between a latency-only ticket and a rollback-only ticket. A small matrix makes that cost visible before you try the same idea on embeddings.
So far the matrix contained signal counts and we kept its origin at zero. Principal component analysis (PCA) answers a different question: which directions describe variation around the average row?
That means centering first. Given a matrix X of tickets by numeric features:
Then SVD on the centered matrix gives principal-component directions in V. If you skip centering, a large average offset can dominate the first direction even when it doesn't describe variation between tickets.
1import numpy as np
2
3X = np.array([
4 [1.0, 1.0], # low delay evidence, low escalation evidence
5 [2.0, 2.1],
6 [3.0, 2.8],
7 [4.0, 4.2], # high on both signals
8])
9centered = X - X.mean(axis=0)
10_, S, Vt = np.linalg.svd(centered, full_matrices=False)
11pc1 = Vt[0]
12if pc1[0] < 0:
13 pc1 = -pc1
14
15retained = S[0] ** 2 / np.sum(S ** 2)
16
17print("column_means", X.mean(axis=0).round(3).tolist())
18print("centered_means", centered.mean(axis=0).round(8).tolist())
19print("pc1", pc1.round(4).tolist())
20print("pc1_variance_fraction", round(float(retained), 4))1column_means [2.5, 2.525]
2centered_means [0.0, -0.0]
3pc1 [0.6937, 0.7203]
4pc1_variance_fraction 0.9961This leading direction points roughly along [delay evidence + escalation evidence], because those two signals rise together in the sample. PCA uses variance, not meaning. A direction that explains a lot of variation still needs human interpretation and task evaluation.
SVD also reveals a failure mode. If a matrix stretches one direction strongly and almost erases another, an operation that tries to reverse that transformation must amplify the weak direction.
The 2-norm condition number is:
when the smallest singular value is non-zero. A condition number near 1 means all directions have comparable scales. A large condition number warns that solving, inverting, whitening (rescaling projected components toward unit variance), or other operations that divide by small singular values can magnify perturbations.[1]
1import numpy as np
2
3A = np.diag([1.0, 0.0001])
4x_true = np.array([1.0, 1.0])
5y = A @ x_true
6y_with_noise = y + np.array([0.0, 0.0001])
7
8x_clean = np.linalg.solve(A, y)
9x_noisy = np.linalg.solve(A, y_with_noise)
10
11print("condition_number", float(np.linalg.cond(A)))
12print("clean_solution", x_clean.tolist())
13print("noisy_solution", x_noisy.tolist())
14print("small_output_noise_doubled_second_coordinate", x_noisy[1] == 2.0)1condition_number 10000.0
2clean_solution [1.0, 1.0]
3noisy_solution [1.0, 2.0]
4small_output_noise_doubled_second_coordinate TrueNotice the scope of the claim. Merely storing vectors in a matrix with a weak direction doesn't automatically make cosine or dot-product retrieval unstable. The amplification appears when a pipeline performs an inverse-like operation, such as a solve or whitening transform, along that weak direction.
A.T @ A made our hand calculation easy, but it isn't the safe general implementation of SVD. Its condition number is squared:
for a full-column-rank matrix. A matrix with condition number 10,000 turns into a Gram matrix with condition number 100,000,000. That can discard useful precision before the decomposition even begins. Use direct SVD implementations in numerical code; keep A.T @ A as a teaching device for the relationship between eigenvectors and singular vectors.
1import numpy as np
2
3A = np.diag([1.0, 0.0001])
4G = A.T @ A
5
6print("cond_A", float(np.linalg.cond(A)))
7print("cond_AtA", float(np.linalg.cond(G)))
8print("ratio", float(np.linalg.cond(G) / np.linalg.cond(A)))1cond_A 10000.0
2cond_AtA 100000000.0
3ratio 10000.0For centered data, the squared singular values are proportional to how much variation each principal direction explains. For an uncentered count or embedding matrix, they still measure Frobenius energy captured by each direction. In either case, the cumulative ratio gives a transparent reconstruction-based cutoff, where r is the number of singular values:
1import numpy as np
2
3singular_values = np.array([12.4, 8.7, 3.1, 0.9, 0.3])
4fractions = np.cumsum(singular_values ** 2) / np.sum(singular_values ** 2)
5target = 0.95
6k = int(np.searchsorted(fractions, target) + 1)
7
8print("retained_by_component", fractions.round(4).tolist())
9print("smallest_k_for_95_percent", k)
10print("retained_at_k", round(float(fractions[k - 1]), 4))1retained_by_component [0.6408, 0.9562, 0.9962, 0.9996, 1.0]
2smallest_k_for_95_percent 2
3retained_at_k 0.9562This rule answers a reconstruction question. For a retrieval system, follow it with ranking metrics on held-out queries rather than assuming that matrix energy equals relevance.
Represent four incident tickets with three term features: latency, rollback, and trace. The query is mostly about a latency trace with a smaller rollback signal. Project both tickets and query onto the top two right-singular directions, then compare rankings before and after compression with cosine similarity. Cosine similarity divides each dot product by both vector lengths, so the score reflects direction rather than raw magnitude.
1import numpy as np
2
3tickets = np.array([
4 [3.0, 0.0, 1.0], # latency trace
5 [0.0, 3.0, 1.0], # rollback trace
6 [2.0, 1.0, 0.5], # latency with rollback mention
7 [1.0, 2.0, 1.5], # rollback with latency mention
8])
9query = np.array([2.5, 0.5, 1.0])
10
11def cosine_scores(matrix: np.ndarray, vector: np.ndarray) -> np.ndarray:
12 row_norms = np.linalg.norm(matrix, axis=1)
13 vector_norm = np.linalg.norm(vector)
14 if np.any(row_norms == 0) or vector_norm == 0:
15 raise ValueError("cosine similarity requires non-zero vectors")
16 return (matrix @ vector) / (row_norms * vector_norm)
17
18_, singular_values, Vt = np.linalg.svd(tickets, full_matrices=False)
19basis = Vt[:2].T
20retained_energy = np.sum(singular_values[:2] ** 2) / np.sum(singular_values ** 2)
21
22original_scores = cosine_scores(tickets, query)
23compressed_scores = cosine_scores(tickets @ basis, query @ basis)
24original_order = np.argsort(original_scores)[::-1]
25compressed_order = np.argsort(compressed_scores)[::-1]
26
27print("singular_values", singular_values.round(4).tolist())
28print("retained_energy", round(float(retained_energy), 4))
29print("original_order", original_order.tolist())
30print("compressed_order", compressed_order.tolist())
31print("top_ticket_preserved", bool(original_order[0] == compressed_order[0]))
32print("max_score_change", round(float(np.max(np.abs(original_scores - compressed_scores))), 4))
33
34try:
35 cosine_scores(tickets, np.zeros(3))
36except ValueError as error:
37 print("zero_vector_guard", str(error))1singular_values [4.7011, 3.1677, 0.6044]
2retained_energy 0.9888
3original_order [0, 2, 3, 1]
4compressed_order [0, 2, 3, 1]
5top_ticket_preserved True
6max_score_change 0.0221
7zero_vector_guard cosine similarity requires non-zero vectorsFor this tiny fixture, compression drops a real tail direction and changes scores while preserving the ranking. The explicit norm guard matters too: cosine similarity is undefined for a zero-length query or ticket vector. A production pipeline should reject, quarantine, or replace that vector instead of returning NaN scores.
This is a useful sanity check, not a credible evaluation. On a real retrieval set, measure Recall@k, the fraction of relevant tickets recovered in the top k returned candidates, and inspect failures before accepting reduced dimensions.
| Later task | Matrix idea used here | What must still be validated |
|---|---|---|
| Latent semantic analysis for text retrieval | Truncated SVD maps term-document structure into fewer directions.[3] | Ranking quality on relevant queries |
| Embedding dimensionality reduction | Centered PCA or a task-specific projection drops feature directions | Recall, latency, and storage tradeoffs |
| Low-rank adaptation (LoRA) | A LoRA update is represented as two thin matrices whose product has limited rank.[4] | Fine-tuned model quality |
| Training diagnostics | Weak or differently scaled directions help explain why optimization can be difficult | Loss curves and held-out performance |
LoRA doesn't compute the SVD of a full trained update during fine-tuning. It learns thin factors directly. SVD gives you the language to understand what "low rank" means; later training lessons explain how those factors are optimized.
1import torch
2
3update = torch.tensor([
4 [2.0, 1.0],
5 [1.0, 2.0],
6])
7U, S, Vh = torch.linalg.svd(update, full_matrices=False)
8rank_one = U[:, :1] @ torch.diag(S[:1]) @ Vh[:1, :]
9
10print("singular_values", [round(float(value), 4) for value in S])
11print("rank_one_shape", tuple(rank_one.shape))
12print("rank_one_error", round(float(torch.linalg.norm(update - rank_one)), 4))1singular_values [3.0, 1.0]
2rank_one_shape (2, 2)
3rank_one_error 1.0[100, 100] to every row of the PCA example. Compare SVD before centering and after centering. Which one describes differences between rows?Answers to check:
16 and 4, singular values are 4 and 2, and .1.0, so the ranking stops being meaningful. The exact code above returns order [3, 2, 1, 0], doesn't preserve the first result, and reports a maximum score change of 0.7113.1.A.T @ A, squaring the condition number. Fix: Use a direct SVD routine for computation.NaN after projection. Cause: A query or ticket vector has zero length, so cosine similarity is undefined. Fix: Guard norms before division and define a reject, quarantine, or replacement policy.U as embedding feature axes. Cause: Row and feature spaces were confused. Fix: Use V for feature directions and U for row participation.Answer every question, then check your score. Score above 75% to mark this lesson complete.
9 questions remaining.
Numerical Linear Algebra.
Trefethen, L. N., & Bau, D. · 1997 · SIAM
The approximation of one matrix by another of lower rank
Eckart, C. & Young, G. · 1936
Indexing by Latent Semantic Analysis
Deerwester, S., et al. · 1990 · JASIS
LoRA: Low-Rank Adaptation of Large Language Models.
Hu, E. J., et al. · 2021 · ICLR