Matrices as Tensor Network Diagrams

In the previous post, I described a simple way to think about matrices, namely as bipartite graphs. Today I'd like to share a different way to picture matrices—one which is used not only in mathematics, but also in physics, chemistry, and machine learning. Here's the basic idea. An $m\times n$ matrix $M$ with real entries represents a linear map from $\mathbb{R}^n\to\mathbb{R}^m$. Such a mapping can be pictured as a node with two edges. One edge represents the input space, the other edge represents the output space.

That's it!

We can accomplish much with this simple idea. But first, a few words about the picture: To specify an $m\times n$ matrix $M$, one must specify all $mn$ entries $M_{ij}$. The index $i$ ranges from 1 to $m$—the dimension of the output space—and the index $j$ ranges from 1 to $n$—the dimension of the input space. Said differently, $i$ indexes the number of rows of $M$ and $j$ indexes the number of its columns. These indices can be included in the picture, if we like:

This idea generalizes very easily. A matrix is a two-dimensional array of numbers, while an $n$-dimensional array of numbers is called a tensor of order $n$ or an $n$-tensor. Like a matrix, an $n$-tensor can be  represented by a node with one edge for each dimension.

A number, for example, can be thought of as a zero-dimensional array, i.e. a point. It is thus a 0-tensor, which can be drawn as a node with zero edges. Likewise, a vector can be thought of as a one-dimensional array of numbers and hence a 1-tensor. It's represented by a node with one edge. A matrix is a two-dimensional array and hence 2-tensor. It's represented by a node with two edges. A 3-tensor is a three-dimensional array and hence a node with three edges, and so on.

In this graphical notation, familiar things have nice pictures. For example...

Matrix multiplication is tensor contraction.

Multiplying two matrices corresponds to "glueing" their pictures together. This is called tensor contraction.

In the picture above, the edges with matching indices $j$ were the ones that were contracted. This is consistent with the fact that two matrices can be multiplied only when their input/output dimensions match up. You'll also notice that the resulting picture has two free indices, $i$ and $k$, which indeed defines a matrix.

By the way, a key feature of having these pictures is that we don't have to carry around indices. So let's not!

A quick check: A matrix was described as a single node with one edge for each vector space, yet the picture above has two nodes. We still want this to represent a single matrix. And I claim it does! There's a nice way to see this: simply smoosh the blue and green nodes together.

This reminds me of rain trickling down a window: when two raindrops come into contact, they fuse into bigger droplet. That's matrix-matrix multiplication. A similar picture holds for matrix-vector multiplication: a matrix $M$ multiplied by a vector $\mathbf{v}$ results in another vector $M\mathbf{v}$, which is a node with one free edge.

More generally, the product of two or more tensors is represented by a cluster of nodes and edges where the contractions occur along edges with matching indices.

Node shapes can represent different properties.

So far I've drawn all the nodes as circles. But this was just a choice. There is no official rule for which shape to use. That means we can be creative! For example, we might want to reserve a circle or other symmetric shape, like a square, for symmetric matrices only.

Then the transpose of a matrix can be represented by reflecting its picture:

So the symmetry of a symmetric matrix is preserved in the diagram!

I also like the idea of drawing isometric embeddings as triangles:

An isometric embedding $U$ is a linear map from a space $V$ into a space $W$ of larger dimension that preserves the lengths of vectors. Such a map satisfies $U^\top U=\text{id}_V$ but $UU^\top \neq \text{id}_W$. In words, you can always embed the small space $V$ into a larger one, then project back onto $V$ without distorting the vectors in $V$. (Not unlike a retraction map in topology.) But you certainly can't squish all of $W$ onto little $V$, then expect to undo the damage after including $V$ back into $W$. This large-versus-small feature is hinted at by the triangles. (The base of a triangle is larger than its tip!) And in general, as shown below, identity linear operators are drawn as straight lines:

Matrix factorizations have nice pictures, too.

With all this talk of matrix multiplication, i.e. matrix composition, let's not forget about matrix factorization, i.e. matrix decomposition! Every matrix, for example, has a singular value decomposition. This has a nice picture associated to it:

Here, $U$ and $V$ are unitary matrices, and hence isometries, and hence triangles. The matrix $D$ is a diagonal matrix, which I like to represent by a diamond. In short, matrix factorization is the decomposition of a single node into multiple nodes; matrix multiplication is the fusion of multiple nodes into a single node.

The drawing above illustrates another feature of these diagrams: the spatial position of the nodes doesn't really matter. I could've drawn the yellow, blue, green, and pink nodes in horizontal line, or in a vertical line, or in a zig-zag, or however I wish. The only important thing is that the diagram has two free edges. The product of matrices is another matrix!

Messy proofs reduce to picture proofs.

There's more we could say about this graphical notation, but I'll wrap up with another noteworthy feature: proofs can become very simple! Take the trace, for example. The trace of a matrix has a simple picture. It's defined to be the sum along a common index:

This string-with-a-bead has no free edges. It's a loop. This is consistent with the fact that the trace is a number. It's a 0-tensor so it has no free indices. Now here's a proof that the trace is invariant under cyclic permutations:

Just slide beads along a necklace. So clean!

What's in a name?

The diagrams discussed in today's post, which have their origins in Penrose's graphical notation, are called tensor network diagrams and/or, perhaps with some minor differences, string diagrams. The "and/or" depends on who you are. That is, the idea to visually represent a map-of-vector-spaces as a node-with-edges is used in the physics/machine learning communities, where they are called tensor network diagrams, and also in the category theory community, where they are called string diagrams. I think this is just a case of different fields using nearly identical notation for different purposes.

Category theorists use string diagrams to prove things. (The diagrams appear in my little booklet on applied category theory, for instance.) What's more, string diagrams are used to represent most any kind of mapping—not just mappings between vector spaces. Said more formally, string diagrams might arise in a discussion of any monoidal category. For a gentle introduction to these categorical ideas, check out Seven Sketches by Fong and Spivak as well as Picturing Quantum Processes by Coecke and Kissinger.

Some physicists and machine learning researchers, on the other hand, use tensor networks to compute things. A typical situation may go something like this. You have a quantum system. You wish to find the main eigenvector of a special linear operator, called a Hamiltonian. This eigenvector lives in an absurdly large Hilbert space, so you want a technique to find this vector in a compressed kind of way. Enter: tensor networks.

And by "absurdly large," I do mean ABSURDLY large. If you have an Avogadro's number's worth of quantum particles, each of which can occupy just two states, then you need a vector space of dimension $2^{10^{23}}$. Now imagine having a linear operator on this space. That's a matrix with $2^{10^{23}}\times 2^{10^{23}}$ entries. That is exponentially more than the number of atoms in the observable universe. There are only $10^{80}$ of those! Good luck storing this on a computer. In summary, tensor networks help us deal with a large number of parameters in a principled, tractable way.

Tensor networks also have much overlap with graphical models, automata, and more. One vein of current research is identifying and making good use of those overlaps. So there's a lot to explore here. A few places to start are Miles Stoudenmire's iTensor library, Roman Orus's A Practical Introduction to Tensor Networks, Jacob Biamonte's and Ville Bergholm's Tensor Networks in a Nutshell, and Google's TensorNetwork library.

I’ve been working on a project (with some excellent researchers who first introduced me to the tensor network world) that uses these diagrams in a more computational/physics-y setting. For that reason, I tend to think of them as tensor network diagrams, rather than as string diagrams.

The project, which features a special kind of tensor network, has some really neat mathematics that I'm excited to share. It also uses the matrices-as-graphs idea discussed on the blog previously. I plan to blog about it later in the year.

Stay tuned!

New! [Added October 20, 2019] You can see some of today's ideas in action in "Modeling Sequences with Quantum States" (arXiv:1910.07425). There, we make frequent use of tensor network diagrams, especially in the description of a certain training algorithm. You can read more about this work with Miles Stoudenmire and John Terilla here on the blog.

Related Posts

A Group and Its Center, Intuitively

Algebra

Language Modeling with Reduced Densities

Category Theory

What is an Operad? Part 1

Algebra
Leave a comment!