# Understanding Entanglement With SVD

Quantum entanglement is, as you know, a phrase that's jam-packed with meaning in physics. But what you might not know is that the linear algebra behind it is quite simple. If you're familiar with singular value decomposition (SVD), then you're 99% there. My goal for this post is to close that 1% gap. In particular, I'd like to explain something called the Schmidt rank in the hopes of helping the math of entanglement feel a little less... tangly. And to do so, I'll ask that you momentarily forget about the previous sentences. Temporarily ignore the title of this article. Forget we're having a discussion about entanglement. Forget I mentioned that word. And let's start over. Let's just chat math.

## Singular Value Decomposition

SVD is arguably one of the most important, well-known tools in linear algebra. You are likely already very familiar with it, but here's a lightning-fast recap. Every matrix $M$ can be factored as $M=UDV^\dagger$ as shown below, called the singular value decomposition of $M$. The entries of the diagonal matrix $D$ are nonnegative numbers called singular values, and the number of them is equal to the rank of $M$, say $k$. What's more, $U$ and $V$ have exactly $k$ columns, called the left and right singular vectors, respectively.

There are different ways to think about this, depending on which applications you have in mind. I like to think of singular vectors as encoding meaningful "concepts" inherent to $M$, and of singular values as indicating how important those concepts are. For instance, this perspective arises naturally in a study of the learning dynamics of deep neural networks. As another example, you can imagine a matrix whose rows are indexed by people, and whose columns are indexed by movies. The $ij$th entry could be a 0 or 1, indicating whether or not person $i$ has watched movie $j$. In an applied setting—a recommender system, for instance—one may wish to compute a truncated SVD of this matrix. Here, only the top largest singular values are kept. The rest are viewed as containing little information and are set to zero. In this way, the diagonal matrix $D$ operates on a low-dimensional "feature space," which provides a nice way to compress and glean information about the data.

Either way, I like to think of $D$ as providing a bridge between two worlds: information about the columns of $U$ (e.g. people) and information about the columns of $V$ (e.g. movies). Below is a very non-mathematical cartoon of this. You can imagine the thickness of the blue bridge relates to the number of singular values. Lots of singular values? The bridge is wide and lots of information passes through. Only a few singular values? The bridge is narrow and not much information gets through.

An actual mathematical picture is found in a tensor network diagram representation of SVD. There, $D$ really is a bridge! As a visual cue, we might draw the edges adjacent to the blue node as very thick if the number of singular values is large, and draw them to be thin otherwise. This again represents the idea of information "flowing" between systems described by $U$ and $V$.

Alternatively still, if you enjoy thinking of matrices as bipartite graphs, then you might have in mind the graphs below. If we have lots of blue nodes—i.e. lots of singular values—then there are lots of pathways between the pink and green nodes (i.e. people and movies). But if we have only a few blue nodes—i.e. a few singular values—then there are fewer pathways between pink and green.

Either way we wish to visualize it, the role of the singular values—that is, the role of the diagonal matrix $D$—is key. Intuitively, they indicate the amount of "interaction" between the information stored by $U$ and $V$, and they mediate how those interactions contribute to the information represented by the original matrix $M$.

And this precisely idea behind the mathematics of entanglement.

In the context of physics, one simply applies SVD to a particular matrix and then looks at the number of nonzero singular values of that matrix. This is the main idea behind something called the Schmidt rank of a quantum state (explained below), which is an integer that indicates how much entanglement is present.

Entanglement is measured by the number of nonzero singular values of a particular matrix.

So what makes a physicist's application of SVD different from that of, say, someone building a movie recommender system? Well, in physics, your matrix $M$ presumably encodes information about a physical system and takes spatial considerations into account (ex. particles in a lattice). Its entries may also contain complex numbers, and the sum of their squares should satisfy $\sum_{ij}|M_{ij}|^2=1$. In this case—as I'll explain below—$M$ represents a quantum state. But lingo aside, the template is much the same: singular values convey important information about how two things—whether users and movies, or two quantum subsystems—are related.

I could stop here, but I'd like to dig a little deeper. In the next section, let me restate this punchline using slightly more specialized language.

## Singular Values vs. Schmidt Rank

To start, let's back up a bit. In a discussion of physics, what exactly is the matrix to which we apply SVD? In the opening example, we applied SVD to a user-by-movie matrix. But what's going on now?

Rather than starting with a matrix, we instead start with a unit vector. To that end, suppose $\psi$ is any unit vector in a tensor product of vector spaces $\mathbb{C}^n\otimes\mathbb{C}^m$. Here, it's important that our discussion takes place in a tensor product. After all, entanglement is defined between two things (So, if someone were to ask you, "How much entanglement is there?" one proper response would be, "Entanglement between what?"), and in quantum mechanics, the tensor product is the mathematical operation used to combine two systems. Now, if the phrase "tensor product" is unfamiliar to you, I recommend the article "The Tensor Product, Demystified." I think you'll be pleasantly surprised at how easy the concept is!

Alright, now that we have the vector $\psi$, it's easy to get a linear map $\mathbb{C}^m\to\mathbb{C}^n$ from it. Simply reshape the entries of $\psi$ into an $n\times m$ matrix $M$. (Said more formally, look at $\psi$ under the isomorphism $A\otimes B^*\cong\text{hom}(B,A)$ for finite-dimensional vector spaces $A$ and $B$.)

In the language of physics, $\psi$ is called a quantum state, and $M$ is simply the matrix associated with it. More generally, the terms "unit vector" and "quantum state" are synonymous. That's because the squares of the entries of any unit vector define a probability distribution, and—in the context of physics—that probability distribution tells you about the state of the system you're studying. (This is the Born rule.)

But I digress. Let's get back to SVD.

Suppose the singular value decomposition of our matrix $M$ is given by $UDV^\dagger$. Here I'm using the dagger to denote the conjugate transpose of $V$ since we're allowing $M$ to have complex entries. Now I'd like to use this decomposition to rewrite $M$ in a slightly messier way. Let $\mathbf{u}_i$ and $\mathbf{v}_i$ denote the $i$th columns of $U$ and $V$, respectively, and let $d_i$ denote the $i$th singular value of $M$. Then we can expand the matrix $M$ as the following sum, where $k$ is the rank of $M$.

We're almost at the punchline, but let me first introduce a definition and then make one final cosmetic change.

For any two vectors $\mathbf{u}$ and $\mathbf{v}$, the matrix $\mathbf{uv}^\dagger$ is called their outer product. This simple operation is also denoted with a tensor product symbol $\mathbf{u}\otimes \mathbf{v}$ or in physicists' bra-ket notation by $|u\rangle\langle v|$. So for example, if $\mathbf{u}=\begin{bmatrix}1&2&3\end{bmatrix}^\top$ and $\mathbf{v}=\begin{bmatrix}4&5\end{bmatrix}^\top$, then their outer product is the following little matrix.

Why introduce this? Let's think back that expansion of $M$ above. Under the correspondence $\mathbf{uv}^\dagger \leftrightarrow \mathbf{u}\otimes \mathbf{v}$, we can write $\psi$ explicitly using the columns of $U$ and $V$, weighted by the singular values of $M$ like this:

At this point, you might think we haven't done much (and we haven't, really), and yet familiar things are now given new names. In the context of physics, the above decomposition of $\psi$ is called its Schmidt decomposition. The integer $k$, which is the rank of the original matrix $M$, is called its Schmidt rank. And the singular values $d_1,d_2,\ldots, d_k$ are called its Schmidt coefficients.

Although terminology is new, the ingredients aren't. And that's the punchline. Explicitly:

The quantum state $\psi$ is said to be entangled if its Schmidt rank (i.e. number of singular values) is strictly greater than 1, and is not entangled otherwise.

So, do you see the connection with our discussion above? As we stressed earlier, singular values can be thought of as providing a "bridge" between two subsystems. They are a measure of how much interaction exists between them. In the context of physics, this interaction is understood as entanglement.

The upshot is that a large number of singular values—i.e. a high Schmidt rank or a "wide bridge"—corresponds to lots of communication between two subsystems. A small number of singular values—i.e. a low Schmidt rank or a "narrow bridge"—corresponds to little communication. At the lowest extreme, one singular value corresponds to zero entanglement, and we might as well omit the thin bridge in the image below.

Indeed, notice that if the Schmidt rank of $\psi$ is equal to one—that is, if $M$ is a rank one matrix $M=\mathbf{uv}^\dagger$—then we can write $\psi=\mathbf{u\otimes v}$. In the mathematics literature, vectors of this form (i.e. a tensor product of vectors) are sometimes called simple tensors. For this reason, some mathematicians associate entanglement with "linear combinations of simple tensors." By now, I hope the reason is clear.

It all boils down to SVD.

## Back to applications...

Today's post was partly inspired by Daniela Witten's enthusiastic Twitter thread on the many wonders and uses of SVD. I wanted to jump in with today's article to tell you another use of SVD—one which hopefully helps make a complicated idea seem a little simpler. Of course, I've omitted a lot from the discussion, but I hope this was a useful starting point for further reading.

As a closing note, I opened this article with a nod towards data science. Indeed, one doesn't need to make any quantum-y assumptions to talk about SVD, and yet SVD is a key mathematical tool in the study of quantum systems. And interestingly enough, the two conversations are not orthogonal. For instance, here's a recent paper by colleagues at X: Entanglement and Tensor Networks for Supervised Image Classification. There, they measure the amount of entanglement (Schmidt rank) between the top and bottom halves of images in the MNIST handwritten dataset. In other words, they explore the entanglement properties of a standard machine learning dataset. My hope is that today's discussion will help make such papers a little more accessible.

Just remember: whenever you see the word entanglement, think of SVD!

The Schmidt rank of a unit vector $\psi\in\mathbb{C}^n\otimes\mathbb{C}^m$ is the same as the ranks of both operators $MM^\dagger$ and $M^\dagger M$, where $M$ is the matrix associated to $\psi$ described above. These are called reduced density operators, and they define quantum states on $\mathbb{C}^n$ and $\mathbb{C}^m$, respectively. Sound familiar? We've discussed this scenario in great detail when $\psi$ is defined by a classical joint probability distribution. In this case, singular vectors (equivalently, the eigenvectors of $MM^\dagger$ and $M^\dagger M$) have an intuitive, easy-to-understand interpretation. For more, see our First Look at Quantum Probability or my PhD thesis!