Linear Algebra for Machine Learning

The TensorFlow channel on YouTube recently uploaded a video I made on some elementary ideas from linear algebra and how they're used in machine learning (ML). It's a very nontechnical introduction — more of a bird's-eye view of some basic concepts and standard applications — with the simple goal of whetting the viewer's appetite to learn more.

I've decided to share it here, too, in case it may be of interest to anyone!

I imagine the content here might be helpful for undergraduate students who are in their first exposure to linear algebra and/or to ML, or for anyone else who's new to the topic and wants to get an idea for what it is and some ways it's used.

The video covers three basic concepts — vectors and matrix factorizations and eigenvectors/eigenvalues — and explains a few ways these concepts arise in ML — namely, as data representations, to find vector embeddings, and for dimensionality reduction techniques, respectively.

Enjoy!

Read More →

Understanding Entanglement With SVD

Quantum entanglement is, as you know, a phrase that's jam-packed with meaning in physics. But what you might not know is that the linear algebra behind it is quite simple. If you're familiar with singular value decomposition (SVD), then you're 99% there. My goal for this post is to close that 1% gap. In particular, I'd like to explain something called the Schmidt rank in the hopes of helping the math of entanglement feel a little less... tangly. And to do so, I'll ask that you momentarily forget about the previous sentences. Temporarily ignore the title of this article. Forget we're having a discussion about entanglement. Forget I mentioned that word. And let's start over. Let's just chat math.

Let's talk about SVD.

Singular Value Decomposition

SVD is arguably one of the most important, well-known tools in linear algebra. You are likely already very familiar with it, but here's a lightening-fast recap. Every matrix $M$ can be factored as $M=UDV^\dagger$ as shown below, called the singular value decomposition of $M$. The entries of the diagonal matrix $D$ are nonnegative numbers called singular values, and the number of them is equal to the rank of $M$, say $k$. What's more, $U$ and $V$ have exactly $k$ columns, called the left and right singular vectors, respectively.

There are different ways to think about this, depending on which applications you have in mind. I like to think of singular vectors as encoding meaningful "concepts" inherent to $M$, and of singular values as indicating how important those concepts are.

Read More →

Matrices as Tensor Network Diagrams

In the previous post, I described a simple way to think about matrices, namely as bipartite graphs. Today I'd like to share a different way to picture matrices—one which is used not only in mathematics, but also in physics and machine learning. Here's the basic idea. An $m\times n$ matrix $M$ with real entries represents a linear map from $\mathbb{R}^n\to\mathbb{R}^m$. Such a mapping can be pictured as a node with two edges. One edge represents the input space, the other edge represents the output space.

That's it!

We can accomplish much with this simple idea. But first, a few words about the picture: To specify an $m\times n$ matrix $M$, one must specify all $mn$ entries $M_{ij}$. The index $i$ ranges from 1 to $m$—the dimension of the output space—and the index $j$ ranges from 1 to $n$—the dimension of the input space. Said differently, $i$ indexes the number of rows of $M$ and $j$ indexes the number of its columns. These indices can be included in the picture, if we like:

This idea generalizes very easily. A matrix is a two-dimensional array of numbers, while an $n$-dimensional array of numbers is called a tensor of order $n$ or an $n$-tensor. Like a matrix, an $n$-tensor can be  represented by a node with one edge for each dimension.

A number, for example, can be thought of as a zero-dimensional array, i.e. a point. It is thus a 0-tensor, which can be drawn as a node with zero edges. Likewise, a vector can be thought of as a one-dimensional array of numbers and hence a 1-tensor. It's represented by a node with one edge. A matrix is a two-dimensional array and hence 2-tensor. It's represented by a node with two edges. A 3-tensor is a three-dimensional array and hence a node with three edges, and so on.

Read More →

The Tensor Product, Demystified

Previously on the blog, we've discussed a recurring theme throughout mathematics: making new things from old things. Mathematicians do this all the time:

  • When you have two integers, you can find their greatest common divisor or least common multiple.
  • When you have some sets, you can form their Cartesian product or their union.
  • When you have two groups, you can construct their direct sum or their free product.
  • When you have a topological space, you can look for a subspace or a quotient space.
  • When you have some vector spaces, you can ask for their direct sum or their intersection.
  • The list goes on!

Today, I'd like to focus on a particular way to build a new vector space from old vector spaces: the tensor product. This construction often come across as scary and mysterious, but I hope to shine a little light and dispel a little fear. In particular, we won't talk about axioms, universal properties, or commuting diagrams. Instead, we'll take an elementary, concrete look:

Given two vectors $\mathbf{v}$ and $\mathbf{w}$, we can build a new vector, called the tensor product $\mathbf{v}\otimes \mathbf{w}$. But what is that vector, really? Likewise, given two vector spaces $V$ and $W$, we can build a new vector space, also called their tensor product $V\otimes W$. But what is that vector space, really?

Read More →

What is an Operad? Part 2

Last week we introduced the definition of an operad: it's a sequence of sets or vector spaces or topological spaces or most anything you like (whose elements we think of as abstract operations), together with composition maps and a way to permute the inputs using symmetric groups. We also defined an algebra over an operad, which a way to realize each abstract operation as an actual operation. Now it's time for some examples!

Read More →

What is an Operad? Part 1

If you browse through the research of your local algebraist, homotopy theorist, algebraic topologist or―well, anyone whose research involves an operation of some type, you might come across the word "operad." But what are operads? And what are they good for? Loosely speaking, operads―which come in a wide variety of types―keep track of various "flavors" of operations. 

Read More →

A Quotient of the General Linear Group, Intuitively

Over the past few weeks, we've been chatting about quotient groups  in hopes of answering the question, "What's a quotient group, really?" In short, we noted that the quotient of a group $G$ by a normal subgroup $N$ is a means of organizing the group elements according to how they fail---or don't fail---to satisfy the property required to belong to $N$. The key point was that there's only one way to belong to $N$, but generally there may be several ways to fail to belong. 

Read More →

A Group and Its Center, Intuitively

Last week we took an intuitive peek into the First Isomorphism Theorem as one example in our ongoing discussion on quotient groups. Today we'll explore another quotient that you've likely come across, namely that of a group by its center.

Read More →

The First Isomorphism Theorem, Intuitively

Welcome back to our little discussion on quotient groups! (If you're just now tuning in, be sure to check out "What's a Quotient Group, Really?" Part 1 and Part 2!) We're wrapping up this mini series by looking at a few examples. I'd like to take my time emphasizing intuition, so I've decided to give each example its own post. Today we'll take an intuitive look at the quotient given in the First Isomorphism Theorem.

Read More →

What's a Quotient Group, Really? Part 2

Today we're resuming our informal chat on quotient groups.  Previously we said that belonging to a (normal, say) subgroup $N$ of a group $G$ just means you satisfy some property. For example, $5\mathbb{Z}\subset\mathbb{Z}$ means "You belong to $5\mathbb{Z}$ if and only if you're divisible by 5". And the process of "taking the quotient" is the simple observation that every element in $G$ either

#1) belongs to N            or            #2) doesn't belong to N

and noting that...

Read More →

What's a Quotient Group, Really? Part 1

I realize that most of my posts for the past, er, few months have been about some pretty hefty duty topics. Today, I'd like to dial it back a bit and chat about some basic group theory! So let me ask you a question: When you hear the words "quotient group," what do you think of? In case you'd like a little refresher, here's the definition...

Read More →

Transitive Group Actions: "Where There's a Will, There's a Way!"

In this post, we visually explore the definition of a transitive group action and see how it relates to the phrase, "Where there's a will, there's a way!"

Read More →

What is Galois Theory Anyway?

Perhaps you've heard of Évariste Galois? (Pronounced "GAL-wah.") You know, the French mathematician who died tragically in 1832 in a duel at the tender age of 20? (Supposedly over a girl! C'est romantique, n'est-ce pas?)  Well, today we're taking a bird's-eye view of his most well-known contribution to mathematics: the appropriately named Galois theory. The goal of this post is twofold...

Read More →

Rational Canonical Form: Example #2 (with Galois Theory)

Last week we saw an example of how to use the rational canonical form (RCF) to classify matrices of a given order in $GL_2(\mathbb{Q})$. Today we have a similar example (taken from CUNY's spring 2015 qualifying exam) where now our matrices have entires in the finite field $F_13$. The fact that our field is $F_13$ instead of $\mathbb{Q}$ actually makes little difference in how to approach the solution, but I think this problem is particularly nice because part of it calls on some Galois Theory.

Read More →

Rational Canonical Form: Example #1

Last time we discussed the rational canonical form (RCF) of a linear transformation, and we mentioned that any two similar linear transformations have the same RCF. It's this fact which allows us to classify distinct linear transformations on a given $F$-vector space $V$ for some field $F$. Today, to illustrate this, we'll work through a concrete example:

Find representatives for the distinct conjugacy classes of matrices of finite order in the multiplicative group of 2x2 matrices with rational entries.

Read More →

Rational Canonical Form: A Summary

This post is intended to be a hopefully-not-too-intimidating summary of the rational canonical form (RCF) of a linear transformation. Of course, anything which involves the word "canonical" is probably intimidating no matter what. But even so, I've attempted to write a distilled version of the material found in (the first half of) section 12.2 from Dummit and Foote's Abstract Algebra.

In sum, the RCF is important because it allows us to classify linear transformations on a vector space up to conjugation. Below we'll set up some background, then define the rational canonical form, and close by discussing why the RCF looks the way it does. Next week we'll go through an explicit example to see exactly how the RCF can be used to classify linear transformations.

Read More →

Finitely Generated Modules Over a PID

We know what it means to have a module $M$ over a (commutative, say) ring $R$. We also know that if our ring $R$ is actually a field, our module becomes a vector space. But what happens if $R$ is "merely" a PID? Answer: A lot. Today we'll look at a proposition, which, thanks to the language of exact sequences, is quite simple and from which the Fundamental Theorem of Finitely Generated Modules over a PID follows almost immediately. The information below is loosely based on section 12.1 of Dummit and Foote' Abstract Algebra.

Read More →

A Little Fact From Group Actions

Today we've got a little post on a little fact relating to group actions. I wanted to write about this not so much to emphasize its importance (it's certainly not a major result) but simply to uncover the intuition behind it.

Read More →

Two Tricks Using Eisenstein's Criterion

Today we're talking about Eisenstein's (not Einstein's!) Criterion - a well-known test to determine whether or not a polynomial is irreducible. In particular, we'll consider two examples where a not-so-obvious trick is needed in order to apply the criterion.

Read More →

Noetherian Rings = Generalization of PIDs

When I was first introduced to Noetherian rings, I didn't understand why my professor made such a big hoopla over these things. What makes Noetherian rings so special? Today's post is just a little intuition to stash in The Back Pocket, for anyone hearing the word "Noetherian" for the first time. 

Read More →

4 Ways to Show a Group is Not Simple

You know the Sylow game. You're given a group of a certain order and are asked to show it's not simple. But where do you start? Here are four options that may be helpful when trying to produce a nontrivial normal subgroup.

Read More →

Algebraic Elements Are Like Limit Points!

When you hear the word closure, what do you think of? I think of wholeness - you know, tying loose ends, wrapping things up, filling in the missing parts. This same idea is behind the mathematician's notion of closure, as in the phrase "taking the closure" of a set. Intuitively this just means adding in any missing pieces so that the result is complete, whole.

Read More →

Constructing the Tensor Product of Modules

Today we talk tensor products. Specifically this post covers the construction of the tensor product between two modules over a ring. But before jumping in, I think now's a good time to ask, "What are tensor products good for?" Here's a simple example where such a question might arise...

Read More →

The Integral Domain Hierarchy, Part 2

In any area of math, it's always good idea to keep a few counterexamples in your back pocket. This post continues part 1 with examples/non-examples from some of the different subsets of integral domains. 

Read More →

The Integral Domain Hierarchy, Part 1

Here is a list of some of the subsets of integral domains, along with the reasoning (a.k.a proofs) of why the bullseye below looks the way it does. Part 2 of this post will include back-pocket examples/non-examples of each.   

Read More →

Ways to Show a Group is Abelian

After some exposure to group theory, you quickly learn that when trying to prove a group $G$ is abelian, checking if $xy=yx$ for arbitrary $x,y$ in $G$ is not always the most efficient - or helpful! - tactic. Here is a (not comprehensive) running tab of other ways you may be able to prove your group is abelian:

Read More →