# June 2021

# Language, Statistics, & Category Theory, Part 1

In the previous post I mentioned a new preprint that John Terilla, Yiannis Vlassopoulos, and I recently posted on the arXiv. In it, we ask a question motivated by the recent successes of the world's best large language models:

What's a nice mathematical framework in which to explain the passage fromprobability distributionson text tosyntacticandsemantic informationin language?

To understand the motivation behind this question, and to recall what a "large language model" is, I'll encourage you to read the opening article from last time. In the next few blog posts, I'll give a tour of mathematical ideas presented in the paper towards answering the question above. I like the narrative we give, so I'll follow it closely here on the blog. You might think of the next few posts as an informal tour through the formal ideas found in the paper.

Now, where shall we begin? What math are we talking about?

Let's start with a simple fact about language.

## Language is algebraic.

By "algebraic," I mean the basic sense in which things combine to form a new thing. We learn about algebra at a young age: given two numbers $x$ and $y$ we can multiply them to get a new number $xy$. We can do something similar in language. Numbers combine to give new numbers, and words and phrases in a language combine to give new expressions. Take the words *red *and *firetruck*, for example. They can be "multiplied" together to get a new phrase: *red firetruck*.

Here, the "multiplication" is just concatenation* — *sticking things side by side. This is a simple algebraic structure, and it's inherent to language. I'm concatenating words together as I type this sentence. That's algebra! Another word for this kind of structure is *compositionality, *where things compose together to form something larger.

So language is algebraic or compositional.

# A Nod to Non-Traditional Applied Math

What is applied mathematics? The phrase might bring to mind historical applications of analysis to physical problems, or something similar. I think that's often what folks mean when they say "applied mathematics." And yet there's a much broader sense in which mathematics is applied, especially nowadays. I like what mathematician Tom Leinster once had to say about this (emphasis mine):

"I hope mathematicians and other scientists hurry up and realize that there’s aglittering arrayof applications of mathematics in which non-traditional areas of mathematics are applied to non-traditional problems. It does no one any favours to keep using the term 'applied mathematics' in its current overly narrow sense."

I'm all in favor of rebranding the term "applied mathematics" to encompass this wider notion. I certainly enjoy applying non-traditional areas of mathematics to non-traditional problems — it's such a vibrant place to be! It's especially fun to take ideas that mathematicians already know lots about, then repurpose those ideas for potential applications in other domains. In fact, I plan to spend some time sharing one such example with you here on the blog.

But before sharing the math— which I'll do in the next couple of blog posts — I want to first motivate the story by telling you about an idea from the field of artificial intelligence* *(AI).

# Linear Algebra for Machine Learning

The TensorFlow channel on YouTube recently uploaded a video I made on some elementary ideas from linear algebra and how they're used in machine learning (ML). It's a very nontechnical introduction — more of a bird's-eye view of some basic concepts and standard applications — with the simple goal of whetting the viewer's appetite to learn more.

I've decided to share it here, too, in case it may be of interest to anyone!

I imagine the content here might be helpful for undergraduate students who are in their first exposure to linear algebra and/or to ML, or for anyone else who's new to the topic and wants to get an idea for what it is and some ways it's used.

The video covers three basic concepts — **vectors** and **matrix factorizations** and **eigenvectors/eigenvalues **— and explains a few ways these concepts arise in ML — namely, as **data representations**, to find **vector embeddings**, and for **dimensionality reduction **techniques, respectively.

Enjoy!

# Warming Up to Enriched Category Theory, Part 2

Let's jump right in to where we left off in part 1 of our warm-up to enriched category theory. If you'll recall from last time, we saw that the set of **truth values** $\{0, 1\}$ and the **unit interval** $[0,1]$ and the **nonnegative extended reals** $[0,\infty]$ were not just *sets* but actually preorders* *and hence categories. We also hinted at the idea that a "category enriched over" one of these preorders (whatever *that *means — we hadn't defined it yet!) looks something like a collection of objects $X,Y,\ldots$ where there is at most one arrow between any pair $X$ and $Y$, and where that arrow can further be "decorated with" —or simply replaced by* — *a number from one of those three exemplary preorders.

With that background in mind, my goal in today's article is to say exactly what a **category enriched over** **a preorder **is. The formal definition — and the intuition behind it — will then pave the way for the notion of a category enriched over an *arbitrary* (and sufficiently nice) category, not just a preorder.

En route to this goal, it will help to make a couple of opening remarks.

## Two things to think about.

First, take a closer look at the picture on the right. I've written "$\text{hom}(X,Y)$" in quotation marks because the notation $\text{hom}(-,-)$ is often used for a *set *of morphisms in ordinary category theory. But the point of this discussion is that we're not just interested in sets! So we should use better notation: let's refer to the number associated to a pair of objects $XY$ and $Y$ as $\mathcal{C}(X,Y)$, where the letter "$\mathcal{C}$" reminds us there's an (enriched) $\mathcal{C}$ategory being investigated.

Second, for the theory to work out nicely, it turns out that preorders need a little *more *added to them.

# Warming Up to Enriched Category Theory, Part 1

*It's no secret that I like category theory.* It's a common theme on this blog, and it provides a nice lens through which to view old ideas in new ways — and to view *new *ideas in new ways! Speaking of new ideas, my coauthors and I are planning to upload a new paper on the arXiv soon. I've really enjoyed the work and can't wait to share it with you. But first, you'll have to know a little something about **enriched category theory**. (And before *that*, you'll have to know something about ordinary category theory... here's an intro!) So that's what I'd like to introduce today.

A warm up, if you will.

## What is enriched category theory?

As the name suggests, it's like a "richer" version of category theory, and it all starts with a simple observation. * (Get your category theory hats on, people. We're jumping right in!)*

In a category, you have some objects and some arrows between them, thought of as relationships between those objects. Now in the formal definition of a category, we usually ask for a *set's worth* of morphisms between any two objects, say $X$ and $Y$. You'll typically hear something like, "The hom set $\text{hom}(X,Y)$ bla bla...."

Now here's the thing. Quite often in mathematics, the set $\text{hom}(X,Y)$ may not just be a set. It could, for instance, be a set equipped with extra structure. You already know lots of examples. Let's think about about linear algebra, for a moment.