Math3ma
Entropy + Algebra + Topology = ?

Today I'd like to share a bit of math involving ideas from information theory, algebra, and topology. It's all in a new paper I've recently uploaded to the arXiv, whose abstract you can see on the right. The paper is short — just 11 pages! Even so, I thought it'd be nice to stroll through some of the surrounding mathematics here.
To introduce those ideas, let's start by thinking about the function $d\colon[0,1]\to\mathbb{R}$ defined by $d(x)=-x\log x$ when $x>0$ and $d(x)=0$ when $x=0$. Perhaps after getting out pencil and paper, it's easy to check that this function satisfies an equation that looks a lot like the product rule from Calculus:

Functions that satisfy an equation reminiscent of the "Leibniz rule," like this one, are called derivations, which invokes the familiar idea of a derivative. The nonzero term $-x\log x$ above may also look familiar to some of you. It's an expression that appears in the Shannon entropy of a probability distribution. A probability distribution on a finite set $\{1,\ldots,n\}$ for $n\geq 1$ is a sequence $p=(p_1,\ldots,p_n)$ of nonnegative real numbers satisfying $\sum_{i=1}^np_i=1$, and the Shannon entropy of $p$ is defined to be

Now it turns out that the function $d$ is nonlinear, which means we can't pull it out in front of the summation. In other words, $H(p)\neq d(\sum_ip_i).$ Even so, curiosity might cause us to wonder about settings in which Shannon entropy is itself a derivation. One such setting is described in the paper above, which shows a correspondence between Shannon entropy and derivations of (wait for it...) topological simplices!
Language, Statistics, & Category Theory, Part 2
Part 1 of this mini-series opened with the observation that language is an algebraic structure. But we also mentioned that thinking merely algebraically doesn't get us very far. The algebraic perspective, for instance, is not sufficient to describe the passage from probability distributions on corpora of text to syntactic and semantic information in language that wee see in today's large language models. This motivated the category theoretical framework presented in a new paper I shared last time. But even before we bring statistics into the picture, there are some immediate advantages to using tools from category theory rather than algebra. One example comes from elementary considerations of logic, and that's where we'll pick up today.
Let's start with a brief recap.

Language, Statistics, & Category Theory, Part 1
In the previous post I mentioned a new preprint that John Terilla, Yiannis Vlassopoulos, and I recently posted on the arXiv. In it, we ask a question motivated by the recent successes of the world's best large language models:
What's a nice mathematical framework in which to explain the passage from probability distributions on text to syntactic and semantic information in language?
To understand the motivation behind this question, and to recall what a "large language model" is, I'll encourage you to read the opening article from last time. In the next few blog posts, I'll give a tour of mathematical ideas presented in the paper towards answering the question above. I like the narrative we give, so I'll follow it closely here on the blog. You might think of the next few posts as an informal tour through the formal ideas found in the paper.
Now, where shall we begin? What math are we talking about?
Let's start with a simple fact about language.
Language is algebraic.
By "algebraic," I mean the basic sense in which things combine to form a new thing. We learn about algebra at a young age: given two numbers $x$ and $y$ we can multiply them to get a new number $xy$. We can do something similar in language. Numbers combine to give new numbers, and words and phrases in a language combine to give new expressions. Take the words red and firetruck, for example. They can be "multiplied" together to get a new phrase: red firetruck.

Here, the "multiplication" is just concatenation — sticking things side by side. This is a simple algebraic structure, and it's inherent to language. I'm concatenating words together as I type this sentence. That's algebra! Another word for this kind of structure is compositionality, where things compose together to form something larger.
So language is algebraic or compositional.
A Nod to Non-Traditional Applied Math
What is applied mathematics? The phrase might bring to mind historical applications of analysis to physical problems, or something similar. I think that's often what folks mean when they say "applied mathematics." And yet there's a much broader sense in which mathematics is applied, especially nowadays. I like what mathematician Tom Leinster once had to say about this (emphasis mine):
"I hope mathematicians and other scientists hurry up and realize that there’s a glittering array of applications of mathematics in which non-traditional areas of mathematics are applied to non-traditional problems. It does no one any favours to keep using the term 'applied mathematics' in its current overly narrow sense."
I'm all in favor of rebranding the term "applied mathematics" to encompass this wider notion. I certainly enjoy applying non-traditional areas of mathematics to non-traditional problems — it's such a vibrant place to be! It's especially fun to take ideas that mathematicians already know lots about, then repurpose those ideas for potential applications in other domains. In fact, I plan to spend some time sharing one such example with you here on the blog.
But before sharing the math— which I'll do in the next couple of blog posts — I want to first motivate the story by telling you about an idea from the field of artificial intelligence (AI).
Linear Algebra for Machine Learning
The TensorFlow channel on YouTube recently uploaded a video I made on some elementary ideas from linear algebra and how they're used in machine learning (ML). It's a very nontechnical introduction — more of a bird's-eye view of some basic concepts and standard applications — with the simple goal of whetting the viewer's appetite to learn more.
I've decided to share it here, too, in case it may be of interest to anyone!
I imagine the content here might be helpful for undergraduate students who are in their first exposure to linear algebra and/or to ML, or for anyone else who's new to the topic and wants to get an idea for what it is and some ways it's used.
The video covers three basic concepts — vectors and matrix factorizations and eigenvectors/eigenvalues — and explains a few ways these concepts arise in ML — namely, as data representations, to find vector embeddings, and for dimensionality reduction techniques, respectively.
Enjoy!