# Rational Canonical Form: A Summary

## The Basic Idea

This post is intended to be a hopefully-not-too-intimidating summary of the rational canonical form (RCF) of a linear transformation. Of course, anything which involves the word "canonical" is probably intimidating *no matter what*. But even so, I've attempted to write a distilled version of the material found in (the first half of) section 12.2 from Dummit and Foote's *Abstract Algebra*.

In sum, the RCF is important because it allows us to classify linear transformations on a vector space *up to conjugation.* Below we'll set up some background, then define the rational canonical form, and close by discussing *why* the RCF looks the way it does. Next week we'll go through an explicit example to see exactly how the RCF can be used to classify linear transformations.

## From English to Math

### The Background

Let $F$ be a field and let $V$ be a finite dimensional $F$-vector space. Given a linear transformation $T:V\to V$, we can simultaneously view $V$ as an $F[x]$-module by defining the action $$x\cdot v := Tv$$ for any vector $v\in V$. Then for any polynomial $p(x)\in F[x]$, we know what the action $p(x)\cdot v$ "looks like," namely if $p(x)=a_nx^n+\cdots+a_1x+a_0$ with $a_i\in F$, then $$ \begin{align} p(x)\cdot v &= (a_nT^n+\cdots+a_1T+a_0I)v\\ &= a_nT^nv + \cdots + a_1Tv+a_0v \end{align} $$ where $I:V\to V$ is the identity. Notice that the last line is indeed another vector in $V$ and so this action makes sense. (And one can easily check that the module axioms are satisfied.)

One key observation is that $V$ is a *torsion** $F[x]$-module. In particular, if $m_T(x)$ denotes the minimal polynomial of $T$, then for any $v\in V$ we have $m_T(x)v=0$ (since $m_T(x)v=m_T(T)v$ and $m_T(T)=0$ is the zero-linear transformation by definition). What's more, $V$ is finitely generated (it has a finite number of basis vectors) by assumption. So since $F[x]$ is a PID (as $F$ is a field), we may conclude
$$V\cong F[x]/(a_1(x))\oplus\cdots\oplus F[x]/(a_d(x))$$
where $a_1(x)\mid\cdots\mid a_d(x)$ are the **invariant factors** of $V$ and where $a_d(x)=m_T(x)$. (This is the Fundamental Theorem of Finitely Generated Modules over a PID, the subject of last week's post.) These invariant factors play a key role in defining the rational canonical form.

### The Rational Canonical Form

Let $a(x)=x^k+b_{n-1}x^{k-1}+\cdots+b_1x+b_0$ be any monic polynomial in $F[x]$. Construct a $k\times k$ matrix by placing $1's$ along the subdiagonal and the negative of all the coefficients of $a(x)$ - *except* the leading term - along the last column. We call this the **companion matrix** of $a(x)$:
$$ \mathscr{C}_{a(x)}:= \begin{pmatrix}0 & 0 & \cdots &\cdots& \cdots & -b_0\\
1 & 0 & \cdots &\cdots& \cdots &-b_1\\
0 & 1 & \cdots &\cdots& \cdots &-b_2\\
0 & 0 &\ddots & & &\vdots\\
\vdots & \vdots & & \ddots & &\vdots\\
0 & 0 &\ldots & \ldots & 1 & -b_{k-1}
\end{pmatrix}.$$

(For example if $a(x)=x^2+x+1\in \mathbb{Q}(x)$, then $\mathscr{C}_{a(x)}=\bigl( \begin{smallmatrix} 0 & -1\\ 1 & -1 \end{smallmatrix} \bigr).$ )

Letting $n=\text{dim}(V)$, we define the **rational canonical form** of $T:V\to V$ to be the $n\times n$ (block-diagonal) matrix
$$ \begin{pmatrix} \mathscr{C}_{a_1(x)} & & & &\\
& \mathscr{C}_{a_2(x)} & & &\\
& & \ddots &&\\
&&& & \mathscr{C}_{a_d(x)}
\end{pmatrix}$$
where the polynomials $a_1(x)\mid a_2(x)\mid\cdots \mid a_d(x)$ are the invariant factors of the representation of $V$ from above.

### But *Why*?

Recall that our vector space $V$ can be written as the direct sum
$$V\cong F[x]/(a_1(x))\oplus\cdots\oplus F[x]/(a_d(x)).$$
For the moment, let's focus our attention on just one of the factors $$F[x]/(a(x))$$ (I've taken off the index for simplicity). Just like $V$, this quotient space wears two hats: it is both an $F[x]$-module *and* it is an $F$-vector space! As an $F$-vector space, it has the basis $\mathscr{B}=\{\bar{1},\bar{x},\ldots,\overline{x}^{k-1}\}$ where $\bar{x}:=x+(a(x))$ indicates a coset (this is the example from Dummit and Foote, section 11.1, following Proposition 1). And just like $V$, this quotient space gets "acted on" by the linear transformation $T$ via $$Tv=x\cdot v$$ where *here* $v$ is a vector (i.e. a coset!) in $F[x]/(a(x))$.

A natural question to ask is, "What is the matrix representation of $T:F[x]/(a(x))\to F[x]/(a(x)) $ with respect to the basis $\mathcal{B}$?" From undergrad linear algebra we know the $i$th column of this matrix is simply the coefficients (with respect to $\mathcal{B}$) of the image of the $i$th basis vector under $T$. Explicitly:

The last line follows because $$ \begin{align} \overline{x}^k=x^k+(a(x))&= a(x)-(b_{k-1}x^{k-1}+\cdots+b_0)+(a(x))\\ &= -b_{k-1}x^{k-1}-\cdots-b_0+(a(x))\\ &=-b_{k-1}\overline{x}^{k-1}-\cdots-b_0\overline{1}. \end{align} $$

Notice this matrix representation for $T$ is the precisely the companion matrix of $a(x)$! But keep in mind, we obtained it by restricting the action of $T$ to just *one* of the factors $F[x]/(a(x))$ in the direct sum above. To obtain the matrix representation of $T$ as it acts on the entire space $V$, we need to take the direct sum** of all the companion matrices. This corresponds to writing down the matrix of $T:V\to V$ with respect to the basis $\mathscr{B}=\mathscr{B}_1\cup \cdots\cup \mathscr{B}_m$ where $\mathscr{B}_i$ is the basis for $F[x]/(a_i(x))$. And this is exactly how we defined the rational canonical form of $T$ above.

As a final remark, **any two similar matrices (or equivalently, similar linear transformations) share the same rational canonical form.** (See Dummit and Foote, section 12.2 Theorem 15.) This means that the RCF acts like a "name tag" because it allows us to find representatives of the distinct conjugacy classes of linear transformations of a given order.

Next week we'll illustrate this fact with an explicit example.

*Footnotes:*

* Recall: to say an $R$-module $M$ is *torsion* means for every $m\in M$ there exists a nonzero $r\in R$ such that $rm=0$.

** By definition, the *direct sum* of matrices $A_1,\ldots, A_d$ is the block diagonal matrix with the $A_i$'s down the diagonal and zeros elsewhere.