Stone Weierstrass Theorem
Let $X$ be a topological space and let $C(X,\mathbb{R})$ denote the set of all continous functions $f:X\to\mathbb{R}$.

A family $\mathscr{F}$ of functions is an algebra if for every $f,g\in\mathscr{F}$ and any $c\in \mathbb{R}$, we have $f+g,fg, cf\in\mathscr{F}$.
- ex: $C([a,b],\mathbb{R})$ is an algebra for any closed interval $[a,b]\subset\mathbb{R}$.
An algbera $\mathscr{A}$ of functions $f:X\to \mathbb{R}$ separates points if for every $x,y\in X$ there is a function $f\in \mathscr{A}$ such that $f(x)\neq f(y)$.
- ex: Let $\mathscr{P}\subset C([a,b],\mathbb{R})$ denote the set of polynomials on $[a,b]$. Then $\mathscr{P}$ is a subalgebra which separates points since it contains the polynomial $f(x)=x$.
An algbera $\mathscr{A}$ of functions $f:X\to \mathbb{R}$ vanishes nowhere if for every $x\in X$ there is a function $f\in\mathscr{A}$ such that $f(x)\neq 0$.
- ex: Let $\mathscr{P}\subset C([a,b], \mathbb{R})$ denote the set of polynomials on $[a,b]$. Then $\mathscr{P}$ is a subalgebra that separates points since it contains the polynomial $f(x)=x$.
These definitions come together in today's theorem:

In English, this just says that any continuous, real-valued function on a compact set can be approximated (with respect to the supremum norm) by some function in your algebra $\mathscr{A}$. In other words, choose any continuous function $f:X\to \mathbb{R}$ where $X$ is compact. Then for every $\epsilon >0$, we can find a function $g\in \mathscr{A}$ such that $$\|f-g\|=\sup_{x\in X}\{|f(x)-g(x)|\}<\epsilon.$$ Here's an example: take $X=[a,b]$ to be a closed interval in $\mathbb{R}$ and let $\mathscr{A}$ be the set of all polynomials on $[a,b]$. Then for any continuous $f:[a,b]\to\mathbb{R}$ and any $\epsilon>0$, we can find a polynomial $p:[a,b]\to\mathbb{R}$ such that $\|f-p\|< \epsilon$. This is the familar Weierstrass Approximation Theorem! Any continous function on a closed interval can be approximated - as close as you want - by a polynomial. The Stone Weierstrass Theorem says that this result is still true if we replace $[a,b]$ by any compact set $X$, and if we replace the set of polynomials by any subalgebra of $C(X,\mathbb{R})$ which separates points and vanishes nowhere.
Now you might ask, "Why do we need $X$ to be compact?" and "Why must $\mathscr{A}$ separate points and vanish nowhere?" Today we'll see why these hypotheses are necessary. Next time we'll work through an exercise from Rudin's Principal of Mathematical Analysis (a.k.a. "Baby Rudin") to see the theorem in action.
Why do we need compactness?

The best way to answer this question is to look at a counterexample. So let's consider $X=\mathbb{R}$ and let $\mathscr{A}$ be the subalgebra of all poynomials on $\mathbb{R}$. Then $\mathscr{A}$ separates points and vanishes nowhere, but $X$ is not compact. In this case, the theorem fails since the function $f(x)=e^x$ cannot be approximated by any polynomial! This is because any polynomial $p$ is dominated by its largest term, say $x^n$, and $e^x$ tends to $\infty$ much faster than does $x^n$ (even if $n$ is very large). As a result, the distance between $e^x$ and $x^n$ cannot be made arbitrarily small as $x$ ranges over all of $\mathbb{R}$.
But if we restrict ourselves to a closed and bounded interval, the smaller terms in $p$ have more weight, and this allows us to approximate $e^x$ by $p(x)$ with as much accuracy as we want. And this is exactly what we've all done in undergraduate calculus! You remember those problems. The $n$th degree Taylor polynomial of $e^x$ centered at 0 is $$e^x\approx \sum_{k=0}^n\frac{x^k}{k!},$$ and a typical homework question might've been something like, "For what values of $x$ is this approximation accurate to within 0.00001?". The answer would be $|x|<\delta$ for some constant $\delta$ which you could find using Taylor's Inequality. This $[-\delta,\delta]$ is precisely the compact set we need to restrict to in order to obtain a good approximation.
Why must the algebra separate points?

Again we'll consider a counterexample. This time let $X=[a,b]\subset\mathbb{R}$ and take $\mathscr{A}$ to be the collection of all polynomials $p:[a,b]\to\mathbb{R}$ such that $p(a)=p(b)$. It's easy to check that this forms an algebra, and it clearly does not separate points. To see where the Stone Weierstrass Theorem fails, simply choose any continuous function $f:[a,b]\to\mathbb{R}$ such that $f(a),f(b)\neq p(a),p(b)$. Then we cannot approximate $f$ by any polynomial $p\in\mathscr{A}$ because we can always find an $\epsilon$ such that $\|f-p\|\geq \epsilon.$ In fact, $\epsilon=|f(b)-f(a)|/2$ does the job.

This isn't too hard to show. Let $M=\max\{|f(a)-p(a)|,|f(b)-p(b)|\}$ and observe from the picture above that $\|f-p\|\geq M$. We want to show* $$\|f-p\|\geq M\geq \frac{|f(b)-f(a)|}{2}.$$ To see this, let $m$ denote the common value $p(a)=p(b)$, assume WLOG $f(a)\leq f(b)$, and suppose $f(a)< m < f(b)$. If, for instance, $m=|f(a)+f(b)|/2$, then $M=|f(a)+f(b)|/2$ and the claim is true.
Otherwise, if, say, $m$ lies in-between $|f(a)+f(b)|/2$ and $f(b)$ (see insert on the left), then $M=|f(a)-m|$ which is greater than $|f(a)-f(b)|/2$ as claimed. And if $m< f(a)$ or $m>f(b)$, then $M$ is even larger and again the claim holds.

Alternatively we could also choose $\mathscr{A}$ to be the set of constant functions on $[a,b]$ (this definitley does not separate points). Then, for example, the function $f(x)=e^x$ can't be approximated by any constant $c$ since $\|e^x-c\|$ is bounded below by $\frac{|e^b-e^a|}{2}$ (using the same argument as above).
Why must the algebra vanish nowhere?

Suppose $X=[0,1]$ and let's take $\mathscr{A}$ to be the set of all continuous functions $p:[0,1]\to\mathbb{R}$ such that $p(0)=0$ (one easily checks that this is an algebra). Then any continuous function $f$ which is not zero at zero can't be approximated by any $p\in\mathscr{A}$! The supremum of $|f(x)-p(x)|$ for $x$ in $[0,1]$ is bounded below by $|f(0)-p(0)|=|f(0)|.$ For instance take $f(x)=x+3$. Then $\|f-p\|$ is at least 3.

So there you go! Each of the conditions in the Stone Weierstrass Theorem is indeed necessary. Next week we'll use the theorem to solve this exercise from Baby Rudin:
- (Rudin, PMA #7.20) If $f$ is continuous on $[0,1]$ and if $\int_0^1f(x)x^n\;dx=0$ for all $n=0,1,2,\ldots,$ prove that $f(x)=0$ on $[0,1]$.

Footnote:
* We don't want to let $M=\max\{|f(a)-p(a)|,|f(b)-p(b)|\}$ be our $\epsilon$ since we need $\epsilon$ to be independent of the polynomial $p$. (The negation of the Stone-Weierstrass Theorem says that if $X$ is not compact or if $\mathscr{A}$ is an algebra which does not separate points or does not vanish nowhere, then there exists a function $f\in C(X,\mathbb{R})$ and there exists $\epsilon>0$ such that $\|f-p\|\geq \epsilon$ for all $p\in\mathscr{A}$. The wording implies that $\epsilon$ depends on $f$ only.)
