Reproducing Kernel Hilbert Spaces

7 minute read

Published:

In this blog post we will examine what Reproducing Kernel Hilbert Spaces (RKHS) are and the important properties that make them useful in statistical machine learning and functional analysis. Without loss of generality, we will take the general metric space $(\mathcal{X}, d)$ to be the standard Euclidean metric space $(\mathbb{R}^n, d)$.

What is a RKHS?

A Reproducing Kernel Hilbert Space $\mathcal{H}$ is a Hilbert space of real-valued functions on $\mathcal{X}$ such that $\forall x \in \mathcal{X}$, the evaluation functional

\[L_x: \mathcal{H} \to \mathbb{R}\]

is bounded. That is,

\[\forall x \in \mathcal{X}, \exists M \in \mathbb{R} \text{ s.t. } ||L_x(f)|| = ||f(x)|| \leq M ||f||_{\mathcal{H}} \text{ for all } f \in \mathcal{H}\]

This is a slightly more abstract way of thinking about the RKHS, but there is also an alternative viewpoint that constructs a Hilbert space directly from a given kernel such that this space has the desired properties.

$\textbf{Definition 1: A Hilbert Space}$, simply defined, is a complete inner product space . Common examples include the space $\mathbb{R}^n, \mathbb{C}^n$, and $L^2[0,1]$, the space consisting of functions

\[f: [0,1] \to \mathbb{R}\]

that are Lebesgue-integrable and that satisfy the property that

\[||f||^2_{\mathcal{H}} = \int_{0}^{1} f^2(x)dx < \infty\]

It is not difficult to show that $L^2[0,1]$ is indeed a Hilbert space (assuming knowledge of Lebesgue integration) although it fails to be a RKHS. Let’s first see that it is indeed a valid Hilbert space.

$\textit{Proof}:$ We need to show that the space $L^2[0,1]$ satisfied the following conditions:

  • Every cauchy sequence in $L^2[0,1]$ is convergent
  • $L^2[0,1]$ is an inner-product space.

For $f, g \in L^2[0,1]$, we define

\[\langle f, g \rangle_{L^2[0,1]} = \int_{0}^{1} f(x)g(x)dx\]

Clearly, $\langle .,. \rangle_{L^2[0,1]}$ is symmetric. We also notice that

\[\langle f, f \rangle_{L^2[0,1]} = \int_{0}^{1} f^2(x)dx = ||f||^2_{L^2[0,1]} \geq 0\]

Lastly, we also have that $\forall \alpha, \beta \in \mathbb{R}$ and $f, g, h \in L^2[0,1]$,

\[\begin{align*} \langle \alpha f + \beta g, h \rangle_{L^2[0,1]} &= \int_{0}^{1} \left[ \alpha f(x) + \beta g(x) \right] h(x) dx\\ &= \alpha \int_{0}^{1} f(x) h(x) dx + \beta \int_{0}^{1} g(x)h(x) dx \\ &= \alpha \langle f, h \rangle_{L^2[0,1]} + \beta \langle g, h \rangle_{L^2[0,1]} \end{align*}\]

This shows that $L^2[0,1]$ is indeed an inner-product space. Now let $\left( f_n\right)_{n=1}^{\infty}$ be a Cauchy sequence in $L^2[0,1]$. This means that, in the function space norm,

\[\lim_{n \to \infty} ||f_n - f||^2 = \lim_{n\to\infty}\left( \int_{0}^{1} (f_n - f)^2\right) = 0\]

for some $f \in L^2[0,1]$. Instead of proving this directly, we indireclty show that this space is isomorphic to $l^2(\mathbb{N})$, the space of all square summable sequences, which is known to be complete.

Let $(\phi_j)_{j=1}^{\infty}$ be an orthonormal basis for $L^2[0,1]$. That is, the basis satisfies the property that

\[||\phi_j||_{L^2[0,1]} = 1 \forall j \in \mathbb{N} \text{ and } \langle \phi_i, \phi_j \rangle = 0 \text{ for all } i \neq j\]

Every $f \in L^2[0,1]$ can be represented as $f = \sum a_j\phi_j$ for some coefficients $(a_j)_{j}^{\infty}$ with $a_j = \langle f, \phi_j\rangle$. By Parseval’s theorem (Wainwright, 2019) ,

\[|| f||^2_{L^2[0,1]} = \sum_{j=1}^{\infty}a_j^2\]

which shows that $f \in L^2[0,1] \iff (a_j)_{j=1}^{\infty} \in l^2(\mathbb{N})$. It is due to this isomorphism that we can conclude that $L^2[0,1]$ is also complete.

How large is the class of RKHS’s?

Wainwright, 2019 demonstrates two viewpoints through which we can think about RKHS’s. One, and slightly more intuitive way, is to start with any positive semi-definite (PSD) kernel $k: \mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}$ and construct a Hilbert space of functions that satisifes the property that for all $x \in \mathbb{R}^n$ and $\forall f \in L^2[0,1]$

\[f(x) = \langle f, k(.,x) \rangle_{L^2[0,1]}\]

This is known as the kernel reproducing property alluding to the fact that the kernel $k$ completely determines the space and function evaluations reduce to inner product computations. Instead of constructing a RKHS from a kernel, we will characterize this class of functions in terms of evaluation functionals. Perhaps this is not intuitive, but in my opinion, it provides a richer mathematical understanding.

In $(1)$ and $(2)$ we defined an RKHS as any Hilbert space in which the evaluation functionals (linear operators on the Hilbert space that perform the transformattion $f \to f(x)$ for any $x \in \mathbb{R}^n$) are bounded. In fact, as Wainwright shows, given any Hilbert space, we can exhibit a PSD kernel with the kernel reproducing property. State formally,

$\textbf{Theorem 1:}$ Let $\mathcal{H}$ be a Hilbert space in which all evaluation functionals are bounded. Then, there exists a unique PSD kernel $k$ that satisfies the kernel reproducing property.

Proof of this theorem can be found in Wainwright, 2019 . The basic idea here is to use the Riesz representation theorem, which guarantees that for any bounded linear functional $L$ on $\mathcal{H}$, there is a unique $g \in \mathcal{H}$ such that $L(f) = \langle g, f\rangle_{\mathcal{H}}$ for any function $f$.

For any bounded linear functional $L_x$, we can pick a unique representer $\phi_x$ from $\mathcal{H}$. This allows us to construct a kernel $k: \mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}$ by

\[k(x,z) = \langle \phi_x, \phi_z \rangle_{\mathcal{H}}\]

Witht this construction, it is trivial to show that $k$ satisfies the reproducing property. Uniqueness of $k$ follows from the uniqueness of the representer functions $\phi_x$ for all $x \in \mathbb{R}^n$.

A simple example here is the linear PSD kernel on $\mathbb{R}^d$ given by $k(x,y)= \langle x, y \rangle$. The associated RKHS consists of functions of the form

\[y \to \sum_{j=1}^{n} \alpha_i \langle y, x_i \rangle = \left\langle y, \sum_{j=1}^{n} \alpha_i x_i\right\rangle\]

Note that these functions are linear so that the underlying RKHS corresponds to the class of all linear maps.

$\textbf{NOTE}:$ One can easily verify that the space of square integrable functions $L^2[0,1]$ that we saw earlier fails to be a RKHS.

Why care about RKHS’s?

RKHS’s were all the rage when the classic Support Vector Machine (SVM) classifier was still popular. Although it’s easy to dismiss this kind of theoretical understanding as outdated in the broader ML community, I find it helpful to understand why stuff works, and there is hardly a better way than to dig into the math however mainstream it might or might not be.

The kernel reproducing property allows us to understand the kernel as a feature map from $\mathbb{R}^n$ to the Hilbert space itself $\mathcal{H}$. Note that inner products in the embedding space reduce to kernel function evaluations since we have that

\[\langle k(.,x), k(.,y) \rangle_{\mathcal{H}} = k(x,y)\]

This might not strike you as important, but it turns out to be very powerful. Oftentimes, the embedding space is excruciatingly high-dimensional so that computing inner products in prohibitive. The kernel function evaluation allows us to compute these inner products very efficiently without having to move into the high dimensional space. This is what has come to be known as the kernel trick.

References

[1] Wainwright MJ. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge: Cambridge University Press; 2019. doi:10.1017/9781108627771