Notes on Sauer's Lemma

Introduction Every binary classifier is a function mapping its input, which is an element in an enumerable dataset, to 0 or 1. Equivalently, we could regard the classifier as a function f:N{0,1}. We have a set of hypotheses H from which a function is chosen to maximize the classification accuracy. It is perfect if H contains all possible functions f:N{0,1}, which indicates a universal approximator. However, when the expressiveness of our model is limited by computational cost or the size of the magnitude of parameters, it remains a problem to quantitatively measure the ability to approximate the ground truth. For example, if H only consists of functions which produce 1 only on one data point, namely, ...

July 26, 2024 · 6 min