Combinatorics

Introduction Every binary classifier is a function mapping its input, which is an element in an enumerable dataset, to 0 or 1. Equivalently, we could regard the classifier as a function $f : N \to {0, 1}$ . We have a set of hypotheses $H$ from which a function is chosen to maximize the classification accuracy. It is perfect if $H$ contains all possible functions $f : N \to {0, 1}$ , which indicates a universal approximator. However, when the expressiveness of our model is limited by computational cost or the size of the magnitude of parameters, it remains a problem to quantitatively measure the ability to approximate the ground truth. For example, if $H$ only consists of functions which produce 1 only on one data point, namely, ...