Please someone tell me why I’m wrong.
It’s possible to match a data set optimally with one parameter.
Model: y=sin(bx), where y is scaled such that all values fall between 0 and 1 exclusive. The difficulty of hitting every point rises with the number of data points, but that just means you need bigger values of b. The “model” will look like an almost fully filled space with a sin curve oscillating so fast it looks like a series of vertical lines. Yet it hits every single point (because in a large enough option space, I can do that) when possible or the exact midpoint when not. Plausibly 100% perfection is impossible in many cases, but a sufficiently close approximation probably is.
If this understanding of overfitting sin waves is correct, doesn’t that suggest a flaw in how we penalize complexity in model-fitting?