Please someone tell me why I’m wrong.
It’s possible to match a data set optimally with one parameter.
Model: y=sin(bx), where y is scaled such that all values fall between 0 and 1 exclusive. The difficulty of hitting every point rises with the number of data points, but that just means you need bigger values of b. The “model” will look like an almost fully filled space with a sin curve oscillating so fast it looks like a series of vertical lines. Yet it hits every single point (because in a large enough option space, I can do that) when possible or the exact midpoint when not. Plausibly 100% perfection is impossible in many cases, but a sufficiently close approximation probably is.
If this understanding of overfitting sin waves is correct, doesn’t that suggest a flaw in how we penalize complexity in model-fitting?
You’re right about a sine wave hitting every point with a single, high enough, parameter.
The second part of this post confuses me though. That’s why we try not to use ‘number of parameters to tune’ as a measure of complexity in a model. Sometimes this is a good measure, and it’s very simple, but it usually isn’t very good. Especially because the same model can have different numbers of parameters depending on how it’s written.
If I were fitting a sine wave to a dataset, I would probably add a penalization term to the error function that was proportional to b. Maybe b^2.
OK, so you say that “we try not to use ‘number of parameters to tune’ as a measure of complexity in a model.”
I was taught AIC and BIC in my econ undergrad, which was fairly theory heavy. No other ways of penalizing overfitting.
@identicaltomyself agrees that this is terrible, but continues to use AIC because what else are you going to do?
Please, teach me the forbidden tuning methods! I am looking for something that adequately penalizes overly complex models, because, as this makes clear, complexity is not just about number of parameters.