(cache) Please someone tell me why I’m wrong. It’s... - Exploring the Interesting

Please someone tell me why I’m wrong.

It’s possible to match a data set optimally with one parameter.

Model: y=sin(bx), where y is scaled such that all values fall between 0 and 1 exclusive. The difficulty of hitting every point rises with the number of data points, but that just means you need bigger values of b. The “model” will look like an almost fully filled space with a sin curve oscillating so fast it looks like a series of vertical lines. Yet it hits every single point (because in a large enough option space, I can do that) when possible or the exact midpoint when not. Plausibly 100% perfection is impossible in many cases, but a sufficiently close approximation probably is.

If this understanding of overfitting sin waves is correct, doesn’t that suggest a flaw in how we penalize complexity in model-fitting?

Notes

di--es---can-ic-ul-ar--es liked this
togglesbloggle liked this
identicaltomyself reblogged this from togglesbloggle and added:
I don’t think the number of prime factors is a good measure of the complexity of a rational number. Every rational...
holyhaddock liked this
fnord888 liked this
molibdenita liked this
lalaithion reblogged this from dataandphilosophy and added:
Epistemic status: Not an expert, but I’ve talked with experts about this sort of thing.When I took Machine Learning, my...
discoursedrome liked this
klchowchow liked this
dataandphilosophy liked this
greaterorequalthanzero liked this
shixunyx liked this
injygo liked this
rgloom liked this
pixilated-pineapple liked this
jadagul reblogged this from dataandphilosophy and added:
The fundamental answer to your question is that it is not possible to do statistics before you have selected a model. It...
sibbinthegivengibbon liked this
star-sabers liked this
approximation-error liked this
stefancertic liked this
spiralingintocontrol reblogged this from dataandphilosophy and added:
so the usual way this gets solved in the Machine Learning side of academia is something called “regularization,” which...
bestfriendrant liked this
oswinoswaldappreciationlife liked this
flowingblades liked this
thirqual reblogged this from dataandphilosophy and added:
1) There is nothing about intuition in what I wrote. 2) Consider the possibility that you do not understand modeling and...
nostalgebraist said: your mental model of overfitting shouldn’t be about # of parameters, it should be about generalization error. criteria based on # of parameters are just proxies for generalization error, and this kind of idea may show why they are not always good proxies.
nostalgebraist said: this model performs terribly on any (x,y) pair you hold out of the training data when you fit it. so just hold out some values, and see how it does. (this is already the way overfitting is typically penalized in practice.)
nostalgebraist said: anyway, even if this works, you can catch it easily with cross-validation (or just validation, period)
nostalgebraist said: i remember thinking about this exact case a while ago. i think i did some math about it, like finding the derivative of the loss wrt b, and concluded that you can’t actually get an arbitrarily good fit.
togglesbloggle reblogged this from dataandphilosophy and added:
In practice, how often do your models diverge from the (polynomial) + epsilon structure? Like, in comparing two...
anthropicprincipal reblogged this from dataandphilosophy and added:
In the Bayesian paradigm, your model is a prior assumption in and of itself, and then you also have some prior on b....
neoliberalism-nightly liked this
dataandphilosophy reblogged this from thirqual and added:
I am looking for something better than “intuitively, these models seem more plausible.” AIC promises, but doesn’t...
spiralingintocontrol said: just penalize the L1 norm of all the parameters? that’s what we learned in my convex optimization class
evolution-is-just-a-theorem liked this
spiralingintocontrol answered: sine waves are bullshit
jbeshir liked this
slythernim answered: I feel like the problem here is an insufficiently useful definition of the word “optimally.” The procedure you suggest is descriptive, but not predictive; it has no modelling value.

← Previous post Next post →

$diffractor$