全 72 件のコメント

[–]farsass 223ポイント224ポイント  (32子コメント)

[–]pin9999 22ポイント23ポイント  (20子コメント)

ELI5: why doesn't deep learning suffer from the curse of dimensionality?

[–]rumblestiltsken 76ポイント77ポイント  (0子コメント)

Because deep learning was unpopular at the time, so none of the other machine learning algorithms wanted it to come along on the expedition when they opened the tomb of dimensionality.

[–]cryptocerous 15ポイント16ポイント  (9子コメント)

IMO - the curse of dimensionality was only ever actually valid as a relative relationship instead of a hard cutoff (when all possible models are available for use.) In this case, curse of dimensionality only creates a relative relation between sample size, dimensionality, and model performance. It is not valid in the way that most laymen interpret it, as a hard cutoff constant that the ratio between sample size and dimensionality cannot surpass or else model performance fails. I.e.,

(a) valid: (sample_size / dimensionality) => greater is generally better

(b) invalid: if (sample_size / dimensionality) > constant => failure

When considered from an information theoretic perspective, it has always been clear that there's no lower bound on how small the (sample_size / dimensionality) ratio can be! Even a single sample providing just a tiny fraction of a single bit can be enough to provide sufficient information for good predictions!

Why's that? There's a third trump card - priors.

As I check Google now, I see that it doesn't come up with any decent general definition of prior, as used in modern machine learning papers. So I'll explain it as this - any type of assumptions about the problem that are imposed by the model, whether intentionally or unintentionally. Realization of priors in a model can take on an unlimited number of forms, from shape of the deep learning circuit e.g. thin and deep, to bayesian priors, to other structure like attention mechanisms. What's important to remember is everything imposes some kind of prior(s), regardless of how general-purpose the model appears to be from your selection of experiments.

Side note: Humans' incredible inadequacy at memorizing even a small number of digits could be interpreted as a strong prior that forces us to give attention to just small parts of mathematical type problems at a time.

[–]say_wot_again 0ポイント1ポイント  (1子コメント)

Realization of priors in a model can take on an unlimited number of forms, from shape of the deep learning circuit e.g. thin and deep,

Wait, I'm confused. Are you basically saying that how you structure your deep network (e.g. what activation functions you use, convolutional and pooling layers in a CNN, etc.) reflect your priors on the underlying functional form?

[–]rumblestiltsken 2ポイント3ポイント  (0子コメント)

Intentional or unintentional. It makes sense, I think.

Like, using a convolutional layer or any other sort of filter is a great example, which reflects an underlying assumption about the data, how it is structured and what sort of processing will be effective.

[–]Hahahahahaga 0ポイント1ポイント  (5子コメント)

Even a single sample providing just a tiny fraction of a single bit can be enough to provide sufficient information for good predictions!

But a fraction of a bit requires more than a bit to represent (._. )

[–]cryptocerous 0ポイント1ポイント  (4子コメント)

Not so, see the ways it's routinely done in data compression.

[–]Hahahahahaga 0ポイント1ポイント  (3子コメント)

You can't compress a bit (._. )

Partial bits are not ever exist.

[–]cryptocerous 0ポイント1ポイント  (1子コメント)

Partial bits do exist mathematically, and there are ways to realize them in real world systems. Each input sample to a model can take up fewer than 1 bit, e.g. arithmetic encoding.

For the very first input bit, you may have to get creative in how exactly you represent that fraction of a bit on your real-word system, but it's not too difficult to do. Then, for successive bits you can potentially continue to pack multiple samples per bit.

Or we can choose to totally ignore digital systems and just look at it mathematically. In that case, it's trivially simple and clear.

For something of a conceptual inverse, see FEC, where each input bit potentially just represents a partial bit with respect to the output.

[–]Hahahahahaga 0ポイント1ポイント  (0子コメント)

(._. ) Sorry, am computer science man. Know what you mean. One weighted sample is represent many like moth is futility of life. :(

[–]quieromas -1ポイント0ポイント  (0子コメント)

Priors still need to be informed by previous analysis. They don't really come for free. Also, you need to be careful about giving your priors too much weight or they can heavily bias your results.

Sorry, I didn't really understand how you're defining the curse of dimensionality either. As far as I know, the curse of dimensionality only refers to things like the distance between 0 and 1 is one, but the distance between 0,0 and 1,1 is sqrt(12 + 12), which is greater than 1, and the distance between 0,0,0 and 1,1,1 is sqrt(12 + 12 + 12) which is greater than the distance between 0,0 and 1,1. The only real solutions are to somehow remove dimensions or shorten distances or use huge amounts of data.

[–]carbohydratecrab 6ポイント7ポイント  (0子コメント)

something something manifold hypothesis

if there's too much dimensionality you're not on the right manifold and obviously need more layers

[–]BadGoyWithAGun 9ポイント10ポイント  (0子コメント)

because something something convolution and dropout and gpus

[–]brockl33 1ポイント2ポイント  (0子コメント)

Stacking layers allows models to create progressively abstract features. For example, pixels are combined into strokes, strokes in to facial features, facial features into facial expressions, etc.

This abstract space is relatively small compared to raw input space. For example, a small change in an abstract facial expression feature may correspond to a dramatic change in nearly all of the pixels.

EDIT: wording

[–]Ahhhhrg 1ポイント2ポイント  (0子コメント)

Here's a serious answer: Apparently for many deep networks there's lots and lots of local minima which are all almost as good as the global minimum, so it doesn't really matter which local minimum you end up with.

Here's LeCun's answer during an AMA, and here's a paper with the details.

[–]richizy 0ポイント1ポイント  (0子コメント)

In images, pixels are more related adjacent than far away, e.g. (x=5,y=5) is more related to (x=5, y=6) than (x=10, y=100). CNNs deal with this local structure pretty well.

[–]energybased 0ポイント1ポイント  (0子コメント)

It does. The challenge of "deep learning" is mitigating the explosion of computation that normally accompanies models with many layers and many parameters.

[–]UshankaDalek 13ポイント14ポイント  (1子コメント)

My favorite part is the x and y axis labels.

[–]grrrgrrr 7ポイント8ポイント  (0子コメント)

have you tried dropout

[–]jcannell 1ポイント2ポイント  (0子コメント)

Near linear increasing layers with increasing layers!

[–]Xirious 0ポイント1ポイント  (3子コメント)

Made /r/machinegoofingoff if you'd like to submit it to there!

[–]t3hcoolness 6ポイント7ポイント  (1子コメント)

I feel like that is so niche that it won't get anything

[–]say_wot_again 1ポイント2ポイント  (0子コメント)

Yeah. Same problem I had with /r/badML.

[–]SeveQStorm -1ポイント0ポイント  (0子コメント)

Absolutely! :D :D :D I think it's already hilarious that I understand that joke! :D

[–]jurniss 69ポイント70ポイント  (7子コメント)

fuck no, i don't code on a white background

[–]Heidric 19ポイント20ポイント  (5子コメント)

I don't get how people can use white background, my eyes were so grateful when I switched to the dark theme.

[–]jaredtobin 5ポイント6ポイント  (0子コメント)

I use the light Solarized theme when I'm outside. Way easier to read when you've got sunlight bouncing around.

(I use the dark Solarized theme when I'm inside, natch)

[–]abstractcontrol 1ポイント2ポイント  (3子コメント)

[–]log_2 37ポイント38ポイント  (1子コメント)

No it doesn't. Some 1980 study with a shitty monitor about fuzziness of bright text on a dark background due to pupil dilation doesn't compare to today's programmers' choice of low contrast dark themes with thick monospaced fonts. The study only showed legibility of text, and did not cover strain of extended use.

[–]Xirious 11ポイント12ポイント  (0子コメント)

Retarded test subjects don't count.

[–]kmike84 0ポイント1ポイント  (0子コメント)

This is ipython notebook

[–]GiskardReventlov 8ポイント9ポイント  (3子コメント)

[–]Xirious 6ポイント7ポイント  (2子コメント)

That's a disappointment.

[–]SeveQStorm 4ポイント5ポイント  (1子コメント)

It's in your hands.

[–]Xirious 0ポイント1ポイント  (0子コメント)

Has been created. Told op they should submit it if they'd like. Also would love more funny AI stuff.

[–]laxatives 8ポイント9ポイント  (0子コメント)

I think this is the first time I've ever seen all 6 frames of this meme used correctly.

[–]PLLOOOOOP 20ポイント21ポイント  (17子コメント)

Keep the memes away, please. See sidebar:

News, Research Papers, Videos, Lectures, Softwares and Discussions on:

  • Machine Learning
  • Data Mining
  • Information Retrieval
  • Predictive Statistics
  • Learning Theory
  • Search Engines
  • Pattern Recognition
  • Analytics

I don't think this is what was intended by "discussion".

[–]codespawner 90ポイント91ポイント  (6子コメント)

You make a good point, but I also find this post very funny. :/

[–]HINDBRAIN 23ポイント24ポイント  (4子コメント)

Yeah but allowing this might eventually lead to a front page full of epic memes about SVM Samantha and Kernel Kevin.

[–]codespawner 10ポイント11ポイント  (1子コメント)

I am personally of the opinion that an occasional break from the serious is enjoyable. Also there's no /r/MachineLearningHumor that I'm aware of.

[–]respeckKnuckles 0ポイント1ポイント  (1子コメント)

tell me more about these characters

[–]zyra_main 0ポイント1ポイント  (0子コメント)

slippery slope my boy, slippery slope

[–]EdwardRaff 4ポイント5ポイント  (1子コメント)

I like the occasional humor, so long as it doesn't get out of hand like when the "one weird trick" stuff was being posted over and over.

[–]PLLOOOOOP -1ポイント0ポイント  (0子コメント)

A like occasional humor as well. Frequent humor, even. But humor is not mutually exclusive from any of the list items from the sidebar - it's fun to see a humorous but also meaningful and informative post. Memes or jokes with no other content are off topic.

[–]tangerinemike 16ポイント17ポイント  (5子コメント)

You don't deserve to be downvoted, this sub has always had a no-memes policy and the mods have been good at maintaining it. Personally I hope it stays that way and, if I'm honest, gets stricter.

The "Hi I'm new to ML but artificial brains are really cool and why isn't my 55 layer recurrent, convolutional extra deep network working on my 20 data points?" posts are getting tedious.

[–]BeatLeJuce[M] 24ポイント25ポイント  (4子コメント)

Unless the other mods delete the threads too quickly for me to even notice, enforcing a 'no-meme policy' (I'm not sure we officially have one?) is no big job: people just don't post any -- and whenever they do, they typically get downvoted/reported very quickly (e.g. this thread has been reported 3 times so far). Personally I agree with what /u/EdwardRaff's said: I don't mind occasional jokes. Which is why I don't intend to remove this post, since judging by the upvote count most people enjoy it. But should the memes/jokes get out of hand, we will definitely enforce a no-jokes policy.

As far as newbie posts, we're trying harder to move those posts to /r/MLQuestions nowadays (Should you spot any posts that we missed, feel free to point people towards it).

[–]tangerinemike 7ポイント8ポイント  (1子コメント)

Thanks for the response, I think you guys do a great job.

[–]PLLOOOOOP 0ポイント1ポイント  (0子コメント)

I don't mind occasional jokes. Which is why I don't intend to remove this post... But should the memes/jokes get out of hand, we will definitely enforce a no-jokes policy.

I support your choice so long as the latter statement is enforced, because it's important to me that this post is not a precedent.

[–]zyra_main 0ポイント1ポイント  (0子コメント)

It is such a small community though, it could be regrettable push the newbies to a sub-subreddit

[–]ajtioch -2ポイント-1ポイント  (0子コメント)

Aww, don't downvote the pedant.

[–]boxstabber 6ポイント7ポイント  (8子コメント)

Sorry to be pedantic, but the code snippet caused my leg to twitch. It's generally a really bad practice to use

from X import *

in Python as anything imported will override your existing namespace (e.g. if X contained a method 'str', good job, now you don't have access to regular 'str' anymore. Even worse, because you likely don't know that X contained 'str' and there's a new 'str' in its place, the substitution will not necessarily generate an Exception, it will just behave differently).

Instead, either import the package X:

import X
X.y
X.z

or import functions, classes that you actually need etc:

from X import y, z
y
z

which has the added benefit of improving your code readability as you make it explicit what parts of the package you will be interfacing with at the top of your module.

[–]SeveQStorm 7ポイント8ポイント  (6子コメント)

I'd propose some kind of prefixing syntax like

from Xbabblediboo import * prefix xb_
xb_whatever('12345')

in the meantime one can use

import Xbabblediboo as xb
xb.whatever('12345')

which at least visually is basically almost the same. However, I don't know which I like more... I guess, it'd be the latter and thus revoke my proposal.

[–]uusu 3ポイント4ポイント  (2子コメント)

Why not just import X

[–]SeveQStorm 12ポイント13ポイント  (0子コメント)

As long as it's just

import X

it's all fine. But in case of a (real world example)

import tensorflow

every call to a member of the TensorFlow module would require to type

tensorflow.foo()
tensorflow.bar()

That's why it's better and a huge time saver to use

import tensorflow as tf
tf.foo()
tf.bar()

[–]L43 1ポイント2ポイント  (1子コメント)

That is a great name for a package.

[–]SeveQStorm 1ポイント2ポイント  (0子コメント)

Yeah, I'm just wondering what it does...

[–]abomb999 0ポイント1ポイント  (2子コメント)

Why did you have a graphic of rolling around in cash for the "what other programmers think of me" category. Are machine learning programmers among the most well payed?

[–]Noncomment 0ポイント1ポイント  (1子コメント)

There was a high demand for deep learning researchers awhile ago. I don't have stats on salaries or anything, but I know Google and other big companies were sucking up a lot of the big names.

[–]j_lyf 1ポイント2ポイント  (0子コメント)

Is that like the academic version of winning the lottery? Now, why hasn't it happened for compressive sensing :P