BERT's success in some benchmarks tests may be simply due to the exploitation of spurious statistical cues in the dataset. Without them it is no better then random.

23 points · 11 hours ago

Oh no doubt... I do believe BERT has value, I doubt some of these benchmarks do... and when looking at what BERT "accomplishes" on these datasets it looks like we practically solved NLP, which creates a fake hype around these new technologies, that's what worries me.

level 2

diggitydata

1 point · 42 minutes ago

Do you have any links for such projects? And dealing with low labels in general? I’m currently looking into trying BERT for a project.

level 1

80 points · 13 hours ago

I feel like this should have made more waves than it did... We keep hearing about all of these new advances in NLP, with a new, better model every few months, achieving unrealistic results. But when someone actually probs the dataset it looks like these models haven't really learned anything of any meaning. These should really make us take a step back from optimizing models and take a hard look at those datasets and whether they really mean anything.

All this time these results really didn't make sense to me... as they require such a high level thinking, as well as a lot of world knowledge.

level 2

melesigenes

39 points · 11 hours ago

It seems to me that the point you’re making in this post is overgeneralizing the paper. Even in the title of this post you say “some” benchmarks (in this case the paper only talks about ART performance of BERT), but in this post you’re trying to say that new better NLP models in general haven’t learned anything of meaning. To make your point you’d have to point out some statistical anomaly in all the benchmarks that BERT improved upon from the then state of the art systems. I think however just in the eye test BERT does seem more effective in NLU tasks.

I agree with your overall point that if anything it’s clear that the benchmarks we use to judge these models imperfectly correlate with human judgment, but this is already widely known and studied problem. It is however quite difficult to come up with even better metrics that correlate better with human ratings.

level 2

SedditorX

4 points · 11 hours ago

I don't think this is a rational conclusion to draw from the paper. If you have some axe to grind with how deep NLP is done, then, sure, start a thread, but your rhetoric certainly isn't supported by the paper.

level 2

evanatyourservice

1 point · 9 hours ago

This is every ml/rl model... they don’t have brains, it’s just self-organizing statistics.

level 1

neato5000

11 points · 11 hours ago

See also the HANS paper which also deserves more attention. https://arxiv.org/abs/1902.01007

level 2

5 points · 10 hours ago

Wow! almost exactly the same conclusion just on another dataset! Looks like a new, and very welcomed, trend...

level 1

AsliReddington

6 points · 12 hours ago

I tried the openAI GPT2 both sizes on colab and man do they spit some BS for summarization tasks. Even the best non-ML approach doesn't spew out of input passage information.

level 2

Best_Mord_Brazil

1 point · 5 hours ago

Are you looking for an extractive summarizer?

level 1

gamerx88

8 points · 9 hours ago

Not to trivialize the paper (I really like their approach and conclusion) and recent advances in ML and NLP, but I think this simply confirms what many researchers and practititioners have suspected for a while.

That inadvertently, some reported advances are to certain degree, the product of overfitting to standardized datasets.

level 2

RSchaeffer

5 points · 7 hours ago

But there's a huge difference between suspecting something and demonstrating it, no?

level 1

fiddlewin

4 points · 6 hours ago

I feel lots of the commenters may have mis-interpreted the paper? It only says these models (BERT and etc.) exploits statistical cues (the presence of "not" and others) for a specific task (ARCT) on a specific dataset. With adverserial samples introduced, BERT's performance was reduced to 50%, compared to 80% of untrained human, which makes sense if we look at BERT v.s. Human in other tasks that requires deep understanding of texts.

In no way did the paper say anything about BERT's ability to learn in other tasks - and it makes sense - learning algorithms never guarantees that the solution it finds is what you intend in the solution space.

level 1

omniron

1 point · 4 hours ago

Text is the representation of broader concepts in a more heuristic, symbolic way .

It makes sense that a system can’t derive an understanding more substantial than basic statistic correlation from purely a text input.

I would expect vqa-type systems to eventually prevail over other nlp type systems.

level 1

ebelilov

1 point · 1 hour ago

next paper: Human success in some benchmarks tests may be simply due to the exploitation of spurious statistical cues in the dataset.

level 1

dalgacik

1 point · just now

I think the main point of this paper is not to claim many of BERT successes are due to the exploitation of spurious cues. The purpose of the paper seems to demonstrate the flaw in a particular NLP task, using the strength of BERT. It is clear to everyone from the beginning that BERT or similar models have no chance to achieve such high accuracy on a task that requires deeper logical reasoning. The original BERT paper does not claim success in the ARCT task. The 77% result comes from the authors of this current paper. So the main message is "if BERT can achieve such a high result, then there must be something wrong with the task design.

level 1

SEFDStuff

-1 points · 11 hours ago

cant wait to read this on the plane

level 1

MrWilsonAndMrHeath

-2 points · 4 hours ago

Than*

level 2

3 points · 4 hours ago

I know! as soon as I posed I noticed it but couldn't find where I can edit the title...

https://www.engadget.com/2019/07/15/intel-neuromorphic-pohoiki-beach-loihi-chips/

More posts from the MachineLearning community

350

Posted by

u/MassivePellfish

5 days ago

News

[N] Intel "neuromorphic" chips can crunch deep learning tasks 1,000 times faster than CPUs

Intel's ultra-efficient AI chips can power prosthetics and self-driving cars They can crunch deep learning tasks 1,000 times faster than CPUs.

Even though the whole 5G thing didn't work out, Intel is is still working on hard on its Loihi "neuromorphic" deep-learning chips, modeled after the human brain. It unveiled a new system, code-named Pohoiki Beach, made up of 64 Loihi chips and 8 million so-called neurons. It's capable of crunching AI algorithms up to 1,000 faster and 10,000 times more efficiently than regular CPUs for use with autonomous driving, electronic robot skin, prosthetic limbs and more.
The Loihi chips are installed on a "Nahuku" board that contains from 8 to 32 Loihi chips. The Pohoiki Beach system contains multiple Nahuku boards that can be interfaced with Intel's Arria 10 FPGA developer's kit, as shown above.
Pohoiki Beach will be very good at neural-like tasks including sparse coding, path planning and simultaneous localization and mapping (SLAM). In layman's terms, those are all algorithms used for things like autonomous driving, indoor mapping for robots and efficient sensing systems. For instance, Intel said that the boards are being used to make certain types of prosthetic legs more adaptable, powering object tracking via new, efficient event cameras, giving tactile input to an iCub robot's electronic skin, and even automating a foosball table.
The Pohoiki system apparently performed just as well as GPU/CPU-based systems, while consuming a lot less power -- something that will be critical for self-contained autonomous vehicles, for instance. " We benchmarked the Loihi-run network and found it to be equally accurate while consuming 100 times less energy than a widely used CPU-run SLAM method for mobile robots," Rutgers' professor Konstantinos Michmizos told Intel.
Intel said that the system can easily scale up to handle more complex problems and later this year, it plans to release a Pohoiki Beach system that's over ten times larger, with up to 100 million neurons. Whether it can succeed in the red-hot, crowded AI hardware space remains to be seen, however.

350

125 comments

302

Posted by

u/ADGEfficiency

https://github.com/ADGEfficiency/teaching-monolith/blob/master/backprop/intro-to-backprop.ipynb

[P] I taught a one day course on backpropagation & neural networks from scratch - here are my materials

I taught a one day course on backpropagation & neural networks from scratch today - here are my materials:

Hopefully it is of of some use to someone :)

302

16 comments

298

Posted by

u/Thomjazz

Link: https://github.com/benedekrozemberczki/awesome-graph-classification

[P] A library of pretrained models for NLP: Bert, GPT, GPT-2, Transformer-XL, XLNet, XLM

Huggingface has released a new version of their open-source library of pretrained transformer models for NLP: PyTorch-Transformers 1.0 (formerly known as pytorch-pretrained-bert).

The library now comprises six architectures:

Google's BERT,
OpenAI's GPT & GPT-2,
Google/CMU's Transformer-XL & XLNet and
Facebook's XLM,

and a total of 27 pretrained model weights for these architectures.

The library focus on:

being superfast to learn & use (almost no abstractions),
providing SOTA examples scripts as starting points (text classification with GLUE, question answering with SQuAD and text generation using GPT, GPT-2, Transformer-XL, XLNet).

It also provides:

a unified API for models and tokenizers,
access to the hidden-states and attention weights,
compatibility with Torchscript...

Install: pip install pytorch-transformers

Quickstart: https://github.com/huggingface/pytorch-transformers#quick-tour

Release notes: https://github.com/huggingface/pytorch-transformers/releases/tag/v1.0.0

Documentation (work in progress): https://huggingface.co/pytorch-transformers/

298

22 comments

280

Posted by

u/luminoumen

3 days ago

Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches

arxiv.org/pdf/19...

280

53 comments

262

Posted by

u/benitorosenberg

6 days ago

Research

[R] A repository of graph classification research papers with implementations (deep learning, graph kernels, fingerprints, factorization)

The repository covers techniques such as deep learning, graph kernels, statistical fingerprints and factorization. I monthly update it with new papers when something comes out with code.

262

11 comments

234

Posted by

u/ououwen

5 days ago

How the Transformers broke NLP leaderboards

[P] TensorFlow DICOM Medical Imaging Decoder Operation

Hello, I wanted to share something our team has been working on for a while. I work on an early stage radiology imaging company where we have a blessing and curse of having too much medical imaging data. Something we found internally useful to build was a DICOM Decoder Op for TensorFlow. We are making this available open-source here: https://github.com/gradienthealth/gradient_decode_dicom.

DICOM is an extremely broad standard, so we try to cover the 90% case of image formats (PNG, TIFF, BMP, JPEG, JPEG2000) by relying on the past work folks have done for DCMTK. DCMTK is also largely considered an industry standard when it comes to parsing DICOMs. We also support multi-frame/multi-frame color images. Try images found here: https://barre.dev/medical/samples/. In the case an unsupported format is found, an empty Tensor is returned which can be filtered out. Reading the files directly off of bucket storage has allowed us to prevent data duplication of .dcm data (a single CT can be 300MB). You can play with the op in this Colab notebook: https://colab.research.google.com/drive/1MdjXN3XkYs_mSyVtdRK7zaCbzkjGub_B

We firmly believe that having open-source resources in healthcare is what will enable its use in practice, not AI trade secrets. We plan on opening more of our work in the future. DM me if there is interest in contributing to upcoming toolkits (the next one we are thinking of creating is an operation to decrypt+decompress gzip files). Also, lmk if there is interest in working with our dataset (~300M DICOMs + notes). The goal of these project collaborations is that they are ultimately open-sourced.

Anyway, give the operation a try. If there are problems with loading a file of interest, please make an issue on GitHub. Right now only Linux based systems are supported, and a Dockerfile example will be coming soon.

234

34 comments

222

Posted by

u/milaworld

1 day ago

Discussion

[D] How the Transformers broke NLP leaderboards

I came across this interesting article about whether larger models + more data = progress in ML research.

Excerpt:

The focus of this post is yet another problem with the leaderboards that is relatively recent. Its cause is simple: fundamentally, a model may be better than its competitors by building better representations from the available data - or it may simply use more data, and/or throw a deeper network at it. When we have a paper presenting a new model that also uses more data/compute than its competitors, credit attribution becomes hard.

The most popular NLP leaderboards are currently dominated by Transformer-based models. BERT received the best paper award at NAACL 2019 after months of holding SOTA on many leaderboards. Now the hot topic is XLNet that is said to overtake BERT on GLUE and some other benchmarks. Other Transformers include GPT-2, ERNIE, and the list is growing.

The problem we’re starting to face is that these models are HUGE. While the source code is available, in reality it is beyond the means of an average lab to reproduce these results, or to produce anything comparable. For instance, XLNet is trained on 32B tokens, and the price of using 500 TPUs for 2 days is over $250,000. Even fine-tuning this model is getting expensive.

Wait, this was supposed to happen!

On the one hand, this trend looks predictable, even inevitable: people with more resources will use more resources to get better performance. One could even argue that a huge model proves its scalability and fulfils the inherent promise of deep learning, i.e. being able to learn more complex patterns from more information. Nobody knows how much data we actually need to solve a given NLP task, but more should be better, and limiting data seems counter-productive.

On that view - well, from now on top-tier NLP research is going to be something possible only for industry. Academics will have to somehow up their game, either by getting more grants or by collaborating with high-performance computing centers. They are also welcome to switch to analysis, building something on top of the industry-provided huge models, or making datasets.

However, in terms of overall progress in NLP that might not be the best thing to do. The chief problem with the huge models is simply this:

“More data & compute = SOTA” is NOT research news.

If leaderboards are to highlight the actual progress, we need to incentivize new architectures rather than teams outspending each other. Obviously, huge pretrained models are valuable, but unless the authors show that their system consistently behaves differently from its competition with comparable data & compute, it is not clear whether they are presenting a model or a resource.

Furthermore, much of this research is not reproducible: nobody is going to spend $250,000 just to repeat XLNet training. Given the fact that its ablation study showed only 1-2% gain over BERT in 3 datasets out of 4, we don’t actually know for sure that its masking strategy is more successful than BERT’s.

At the same time, the development of leaner models is dis-incentivized, as their task is fundamentally harder and the leaderboard-oriented community only rewards the SOTA. That, in its turn, prices out of competitions academic teams, which will not result in students becoming better engineers when they graduate.

Entire article:

https://hackingsemantics.xyz/2019/leaderboards/

222

47 comments

214

Posted by

u/Skonagog

1 day ago

Research

[R] ∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing

Research

arxiv.org/abs/19...

214

48 comments

164

Posted by

u/OppositeMidnight

2 days ago

https://github.com/firmai/machine-learning-asset-management

[P] Comprehensive Machine Learning Trading Strategies - Google Colab

100+ Machine Learning Trading Strategies

- Deep Learning

- Reinforcement Learning

- Evolutionary Strategies

- Stacked Models

If you are interested in industry machine learning for python, feel free to sign up to my newsletter: https://mailchi.mp/ec4942d52cc5/firmai

164

7 comments

160

Posted by

u/NoamBrown

AMA: We are Noam Brown and Tuomas Sandholm, creators of the Carnegie Mellon / Facebook multiplayer poker bot Pluribus. We're also joined by a few of the pros Pluribus played against. Ask us anything!

Hi all! We are Noam Brown and Professor Tuomas Sandholm. We recently developed the poker AI Pluribus, which has proven capable of defeating elite human professionals in six-player no-limit Texas hold'em poker, the most widely-played poker format in the world. Poker was a long-standing challenge problem for AI due to the importance of hidden information, and Pluribus is the first AI breakthrough on a major benchmark game that has more than two players or two teams. Pluribus was trained using the equivalent of less than $150 worth of compute and runs in real time on 2 CPUs. You can read our blog post on this result here.

We are happy to answer your questions about Pluribus, the experiment, AI, imperfect-information games, Carnegie Mellon, Facebook AI Research, or any other questions you might have! A few of the pros Pluribus played against may also jump in if anyone has questions about what it's like playing against the bot, participating in the experiment, or playing professional poker.

We are opening this thread to questions now and will be here starting at 10AM ET on Friday, July 19th to answer them.

EDIT: Thanks for the questions everyone! We're going to call it quits now. If you have any additional questions though, feel free to post them and we might get to them in the future.

160

138 comments

139

Posted by

u/mrdbourke

2 days ago

https://youtu.be/btLxTTkSZuY

[P] I made a video review of The Hundred-Page Machine Learning Book

I enjoy learning about machine learning.

And I enjoy making videos.

So I made a video reviewing The Hundred-Page Machine Learning Book by Andriy Burkov.

I read it from the perspective of a machine learning engineer and still learned a bunch.

If you haven't checked out the book, it's a great concise read. There's nothing like a complex topic explained simply.

If you do watch the video, any advice on ways to improve or future reviews/topics would be greatly appreciated.

139

14 comments

136

Posted by

u/actuallynotcanadian

7 days ago

Discussion

[D] Pointless PhD for Machine Learning career advancement?

Dear r/MachineLearning, I am a STEM graduate who became interested after my Master's into making research in Machine Learning. I was promised a PhD, the opportunity to do cutting-edge research, real world applications and a "close" cooperation with industrial partners. But after having spent a few months reading and discussing with supervisors, a lot of work I am considered to do is centered around metaheuristic search and evolutionary computation. And although, I find it fascinating and there is some application to machine learning / DNNs, as well as companies like Uber and Cognizant are adopting it, I feel like it has too much of a niche quality and mainstream interest seems not to be catching-up with it. In case if there is any at all to begin with.

I thought it might be helpful to ask you guys, to get a neutral outside-of-the-box opinion.

Particularly, as over the last month I live and work in a scientific bubble and my prior background is not AI/ML or Computer Science to begin with. So anyone might just be able to claim anything to me without me having the ability to evaluate their claims or get any decent outside criticism.

Edit: Wow, I am overwhelmed by the response!

136

56 comments

131

Posted by

u/bering_team

[P] Structure-preserving dimensionality reduction in very large datasets

Hi there, we're a London-based research team working on clinical applications of machine learning. Recently, we've been dealing a lot with clinical datasets that exceed 1M+ observations and 20K+ features. We found that traditional dimensionality reduction and feature extraction methods don't deal well with this data without subsampling and are actually quite poor at preserving both global and local structures of the data. To address these issues, we've been looking into Siamese Networks for non-linear dimensionality reduction and metric learning applications. We are making our work available through an open-source project: https://github.com/beringresearch/ivis

So far, we've applied ivis to single cell datasets, images, and free text - we're really keen to see what other applications could be enabled! We've also ran a large number of benchmarks looking at both accuracy of embeddings and processing speed - https://bering-ivis.readthedocs.io/en/latest/timings_benchmarks.html - and can see that ivis begins to stand out in datasets with 250K+ observations. We're really excited to make this project open source - there's so much for Siamese Networks beyond one-shot learning!

EDIT: wow - thank you so much for so many wonderful questions, comments, and criticisms! We had a lot of fun addressing them - we're now off to do some barbecuing before the evening is out, but we'll be back tomorrow to answer any further questions!

131

52 comments

124

Posted by

u/NNFAK

2 days ago