X
Innovation

Deep learning godfathers Bengio, Hinton, and LeCun say the field can fix its flaws

Yoshua Bengio, Geoffrey Hinton, and Yann LeCun took the stage in Manhattan at an AI conference to present a united front about how deep learning can move past obstacles like adversarial examples and maybe even gain common sense.
Written by Tiernan Ray, Senior Contributing Writer
hinton-lecun-bengio-aaai20-feb-9-2020.png

From left, Geoffrey Hinton, Yann LeCun, Yoshua Bengio.

Tiernan Ray for ZDNet

Artificial intelligence has to go in new directions if it's to realize the machine equivalent of common sense, and three of its most prominent proponents are in violent agreement about exactly how to do that.

Yoshua Bengio of Canada's MILA institute, Geoffrey Hinton of the University of Toronto, and Yann LeCun of Facebook, who have called themselves co-conspirators in the revival of the once-moribund field of "deep learning," took the stage Sunday night at the Hilton hotel in midtown Manhattan for the 34th annual conference of the Association for the Advancement of Artificial Intelligence.

The three, who were dubbed the "godfathers" of deep learning by the conference, were being honored for having received last year's Turing Award for lifetime achievements in computing. 

Each of the three scientists got a half-hour to talk, and each one acknowledged numerous shortcomings in deep learning, things such as "adversarial examples," where an object recognition system can be tricked into misidentifying an object just by adding noise to a picture.

"There's been a lot of talk of the negatives about deep learning," LeCun noted.

Each of the three men was confident that the tools of deep learning will fix deep learning and lead to more advanced capabilities.

The big idea shared by all three is that the solution is a form of machine learning called "self-supervised," where something in data is deliberately "masked," and the computer has to guess its identity.

For Hinton, it's something called "capsule networks," which are like convolutional neural networks widely used in AI, but with parts of the input data deliberately hidden. LeCun, for his part, said he borrowed from Hinton to create a new direction in self-supervised learning. 

"Self-supervised is training a model to fill in the blanks," LeCun said.

"This is what is going to allow our AI systems to go to the next level," said LeCun. "Some kind of common sense will emerge." 

And Bengio talked about how machines could generalize better if trained to spot subtle changes in the data caused by the intervention of an agent, a form of cause and effect inference. 

In each case, masking information and then guessing it is made possible by a breakthrough in 2017 called the "Transformer," made by Google scientists. The Transformer has become the basis for surprising language learning, such as OpenAI's "GPT" software. Transformer exploits the notion of "attention," which is what will allow a computer to guess what's missing in masked data. (You can watch a replay of the talks and other sessions on the conference website.)

The prominent panel appearance by the deep learning cohort was a triumphant turnaround for a sub-discipline of AI that had once been left for dead, even by the conference itself. It was a bit paradoxical, too, because all three talks seemed to borrow terms that are usually identified as belonging to the opposing strain in AI, the "symbolic" AI theorists, who were the ones who dismissed Bengio and Hinton and LeCun years ago. 

"And yet, some of you speak a little disparagingly of the symbolic AI world," said the moderator, MIT professor Leslie Kaelbling, noting the borrowing of terms. "Can we all be friends or can we not?" she asked, to much laughter from the audience. 

Hinton, who was standing at the panel table rather than taking a seat, dryly quipped, "Well, we've got a long history, like," eliciting more laughter. 

"The last time I submitted a paper to AAAI, it got the worst review I've ever got, and it was mean!" said Hinton.

"It said, Hinton's been working on this idea for seven years and nobody's interested, it's time to move on," Hinton recalled, eliciting grins from LeCun and Bengio, who also labored in obscurity for decades until deep learning's breakthrough year in 2012. "It takes a while to forget that," Hinton said, though perhaps it was better to forget the past and move forward, he conceded. 

Bengio and LeCun look on as Hinton describes mistreatment in the bad days before deep learning broke through. "The last time I submitted a paper to AAAI, it got the worst review I've ever got, and it was mean!"

Tiernan Ray for ZDNet

Kaelbling's question struck home because there were allusions in the three scientists' talk about how their work is frequently under attack from skeptics. 

LeCun noted he is "pretty active on social media and there seems to be some confusion" as to what deep learning is, which was an allusion to back-and-forth debates he's had on Twitter with deep learning critic Gary Marcus, among others, that have gotten combative at times. LeCun began his talk by offering a slide defining what deep learning is, echoing a debate in December between Bengio and Marcus

Mostly, however, the evening was marked by the camaraderie of the three scholars. When asked by the audience what, if anything, they disagreed on, Bengio quipped, "Leslie already tried that on us and it didn't work." Hinton said, "I can tell you one disagreement between us: Yoshua's email address ends in 'Quebec,' and I think there should be a country code after that, and he doesn't."

There was also a chance for friendly teasing. Hinton began his talk by saying it was aimed at LeCun, who made convolutional neural networks a practical technology thirty years ago. Hinton said he wanted to show why CNNs are "rubbish," and should be replaced by his capsule networks. Hinton mocked himself, noting that he's been putting out a new version of capsule networks every year for the past three years. "Forget everything you knew about the previous versions, they were all wrong but this one's right," he said, to much laughter. 

Some problems in the discipline, as a discipline, will be harder to solve. When Kaelbling asked whether any of them have concerns about the goals or agenda of big companies that use AI, Hinton grinned and pointed at LeCun, who runs Facebook's AI research department, but LeCun grinned and pointed at Hinton, who is a fellow in Google's AI program. "Uh, I think they ought to be doing things about fake news, but..." said Hinton, his voice trailing off, to which LeCun replied, "In fact, we are." The exchange got some of the biggest applause and laughter out of the room. 

They also had thoughts about the structure of the field and how it needs to change. Bengio noted the pressure on young scholars to publish is far greater today than when he was a PhD student, and that something needs to change structurally in that regard to enable authors to focus on more meaningful long-term problems. LeCun, who also has a professorship at NYU, agreed times have changed, noting that as professors, "we would not admit ourselves in our own PhD programs."

With the benefit of years of struggling in obscurity, and with his gentle English drawl, Hinton managed to inject a note of levity into the problem of short-sighted research. 

"I have a model of this process, of people working on an idea for a short length of time, and making a little bit of progress, and then publishing a paper," he said. 

"It's like someone taking one of those books of hard sudoku puzzles, and going through the book, and filling in a few of the easy ones in each sudoku, and that really messes it up for everybody else!"

Editorial standards
Are you prepared for the future of AI, automation, and jobs?
Are you prepared for the future of AI, automation, and jobs?
Video Player is loading.
Current Time 0:08
Duration 0:51
Loaded: 93.08%
Stream Type LIVE
Remaining Time 0:43
 
1x
    • Chapters
    • descriptions off, selected
    • en (Main), selected
    Innovation

    AI scientist: 'We need to think outside the large language model box'

    AI21 Labs co-founder Yoav Shoham says something extra is needed to build better and smarter generative artificial intelligence.
    Written by Tiernan Ray, Senior Contributing Writer
    PM Images/Getty Images

    Generative artificial intelligence (Gen AI) developers continuously push the boundaries of what's possible, such as Google's Gemini 1.5, which can take in a million tokens of information at a time. 

    Still, even this level of development is not enough to make real progress in AI, say competitors who go toe-to-toe with Google. "We need to think outside the LLM box," AI21 Labs co-founder and co-CEO Yoav Shoham said in an interview with ZDNET.

    Also: 3 ways Meta's Llama 3.1 is an advance for Gen AI

    AI21 Labs, a privately backed startup, competes with Google in LLMs, the large language models that are the bedrock of Gen AI. Shoham, who was once a principal scientist at Google, is also an emeritus professor at Stanford University.

    "They're amazing at the output they put out, but they don't really understand what they're doing," he said of LLMs. "I think that even the most diehard neural net guys don't think that you can only build a larger language model, and they'll solve everything."

    AI21 Labs researchers highlight basic errors of OpenAI's GPT-3 as an example of how models stumble on basic questions. The answer, the startup argues, is augmenting LLMs with something else, such as modules that can operate consistently.

    AI21 Labs

    Shoham's startup has pioneered novel Gen AI approaches that go beyond the traditional "transformer," the core element of most LLMs. For example, AI21 Labs in April debuted a model called Jamba, an intriguing combination of transformers with a second neural network called a state space model (SSM).

    The mixture has allowed Jamba to top other AI models in important metrics. Shoham asked ZDNET to indulge him in an extensive explanation of one important metric: context length.

    The context length is the amount of input -- in tokens, usually words -- that a program can handle. Meta's Llama 3.1 supports up to 128,000 tokens in its context window. AI21 Labs's Jamba, which is also open-source software, has double that figure -- a 256,000-token context window. 

    prof-yoav-shoham-credit-roei-shor-photography

    Shoham. "Even the most diehard neural net guys don't think that you can only build a larger language model, and they'll solve everything."

    Roei Shor Photography

    In head-to-head tests, using a benchmark test constructed by Nvidia, Shoham said the Jamba model was the only model other than Gemini that could maintain that 256K context window "in practice." Context length can be advertised as one thing, but can fall apart as a model scores lower as context length increases.

    Also: 3 ways Meta's Llama 3.1 is an advance for Gen AI

    "We are the only ones with truth in advertising," as far as context length, Shoham said. "All the other models degrade with increased context length."

    Google's Gemini can't be tested beyond 128K, Shoham said, given the limits imposed on the Gemini application programming interface by Google. "They actually have a good effective context window, at least, at 128K," he said.

    Jamba is more economical than Gemini for the same 128K window, Shoham said. "They're about 10 times more expensive than we are," in terms of the cost to serve up predictions from Gemini versus Jamba, the practice of inference, he said.

    All of that, Shoham emphasized, is a product of the "architectural" choice of doing something different, joining a transformer to an SSM. "You can show exactly how many [API] calls are made" to the model, he told ZDNET. "It's not just the cost, and the latency, it's inherent in the architecture."

    Shoham has described the findings in a blog post.

    None of that progress matters, however, unless Jamba can do something superior. The benefits of having a large context window become apparent, Shoham said, as the world moves to things such as retrieval-augmented generation (RAG), an increasingly popular approach of hooking up an LLM to an external information source, such as a database.  

    Also: Make room for RAG: How Gen AI's balance of power is shifting

    A large context window lets the LLM retrieve and sort through more information from the RAG source to find the answer.

    "At the end of the day, retrieve as much as you can [from the database], but not too much," is the right approach to RAG, Shoham said. "Now, you can retrieve more than you could before, if you've got a long context window, and now the language model has more information to work with."

    Asked if there is a practical example of this effort, Shoham told ZDNET: "It's too early to show a running system. I can tell you that we have several customers who have been frustrated with the RAG solutions, who are working with us now. And I am quite sure we'll be able to publicly show results, but it hasn't been out long enough."

    Newsletters

    ZDNET Tech Today
    ZDNET's Tech Today newsletter is a daily briefing of the newest, most talked about stories, five days a week.
    Subscribe
    See all

    Jamba, which has seen 180,000 downloads since it was put on HuggingFace, is available on Amazon's AWS's Bedrock inference service and Microsoft Azure, and "people are doing interesting stuff with it," Shoham said.

    That said, even an improved RAG is not ultimately the salvation for the various shortcomings of Gen AI, from hallucinations to the risk of generations of the technology descending into gibberish

    "I think we're going to see people demanding more, demanding systems not be ridiculous, and have something that looks like real understanding, having close to perfect answers," Shoham said, "and that won't be pure LLMs."

    Also: Beware of AI 'model collapse': How training on synthetic data pollutes the next generation

    In a paper posted last month on the arXiv pre-print server, with collaborator Kevin Leyton-Brown, titled "Understanding Understanding: A Pragmatic Framework Motivated by Large Language Models," Shoham demonstrated how, across numerous operations, such as mathematics and manipulation of table data, LLMs produced "convincing-sounding explanations that aren't worth the metaphorical paper they're written on."

    "We showed how naively hooking [an LLM] up to a table, that table function will give success 70% or 80% of the time," Shoham told ZDNET. "That is often very pleasing because you get something for nothing, but if it's mission-critical work, you can't do that."

    Such failings, Shoham said, mean that "the whole approach to creating intelligence will say that LLMs have a role to play, but they're part of a bigger AI system that brings to the table things you can't do with LLMs."

    Among the things required to go beyond LLMs are the various tools that have emerged in the past couple of years, Shoham said. Elements such as function calls let an LLM hand off a task to another kind of software specifically built for a particular task.

    "If you want to do addition, language models do addition, but they do it terribly," Shoham said. "Hewlett-Packard gave us a calculator in 1970, why reinvent that wheel? That's an example of a tool."

    Using LLMs with tools is broadly grouped by Shoham and others under the rubric "compound AI systems". With the help of data management company Databricks, Shoham recently organized a workshop on prospects for building such systems.

    An example of using such tools is presenting LLMs with the "semantic structure" of table-based data, Shoham said. "Now, you get to close to a hundred percent accuracy" from the LLM, he said, "and this you wouldn't get if you just used a language model without additional stuff.

    Beyond tools, Shoham advocates for scientific exploration of other directions outside the pure deep-learning approach that has dominated AI for over a decade. "You won't get robust reasoning just by back-prop and hoping for the best," Shoham said, referring to back-propagation, the learning rule by which most of today's AI is trained.

    Also: Anthropic brings Tool Use for Claude out of beta, promising sophisticated assistants

    Shoham was careful to avoid discussing the next product initiatives, but he hinted that what may be needed is represented -- at least philosophically -- in a system he and colleagues introduced in 2022 called an MRKL (Modular Reasoning, Knowledge, and Language) System.

    The paper describes the MRKL system as being both "Neural, including the general-purpose huge language model as well as other smaller, specialized LMs," and also, "Symbolic, for example, a math calculator, a currency converter or an API call to a database."

    That breadth is a neuro-symbolic approach to AI. In that way, Shoham is in accord with some prominent thinkers who have concerns about the dominance of Gen AI. Frequent AI critic Gary Marcus, for example, has said that AI will never reach human-level intelligence without a symbol-manipulation capability. 

    MRKL has been implemented as a program called Jurassic-X, which the startup has tested with its partners.

    Also: OpenAI is training GPT-4's successor. Here are 3 big upgrades to expect from GPT-5

    An MRKL system should be able to use the LLM to parse problems that involve tricky phrasing, such as, "99 bottles of beer on the wall. One fell down. How many bottles of beer are on the wall?" The actual arithmetic is handled by a second neural net with access to arithmetic logic, using the arguments extracted from the text by the first model. 

    A "router" between the two has the difficult task of choosing which things to extract from the text parsed by the LLM and choosing which "module" to pass the results to in order to perform the logic.

    That work means that "there is no free lunch, but that lunch is in many cases affordable," Shoham's team wrote.

    From a product and business standpoint, "we'd like to, on a continued basis, provide additional functionalities for people to build stuff," Shoham said.

    Also: AI21 and Databricks show open source can radically slim down AI

    The important point is that a system like MRKL does not need to do everything to be practical, he said. "If you're trying to build the universal LLM that understands math problems and how to generate pictures of donkeys on the moon, and how to write poems, and do all of that, that can be expensive," he observed. "But 80% of the data in the enterprise is text -- you have tables, you have graphs, but donkeys on the moon aren't that important in the enterprise."

    Given Shoham's skepticism about LLMs on their own, is there a danger that today's Gen AI could prompt what's referred to as an AI winter (a sudden collapse in activity, as interest and funding dry up entirely)?

    "It's a valid question, and I don't really know the answer," he said. "I think it's different this time around in that, back in the 1980s," during the last AI winter, "not enough value had been created by AI to make up for the unfounded hype. There's clearly now some unfounded hype, but my sense is that enough value has been created to see us through it."

    Editorial standards
    Innovation

    Google's DeepMind AI takes home silver medal in complex math competition

    The achievement is noteworthy because AI systems don't usually fare well with complex math challenges.
    Written by Lance Whitney, Contributor
    Jakub Porzycki/NurPhoto via Getty Images

    Today's artificial intelligence (AI) systems possess many skills but typically fall short when it comes to tackling complex math problems. That's why Google is excited that two of its DeepMind AI systems were able to solve several challenging problems posed in a prestigious math competition.

    In a new post published Thursday, Google touted the AI smarts and achievements of its DeepMind AlphaProof and AlphaGeometry 2 AI models. Entering the 2024 International Mathematical Olympiad (IMO), the two systems solved four out of six problems. That effort rewarded Google's AI with the same level as a silver medalist for the first time in this contest, which is typically geared toward young mathematicians.

    Also: OpenAI launches SearchGPT - here's what it can do and how to access it

    Each year, IMO invites elite pre-college mathematicians to wrestle with six extremely difficult problems in algebra, combinatorics (counting, selecting, and arranging a large number of objects), geometry, and number theory. Branching out beyond humans, the competition has also become a way to test and measure machine learning and AI systems in advanced mathematical reasoning.

    With the problems translated into a formal language understood by Google's AI, AlphaProof solved two algebra problems and one problem in number theory, not only finding the answer but also proving that the answer was correct. Google cited the number theory challenge as the hardest one in the competition, solved by only five of the human contestants. AlphaGeometry 2 figured out the geometry problem. But neither model was able to crack the two combinatorics problems.

    Newsletters

    ZDNET Tech Today
    ZDNET's Tech Today newsletter is a daily briefing of the newest, most talked about stories, five days a week.
    Subscribe
    See all

    AlphaProof is an AI-based system that can train itself to prove mathematical statements using the formal language Lean. Combining a pre-trained language model with the AlphaZero reinforcement learning algorithm, AlphaProof previously taught itself how to play and win at chess, shogi, and Go.

    Also: Google's new math app solves nearly any problem with AI: Here's how to use it

    AlphaGeometry 2 is an improved version of AlphaGeometry. Based on Google's Gemini AI, this model can handle highly challenging geometry problems, including those that cover movements of objects and equations of angles, ratios, and distances.

    Beyond testing the math skills of AlphaProof and AlphaGeometry 2, Google took advantage of IMO to try out a natural language reasoning system built on Gemini with advanced problem-solving capabilities. Unlike the other two models, this one doesn't require problems to be translated into a formal language.

    Though the achievement of these models may sound abstract, Google sees it as another step toward the future of AI.

    "We're excited for a future in which mathematicians work with AI tools to explore hypotheses, try bold new approaches to solving long-standing problems, and quickly complete time-consuming elements of proofs -- and where AI systems like Gemini become more capable at math and broader reasoning," the company said in its post.

    Editorial standards