Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

microsoft/mup 7 Mar 2022

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.

TWEETS

The Quest for a Common Model of the Intelligent Decision Maker

no code yet • 26 Feb 2022

It is time to recognize and build on the convergence of multiple diverse disciplines on a substantive common model of the intelligent agent.

Decision Making

TWEETS

Kubric: A scalable dataset generator

google-research/kubric 7 Mar 2022

Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details.

Optical Flow Estimation

TWEETS

DeepNet: Scaling Transformers to 1,000 Layers

microsoft/unilm 1 Mar 2022

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.

Translation

TWEETS

Efficient Language Modeling with Sparse all-MLP

no code yet • 14 Mar 2022

All-MLP architectures have attracted increasing interest as an alternative to attention-based models.

Language Modelling Zero-Shot Learning

TWEETS

The Mathematics of Artificial Intelligence

no code yet • 16 Mar 2022

We currently witness the spectacular success of artificial intelligence in both science and public life.

TWEETS

Block-Recurrent Transformers

no code yet • 11 Mar 2022

It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens.

Language Modelling

TWEETS

Generative Adversarial Networks

no code yet • 1 Mar 2022

Generative Adversarial Networks (GANs) are very popular frameworks for generating high-quality data, and are immensely used in both the academia and industry in many domains.

Data Augmentation Image Generation

TWEETS

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

no code yet • 10 Mar 2022

In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin.

 Ranked #1 on Image Classification on ImageNet (using extra training data)

Domain Generalization Image Classification +1

TWEETS

On Embeddings for Numerical Features in Tabular Deep Learning

Yura52/tabular-dl-num-embeddings 10 Mar 2022

We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations.

TWEETS