Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm or in Random ReLU networks.
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.
Ranked #1 on
Multi-task Language Understanding
on MMLU
In this work, we show that this model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue -- in which new multimodal tasks are formulated as a guided language-based exchange between different pre-existing foundation models, without additional finetuning.
We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30
Ranked #9 on
Common Sense Reasoning
on CommonsenseQA
Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains.
Generating temporally coherent high fidelity video is an important milestone in generative modeling research.
We present a framework that gives a deep insight into the link between physical and technical-tactical aspects of soccer and it allows associating physical performance with value generation thanks to a top-down approach.
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.
The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e. g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from
This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone for pre-training.