Sea AI Lab

Pinned Loading

understand-r1-zero Public

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1.2k 57
zero-bubble-pipeline-parallelism Public

Forked from NVIDIA/Megatron-LM

Zero Bubble Pipeline Parallelism

Python 452 34
lorahub Public

[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Python 668 42
oat Public

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python 641 62
stde Public

Official implementation of Stochastic Taylor Derivative Estimator (STDE) NeurIPS2024

Python 128 10
feedback-conditional-policy Public

Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

Python 61 2

envpool Public
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

C++ 1,304 Apache-2.0 128 41 4 Updated Mar 29, 2026
jrystal Public
A JAX-based Differentiable Density Functional Theory Framework for Materials

Python 45 Apache-2.0 1 5 3 Updated Mar 26, 2026
TeamHOI Public
[CVPR 2026] TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size

Python 28 MIT 0 0 0 Updated Mar 12, 2026
odc Public
On demand communication

Python 33 2 1 3 Updated Mar 3, 2026
Stable-RL Public
Rethinking the Trust Region in LLM Reinforcement Learning

Python 52 Apache-2.0 5 0 5 Updated Mar 2, 2026
oat Public
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python 641 Apache-2.0 62 6 1 Updated Jan 29, 2026
LifelongSafetyAlignment Public

Python 11 0 0 0 Updated Jan 13, 2026
feedback-conditional-policy Public
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

Python 61 2 0 0 Updated Jan 5, 2026
InfNeRF Public
InfNeRF: Towards Infinite Scale NeRF Rendering with O(log n) Space Complexity

Python 12 Apache-2.0 1 1 0 Updated Jan 3, 2026
SkyLadder Public Forked from jzhang38/TinyLlama
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling

Python 42 Apache-2.0 610 1 0 Updated Dec 29, 2025