Skip to content

Pinned Loading

  1. understand-r1-zero Public

    Understanding R1-Zero-Like Training: A Critical Perspective

    Python 1.2k 57

  2. zero-bubble-pipeline-parallelism Public

    Forked from NVIDIA/Megatron-LM

    Zero Bubble Pipeline Parallelism

    Python 452 34

  3. lorahub Public

    [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

    Python 668 42

  4. oat Public

    🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

    Python 641 62

  5. stde Public

    Official implementation of Stochastic Taylor Derivative Estimator (STDE) NeurIPS2024

    Python 128 10

  6. feedback-conditional-policy Public

    Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

    Python 61 2

Repositories

Showing 10 of 101 repositories
  • envpool Public

    C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

    C++ 1,304 Apache-2.0 128 41 4 Updated Mar 29, 2026
  • jrystal Public

    A JAX-based Differentiable Density Functional Theory Framework for Materials

    Python 45 Apache-2.0 1 5 3 Updated Mar 26, 2026
  • TeamHOI Public

    [CVPR 2026] TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size

    Python 28 MIT 0 0 0 Updated Mar 12, 2026
  • odc Public

    On demand communication

    Python 33 2 1 3 Updated Mar 3, 2026
  • Stable-RL Public

    Rethinking the Trust Region in LLM Reinforcement Learning

    Python 52 Apache-2.0 5 0 5 Updated Mar 2, 2026
  • oat Public

    🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

    Python 641 Apache-2.0 62 6 1 Updated Jan 29, 2026
  • Python 11 0 0 0 Updated Jan 13, 2026
  • feedback-conditional-policy Public

    Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

    Python 61 2 0 0 Updated Jan 5, 2026
  • InfNeRF Public

    InfNeRF: Towards Infinite Scale NeRF Rendering with O(log n) Space Complexity

    Python 12 Apache-2.0 1 1 0 Updated Jan 3, 2026
  • SkyLadder Public Forked from jzhang38/TinyLlama

    The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling

    Python 42 Apache-2.0 610 1 0 Updated Dec 29, 2025