IOPO Experiments

classroom
Activity Feed

AI & ML interests

None defined yet.

alvarobartt 
posted an update 2 days ago
view post
Post
123
Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker
alvarobartt 
posted an update 6 days ago
view post
Post
3203
Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem
qgallouedec 
posted an update 14 days ago
view post
Post
10025
Shipped hf-sandbox! 🥡

🧪 Running an eval that executes model-generated C on a few thousand prompts? You probably don't want any of that on your laptop.
Just shipped hf-sandbox, a Modal-style sandbox API on top of Hugging Face Jobs. Spin up an isolated, ephemeral container, run untrusted code, get the result back. No Docker on your laptop, no infra to manage.

Just pip install hf-sandbox.

Early days (v0.1); feedback and issues very welcome:
👉 https://github.com/huggingface/hf-sandbox
  • 1 reply
·
qgallouedec 
posted an update 15 days ago
view post
Post
314
**TRL v1.4 is out 🚀** Chunked NLL loss for SFT and a first-class **OpenReward** integration.

**Chunked NLL loss for SFT — drops peak VRAM by up to 14×**

Standard SFT materializes a full [batch × seq × vocab] logits tensor before computing cross-entropy, which dominates peak memory at long context lengths. The new loss_type="chunked_nll" path drops ignored-label tokens before the lm_head matmul and computes cross-entropy in checkpointed chunks of 256.

Peak GPU memory, AdamW fp32:
- Qwen3-14B, 8×H100 FSDP2, 16k seq: 58.9 GB → 38.9 GB
- Qwen3-4B, 1×H100 80GB, 16k seq: OOM → 63.8 GB
- Qwen3-32B, 8×H100 FSDP2, 8k seq: OOM → 71.2 GB

End-to-end it's consistently as fast or faster than nll, and unlocks sequence lengths that don't fit at all under the standard path.

SFTConfig(loss_type="chunked_nll")


Works with PEFT and VLMs out of the box.

**Open Reward Standard environment adapter**

The new trl.experimental.openreward adapter plugs any environment speaking the [Open Reward Standard](https://openrewardstandard.io) protocol into any TRL trainer that takes an environment_factory. One string — a catalog name or a URL — wires the dataset, factory, and reward_func slots; tools are bound dynamically from JSON Schema, no per-env wrapper code:

from trl import GRPOTrainer
from trl.experimental.openreward import OpenRewardSpec

spec = OpenRewardSpec("Eigent/SETA", num_tasks=64)

trainer = GRPOTrainer(
    ...,
    train_dataset=spec.train_dataset,
    environment_factory=spec.environment_factory,
    reward_funcs=spec.reward_funcs,
)


v1.4 also brings MFU helpers for dense + MoE models, GRPO support for Liger 0.8.0 (delta clipping + VESPO + KL bias correction), Tülu 3's length-normalized DPO loss, four more training chat templates (Cohere, Cohere2, Gemma 3, Qwen3-2507), and a 5+ GB CUDA memory leak fix in activation offloading.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.4.0
qgallouedec 
posted an update 28 days ago
view post
Post
8029

TRL v1.3 ships day-one training support for Qwen 3.6 🚀

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()


So does GRPO tool-calling — just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
qgallouedec 
posted an update about 1 month ago
view post
Post
2003
TRL v1.2 introduces the SSDTrainer 🚀

Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL.

The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model.

from trl.experimental.ssd import SSDConfig, SSDTrainer

trainer = SSDTrainer(
    model="Qwen/Qwen3-4B-Instruct",
    args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95),
    train_dataset=dataset,
)
trainer.train()


v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of use_transformers_paged, and key fixes for VLM response parsing.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0
qgallouedec 
posted an update about 2 months ago
view post
Post
2440
TRL v1.0 is out!

Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.

The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.

What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.

pip install --upgrade trl


Read more: hf.co/blog/trl-v1
alvarobartt 
posted an update 3 months ago
view post
Post
3736
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
qgallouedec 
posted an update 3 months ago
view post
Post
3069
@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb
  • 1 reply
·
alvarobartt 
posted an update 4 months ago
view post
Post
3257
💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.
  • 1 reply
·
lewtun 
posted an update about 1 year ago
view post
Post
4916
Introducing OlympicCoder: a series of open reasoning models that can solve olympiad-level programming problems 🧑‍💻

- 7B open-r1/OlympicCoder-7B
- 32B open-r1/OlympicCoder-32B

We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger 💪

Together with the models, we are releasing:

📊CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots

🏆 IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi

For links to the models and datasets, check out our latest progress report from Open R1: https://huggingface.co/blog/open-r1/update-3
  • 1 reply
·