AI & ML interests

None defined yet.

Recent Activity

sergiopaniegoย  updated a dataset 39 minutes ago
agents-course/final-certificates
burtenshawย  updated a dataset 40 minutes ago
agents-course/certificates
View all activity

sergiopaniegoย 
posted an update 6 days ago
view post
Post
1489
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.

blog: https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/

try it out: google/functiongemma-tuning-lab

This example builds on a more advanced one for learning fine-tuning with SFT using TRL: https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma
  • 1 reply
ยท
sergiopaniegoย 
posted an update 9 days ago
view post
Post
717
TRL v0.27.0 is out!! ๐Ÿฅณ

It includes GDPO, the latest variant of GRPO for multi-reward RL โœจ
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence โ€” developed by
@sliuau @SimonX et al.

Explore the paper: GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization (2601.05242)

Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0
sergiopaniegoย 
posted an update 12 days ago
view post
Post
2939
New REPL environment in OpenEnv available! โœจ
Used in the Recursive Language Models (RLM) paper by Alex Zhang.

Ready for inference & post-training using trajectories. Handles long contexts:

> Run Python code in a sandbox
> Make recursive calls to LMs
> Explore data programmatically
> Return final result

Docs: https://meta-pytorch.org/OpenEnv/environments/repl/
Inference script: https://github.com/meta-pytorch/OpenEnv/blob/main/examples/repl_oolong_simple.py
sergiopaniegoย 
posted an update 13 days ago
view post
Post
407
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!

โš ๏ธ LLMs struggle with long prompts โ†’ attention overload & lost info
๐Ÿ”„ RLMs inspect, split & call themselves on chunks, then aggregate results
โœ… Handles millions of tokens, reduces noise, improves reasoning
๐Ÿ’ก System prompt guides recursion
๐ŸŽฏ RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)

We're adding it to OpenEnv (with Kashif Rasul): https://github.com/meta-pytorch/OpenEnv/pull/282

More resources:

> Paper: Recursive Language Models (2512.24601)
> Paper blog: https://alexzhang13.github.io/blog/2025/rlm/
> RLM repo: https://github.com/alexzhang13/rlm
  • 2 replies
ยท
sergiopaniegoย 
posted an update 17 days ago
pcuenqย 
posted an update 21 days ago
view post
Post
2942
๐Ÿ‘‰ What happened in AI in 2025? ๐Ÿ‘ˆ

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1๏ธโƒฃ Q1 โ€” Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2๏ธโƒฃ Q2 โ€” Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3๏ธโƒฃ Q3 โ€” "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4๏ธโƒฃ Q4 โ€” Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 ๐Ÿคฏ

Credits
๐Ÿ™ NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

๐Ÿซก @reach-vb for the original idea, design and recipe

๐Ÿ™Œ @ariG23498 and yours truly for compiling and verifying the 2025 edition

๐Ÿฅณ Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! ๐Ÿฅ‚
  • 1 reply
ยท
sergiopaniegoย 
posted an update 23 days ago
view post
Post
2574
The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!

โ€ข SFT
โ€ข GRPO
โ€ข Tool calling & agents
โ€ข RL environments with OpenEnv
โ€ข LLMs and VLMs
โœจ Many run on FREE Colab, making it super easy to get started fast!

https://github.com/huggingface/trl/tree/main/examples/notebooks
sergiopaniegoย 
posted an update 26 days ago
sergiopaniegoย 
posted an update 28 days ago
sergiopaniegoย 
posted an update about 1 month ago
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
2002
The Christmas holidays are here! ๐ŸŽ„
Thinking about learning something new in AI?

@huggingface offers 12 FREE courses covering all the relevant topics, for every level of experience. A great challenge for the holidays (and worth saving for later ๐Ÿ™„)

Letโ€™s explore them!

๐Ÿง  ๐—Ÿ๐—Ÿ๐—  ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: large language models with HF tools
https://huggingface.co/learn/llm-course

๐Ÿค– ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: build and deploy AI agents
https://huggingface.co/learn/agents-course

๐ŸŽจ ๐——๐—ถ๐—ณ๐—ณ๐˜‚๐˜€๐—ถ๐—ผ๐—ป ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: diffusion models with ๐Ÿค— Diffusers
https://huggingface.co/learn/diffusion-course

๐Ÿ”Š ๐—”๐˜‚๐—ฑ๐—ถ๐—ผ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: transformers for audio tasks
https://huggingface.co/learn/audio-course

๐ŸŽฎ ๐——๐—ฒ๐—ฒ๐—ฝ ๐—ฅ๐—Ÿ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: deep reinforcement learning
https://huggingface.co/learn/deep-rl-course

๐Ÿ‘๏ธ ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐˜† ๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ๐—ฟ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: modern computer vision with HF
https://huggingface.co/learn/computer-vision-course

๐Ÿฆพ ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜๐—ถ๐—ฐ๐˜€ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ (๐—Ÿ๐—ฒ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜): learning-based robotics
https://huggingface.co/learn/robotics-course

๐Ÿงฉ ๐— ๐—–๐—ฃ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: Model Context Protocol explained
https://huggingface.co/learn/mcp-course

๐Ÿงช ๐—” ๐—ฆ๐—บ๐—ผ๐—น ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: post-training AI models
https://huggingface.co/learn/a-smol-course

๐Ÿ•น๏ธ ๐— ๐—Ÿ ๐—ณ๐—ผ๐—ฟ ๐—š๐—ฎ๐—บ๐—ฒ๐˜€: AI in game development
https://huggingface.co/learn/ml-for-games-course

๐ŸงŠ ๐— ๐—Ÿ ๐—ณ๐—ผ๐—ฟ ๐Ÿฏ๐——: machine learning for 3D data
https://huggingface.co/learn/ml-for-3d-course

๐Ÿ“˜ ๐—ข๐—ฝ๐—ฒ๐—ป-๐—ฆ๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—”๐—œ ๐—–๐—ผ๐—ผ๐—ธ๐—ฏ๐—ผ๐—ผ๐—ธ: practical AI notebooks
https://huggingface.co/learn/cookbook

All of them can be found here: https://huggingface.co/learn
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
1893
Google DeepMind releases FunctionGemma, a 240M model specialized in ๐Ÿ”ง tool calling, built for fine-tuning

TRL has day-0 support. To celebrate, weโ€™re sharing 2 new resources:

> Colab guide to fine-tune it for ๐ŸŒ browser control with BrowserGym OpenEnv
> Standalone training script

> Colab notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_functiongemma_browsergym_openenv.ipynb
> Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/browsergym_llm.py (command to run it inside the script)
> More notebooks in TRL: https://huggingface.co/docs/trl/example_overview#notebooks
sergiopaniegoย 
posted an update about 1 month ago
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
2142
๐ŸŽ„ last talk of the year about open AI and HF today at Universidad Rey Juan Carlos for undergrad students

always a pleasure to be back at my alma mater

๐ŸŽ… slides: https://github.com/sergiopaniego/talks
  • 1 reply
ยท
sergiopaniegoย 
posted an update about 2 months ago
view post
Post
1721
TRL now includes agent training support for GRPOโ€ผ๏ธ

Train ๐Ÿ•ต๏ธ agents with ๐Ÿ”ง tools, enabling interaction with external functions and APIs.

And of course, a new notebook and scripts to get you up to speed

๐Ÿ“˜ notebook tutorial: https://github.com/huggingface/trl/blob/main/examples/notebooks/grpo_agent.ipynb

๐Ÿ“‚ script examples: https://github.com/huggingface/trl/blob/main/examples/scripts/grpo_agent.py

๐Ÿ“ฆ TRL v0.26.0 release: https://github.com/huggingface/trl/releases/tag/v0.26.0
  • 2 replies
ยท