Blog-explorers

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

jordyvl authored a paper 1 day ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Jackmin108 authored a paper 23 days ago

Arcee Trinity Large Technical Report

kobe0938 authored a paper 25 days ago

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

View all activity

chengle

authored a paper 4 days ago

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

Paper • 2603.09652 • Published 4 days ago • 14

perfecXion

posted an update 8 days ago

Post

2528

# IntentGuard: Open-Source Vertical Intent Classifiers for LLM Guardrails

Three models published to the Hub:

- [perfecXion/intentguard-finance]( perfecXion/intentguard-finance)
- [perfecXion/intentguard-healthcare]( perfecXion/intentguard-healthcare)
- [perfecXion/intentguard-legal]( perfecXion/intentguard-legal)

DeBERTa-v3-xsmall fine-tuned for three-way classification: **allow**, **deny**, or **abstain**. ONNX + INT8 quantized, under 80MB, p99 <30ms on CPU. Margin-based thresholds (not argmax) — uncertain queries route to clarification instead of forcing a guess.

**Eval results (adversarial test sets, ~470-480 examples per vertical):**

| Vertical | Accuracy | Legit-Block Rate | Off-Topic-Pass Rate |
|----------|----------|------------------|---------------------|
| Finance | 99.6% | 0.00% | 0.00% |
| Healthcare | 98.9% | 0.00% | 0.98% |
| Legal | 97.9% | 0.00% | 0.50% |

docker run -p 8080:8080 ghcr.io/perfecxion/intentguard:finance-latest

curl -X POST http://localhost:8080/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What are current mortgage rates?"}]}'

Apache 2.0. Full pipeline + Docker configs on [GitHub](https://github.com/perfecxion-ai/intentguard).

Feedback welcome on domain coverage, adversarial robustness, and multilingual demand.

GeorgeBredis

authored a paper 10 days ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published 12 days ago • 19

GeorgeBredis

submitted a paper to Daily Papers 10 days ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published 12 days ago • 19

kostakoff

posted an update 11 days ago

Post

2074

Mining GPU Nvidia CMP 170HX - let's run some models!

To satisfy my curiosity, I investigated different GPUs and found this: a mining version of the A100 — the CMP 170HX.

It is a very interesting GPU. Based on public documentation, it has hardware similar to the datacenter A100. If you open it up and look at the board, you will see that it's very similar to an A100 board; it even has NVLink connectors.

Online, I found almost no information about how to run it, whether it works with LLMs, or if it's supported by default Nvidia drivers and CUDA. So, I decided to test it myself.
I installed it in my lab (see previous post https://huggingface.co/posts/kostakoff/584269728210158) and found that the default nvidia-driver-570 works with it out of the box. After that, I checked if CUDA was available, and it worked too.

The next step was to try running some models:
- Stable Diffusion XL with BNB4 quantization: It took around two minutes to generate an image, but it works!
- Compiled llama.cpp for CUDA (https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#compilation): I run Mistral 7B Q4_K_M, and this actually worked even better. It was able to generate 33 tokens per second and read 400 tokens per second.

There are some limitations related to power utilization:
- When running PyTorch, it doesn't utilize more than 80 watts.
- When running llama.cpp, utilization is a bit better but still limited to 113 watts.

I found this GitHub thread about the Nvidia CMP https://github.com/dartraiden/NVIDIA-patcher/issues/73, and it looks like this mining GPU has an internal rate limiter based on FMA compute calls. I haven't found a solution to bypass it yet.

llmlaba

1 reply

borgr

submitted a paper to Daily Papers 15 days ago

General Agent Evaluation

Paper • 2602.22953 • Published 16 days ago • 11

frumu

posted an update 17 days ago

Post

209

Prototype/proposal repo: https://github.com/frumu-ai/trace-share

Goal: an opt-in Rust CLI that ingests local coding-agent logs (Codex/Claude/VS Code agents), scrubs secrets/PII locally (gitleaks + deterministic redaction), and exports structured “episodes” intended for OSS model training (SFT + tool-use traces).

Status: local-only right now (uploads nothing). Missing pieces are:

a home to publish versioned dataset snapshots (JSONL/Parquet + manifests/checksums), and an optional vector index for search/dedupe/curation.

Hugging Face is the most natural distribution channel because the end product is a Hub Dataset (versioned downloadable snapshots). I’m looking for direct Hugging Face support (recommended dataset layout + publishing workflow, and ideally storage/bandwidth support as releases scale).

Scale ref: ~700MB raw Codex logs → ~36MB sanitized export.

frumu

posted an update 19 days ago

Post

152

New write-up: Tandem is an engine-first local agent runtime in Rust (Desktop + TUI), Plan Mode diffs, Multi Agents, and local memory.
Would love feedback from HF builders.

https://dev.to/tacshade/i-built-an-agentic-ferrari-in-rust-and-nobodys-driving-it-57fe

kostakoff

posted an update 22 days ago

Post

2891

I found it very funny that the Hugging Face profile has a specific section where we can share our hardware.

It really brings back memories of the good old days when we used to flex our custom PC specs on enthusiast forums 20 years ago! That inspired me to fill out my own profile and share it here.

And this is my first set of GPUs that I am using to learn MLOps:
- RTX 3090 – the best one; unfortunately it doesn't support the latest FP8 and FP4, but it’s still very powerful.
- Tesla V100 – performance is almost like the RTX 3090, just much older.
- Tesla P100 – old, and doesn't have tensor cores, but still can handle small models.
- Radeon MI50 – old, similar to the P100, but uses ROCm instead of CUDA, which is actually a pretty good experience to setup.
- GTX 1080 Ti – mostly useless, no FP16 support.
- GTX 1660 – first generation of the Turing architecture, but mostly useless.

llmlaba

5 replies

m1b

authored a paper 23 days ago

CADEvolve: Creating Realistic CAD via Program Evolution

Paper • 2602.16317 • Published 25 days ago • 27

kostakoff

posted an update 29 days ago

Post

3327

My home lab for AI models - llmlaba v1

After I began learning MLOps I realized that I needed some kind of home lab, there are a lot of GPUs that I need to learn how to set up and test.
So I spent some time to do a researching which platform I could buy or build.
My requirements ware:
- Limited budget
- Power supply 1 kW or higher
- Few PCIe slots to be able to install more than one gpu
- Zero maintenance cost, I don't want spend a lot of time or money to maintain lab hardware, except for the GPUs

I chose the Intel Mac Pro 7.1:
- Prices on eBay acceptable
- Excelent cooling
- 1.4 kW power supply
- 7 PCIe slots
- Zero maintenance: I don't need to do anything with the Mac Pro hardware; it just works
- Classic UEFI boot loader

It requires a bit of OS preparation:
1. Install Ubuntu 24.04 (it works with the general PC ISO image)
2. Set up T2 drivers

sudo apt install -y dkms linux-headers-$(uname -r) applesmc-t2 apple-bce lm-sensors

3. Install t2fanrd to manually manage fans (/etc/t2fand.conf) https://wiki.t2linux.org/guides/fan/
4. Fix PCIe BAR: add pci=realloc to GRUB_CMDLINE_LINUX_DEFAULT so the Linux kernel will properly initializes server GPUs without Graphics Output Protocol
5. Install NVIDIA GPU driver:

sudo apt install nvidia-driver-570

And it works!
I was able to run server-grade Nvidia Tesla P100 (required DIY air duct), and consumer Nvidia Titan X, Titan V, GTX 1080 cards on the old Mac Pro 7.1 - even three in parallel.

llmlaba

3 replies

frumu

posted an update about 1 month ago

Post

688

I’m looking for Mac/Windows/Linux testers and contributors for Tandem, an open-source, local-first AI desktop workspace.

Runs on your machine (works great with local LLMs like Ollama / LM Studio)

Built with Tauri + a sidecar runtime, so it’s a single install

Focused on making agent workflows usable for non-developers (approvals + undo)

If you’re willing to test installs (especially macOS) or poke at bugs, I’d really appreciate it. Repo: https://github.com/frumu-ai/tandem

melikegks

in blog-explorers/README about 1 month ago

[Support] Community Articles

🚀 🤝 1

103

#5 opened almost 2 years ago by

victor

huybery

authored a paper about 1 month ago

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Paper • 2602.02361 • Published Feb 2 • 60

AIPreplabs

posted an update about 1 month ago

Post

2337

We’ve all had that moment where we watch a tutorial, nod along, but then realize we can’t actually do it ourselves because watching is just passive. At AIPrep, we are fixing this "watch and forget" cycle by building a foundational Generative Explanatory Model (GEM). GEM doesn't just give you a video or a wall of text; it builds an interactive lesson that asks you questions, catches your mistakes in real time, and adapts to your pace. We have just finished preparing our specialized datasets for this interactive logic, and you can already check them out on our profile to see how we are structuring this step-by-step reasoning. Training for the foundational model starts very soon, so stay in touch because something revolutionary is coming to the world of AI education. You can see our progress at aiprep.in.

2 replies

Reality123b

in blog-explorers/README about 1 month ago

[Support] Community Articles

🚀 🤝 1

103

#5 opened almost 2 years ago by

victor

in blog-explorers/README about 1 month ago

[Support] Community Articles

🤝 🚀 1

103

#5 opened almost 2 years ago by

victor

Csplk

posted an update about 1 month ago

Post

2295

Was tinkering with a Daggr node generator script earlier today ( Csplk/DaggrGenerator )and started on a GUI for it for folks who are not comfy with writing code and like a GUI instead for something to motivate working on some Daggr stuff.
*Will have time later to keep working on it so don’t hesitate to comment with bugs or issues found if trying it out.*

Csplk/DaggrGenerator

Thanks @merve @ysharma @abidlabs and team daggr for making daggr :)