Part 4/4 — Part 4 ← Part 3 ← Part 2 ← Part 1
May 2026 — co-authored with Gemma 4
The transition from Exam V2 to V3 was necessitated by the discovery that the V2 harness was providing invalid scoring data, masking model instability and quantization failures.
Setup Hardware: Framework 13 (Ryzen AI 370HX, 64GB DDR5). Inference: llama-swap via Vulkan. KV Cache: q8_0 (fixed across all runs). Environment: Go-based scraper resilience task (buffering, eviction, background flush).
*February 2026 – co-authored with Claude Opus 4.6.
Part 2/4 — Part 4 ← Part 3 ← Part 2 ← Part 1
Yes, the title is clickbaity :>. Veritasium has a great video about why clickbait is unreasonably effective and I’ve been dying to try it on a technical post. The irony is that the actual content is the opposite of clickbait – every claim backed by a shell command, every number derived from first principles.
April 20261
Part 3/4 — Part 4 ← Part 3 ← Part 2 ← Part 1
In episode 1 three models tied at 13/15. The test was too easy — it couldn’t separate a good model from a mediocre one having a lucky run. Since then Qwen3.5, Gemma 4, and Qwen3-Coder dropped. They’d all tie too. We needed a harder exam and better methodology. We also suspected (correctly) that single-seed results were noise and that our grep-based scoring was garbage, so we planned for multi-seed and real test execution from the start.
Part 1/4 — Part 4 ← Part 3 ← Part 2 ← Part 1 (Feb 2025)
I have a Framework 13 with a Ryzen AI 370HX and a bunch of GGUF models accumulating in ~/.cache/llama.cpp/. I wanted to know if any of them can actually write Go that compiles and runs. Not vibes, not leaderboard numbers – go build says yes or no. Goal was to have some sense of where local models are in terms of practical capability, being limited in size and available ram/compute
At Dune, we value our customers’ feedback and are committed to continuously improving our services. This is the story of how a simple, prioritized feature request for DuneAPI —supporting query result pagination for larger results—evolved into a comprehensive improvement involving the adoption of DuckDB at Dune.
We’ve learned a lot during this journey and are excited to share our experiences and the new functionalities we’ve been building.
Motivation & Context The journey began with user feedback and a repeated feature request: “Dune API doesn’t support pagination, and the maximum size of query results is limited (~1GB).
🐢
Miguel Filipe
Tech Lead @ Dune. Distributed systems, local LLMs, self-hosting. Lisbon.