Posts on Miguel Filipe

Posts on Miguel Filipe https://blog.mfilipe.eu/post/ Recent content in Posts on Miguel Filipe Hugo en-us © Miguel Filipe Thu, 07 May 2026 22:59:29 +0100 Testing LLMs is hard, doubly hard when the testplan and code are vibecoded https://blog.mfilipe.eu/post/benchmarking_llms-v3-rebuild/ Thu, 07 May 2026 22:59:29 +0100 https://blog.mfilipe.eu/post/benchmarking_llms-v3-rebuild/ Part 4/4 — Part 4 ← Part 3 ← Part 2 ← Part 1 May 2026 — co-authored with Gemma 4 The transition from Exam V2 to V3 was necessitated by the discovery that the V2 harness was providing invalid scoring data, masking model instability and quantization failures. Setup Hardware: Framework 13 (Ryzen AI 370HX, 64GB DDR5). Inference: llama-swap via Vulkan. KV Cache: q8_0 (fixed across all runs). Environment: Go-based scraper resilience task (buffering, eviction, background flush). WHY Are Local LLMs So Slow On My Framework 13 AMD Strix Point https://blog.mfilipe.eu/post/local-llm-performance-framework13/ Fri, 10 Apr 2026 21:01:28 +0000 https://blog.mfilipe.eu/post/local-llm-performance-framework13/ *February 2026 – co-authored with Claude Opus 4.6. Part 2/4 — Part 4 ← Part 3 ← Part 2 ← Part 1 Yes, the title is clickbaity :>. Veritasium has a great video about why clickbait is unreasonably effective and I’ve been dying to try it on a technical post. The irony is that the actual content is the opposite of clickbait – every claim backed by a shell command, every number derived from first principles. Gemma 4 vs Qwen3.5: benchmarking quantized local LLMs on Go coding https://blog.mfilipe.eu/post/local-llm-coding-harder-test/ Fri, 10 Apr 2026 17:04:51 +0100 https://blog.mfilipe.eu/post/local-llm-coding-harder-test/ April 20261 Part 3/4 — Part 4 ← Part 3 ← Part 2 ← Part 1 In episode 1 three models tied at 13/15. The test was too easy — it couldn’t separate a good model from a mediocre one having a lucky run. Since then Qwen3.5, Gemma 4, and Qwen3-Coder dropped. They’d all tie too. We needed a harder exam and better methodology. We also suspected (correctly) that single-seed results were noise and that our grep-based scoring was garbage, so we planned for multi-seed and real test execution from the start. I benchmarked 8 local LLMs writing Go on my Framework 13 AMD Strix Point https://blog.mfilipe.eu/post/benchmarking-local-llms-go-coding/ Fri, 10 Apr 2026 17:04:51 +0100 https://blog.mfilipe.eu/post/benchmarking-local-llms-go-coding/ Part 1/4 — Part 4 ← Part 3 ← Part 2 ← Part 1 (Feb 2025) I have a Framework 13 with a Ryzen AI 370HX and a bunch of GGUF models accumulating in ~/.cache/llama.cpp/. I wanted to know if any of them can actually write Go that compiles and runs. Not vibes, not leaderboard numbers – go build says yes or no. Goal was to have some sense of where local models are in terms of practical capability, being limited in size and available ram/compute How we've improved Dune API using DuckDB https://blog.mfilipe.eu/post/blogpost-improving-dune-api/ Thu, 01 Aug 2024 12:22:38 -0700 https://blog.mfilipe.eu/post/blogpost-improving-dune-api/ At Dune, we value our customers’ feedback and are committed to continuously improving our services. This is the story of how a simple, prioritized feature request for DuneAPI —supporting query result pagination for larger results—evolved into a comprehensive improvement involving the adoption of DuckDB at Dune. We’ve learned a lot during this journey and are excited to share our experiences and the new functionalities we’ve been building. Motivation & Context The journey began with user feedback and a repeated feature request: “Dune API doesn’t support pagination, and the maximum size of query results is limited (~1GB).