<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Posts on Miguel Filipe</title>
    <link>https://blog.mfilipe.eu/post/</link>
    <description>Recent content in Posts on Miguel Filipe</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <copyright>© Miguel Filipe</copyright>
    <lastBuildDate>Thu, 07 May 2026 22:59:29 +0100</lastBuildDate>
    <atom:link href="https://blog.mfilipe.eu/post/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Testing LLMs is hard, doubly hard when the testplan and code are vibecoded</title>
      <link>https://blog.mfilipe.eu/post/benchmarking_llms-v3-rebuild/</link>
      <pubDate>Thu, 07 May 2026 22:59:29 +0100</pubDate>
      <guid>https://blog.mfilipe.eu/post/benchmarking_llms-v3-rebuild/</guid>
      <description>Part 4/4 — Part 4 ← Part 3 ← Part 2 ← Part 1&#xA;May 2026 — co-authored with Gemma 4&#xA;The transition from Exam V2 to V3 was necessitated by the discovery that the V2 harness was providing invalid scoring data, masking model instability and quantization failures.&#xA;Setup Hardware: Framework 13 (Ryzen AI 370HX, 64GB DDR5). Inference: llama-swap via Vulkan. KV Cache: q8_0 (fixed across all runs). Environment: Go-based scraper resilience task (buffering, eviction, background flush).</description>
    </item>
    <item>
      <title>WHY Are Local LLMs So Slow On My Framework 13 AMD Strix Point</title>
      <link>https://blog.mfilipe.eu/post/local-llm-performance-framework13/</link>
      <pubDate>Fri, 10 Apr 2026 21:01:28 +0000</pubDate>
      <guid>https://blog.mfilipe.eu/post/local-llm-performance-framework13/</guid>
      <description>*February 2026 &amp;ndash; co-authored with Claude Opus 4.6.&#xA;Part 2/4 — Part 4 ← Part 3 ← Part 2 ← Part 1&#xA;Yes, the title is clickbaity :&amp;gt;. Veritasium has a great video about why clickbait is unreasonably effective and I&amp;rsquo;ve been dying to try it on a technical post. The irony is that the actual content is the opposite of clickbait &amp;ndash; every claim backed by a shell command, every number derived from first principles.</description>
    </item>
    <item>
      <title>Gemma 4 vs Qwen3.5: benchmarking quantized local LLMs on Go coding</title>
      <link>https://blog.mfilipe.eu/post/local-llm-coding-harder-test/</link>
      <pubDate>Fri, 10 Apr 2026 17:04:51 +0100</pubDate>
      <guid>https://blog.mfilipe.eu/post/local-llm-coding-harder-test/</guid>
      <description>April 20261&#xA;Part 3/4 — Part 4 ← Part 3 ← Part 2 ← Part 1&#xA;In episode 1 three models tied at 13/15. The test was too easy — it couldn&amp;rsquo;t separate a good model from a mediocre one having a lucky run. Since then Qwen3.5, Gemma 4, and Qwen3-Coder dropped. They&amp;rsquo;d all tie too. We needed a harder exam and better methodology. We also suspected (correctly) that single-seed results were noise and that our grep-based scoring was garbage, so we planned for multi-seed and real test execution from the start.</description>
    </item>
    <item>
      <title>I benchmarked 8 local LLMs writing Go on my Framework 13 AMD Strix Point</title>
      <link>https://blog.mfilipe.eu/post/benchmarking-local-llms-go-coding/</link>
      <pubDate>Fri, 10 Apr 2026 17:04:51 +0100</pubDate>
      <guid>https://blog.mfilipe.eu/post/benchmarking-local-llms-go-coding/</guid>
      <description>Part 1/4 — Part 4 ← Part 3 ← Part 2 ← Part 1 (Feb 2025)&#xA;I have a Framework 13 with a Ryzen AI 370HX and a bunch of GGUF models accumulating in ~/.cache/llama.cpp/. I wanted to know if any of them can actually write Go that compiles and runs. Not vibes, not leaderboard numbers &amp;ndash; go build says yes or no. Goal was to have some sense of where local models are in terms of practical capability, being limited in size and available ram/compute</description>
    </item>
    <item>
      <title>How we&#39;ve improved Dune API using DuckDB</title>
      <link>https://blog.mfilipe.eu/post/blogpost-improving-dune-api/</link>
      <pubDate>Thu, 01 Aug 2024 12:22:38 -0700</pubDate>
      <guid>https://blog.mfilipe.eu/post/blogpost-improving-dune-api/</guid>
      <description>At Dune, we value our customers’ feedback and are committed to continuously improving our services. This is the story of how a simple, prioritized feature request for DuneAPI —supporting query result pagination for larger results—evolved into a comprehensive improvement involving the adoption of DuckDB at Dune.&#xA;We’ve learned a lot during this journey and are excited to share our experiences and the new functionalities we’ve been building.&#xA;Motivation &amp;amp; Context The journey began with user feedback and a repeated feature request: “Dune API doesn’t support pagination, and the maximum size of query results is limited (~1GB).</description>
    </item>
  </channel>
</rss>
