From caa928655c1a1f4f02f17eb3c10fa4f511539a3b Mon Sep 17 00:00:00 2001
From: "Rodrigo Rodriguez (Pragmatismo)" <me@rodrigorodriguez.com>
Date: Sun, 25 Jan 2026 15:57:30 -0300
Subject: [PATCH] docs: restructure hardware guide comparisons, prioritize
 MoE/DeepSeek/GLM, update cloud benchmarks

---
 src/13-hardware-devices/desktop-hardware.md | 30 +++++++++------------
 1 file changed, 13 insertions(+), 17 deletions(-)
diff --git a/src/13-hardware-devices/desktop-hardware.md b/src/13-hardware-devices/desktop-hardware.md
index 75101417..a07622b6 100644
--- a/src/13-hardware-devices/desktop-hardware.md
+++ b/src/13-hardware-devices/desktop-hardware.md
@@ -2,7 +2,7 @@
 
 A detailed guide focusing on the Brazilian scenario, crossing high-performance AI models with hardware available in the local market (Mercado Livre, OLX, etc.).
 
-> **Important Note:** Proprietary models like **GPT-5**, **Claude 3.5 Sonnet**, and **Gemini 1.5 Pro** are Closed Source and **do not run locally**. They require a cloud API. This guide assumes open-source equivalents that attempt to reach this performance level, such as **Llama-3.1-405B**, **DeepSeek-V3**, or **Qwen-2.5**, running on local hardware.
+> **Important Note:** Proprietary models like **GPT-5.2**, **Claude 3.5 Opus**, and **Gemini 3 Pro** represent the cutting edge of Cloud AI. For **Local AI**, we focus on efficiently running models that approximate this power using **MoE (Mixture of Experts)** technology, specifically **DeepSeek**, **GLM 4**, and **OSS120B-GPT**. Dense models like Llama 3.1 405B/70B are mentioned for reference but are less efficient for consumer hardware.
 
 ## AI Model Scaling for Local Hardware
 
@@ -10,13 +10,11 @@ Mapping mentioned top-tier models to their local "runnable" equivalents.
 
 | Citation Model | Real Status | Local Equivalent (GPU) | Size (Params) |
 | :--- | :--- | :--- | :--- |
-| **Claude 3.5 Sonnet** | API Only | Llama-3.1-70B / Mistral-Large | ~70B |
-| **Claude Opus** | API Only | Llama-3.1-405B (Ref.) | ~405B (Hard for Consumers) |
-| **GPT-4o** | API Only | DeepSeek-V2-Lite / Qwen-2.5-72B | ~16B to 72B |
-| **OSS120B-GPT** | Offline (OSS) | Mistral-Large-124B / Yi-1.5-34B | ~120B (Single RTX via MoE) |
-| **GLM 4** | Offline (OSS) | GLM-4-9B-Chat | ~9B (Fast) |
-| **DeepSeek** | Offline (OSS) | DeepSeek-V2.5 / DeepSeek-Coder-V2 | ~16B to 236B (MoE) |
-| **Gemini Pro** | API Only | Gemma-2-27B | ~27B |
+| **Claude 3.5 Opus** | API Only | **OSS120B-GPT (MoE)** / Mistral-Large | ~120B (Single RTX via MoE) |
+| **GPT-5.2** | API Only | **DeepSeek-V3** (MoE) | ~236B (Single RTX High RAM) |
+| **Gemini 3 Pro** | API Only | **GLM 4** (9B) | ~9B (Blazing Fast) |
+| **Llama 3.1 405B** | Legacy Dense | Not Recommended Local | ~405B (Too Heavy) |
+| **GPT-4o** | API Only | DeepSeek-V2-Lite | ~16B (efficient) |
 
 ## Compatibility Matrix (GPU x Model x Quantization)
 
@@ -28,15 +26,13 @@ Defining how well each GPU runs the listed models, focusing on "Best Performance
 *   **Q8_0:** Near perfection (FP16 equivalent), but very heavy.
 *   **Offload CPU:** Model fits in system RAM, not VRAM (slow).
 
-| GPU | VRAM | **GLM 4 (9B)** <br>*(Speed Target)* | **DeepSeek-V2-Lite (16B)** | **Llama-3-70B / Qwen-72B** | **Llama 3.1 (405B)** | Performance Notes |
-| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
-| **RTX 3050** | 8 GB | **Q8_0** (Perfect) | **Q4_K_M** (Acceptable) | Impossible (Slow CPU Offload) | Impossible | Runs GLM 4 at max speed. DeepSeek starts to struggle. |
-| **RTX 3060** | 12 GB | **Q8_0** (Instant) | **Q6_K** (High Qual.) | Impossible (30-50% offload) | Impossible | **Best value GPU.** Runs 16B models (DeepSeek Lite) very well. |
-| **RTX 4060 Ti** | 16 GB | **Q8_0** (Overkill) | **Q8_0** (Max Qual.) | **Q2_K** (Tight fit) | Impossible | 16GB allows entering the 27B/34B class with good performance. |
-| **RTX 3070 / 3080** | 10 GB | **Q8_0** | **Q4_K_M** (Tight) | Impossible | Impossible | The Nvidia "trap". Low VRAM prevents large models despite high speed. |
-| **RTX 3080 Ti** | 12 GB | **Q8_0** | **Q6_K** | Impossible | Impossible | Fast, but limited to medium models (up to 20B). |
-| **RTX 3090 / 4090** | 24 GB | **Q8_0** (Dual) | **Q8_0** (Dual) | **Q4_K_M** (Excellent) | **Q2_K** (Experimental) | **Consumer Limit**. Runs Llama-70B class with great speed. |
-| **2x RTX 3090** | 48 GB | N/A | N/A | **Q6_K / Q8_0** (Perfect) | **Q4_K_M** (Runs!) | Required to emulate GPT-4 class (70B+) at high quality locally. |
+| GPU | VRAM | **GLM 4 (9B)** <br>*(Daily Driver)* | **DeepSeek-V3 (MoE)** <br>*(Coding/Reasoning)* | **OSS120B-GPT (MoE)** <br>*(Heavy Duty)* | Performance Notes |
+| :--- | :--- | :--- | :--- | :--- | :--- |
+| **RTX 3050** | 8 GB | **Q8_0** (Perfect) | **Q2_K** (Slow/Tight) | Impossible | Great for GLM 4. Struggles with large MoEs. |
+| **RTX 3060** | 12 GB | **Q8_0** (Instant) | **Q4_K_M** (Good) | **Q2_K** (Slow w/ RAM) | **Best Value.** Runs DeepSeek nicely. |
+| **RTX 4060 Ti** | 16 GB | **Q8_0** (Overkill) | **Q6_K** (Great) | **Q3_K_M** (Doable) | Good middle ground for MoE exploration. |
+| **RTX 3090** | 24 GB | **Q8_0** (Dual) | **Q6_K** (Perfect) | **Q4_K_M** (Usable) | **King of Local AI.** Runs 120B MoE with offloading. |
+| **2x RTX 3090** | 48 GB | N/A | **Q8_0** (native) | **Q6_K** (Fast) | The only way to run 120B+ comfortably fast. |
 
 ## Brazilian Market Pricing & Minimum Specs
 *Approximate prices on Mercado Livre (ML) and OLX (Brazil) as of late 2024.*