Blog

Deep dives into AI engineering, LLM benchmarks, agent architectures, and the evolving landscape of AI-assisted software development.

Featured🏷️ Local LLM🏷️ Qwen3.6🏷️ Unsloth

Run Qwen3.6 Locally with Unsloth GGUF

Qwen3.6 adds open-weight 35B-A3B and 27B models focused on coding and agent work. This guide shows how to run Unsloth GGUF builds with llama.cpp, choose a quant, try MTP GGUFs, and expose a local OpenAI-compatible endpoint.

May 13, 202622 min readby LeetLLM Team

Read post

All Posts

🏷️ DeepSeek🏷️ Open Models🏷️ AI Infrastructure

DeepSeek V4 and the US AI Lab Squeeze

DeepSeek V4 pairs open weights, 1M context, and low hosted pricing with strong agentic coding results. The bigger story is what that does to closed API economics and US lab positioning.

April 27, 202620 min

🏷️ OpenClaw🏷️ AI Coding Plans🏷️ Cost Optimization

Best AI Plan for OpenClaw in 2026: 5 Providers Compared

OpenClaw plan selection is mostly a routing and quota problem. This guide compares current Fireworks Fire Pass, MiniMax Token Plan, Z.AI, Alibaba Cloud Coding Plan, and OpenAI routes using official docs and the actual OpenClaw provider paths.

April 4, 202622 min

🏷️ Local LLM🏷️ Ollama🏷️ Gemma 4

Run Gemma 4 Locally with Ollama

Gemma 4 documents Apache 2.0 open weights, laptop-scale E2B and E4B Ollama tags, and a 26B mixture-of-experts (MoE) path with 3.8B active parameters per token. This guide shows how to pick the right tag, enable thinking mode, and tune long-context sessions.

April 2, 202617 min

🏷️ Inference🏷️ vLLM🏷️ SGLang

vLLM vs SGLang vs TensorRT-LLM vs Ollama: Choosing an Inference Engine in 2026

Raw throughput is only half the inference-engine decision. This guide teaches PagedAttention with worked memory math, analyzes an H100 benchmark snapshot, then explains how workload shape, prefix reuse, and deployment friction matter as much as tok/s.

April 1, 202615 min

🏷️ AI Engineering🏊 Deep Dive🏷️ Architecture

50 Essential LLM Engineering Concepts for 2026

Fifty LLM engineering concepts, organized by topic and system layer. Each explanation goes beyond definitions to cover trade-offs, failure modes, and production intuition.

March 21, 202651 min

📏 Context Windows📜 Long Context📊 Benchmarks

The Million-Token Era: What 1M Context Windows Change

Frontier APIs now expose seven-figure context windows around the 1M-token range. This guide explains what fits, what breaks, how to evaluate effective context length, and when economics justify using it.

March 14, 202622 min

🏷️ Local LLM🏷️ Ollama🏷️ Qwen3.5

Run Qwen3.5 Locally with Ollama

Qwen3.5 in Ollama spans primary aliases from 0.8B to 122B plus explicit quantized variants. This guide shows how to choose the right local tag, keep context size realistic, and expose it through Ollama's OpenAI-compatible API.

March 2, 202617 min

🤖 Agents🏊 Deep Dive🏷️ Tutorial

How to Build an AI Agent from Scratch

Build a working AI agent from the raw loop: define tools, let the model choose one, execute it in Python, append the observation, and add the guardrails that keep agents reliable.

February 19, 202625 min

🔬 Research🏊 Deep Dive🏢 Industry

RAG vs Fine-Tuning vs Prompting

Every LLM project starts with the same architecture question: use RAG, fine-tune the model, or improve the prompts? This guide gives a practical decision framework, explains the trade-offs, and shows where each approach tends to win.

February 19, 202630 min

📊 Benchmarks📐 Evaluation🧪 SWE-bench

Understanding SWE-bench

SWE-bench is a widely used benchmark for measuring AI coding agents. This guide breaks down methodology, variants, scoring mechanics, and what leaderboard results mean for production engineering.

February 17, 202623 min

🏷️ Career🏷️ Portfolio🏷️ Projects

AI Engineer Portfolio Projects That Get Interviews

Five portfolio projects that prove real AI engineering skill: shipped demos, eval reports, traces, cost notes, tests, and design docs.

May 9, 202614 min

🏷️ Career🏷️ AI Engineering🏷️ Roadmap

How to Become an AI Engineer from Zero in 2026

A practical path from beginner to hire-ready AI engineer: programming basics, LLM APIs, RAG, evals, agents, deployment, and portfolio proof.

May 9, 202613 min

🏷️ Career🏷️ Compensation

AI Engineer Salary Guide 2026

AI engineering pay is not one market. This guide uses public 2026 job postings, Levels.fyi's verified self-reported compensation data, and H-1B salary records to benchmark offers by level, company tier, location policy, and technical scope.

March 16, 202619 min

🏢 Industry🏊 Deep Dive

What Does an AI Engineer Actually Do?

AI engineering sits between foundation models and product engineering. We break down the day-to-day work, core skills, and career paths behind shipping LLM systems in 2026, from hybrid RAG pipelines and evals to distributed serving internals and lightweight fine-tuning.

February 19, 202623 min

🏷️ Career🏷️ Interview Prep🏷️ 2026

How to Prepare for ML & LLM Engineering Interviews in 2026

A practical guide to ML and LLM engineering interview prep in 2026, covering classical ML filters, LLM systems design, evaluation, and a concrete study roadmap.

February 16, 202624 min