How LeetLLM turns research into curated lessons with research packets, article bundles, validation gates, generated diagrams, component-based illustrations, and a production web stack.
LeetLLM became useful when we treated curriculum like software: source material goes in, article bundles move through review, code and diagrams compile, and validators block sloppy pages before readers see them.
If you're building a learning site with LLMs, start there. Give the model strong context, ask for artifacts you can inspect, and make weak drafts cheap to catch.
Learning content has two common failure modes. One is broad writing that sounds confident but teaches little. The other is technically dense material that forgets learners need examples, diagrams, runnable code, and a path from basics to production.
That shaped the content system. LeetLLM has to support short lessons, long deep dives, blog posts, generated diagrams, citations, code examples, and visual explanations across AI engineering, LLM systems, RAG, agents, evaluation, inference, safety, and deployment.
| Need | Bad version | LeetLLM version |
|---|---|---|
| Research | Unverified summary from memory | Source queue with papers, docs, repos, and dated claims |
| Structure | Loose draft | Article bundle with frontmatter, article text, figures, code, refs, and questions |
| Visuals | Static screenshots or hand-placed labels | TSX illustrations built from layout primitives |
| Code | Decorative snippets | Examples that can be tested or reviewed |
| Review | Read it once | Validators, style checks, reference checks, build checks, and live review |
The pipeline doesn't remove editorial taste. It gives taste a place to run: reject weak examples, fix vague claims, and check the page the way a reader will see it.
The pipeline has this shape: collect evidence, plan the article, draft into a structured bundle, generate assets, validate the page, then revise until the lesson works.
That structure changes model behavior. The model sees a target audience, an outline, source notes, repository conventions, and explicit rules for diagrams, examples, and citations.
This is context engineering applied to education: choose what the model sees, control output shape, compress facts into a useful packet, and keep evidence close to the writing task.[1] [2]
We don't start by asking for a draft. We start by building a claim inventory: required facts, supporting sources, freshness, and the learner misconception each claim should correct.
Each article lives as a small bundle: content, metadata, custom illustrations, generated assets, and references. That makes articles easy to move through the same checks as code.
Before writing, we want a compact spec:
1{
2 "slug": "how-we-built-leetllm",
3 "audience": "engineers building AI learning products",
4 "promise": "show the full content pipeline, not a vague AI-writing story",
5 "must_include": [
6 "research packet",
7 "article bundle",
8 "validation gates",
9 "illustration framework",
10 "production web stack"
11 ],
12 "evidence": [
13 "repo conventions",
14 "content guidelines",
15 "external context engineering references"
16 ]
17}That spec prevents the polished-overview failure. The article has a job, a reader, evidence, and artifacts that can be checked.
Research packets aren't dumps. A useful packet separates facts, claims, examples, and risks:
| Research field | Purpose |
|---|---|
| Source claim | Keeps factual statements traceable |
| Freshness date | Flags claims that can drift |
| Learner misconception | Turns facts into teaching moments |
| Concrete example | Forces explanation against a real scenario |
| Failure case | Shows what breaks when idea is misused |
| Link target | Connects article to course path |
For example, a RAG lesson shouldn't say "retrieval improves accuracy" and stop. It should show when retrieval helps, when it adds stale or irrelevant context, how to evaluate groundedness, and why query routing matters. The lesson can then link naturally into production RAG pipelines, RAG evaluation, and LLM-as-judge evaluation.
The same rule applies to blog posts. A strong post about agents links into agent architecture, tool calling, context engineering, and SWE-bench instead of pretending every concept begins on that page.
The review loop checks whether a page teaches the idea, not whether Markdown merely parses. We care about factual grounding, structure, examples, code, visuals, references, and build output.
This tiny sketch captures one rule every draft should satisfy before human review:
1required = {
2 "source_ids",
3 "failure_case",
4 "code_snippet",
5 "visual_refs",
6 "next_lesson_links",
7}
8
9article = {
10 "source_ids": ["anthropic2025effectivecontext", "contexteng_survey2025"],
11 "failure_case": "draft without concrete examples",
12 "code_snippet": "content-quality-gate.py",
13 "visual_refs": ["content_pipeline", "illustration_system"],
14 "next_lesson_links": ["design-production-rag-pipeline", "llm-as-judge-automated-evaluation"],
15}
16
17missing = sorted(required - article.keys())
18print("ready for review" if not missing else f"missing: {', '.join(missing)}")Real checks are broader: metadata schemas, broken reference IDs, code examples, Mermaid diagrams, illustration references, cover images, text density, layout warnings, TypeScript, lint, tests, and browser output. The point is making weak article states visible before a reader sees them: unreferenced figures, stale citations, broken code blocks, missing cover images, or pages that look fine in Markdown but fail in browser review.
Language models are good at filling gaps with plausible connective tissue. That's dangerous in education. Every draft pass needs a later pass that deletes generic examples, checks claims, and asks, "Would this help a learner solve the next problem?"
The visual system changed the most. Early images were easy to break because labels, arrows, and charts used too many manual positions. One label length change could overlap a box. One chart tweak could push text out of frame.
The fix was to make illustrations more like UI components. Source illustrations are TSX files that use theme-aware primitives: scenes, rows, panels, cards, badges, arrows, metrics, and detail lists. The build turns those components into dark and light PNGs.
A source illustration looks closer to a small interface than a hand-drawn canvas:
1const illustration = defineIllustration((c) =>
2 Scene({
3 c,
4 title: 'Content Pipeline',
5 contentWidth: 1030,
6 children: [
7 Row({
8 width: '100%',
9 gap: 14,
10 children: stages.map((stage) =>
11 StepCard({
12 c,
13 title: stage.title,
14 subtitle: stage.subtitle,
15 tone: stage.tone,
16 width: 190,
17 })
18 ),
19 }),
20 ],
21 })
22)That style scales better. If every article needs custom visuals, the author shouldn't spend time nudging absolute coordinates. They should compose reliable pieces, review the rendered output, and fix the teaching idea.
The editing loop is blunt:
That loop is why a lesson on inference doesn't use a random e-commerce analogy when it should talk about TTFT, decode throughput, KV cache, and GPU memory. It's why an evaluation post talks about judges, calibration, pairwise ranking, and false positives instead of abstract "quality scores."
| Weak draft | Stronger rewrite |
|---|---|
| "Use RAG to improve answers." | "Use retrieval when the answer depends on private or fast-changing facts, then measure groundedness and citation precision." |
| "Agents call tools." | "The model proposes a typed tool call, code executes it, and the observation returns to the next model turn." |
| "Models need more context." | "The context packet should include only the evidence needed for the next decision, with source IDs preserved." |
| "Add diagrams for clarity." | "Add a diagram that changes how the reader reasons about a failure mode." |
That's why LeetLLM links concepts across the curriculum. If a blog introduces context packets, it should point to prompt engineering, retrieval, evaluation, and production deployment instead of making readers rebuild the map from scratch.
If you're building an AI-assisted learning platform, don't optimize for article count first. Optimize for a system that makes weak articles uncomfortable to publish.
The site runs on a deliberately boring stack. The interesting part is the content system; the infrastructure should make pages fast to edit, cheap to review, and predictable to run.
| Layer | What we use | Job |
|---|---|---|
| App shell | Next.js App Router, React, TypeScript | Render the curriculum, blog, auth-aware UI, and content pages from one codebase |
| Styling | Tailwind CSS, Shadcn/Radix primitives, Lucide icons | Keep the reading surface consistent without inventing one-off UI for every lesson |
| Content source | Markdown bundles in Git with JSON frontmatter | Make each article reviewable as source, with nearby illustrations and generated assets |
| Visual build | TSX illustration primitives and Mermaid diagram rendering | Turn editable source into stable dark and light PNG assets |
| Auth and data | Supabase auth and Postgres | Store user-owned state such as profiles, reading progress, bookmarks, and comments |
| CI | pnpm, ESLint, Vitest, TypeScript, Next build, content validators | Catch broken routes, bad references, stale assets, and app regressions during review |
| Hosting | Dockerized Next.js on Google Cloud Run behind Cloudflare | Serve the app with managed containers, autoscaling, and edge caching |
That split matters because each layer has a clear owner: Git for source, Next.js for rendering, Supabase for user state, Cloud Run for serving, Cloudflare for caching public pages, and validators for the content contract. When those responsibilities stay separate, the team can improve lessons without turning every article edit into an infrastructure project.
LLMs help in this workflow, but they aren't the product. The product is the learning path, the examples, the review loop, and the trust that each page earns.
That's how LeetLLM is built: not one prompt, but a content engineering system.