LeetLLM
LearnFeaturesPricingBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Pricing
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

ยฉ 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 104 articles completed

๐Ÿ› ๏ธComputing Foundations0/3
Python for AI EngineeringNumPy and Tensor ShapesData Structures for AI
๐Ÿ“ŠMath & Statistics0/4
Probability for MLStatistics and UncertaintyDistributions and SamplingHypothesis Tests and pass@k
๐Ÿ“šPreparation & Prerequisites0/9
Vectors, Matrices & TensorsNeural Networks from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsThe LLM Lifecycle
๐ŸงชCore LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFunction Calling & Tool UseChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
๐ŸงฎML Algorithms & Evaluation0/8
Linear Regression from ScratchValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment DesignPyTorch Training LoopsDataset Pipelines
๐ŸงฐApplied LLM Engineering0/17
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingMCP & Tool Protocol StandardsPrompt Injection DefenseAI Agent Evaluation and BenchmarkingProduction RAG PipelinesHybrid Search: Dense + SparseLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringPre-training Data at ScaleMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsDesign an Automated Support Agent
๐ŸŽ“Portfolio Capstones0/4
Capstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
๐Ÿง Transformer Deep Dives0/6
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNDecoding Strategies: Greedy to Nucleus
๐ŸงฌAdvanced Training & Adaptation0/10
Scaling Laws & Compute-Optimal TrainingDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
๐Ÿค–Advanced Agents & Retrieval0/12
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingAgent Memory & PersistenceHuman-in-the-Loop AgentsAgent Failure & RecoveryMulti-Agent Orchestration
โšกInference & Production Scale0/14
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Quantization: GPTQ, AWQ & GGUFSpeculative DecodingLong Context Window ManagementMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeGPU Serving & AutoscalingA/B Testing for LLMs
๐Ÿ—๏ธSystem Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
Track Your Progress

Create a free account to save your reading progress across devices and unlock the full learning experience.

LeetLLM Premium
  • All question breakdowns
  • Architecture diagrams
  • Model answers & rubrics
  • Follow-up Q&A analysis
  • New content weekly
Back to Topics
LearnPreparation & PrerequisitesPrompt Engineering Fundamentals
๐Ÿ“EasyNLP Fundamentals

Prompt Engineering Fundamentals

Learn how to steer language models using System messages, Zero-Shot vs Few-Shot prompting, and Chain of Thought reasoning.

20 min readGoogle, Meta, OpenAI +14 key concepts

We now know that modern LLMs are autoregressive models trained on massive text corpora and fine-tuned with human feedback. But how do you use them effectively as a developer?

Prompt Engineering is the art and science of communicating with LLMs to get reliable, high-quality, and structured outputs.

๐Ÿ’ก Key insight: An LLM is a brilliant but amnesiac intern. It has absorbed broad patterns from training data, but it knows nothing about your current project unless you put that context in the prompt.


The chat API structure

When you use ChatGPT in the browser, you just type a message. But under the hood (and when you use the API), the conversation is structured into distinct roles.

A typical API request looks like this:

json
1[ 2 {"role": "system", "content": "You are a senior Python engineer. Always reply with working code and no explanations."}, 3 {"role": "user", "content": "Write a function to reverse a string."}, 4 {"role": "assistant", "content": "def reverse_string(s):\n return s[::-1]"}, 5 {"role": "user", "content": "Now do it without slicing."} 6]
  1. System Message: The persistent "rules of the game". It sets the persona, formatting rules, and constraints. It heavily anchors the model's behavior.
  2. User Message: The specific request or question from the human.
  3. Assistant Message: The model's previous replies. You pass these back in so the model has memory of the conversation.
A visual comparison of Zero-Shot vs Few-Shot vs Chain of Thought prompting. A visual comparison of Zero-Shot vs Few-Shot vs Chain of Thought prompting.
Visual anchor: compare how zero-shot, few-shot, and reasoning prompts change what evidence sits inside the context window.

Prompting map

PatternAdd to contextBest beginner useFailure to watch
Zero-shotTask instruction onlyCommon tasks with obvious formatModel guesses hidden requirements.
Few-shotTwo or more examplesExact output shape or styleBad examples teach the wrong pattern.
Structured outputSchema and delimitersJSON, extraction, tool inputsUser data leaks into instructions.
Reasoning promptVisible intermediate work when appropriateMulti-step math or planningExtra reasoning can be unnecessary for reasoning-specific models.

Reliable prompts are built in layers. Put stable rules first, add trusted context and examples, isolate untrusted user data, then check the output shape.

Diagram Diagram

Zero-shot vs few-shot prompting

A zero-shot prompt asks the model to do something without examples. It relies entirely on its pre-training.

User: Translate "hello" to French.

A few-shot prompt provides examples in the prompt to teach the model the exact format or logic you want. This uses the model's in-context learning ability, made famous by GPT-3.[1]

User: English: cat -> French: chat English: dog -> French: chien English: bird -> French:

Few-shot prompting is incredibly powerful for forcing the model to output a specific JSON structure or adopt a weird, non-standard style.

Chain-of-thought prompting

One important prompt engineering discovery is chain-of-thought prompting.[2]

If you ask an LLM a complex math question and demand the answer immediately, it will often fail. Why? Because of autoregressive generation. The model has to output the final number immediately based only on the prompt.

Standard prompt

Q: If John has 5 apples, gives 2 to Mary, buys 3 more, and cuts them all in half, how many apple halves does he have? A:

For standard models, one fix is to make intermediate reasoning visible: add a phrase like "Let's think step by step", or provide a few-shot example that includes reasoning.

Chain-of-thought prompt

Q: If John has 5 apples, gives 2 to Mary, buys 3 more, and cuts them all in half, how many apple halves does he have? A: Let's think step by step. First, John starts with 5 apples. He gives 2 away, leaving him with 3. He buys 3 more, giving him 6. He cuts 6 apples in half, yielding 12 halves. The answer is 12.

By generating the intermediate steps, the model adds them to its context window. It uses its output as a scratchpad, making the final answer easier to predict.

Some reasoning models do extra hidden reasoning internally before outputting the final answer, so explicit chain-of-thought prompts aren't always the right interface.[3]


Best Practices for Developers

  1. Be specific and use delimiters: Use markdown, XML tags (<data>...</data>), or triple quotes (""") to clearly separate your instructions from the user's data. This prevents prompt injection and confuses the model less.
  2. Specify the output format: If you want JSON, ask for JSON and provide a schema.
  3. Give the model an "out": Tell it: "If you don't know the answer based on the provided text, say 'I don't know'." This reduces hallucinations.
  4. Repeat critical instructions near the end: Models can miss information buried in the middle of long contexts.[4] If you have a 10-page document, put the core instruction close to the final user request.

Next, continue to The LLM Lifecycle, where prompting fits into training, alignment, and deployment.

Evaluation Rubric
  • 1
    Explains the difference between System, User, and Assistant message roles
  • 2
    Differentiates between Zero-shot and Few-shot prompting
  • 3
    Demonstrates how Chain of Thought (CoT) prompting forces the model to reason before answering
Common Pitfalls
  • Treating the LLM like a human that can read your mind (if it's not in the prompt, the model doesn't know it)
  • Putting complex instructions at the top of a very long prompt (models suffer from 'lost in the middle' syndrome; put crucial instructions at the very end)
Follow-up Questions to Expect

Key Concepts Tested
The Chat completions API structure (System, User, Assistant messages)Zero-shot vs One-shot vs Few-shot promptingChain of Thought (CoT) prompting to improve reasoningClear instructions, delimiters, and specifying output formats
References

Language Models are Few-Shot Learners.

Brown, T., et al. ยท 2020 ยท NeurIPS 2020

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

Wei, J., et al. ยท 2022 ยท NeurIPS

Lost in the Middle: How Language Models Use Long Contexts

Liu, N.F., et al. ยท 2023 ยท TACL 2023

Reasoning models

OpenAI ยท 2026

Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail

Your account is free and you can post anonymously if you choose.