A high-level map of how an AI goes from a pile of empty GPUs to a production API: Pre-training, Fine-Tuning, Alignment, and Deployment.
You've reached the end of the Preparation & Prerequisites section. You now understand what neural networks are, how they learn via backpropagation, how they output probabilities using softmax, how Transformers generate text, and how prompts guide their behavior.
Before you dive into the technical engineering chapters of LeetLLM, it's helpful to have a map of the territory. How is an LLM built?
Creating a frontier-class model follows a multi-stage lifecycle.
๐ก Key insight: You don't train a chatbot from scratch. You train a massive text predictor, then shape it into an assistant.
| Stage | Input | Output | Engineer question |
|---|---|---|---|
| Pre-training | Web-scale text and code | Base model | Is data quality high enough for the compute spend? |
| SFT | Prompt-response examples | Instruct model | Does the model follow the product format? |
| Preference tuning | Ranked answers | Chat model | Does behavior match human preference and safety goals? |
| Inference | Frozen weights and requests | Served API | Can latency, cost, and reliability meet the product bar? |
The lifecycle is a feedback loop, not a one-way checklist. Serving the model reveals failures, and those failures feed the next training or tuning run.
This is where most of the compute budget goes.
You start with a massive neural network whose parameters are random. You train it on a huge mixture of text and code: web pages, books, papers, documentation, and repositories.
The objective is simple: next-token prediction. The model reads a chunk of text, guesses the next token, calculates cross-entropy loss, and uses backpropagation to update its weights.[1][2]
A base model contains a lot of knowledge, but it's awkward for consumers. If you prompt a base model with "How do I bake a cake?", it might continue with "How do I bake a pie? How do I bake cookies?" because it thinks it's continuing a list of questions. It hasn't been trained to act like an assistant.
To turn the base model into an assistant, we need to teach it the Q&A format.
Researchers hire human experts to write thousands of high-quality prompt-and-response pairs.
We continue training the model, but on this high-quality dataset.
The model now knows how to follow instructions and act like an assistant.
Even after SFT, the model might be toxic, helpful to hackers, or overly verbose. We need to align it with human values (helpful, honest, harmless).
This is often done using RLHF (Reinforcement Learning from Human Feedback) or newer techniques like DPO (Direct Preference Optimization). InstructGPT is the classic RLHF example.[3]
Humans are given two different answers generated by the model for the same prompt, and they vote on which one is better. The model is then trained to maximize the probability of generating the "winning" type of answer.
Once the weights are frozen, the training is done. The model is deployed to servers so users can access it. This phase is called Inference.
Unlike training, inference doesn't update the weights. But running a large model for millions of users is an extreme engineering challenge.
Inference engineers spend their time optimizing:
Now that you know the lifecycle, here is how the rest of the LeetLLM roadmap maps to the work:
Next, start Core LLM Foundations with The Bitter Lesson. That chapter explains why scalable learning and search beat hand-coded AI shortcuts.