Master the architecture of a real-time content moderation system using LLMs and specialized classifiers.
A/B Testing taught you how to decide whether a model change should ship. This capstone asks a harder question: how do you give each user-generated message, listing, photo, and appeal the policy decision path its product surface requires?
You are the Staff AI Engineer at LogiFlow, the global e-commerce and logistics platform that moves 18 million orders per day across 180 countries. Your sellers post 2.4 million new listings every 24 hours. Buyers and sellers chat live inside the app about shipping windows, returns, and product authenticity. Chat messages need a synchronous send decision; new listings can instead remain pending while slower image or appeal workflows complete. Each surface needs an explicit moderation contract because false negatives harm users while false positives interrupt legitimate commerce.
A real-time content moderation system makes policy decisions fast enough for the surfaces that require synchronous enforcement. This capstone uses a design target of 10K requests per second (RPS) and a sub-200 ms p95 chat decision budget to reason about classifiers, fingerprint lookups, policy-aware judges, restricted review paths, appeals, and versioned policy releases. Those numbers are scenario requirements, not universal product guarantees.
The fulfillment-center analogy is useful. Barcode scanners and weight checks catch predefined problems quickly. Vision systems and rules engines flag suspicious packages for secondary inspection. Ambiguous or high-impact cases reach authorized specialists. Content moderation follows the same pattern: route routine high-confidence traffic through validated fast paths, and reserve contextual model or reviewer work for cases that need it.
Designing this system means balancing the throughput of lightweight classifiers with the context available to slower policy-aware models and reviewers. A small BERT (Bidirectional Encoder Representations from Transformers)-style classifier can score common categories cheaply after it is evaluated on local traffic, while novel exceptions and cross-turn meaning often need richer context. The engineering challenge is to route each content type through an enforcement path whose latency, error costs, and appeal requirements are measured.
A production-grade moderation system for this marketplace scenario has operational constraints that must be defined before model selection. It should minimize friction for legitimate users while handling likely violations under a declared policy and review process.
For this design exercise, target <200 ms p95 (95th percentile) for chat send decisions and <1 s for a listing to receive either a publish decision or an explicit pending-review state. Allocate that budget across gateway, deterministic checks, classifiers, and any synchronous escalation; measure user impact rather than asserting a universal latency threshold.
Handle the stated 10K+ RPS target with varying load spikes. E-commerce platforms experience traffic surges during flash sales and holiday shipping windows. Capacity planning needs both steady-state and burst assumptions, plus queue limits for asynchronous review.
Support evolving content policies without requiring a full model retrain for every change. A new counterfeit wording pattern may require a versioned rule or policy-pack update first, followed by regression evaluation, approval, monitored release, and later classifier retuning or retraining.
Moderate text, images, video, and audio simultaneously. Users interact across multiple formats, and malicious actors often hide violations in cross-media contexts (for example, placing hateful text over a benign image). On e-commerce platforms, this includes counterfeit product photos, misleading unboxing videos, and abusive audio in customer support calls.
Support transparent appeal workflows and appropriately restricted review loops. Automation errors are inevitable, so affected users need an explainable remediation path; highly sensitive safety reports require a separate authorized process rather than a generic review queue.
A useful production pattern is a cascade or tiered architecture. Content flows through a funnel of differently scoped controls: deterministic signatures and rules, learned classifiers, policy-aware review models, and authorized human workflows. Later stages can use more context, but they are not automatically more correct; each action path needs its own evaluation.
By placing validated, lightweight controls at the top of the funnel, the architecture can preserve low latency for routine traffic. Ambiguous, context-dependent, or high-impact content can be held or escalated to slower paths without making every request pay that cost.
Avoid making every request pay for a contextual judge. Let validated Tier 1 paths handle routine decisions, while uncertainty and high-impact actions follow the policy-defined escalation path.
Three pieces of seller-generated content arrive in the same second on LogiFlow:
Seller listing description: "Bulk packing boxes, pack of 20, ships tomorrow."
Deterministic checks find no prohibited signature and a validated low-risk listing classifier stays below its review threshold. The system records an ALLOW under the current listing policy scope. It does not treat a seller identity or familiar wording as proof that every future listing is safe.
Buyer-seller chat turn: "I will come to your warehouse tonight and destroy your inventory." A violence classifier scores 0.97. If that score exceeds a validated chat-threat block threshold, the message is held before delivery and routed under the threat-response policy; the decision and policy version are logged.
Seller photo + caption: A picture of a handbag with the text overlay "100% authentic Louis Vuitton (limited drop), $89." Image and text signals exceed a review threshold but not an auto-block threshold. The listing remains pending while Tier 2 receives the signals, listing context, and policy clause, then recommends human review. If an appeal later overturns enforcement, that labelled outcome becomes a candidate hard negative for evaluation and retraining.
Without a cascade, every upload pays for contextual inference or enforcement depends on controls too crude for context-dependent cases. A tiered design lets LogiFlow benchmark a low-latency routine path while holding uncertain or high-impact listings for deeper review.
To balance speed and contextual understanding, the system divides the workload across three distinct operational tiers. Each tier serves a specific purpose in the funnel, filtering out clear-cut cases and escalating only what it can't confidently resolve.
The first line of defense is a high-performance, specialized model (typically a distilled BERT, DeBERTa (Decoding-enhanced BERT with disentangled attention), or even a fastText (a library for efficient learning of word representations and text classification) classifier). These are smaller, faster versions of large transformers, trained on labeled data to detect specific categories of violations.
[prob_hate, prob_violence, prob_spam, ...]).A production Tier 1 classifier can be deployed through an ONNX (Open Neural Network Exchange) or TensorRT runtime. The controller logic around that model is still simple: inspect every category score, choose the strongest permitted block candidate first, otherwise route the strongest review candidate onward.
Threshold tuning is itself a product decision. Set
blockthresholds conservatively to avoid over-blocking.reviewthresholds can be looser since humans or Tier 2 are in the loop. Tune these thresholds against measured false-positive and false-negative rates.
1from dataclasses import dataclass
2from typing import Literal
3
4CATEGORIES = [
5 "hate_speech", "violence", "sexual_content",
6 "self_harm", "spam", "misinformation"
7]
8
9@dataclass
10class ModerationResult:
11 action: Literal["ALLOW", "BLOCK", "REVIEW"]
12 category: str | None = None
13 score: float = 0.0
14
15class ThresholdController:
16 def __init__(self):
17 self.thresholds = {
18 "hate_speech": {"block": 0.98, "review": 0.65},
19 "violence": {"block": 0.95, "review": 0.60},
20 "sexual_content": {"block": 0.97, "review": 0.70},
21 "self_harm": {"block": 0.92, "review": 0.55},
22 "spam": {"block": 0.99, "review": 0.80},
23 "misinformation": {"block": 0.97, "review": 0.75},
24 }
25
26 def decide(self, scores: dict[str, float]) -> ModerationResult:
27 best_block: tuple[str, float] | None = None
28 best_review: tuple[str, float] | None = None
29
30 for category in CATEGORIES:
31 score = scores.get(category, 0.0)
32 thresholds = self.thresholds[category]
33
34 if score >= thresholds["block"]:
35 if best_block is None or score > best_block[1]:
36 best_block = (category, score)
37 elif score >= thresholds["review"]:
38 if best_review is None or score > best_review[1]:
39 best_review = (category, score)
40
41 if best_block is not None:
42 return ModerationResult(action="BLOCK", category=best_block[0], score=best_block[1])
43
44 if best_review is not None:
45 return ModerationResult(action="REVIEW", category=best_review[0], score=best_review[1])
46
47 return ModerationResult(action="ALLOW")
48
49controller = ThresholdController()
50
51examples = {
52 "spammy listing": {"spam": 0.85, "violence": 0.02},
53 "warehouse threat": {"violence": 0.96, "hate_speech": 0.08},
54 "return policy": {"spam": 0.04, "violence": 0.03, "misinformation": 0.05},
55}
56
57for label, scores in examples.items():
58 print(label, "=>", controller.decide(scores))1spammy listing => ModerationResult(action='REVIEW', category='spam', score=0.85)
2warehouse threat => ModerationResult(action='BLOCK', category='violence', score=0.96)
3return policy => ModerationResult(action='ALLOW', category=None, score=0.0)This detail matters in production because moderation is a multi-label problem. A post can look borderline in one category and clearly violating in another. The controller needs to inspect all category scores before emitting a final action.
Treating buyer-seller dispute language the same as general social text creates false positives. A message like "This product is fake garbage, you scammer" can be a legitimate refund dispute from a wronged buyer. Include conversation context, order status, previous messages, and user history before auto-blocking.
For content that is uncertain (for example, threats versus sports slang, or policy criticism versus targeted harassment), send a richer context package to a policy-aware judge model if the synchronous latency budget permits it; otherwise hold the action for asynchronous review. Candidate safeguard layers include Meta's Llama Guard line[1], Google's ShieldGemma[2], hosted multimodal moderation APIs such as OpenAI's omni-moderation model[3], or an internal judge wrapped with strict output schemas.
The design split is operational: a deployed safeguard model scores the taxonomy and version it was tested against, while a policy-injected judge can consume a newer approved policy pack without retraining its weights. That does not make a prompt edit enforcement-ready by itself. New policy packs still need golden-case and adversarial evaluation, schema validation, approval, monitoring, and rollback.
Instead of baking every rule into model weights, we inject the current policy definitions directly into prompt or retrieval context. That makes the system policy-aware and lets many rule changes ship as configuration updates rather than full retrains. The code block below illustrates a system prompt containing the exact policy rules. It serves as the instruction template for the Tier 2 judge. The template takes the user's content as an input variable and instructs the model to output a structured JSON decision with a short policy rationale.
1System: You are a content moderation expert.
2Your task is to classify user content based on the following policy:
3
4<POLICY_DEFINITION>
5Hate Speech: Dehumanizing speech, calls for violence, or inferiority claims based on protected characteristics (race, religion, etc.).
6Exceptions:
7- Counterspeech (raising awareness)
8- Self-referential use (reclaimed terms)
9- Fictional content (unless glorifying)
10</POLICY_DEFINITION>
11
12Analyze the content below against the policy and return only JSON:
13{ "action": "ALLOW" | "BLOCK" | "ESCALATE", "category": "...", "confidence": 0.0, "rationale": "one-sentence policy explanation" }
14
15Content: {user_content}The judge response is untrusted model output until the controller validates its schema and attaches the policy version that produced it:
1from dataclasses import dataclass
2from typing import Literal
3
4Action = Literal["ALLOW", "BLOCK", "ESCALATE"]
5ALLOWED_ACTIONS = {"ALLOW", "BLOCK", "ESCALATE"}
6ALLOWED_CATEGORIES = {"none", "threat", "harassment", "counterfeit"}
7
8@dataclass(frozen=True)
9class Decision:
10 action: Action
11 category: str
12 confidence: float
13 rationale: str
14 policy_version: str
15
16def validate_judge_payload(payload: dict[str, object], policy_version: str) -> Decision:
17 action = payload.get("action")
18 category = payload.get("category")
19 confidence = payload.get("confidence")
20 rationale = payload.get("rationale")
21 if action not in ALLOWED_ACTIONS:
22 raise ValueError("unknown action")
23 if category not in ALLOWED_CATEGORIES:
24 raise ValueError("unknown category")
25 if not isinstance(confidence, (int, float)) or not 0 <= confidence <= 1:
26 raise ValueError("invalid confidence")
27 if not isinstance(rationale, str) or not rationale.strip():
28 raise ValueError("missing rationale")
29 return Decision(action, category, float(confidence), rationale, policy_version)
30
31payloads = [
32 {"action": "ESCALATE", "category": "counterfeit", "confidence": 0.72,
33 "rationale": "Image and caption require authenticity review."},
34 {"action": "DELETE_FOREVER", "category": "counterfeit", "confidence": 0.99,
35 "rationale": "Unsupported enforcement action."},
36]
37
38for payload in payloads:
39 try:
40 decision = validate_judge_payload(payload, "listing-policy-v44")
41 print("accepted:", decision.action, decision.policy_version)
42 except ValueError as error:
43 print("rejected:", error)1accepted: ESCALATE listing-policy-v44
2rejected: unknown actionWhen the judge returns ESCALATE, fails output validation, or cannot make a permitted enforcement decision, ordinary ambiguous content can enter human review. This is an expensive and slow path, but it supports remediation and labelled evaluation data.
Review routes and service-level objectives depend on the action and harm category. Some sensitive categories require restricted workflows and applicable reporting procedures rather than being displayed in a general moderation queue.
| Route | Example Cases | Handling Principle |
|---|---|---|
| Restricted safety workflow | Apparent CSAM, credible imminent harm | Hold access, preserve required records, and send only to authorized specialists or required reporting paths. |
| Urgent enforcement review | Severe threats or high-impact abuse | Prioritize under a product-defined SLA and log the policy basis. |
| Standard review / appeal | Counterfeit ambiguity, ordinary disputes | Queue with decision context and an appeal route. |
Reviewer safety tooling can include protected access, blurring by default, controlled media playback, workload rotation, and support resources. Review outcomes can become evaluation or training labels only through governed data handling and quality checks.
1from dataclasses import dataclass
2from typing import Literal
3
4Route = Literal["RESTRICTED", "URGENT", "STANDARD"]
5
6@dataclass(frozen=True)
7class ReviewCase:
8 category: str
9 confidence: float
10 apparent_illegal_material: bool = False
11
12def route_case(case: ReviewCase) -> Route:
13 if case.apparent_illegal_material:
14 return "RESTRICTED"
15 if case.category in {"credible_threat", "severe_harassment"} and case.confidence >= 0.8:
16 return "URGENT"
17 return "STANDARD"
18
19cases = [
20 ReviewCase("counterfeit", 0.74),
21 ReviewCase("credible_threat", 0.92),
22 ReviewCase("apparent_cs_material", 0.88, apparent_illegal_material=True),
23]
24
25for case in cases:
26 print(case.category, "=>", route_case(case))1counterfeit => STANDARD
2credible_threat => URGENT
3apparent_cs_material => RESTRICTEDThis marketplace includes text, images, video, and audio, so its policy system needs paths for every supported content type. The architecture extends the tiered approach by passing each medium through specialized signals before combining policy-relevant context.
A major engineering challenge in multimodality is handling cross-modal context. A benign text overlay like "Look at what I found today" becomes a policy violation when paired with a graphic image. On e-commerce platforms, a listing photo showing a generic handbag paired with text "100% authentic Louis Vuitton" is a counterfeit violation only visible when both modalities are judged together. To handle this, the outputs from individual classifiers need to feed into a late-fusion layer or a multimodal LLM that can evaluate the combined context of the post.
| Content Type | Tier 1 (Fast) | Tier 2 (Deep) |
|---|---|---|
| Text | DistilBERT / DeBERTa | Llama Guard / ShieldGemma / policy-injected LLM judge |
| Images | ResNet / EfficientNet (NSFW (Not Safe For Work) detection) | Multimodal safeguard (Llama Guard 4, omni-moderation) or VLM judge |
| Video | Keyframe sampling + Image Classifier | Multimodal judge / vision-language model on sampled frames |
| Audio | Audio event classification | Whisper transcription → Text Pipeline |
Here is how a keyframe sampler works in practice. Given a video, it extracts frames at a fixed interval (e.g., every 1 second) plus additional frames whenever a significant scene change is detected. The combined set of keyframes is then passed to the image classifier pipeline. Scenes with rapid cuts or flashing content need extra coverage.
1from dataclasses import dataclass
2
3@dataclass
4class Keyframe:
5 frame_index: int
6 timestamp_sec: float
7 is_scene_cut: bool
8
9def extract_keyframes(
10 frame_luminance: list[float],
11 fps: float = 4.0,
12 sample_interval_sec: float = 1.0,
13 scene_cut_threshold: float = 30.0,
14) -> list[Keyframe]:
15 """
16 Illustrates regular sampling plus scene-cut detection.
17 Production systems compute the luminance series from decoded video frames.
18 """
19 if fps <= 0:
20 raise ValueError("fps must be positive")
21
22 keyframes: list[Keyframe] = []
23 interval_frames = max(1, round(sample_interval_sec * fps))
24 last_luminance: float | None = None
25
26 for frame_idx, luminance in enumerate(frame_luminance):
27 is_scene_cut = False
28 if last_luminance is not None:
29 diff = abs(luminance - last_luminance)
30 is_scene_cut = diff > scene_cut_threshold
31
32 if is_scene_cut:
33 keyframes.append(Keyframe(frame_idx, frame_idx / fps, True))
34 elif frame_idx % interval_frames == 0:
35 keyframes.append(Keyframe(frame_idx, frame_idx / fps, False))
36
37 last_luminance = luminance
38
39 return keyframes
40
41sampled_luminance = [10, 11, 12, 13, 15, 16, 82, 84, 85, 86, 18, 19]
42
43for keyframe in extract_keyframes(sampled_luminance):
44 print(keyframe)1Keyframe(frame_index=0, timestamp_sec=0.0, is_scene_cut=False)
2Keyframe(frame_index=4, timestamp_sec=1.0, is_scene_cut=False)
3Keyframe(frame_index=6, timestamp_sec=1.5, is_scene_cut=True)
4Keyframe(frame_index=8, timestamp_sec=2.0, is_scene_cut=False)
5Keyframe(frame_index=10, timestamp_sec=2.5, is_scene_cut=True)Sampling supplies evidence; it does not decide policy alone. A fusion controller can hold a listing when two moderate signals jointly cross a review boundary:
1from dataclasses import dataclass
2from typing import Literal
3
4Action = Literal["ALLOW", "REVIEW", "BLOCK"]
5
6@dataclass(frozen=True)
7class Signals:
8 caption_counterfeit: float
9 image_counterfeit: float
10 known_prohibited_signature: bool = False
11
12def fuse_for_listing(signals: Signals) -> Action:
13 if signals.known_prohibited_signature:
14 return "BLOCK"
15 combined = 0.45 * signals.caption_counterfeit + 0.55 * signals.image_counterfeit
16 return "REVIEW" if combined >= 0.65 else "ALLOW"
17
18listings = {
19 "ordinary packaging": Signals(0.08, 0.12),
20 "suspicious brand claim": Signals(0.71, 0.68),
21 "approved prohibited signature": Signals(0.10, 0.10, True),
22}
23
24for label, signals in listings.items():
25 print(label, "=>", fuse_for_listing(signals))1ordinary packaging => ALLOW
2suspicious brand claim => REVIEW
3approved prohibited signature => BLOCKFor live streams, use a sliding window of the last N frames and continuously emit keyframes to the moderation pipeline. For on-demand video uploads, batch processing is cheaper because you can wait for the full file before starting inference.
Once the individual pipelines process their respective modalities, a calibrated fusion layer can combine their signals or provide context for a multimodal judge. Thresholds must be fitted on representative labelled content because raw scores from different models are not automatically comparable. If the combined signal falls within the uncertain range, hold the listing and escalate the relevant context to review.
Treating video as a black box hides risk. Moderating every frame is usually too expensive, so use keyframe sampling plus audio transcription. For flashing or rapidly cut content, raise the sampling rate or add scene-change triggers.
A major challenge in moderation is that policy and abuse patterns change. A new counterfeit phrasing may need an urgent response. A versioned Tier 2 policy pack can often be evaluated and released faster than a retrained classifier artifact, but it must not bypass approval, regression cases, monitoring, or rollback. Tier 1 can receive an approved deterministic signature quickly when an exact known pattern exists; learned generalization still needs examples, threshold tuning, and deployment evaluation.
A Tier 2 prompt or retrieval update is a candidate release, not an enforcement shortcut. Run policy regression cases, check output schema and thresholds, approve the new version, and monitor its rollout.
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class GoldenCase:
5 text: str
6 expected: str
7
8def decide(text: str, blocked_phrases: tuple[str, ...]) -> str:
9 lowered = text.lower()
10 return "BLOCK" if any(phrase in lowered for phrase in blocked_phrases) else "ALLOW"
11
12def evaluate_candidate(version: str, phrases: tuple[str, ...], cases: list[GoldenCase]) -> bool:
13 failures = [
14 case.text for case in cases
15 if decide(case.text, phrases) != case.expected
16 ]
17 print(version, "failures:", failures or "none")
18 return not failures
19
20golden_cases = [
21 GoldenCase("counterfeit replica handbag", "BLOCK"),
22 GoldenCase("replica display model for classroom", "ALLOW"),
23 GoldenCase("packing boxes, ships tomorrow", "ALLOW"),
24]
25
26candidates = {
27 "policy-v43": ("counterfeit replica",),
28 "policy-v44-draft": ("replica",),
29}
30
31for version, phrases in candidates.items():
32 print("release:", version, evaluate_candidate(version, phrases, golden_cases))1policy-v43 failures: none
2release: policy-v43 True
3policy-v44-draft failures: ['replica display model for classroom']
4release: policy-v44-draft FalseWell-designed systems also continuously generate "red team" data to test classifiers against adversarial attacks (for example, using datasets like ToxiGen[4] to validate resilience against implicit hate speech).
Building a system that can accurately classify content is only half the challenge. The other half is making sure the system stays responsive when traffic spikes unpredictably during major global events. Achieving 10,000+ requests per second (RPS) requires heavy optimization at the infrastructure layer. We can use a combination of caching, batching, and intelligent routing to keep latency low and compute costs manageable.
Duplicate and near-duplicate content is common (reposts, viral memes, copypasta, and mass-uploaded counterfeit product listings). A fingerprint layer can avoid repeated model work, but only when its reuse rule does not silently broaden enforcement.
An exact decision cache can reuse an approved result for identical content under the same policy version, enforcement scope, and content type. A SimHash-style locality-sensitive hashing (LSH) index can find near-duplicates, but similarity is weaker evidence: route the match to review or an additional validated detector instead of copying a block decision automatically. The runnable version below keeps stores in memory so you can see that distinction without external services.
Cache namespaces must include policy version and enforcement scope. Otherwise yesterday's ALLOW, or another region's BLOCK, can be applied under a different rule.
Here is why LSH still matters. If a reviewed abusive message is reposted with a small addition, an exact SHA256 match misses it while SimHash can retrieve the related prior case. A tuned Hamming-distance threshold trades recall for false-positive risk; the example uses a deliberately permissive threshold to expose the near-duplicate routing behavior, not to prescribe an enforcement threshold.
1import hashlib
2from dataclasses import dataclass
3from typing import Literal
4
5@dataclass(frozen=True)
6class Scope:
7 policy_version: str
8 enforcement_scope: str
9 content_type: str
10
11@dataclass
12class CachedModerationResult:
13 action: Literal["ALLOW", "BLOCK", "REVIEW"]
14 category: str | None = None
15
16@dataclass(frozen=True)
17class CacheHit:
18 match: Literal["EXACT_DECISION", "SIMILAR_REVIEW_CANDIDATE"]
19 action: Literal["ALLOW", "BLOCK", "REVIEW"]
20 category: str | None
21
22def sha256(text: str) -> str:
23 return hashlib.sha256(text.encode()).hexdigest()
24
25def normalize(text: str) -> str:
26 return (
27 text.lower()
28 .replace("1", "i")
29 .replace("0", "o")
30 .strip()
31 )
32
33def exact_key(content: str, scope: Scope) -> str:
34 scoped_text = (
35 f"{scope.policy_version}:{scope.enforcement_scope}:"
36 f"{scope.content_type}:{normalize(content)}"
37 )
38 return sha256(scoped_text)
39
40def simhash64(text: str) -> int:
41 weights = [0] * 64
42 for token in normalize(text).split():
43 digest = int.from_bytes(hashlib.blake2b(token.encode(), digest_size=8).digest(), "big")
44 for bit in range(64):
45 weights[bit] += 1 if digest & (1 << bit) else -1
46 return sum((1 << bit) for bit, weight in enumerate(weights) if weight >= 0)
47
48def hamming_distance(a: int, b: int) -> int:
49 return (a ^ b).bit_count()
50
51class ModerationIndex:
52 def __init__(self, scope: Scope, max_distance: int = 16):
53 self.scope = scope
54 self.max_distance = max_distance
55 self.exact_cache: dict[str, CachedModerationResult] = {}
56 self.fuzzy_cache: dict[int, CachedModerationResult] = {}
57
58 def check(self, content: str, scope: Scope) -> CacheHit | None:
59 if scope != self.scope:
60 return None
61 if cached := self.exact_cache.get(exact_key(content, scope)):
62 return CacheHit("EXACT_DECISION", cached.action, cached.category)
63 fingerprint = simhash64(content)
64 for known_fingerprint, result in self.fuzzy_cache.items():
65 if hamming_distance(fingerprint, known_fingerprint) <= self.max_distance:
66 return CacheHit("SIMILAR_REVIEW_CANDIDATE", "REVIEW", result.category)
67 return None
68
69 def store(self, content: str, result: CachedModerationResult) -> None:
70 self.exact_cache[exact_key(content, self.scope)] = result
71 self.fuzzy_cache[simhash64(content)] = result
72
73eu_scope = Scope("policy-v43", "eu-chat", "text")
74us_scope = Scope("policy-v43", "us-chat", "text")
75cache = ModerationIndex(scope=eu_scope)
76blocked = CachedModerationResult("BLOCK", "harassment")
77
78cache.store("You are such an idiot", blocked)
79
80checks = [
81 ("exact same scope", "You are such an idiot", eu_scope),
82 ("similar same scope", "You are such an idiot scammer", eu_scope),
83 ("exact different scope", "You are such an idiot", us_scope),
84]
85for label, text, scope in checks:
86 print(label, "=>", cache.check(text, scope))1exact same scope => CacheHit(match='EXACT_DECISION', action='BLOCK', category='harassment')
2similar same scope => CacheHit(match='SIMILAR_REVIEW_CANDIDATE', action='REVIEW', category='harassment')
3exact different scope => NoneOn repost-heavy surfaces, measure how often exact hits safely avoid inference and how often similarity retrieval improves review throughput without raising false blocks.
Here is how fingerprinting strategies differ in enforcement strength:
| Strategy | What it Catches | Safe Default Use |
|---|---|---|
| SHA256 exact hash | Identical bytes or normalized text | Reuse a permitted decision only in the same policy scope. |
| SimHash + LSH | Minor text edits and obfuscation | Retrieve related cases; route uncertain matches to review. |
| Perceptual hashing (pHash) | Image/video transformations | Match approved signatures or provide review evidence. |
| Embedding cosine similarity | Semantically related content | Candidate retrieval with tuned thresholds and audit metrics. |
Latency and hit rate depend on index design and scale; benchmark them on the actual workload rather than attaching universal millisecond values.
Exact-cache savings can be large on repost-heavy traffic. Similarity hits are useful too, but should not turn approximate matching into unreviewed enforcement.
Micro-batching amortizes GPU launch overhead and memory transfers across requests. Its benefit and latency cost must be measured on the deployed model and traffic shape.
GPUs are throughput-optimized devices. Processing requests one by one can underutilize available compute. Micro-batching accumulates incoming requests into a bounded batch before running inference, so the GPU processes items together and shares overhead; the configured size and timer are workload choices, not universal constants.
This is distinct from training-time gradient accumulation. At inference time, each request remains a separate moderation decision. The table below is illustrative load-test data for reasoning about the tradeoff, not a benchmark promise:
| Component | Single Request | Batch of 32 | Throughput Gain |
|---|---|---|---|
| Tier 1 (BERT) | 3ms | 8ms | 12× |
| Tier 2 (LLM) | 150ms | 400ms | 12× |
| Embedding | 5ms | 10ms | 16× |
The scheduler needs a timer as well as a maximum batch size. The following toy schedule emits a full batch promptly during a burst and expires a partially filled batch before its wait budget is exceeded:
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class Request:
5 request_id: str
6 arrival_ms: int
7
8def schedule_batches(
9 requests: list[Request],
10 max_batch_size: int,
11 max_wait_ms: int,
12) -> list[tuple[int, list[str]]]:
13 emitted: list[tuple[int, list[str]]] = []
14 pending: list[Request] = []
15 opened_ms: int | None = None
16 for request in requests:
17 if pending and opened_ms is not None and request.arrival_ms > opened_ms + max_wait_ms:
18 emitted.append((opened_ms + max_wait_ms, [item.request_id for item in pending]))
19 pending = []
20 opened_ms = None
21 if not pending:
22 opened_ms = request.arrival_ms
23 pending.append(request)
24 if len(pending) == max_batch_size:
25 emitted.append((request.arrival_ms, [item.request_id for item in pending]))
26 pending = []
27 opened_ms = None
28 if pending and opened_ms is not None:
29 emitted.append((opened_ms + max_wait_ms, [item.request_id for item in pending]))
30 return emitted
31
32requests = [Request("a", 0), Request("b", 1), Request("c", 2), Request("d", 20), Request("e", 27)]
33for sent_at, ids in schedule_batches(requests, max_batch_size=3, max_wait_ms=5):
34 print(f"send at {sent_at} ms:", ids)1send at 2 ms: ['a', 'b', 'c']
2send at 25 ms: ['d']
3send at 32 ms: ['e']Two common accumulation strategies exist, each with a different latency-throughput trade-off:
Production systems commonly use a hybrid: start a timer when the first request arrives, and fire the batch when either the timer expires or the batch fills, whichever comes first. This bounds the scheduler's added wait time; end-to-end p95 or p99 still depends on queues, inference time, downstream review, and overload behavior.
The actual batching and GPU scheduling is usually handled by specialized inference servers. Triton Inference Server can host custom batching logic, while systems built around PagedAttention and continuous batching (such as vLLM) keep decode pipelines saturated under mixed loads[5]. Both approaches help avoid the GPU starvation that occurs when requests are processed one by one.
Batching alone is not enough. Set per-request deadlines at the gateway layer, then define a risk-approved timeout action such as hold for review or suppress an individual message. Do not silently bypass a required moderation decision on timeout.
Network distance contributes to latency, so synchronous chat moderation may benefit from classifiers deployed near request traffic. Slower review paths can use regional hubs when latency, data-residency, model availability, and cost requirements permit it.
Beyond latency, geographic distribution raises a policy-routing problem. Applicable duties can depend on where content is offered, the user/account market, the action being taken, and legal policy configuration. A reviewed policy-resolution service should produce the enforcement scope used by inference and auditing.
enforcement_scope, policy overlay, and appeal/reporting route. Tier 2 receives that approved overlay alongside the base policy.
One deployment option is Tier 1 in high-volume regions and Tier 2 in fewer regional hubs, with synchronous escalation only where its measured latency fits the surface budget. Placement is an engineering and compliance decision; language quality, residency rules, reviewer availability, and measured traffic may lead to different regional layouts.
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class PolicyScope:
5 enforcement_scope: str
6 policy_version: str
7 appeal_route: str
8
9SCOPES = {
10 "EU": PolicyScope("eu-marketplace-listing", "eu-v17", "eu-appeals"),
11 "US": PolicyScope("us-marketplace-listing", "us-v11", "us-appeals"),
12}
13
14def resolve_scope(account_market: str, offered_markets: set[str], ip_hint: str) -> PolicyScope | None:
15 if account_market not in offered_markets:
16 return None
17 # IP is logged as a fraud/routing hint, not used alone to change enforcement.
18 _ = ip_hint
19 return SCOPES.get(account_market)
20
21for account_market, offered, ip_hint in [
22 ("EU", {"EU", "US"}, "US"),
23 ("CA", {"US"}, "US"),
24]:
25 scope = resolve_scope(account_market, offered, ip_hint)
26 print(account_market, "=>", scope.enforcement_scope if scope else "HOLD_FOR_SCOPE_REVIEW")1EU => eu-marketplace-listing
2CA => HOLD_FOR_SCOPE_REVIEWNo moderation system is perfect. The cost of false positives and false negatives varies by category and action. A counterfeit-listing auto-block may require very high precision and a fast appeal path; apparent child sexual abuse material (CSAM) or credible imminent harm requires a restricted safety workflow and applicable reporting/escalation obligations. Thresholds and remediation therefore belong to policy, not to a single global precision-versus-recall rule.
When a user appeals a moderation decision:
Automation can prioritize appeals and surface obvious mismatches, but the proportion resolved automatically is a measured product outcome, not an architectural assumption.
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class DecisionRecord:
5 action: str
6 category: str
7 policy_version: str
8 restricted: bool
9
10def route_appeal(record: DecisionRecord, requested_policy_version: str) -> str:
11 if record.restricted:
12 return "RESTRICTED_SAFETY_PROCESS"
13 if requested_policy_version != record.policy_version:
14 return "LOG_POLICY_CHANGE_BEFORE_REVIEW"
15 return "INDEPENDENT_REVIEW_SAME_POLICY"
16
17records = [
18 DecisionRecord("BLOCK", "counterfeit", "listing-v44", False),
19 DecisionRecord("BLOCK", "counterfeit", "listing-v44", False),
20 DecisionRecord("HOLD", "apparent_cs_material", "safety-v9", True),
21]
22versions = ["listing-v44", "listing-v45", "safety-v9"]
23
24for record, version in zip(records, versions):
25 print(route_appeal(record, version))1INDEPENDENT_REVIEW_SAME_POLICY
2LOG_POLICY_CHANGE_BEFORE_REVIEW
3RESTRICTED_SAFETY_PROCESSThe appeal system's real value is in generating training data:
Training moderation models on harmful content requires strict safety controls to prevent inadvertent exposure. In practice, teams combine carefully controlled real examples from review queues with synthetic and adversarial examples that broaden coverage without forcing every engineer to handle raw toxic data. For the most sensitive categories, systems should prefer hashes, signatures, and restricted-access review tooling over broad dataset access. Human annotators who do review raw content operate under rotation policies and have access to psychological support, limiting cumulative exposure. Verified decisions from the human review pipeline become ground-truth labels for offline retraining cycles, ensuring the automated system improves while raw harmful data stays tightly controlled.
Content policies are rarely global. What's considered standard political discourse in one country might be illegal hate speech in another. A well-architected moderation system needs to be flexible enough to apply different rulesets based on user geography, while still maintaining a baseline of universal safety.
| Region | Regulation | Implication |
|---|---|---|
| EU | DSA (Digital Services Act)[6] | Covered moderation decisions can require statements of reasons and related transparency processes; encode the required route in policy. |
| US | CyberTipline reporting path[8] | Apparent CSAM follows restricted handling and applicable electronic-service-provider reporting procedures. |
| India | IT Rules 2021 (+ amendments)[7] | Encode applicable grievance and takedown handling after legal review. |
In a system design interview or production review, you should be able to explain:
ALLOW, BLOCK, and REVIEW thresholds instead of treating moderation as one binary cutoff.Symptom: moderation costs explode and p95 latency misses target. Cause: every request reaches the LLM judge. Fix: expand evaluated deterministic or classifier paths where policy permits, then reserve Tier 2 for cases needing context.Symptom: legitimate seller listings are blocked after a policy update. Cause: a new Tier 2 policy pack skipped regression gates, or Tier 1 still applies an incompatible artifact. Fix: release a tested version with rollback, then retrain or retune the fast path with fresh appeals and misses.Symptom: old or merely similar content receives an automatic enforcement decision. Cause: cache keys ignore scope or near-duplicate retrieval copies an action. Fix: namespace exact decisions by version/scope and route approximate matches to review.Symptom: GPU throughput improves but chat feels slower. Cause: batching waits too long or lets one slow request hold the batch. Fix: use size-or-timer batching plus gateway deadlines and explicit fallback behavior.Symptom: refund disputes and counterfeit complaints get over-moderated as harassment. Cause: buyer-seller commerce language is treated like generic social chat. Fix: include order, listing, and conversation context before auto-blocking.You have now designed a real-time content moderation architecture using tiered classifiers, policy-aware judges, scoped fingerprint caches, bounded micro-batching, policy resolution, and appeal/restricted-review paths.
The next capstone takes the same latency discipline and context hierarchy, then applies serving techniques such as KV-cache reuse and speculative decoding to real-time code completion inside an IDE.
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations.
Inan, H., et al. · 2023 · arXiv preprint
ShieldGemma: Generative AI Content Moderation.
Google DeepMind. · 2024 · Google DeepMind Report
Upgrading the Moderation API with our new multimodal moderation model
OpenAI · 2024
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech.
Hartvigsen, T., et al. · 2022 · ACL 2022
Efficient Memory Management for Large Language Model Serving with PagedAttention.
Kwon, W., et al. · 2023 · SOSP 2023
How the Digital Services Act enhances transparency online.
European Commission · 2026
Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021, updated 06.04.2023.
Ministry of Electronics and Information Technology, Government of India · 2023
CyberTipline.
National Center for Missing & Exploited Children · 2026