LearnProduction ML SystemsFeature Engineering for Production ML

⚙️MediumMLOps & Deployment

Feature Engineering for Production ML

Turn training-job events into stable prediction inputs while preventing leakage and training-serving mismatch.

11 min read

Learning path

Step 42 of 158 in the full curriculum

Dataset Pipelines and Data Quality Batch and Streaming Feature Pipelines

Versioned datasets with clean splits still aren't model-ready by themselves. A prediction model can't consume a raw event log directly. To predict whether a training job will miss its SLA, it needs a fixed row of measurements available at the moment the promise is made.

Those measurements are features. A feature such as hours_since_last_heartbeat compresses many scheduler events into one input value. Inventing columns is easy compared with making sure each value means the same thing during training and while serving live requests.

Job J-204 event timeline cut at the 09:00 prediction timestamp: heartbeats at 01:00 and 08:30 are visible, the latest at 08:30 yields a six-slot feature vector with 0.5-hour heartbeat age, while a 12:00 heartbeat, finished timestamp, and post-SLA escalation remain in the future-only region outside model inputs. — At `09:00`, the contract emits the exact six-field row `[a100-pool, 42.0, 0.5, 18.0, 0, 0]`. The noon heartbeat, finished timestamp, and post-SLA escalation remain beyond the cutoff and can't enter the model input.

Start With a Prediction Timestamp

Suppose the product asks at 2026-05-01 09:00: will job J-204 miss its SLA? Job heartbeats after that timestamp aren't available to the prediction service and can't appear in its training row.

Candidate field	Known at prediction time?	Use as feature?
runner pool	yes	yes, categorical
queued minutes	yes	yes, numeric
minutes since most recent heartbeat	yes	yes, numeric
cluster queue backlog	yes	yes, numeric
finished timestamp	no	no, it defines the eventual label
post-SLA escalation	no	no, it leaks the outcome

The label may be computed later as missed_sla = 1. A feature must be computed from history ending at the prediction timestamp. If a training row contains the post-SLA escalation, offline accuracy will reward a model for reading the answer.

Sculley et al. describe production ML systems as networks of data and configuration dependencies where hidden feedback and undeclared consumers create technical debt.^{[1]Reference 1Hidden Technical Debt in Machine Learning Systems.https://research.google/pubs/hidden-technical-debt-in-machine-learning-systems/} Feature definitions are one of those dependencies: when their time boundary is unclear, the model's impressive score doesn't survive deployment.

Diagram showing Job events through prediction time, Feature contract types + missing policy, Training snapshot label joins later, and Online request same computation. — Job events through prediction time, Feature contract types + missing policy, Training snapshot label joins later, and Online request same computation.

Define one row before training

Build a small contract for job-SLA prediction:

Feature	Type	Missing rule	Why it can help
`queued_minutes`	numeric	reject if absent	longer queue waits expose more scheduling risk
`hours_since_last_heartbeat`	numeric	cap at 180	stale heartbeats signal job risk
`queue_backlog`	numeric	use measured queue only	queue pressure affects scheduling and preemption
`runner_pool`	categorical	reject if absent; map unseen to `other`	runner pools have different networks
`priority_job`	boolean	default `false` only when source guarantees it	priority changes SLA and retry policy

A missing value is a product decision. Filling missing queue_backlog with zero says "unknown congestion means no congestion," which is rarely defensible. Store an additional queue_backlog_missing indicator or stop scoring until the feed recovers.

Categorical values need a policy too. If a new runner pool appears after training, the online encoder can't invent a new model column. An other bucket provides stable behavior while a new model candidate is evaluated. Missing runner pool data is different: it may signal a broken source feed, so don't silently fold it into other.

Prove the Time Boundary

A feature job should make the prediction timestamp explicit, then discard later events before aggregation. This first lab keeps only heartbeats whose event timestamps aren't later than the prediction. It deliberately assumes heartbeats arrive immediately. The next lesson adds a separate ingestion timestamp so a replay can't use an earlier heartbeat that arrived late.

keep-visible-events.py

from datetime import datetime

prediction_time = datetime.fromisoformat("2026-05-01T09:00:00")
heartbeats = [
    datetime.fromisoformat("2026-05-01T01:00:00"),
    datetime.fromisoformat("2026-05-01T08:30:00"),
    datetime.fromisoformat("2026-05-01T12:00:00"),
]

visible_heartbeats = [heartbeat for heartbeat in heartbeats if heartbeat <= prediction_time]
print("visible heartbeats:", len(visible_heartbeats))
print("latest visible:", max(visible_heartbeats).isoformat())

Output

visible heartbeats: 2
latest visible: 2026-05-01T08:30:00

Filtering source history is necessary but not sufficient. A transformation can still copy a future-only field into its output by mistake. Check the actual feature keys before training or serving.

reject-leaked-output.py

allowed_feature_keys = {
    "runner_pool",
    "queued_minutes",
    "hours_since_last_heartbeat",
    "queue_backlog",
    "queue_backlog_missing",
    "priority_job",
}
candidate_features = {
    "runner_pool": "a100-pool",
    "queued_minutes": 42.0,
    "finished_at": "2026-05-03T14:00:00",
}

blocked_fields = sorted(candidate_features.keys() - allowed_feature_keys)
print("blocked fields:", blocked_fields)
print("release allowed:", not blocked_fields)

Output

blocked fields: ['finished_at']
release allowed: False

An allowlist gate is stronger than a comment or a static list of future fields: it fails when an unexpected key enters the feature row. It can't prove that an allowed field contains only information known at prediction time. Source-history filtering and offline/online parity checks still have to enforce that semantic contract.

Encode ambiguity deliberately

For this model, keep a missing queue backlog distinct from a measured queue backlog of zero. The imputed value keeps the vector numeric; the indicator preserves the information that measurement failed. Another product might abstain instead.

encode-missing-queue-backlog.py

def encode_queue_backlog(value):
    return {
        "queue_backlog": 0.0 if value is None else float(value),
        "queue_backlog_missing": int(value is None),
    }

print("measured zero:", encode_queue_backlog(0))
print("missing:", encode_queue_backlog(None))

Output

measured zero: {'queue_backlog': 0.0, 'queue_backlog_missing': 0}
missing: {'queue_backlog': 0.0, 'queue_backlog_missing': 1}

Reserve a fitted other category too. Well-formed new runner pools then have a stable representation until retraining evaluates them explicitly. Reject an absent runner_pool instead of hiding a feed failure inside the fallback bucket.

encode-unseen-runner-pool.py

known_runner_pools = {"a100-pool", "h100-pool"}

def encode_runner_pool(value):
    if not isinstance(value, str) or not value.strip():
        raise ValueError("runner_pool is missing")
    normalized = value.strip().casefold()
    return normalized if normalized in known_runner_pools else "other"

for runner_pool in ["a100-pool", "l4-pool", None]:
    try:
        print(f"{runner_pool!r} -> {encode_runner_pool(runner_pool)}")
    except ValueError as error:
        print(f"{runner_pool!r} -> blocked: {error}")

Output

'a100-pool' -> a100-pool
'l4-pool' -> other
None -> blocked: runner_pool is missing

Caps are contract choices, not universal constants. This example limits the influence of very old heartbeats. The fitted cap must travel with the model.

cap-heartbeat-age.py

max_heartbeat_age_hours = 180.0

def cap_heartbeat_age(hours):
    return min(float(hours), max_heartbeat_age_hours)

for hours in [8, 180, 360]:
    print(f"{hours} -> {cap_heartbeat_age(hours)}")

Output

-> 8.0
-> 180.0
-> 180.0

A negative heartbeat age indicates broken event timing or a missing cutoff filter. Reject it instead of passing a surprising number into the model.

reject-future-heartbeat.py

from datetime import datetime

def hours_since_last_heartbeat(last_heartbeat_at, prediction_time):
    heartbeat = datetime.fromisoformat(last_heartbeat_at)
    if heartbeat > prediction_time:
        raise ValueError("last_heartbeat_at is after prediction_time")
    return (prediction_time - heartbeat).total_seconds() / 3600

prediction_time = datetime.fromisoformat("2026-05-01T09:00:00")
try:
    hours_since_last_heartbeat("2026-05-01T12:00:00", prediction_time)
except ValueError as error:
    print("blocked:", error)

Output

blocked: last_heartbeat_at is after prediction_time

Build one trustworthy row

Now assemble those policies. The source record may contain post-job-SLA fields because labels need them later. The returned model row may contain only allowlisted feature keys.

build-feature-row.py

from datetime import datetime
from math import isfinite

allowed_feature_keys = {
    "runner_pool",
    "queued_minutes",
    "hours_since_last_heartbeat",
    "queue_backlog",
    "queue_backlog_missing",
    "priority_job",
}
known_runner_pools = {"a100-pool", "h100-pool"}
future_only_keys = {"finished_at", "post_sla_escalation"}

def make_features(row, prediction_time):
    required_keys = {"queued_minutes", "last_heartbeat_at", "priority_job"}
    missing_keys = sorted(required_keys - row.keys())
    if missing_keys:
        raise ValueError(f"missing required fields: {missing_keys}")

    queued_minutes = float(row["queued_minutes"])
    if not isfinite(queued_minutes) or queued_minutes <= 0:
        raise ValueError("queued_minutes must be finite and positive")

    runner_pool = row.get("runner_pool")
    if not isinstance(runner_pool, str) or not runner_pool.strip():
        raise ValueError("runner_pool is missing")
    runner_pool = runner_pool.strip().casefold()

    if not isinstance(row["priority_job"], bool):
        raise ValueError("priority_job must be boolean")

    last_heartbeat = datetime.fromisoformat(row["last_heartbeat_at"])
    if last_heartbeat > prediction_time:
        raise ValueError("last_heartbeat_at is after prediction_time")
    heartbeat_age = min((prediction_time - last_heartbeat).total_seconds() / 3600, 180.0)

    queue_backlog = row.get("queue_backlog")
    if queue_backlog is not None:
        queue_backlog = float(queue_backlog)
        if not isfinite(queue_backlog) or queue_backlog < 0:
            raise ValueError("queue_backlog must be finite and nonnegative")
    features = {
        "runner_pool": runner_pool if runner_pool in known_runner_pools else "other",
        "queued_minutes": queued_minutes,
        "hours_since_last_heartbeat": heartbeat_age,
        "queue_backlog": 0.0 if queue_backlog is None else queue_backlog,
        "queue_backlog_missing": int(queue_backlog is None),
        "priority_job": int(row["priority_job"]),
    }

    blocked_fields = sorted(features.keys() - allowed_feature_keys)
    if blocked_fields:
        raise ValueError(f"unexpected feature fields: {blocked_fields}")
    return features

prediction_time = datetime.fromisoformat("2026-05-01T09:00:00")
job = {
    "job_id": "J-204",
    "runner_pool": "a100-pool",
    "queued_minutes": 42,
    "queue_backlog": 18,
    "priority_job": False,
    "last_heartbeat_at": "2026-05-01T08:30:00",
    "finished_at": "2026-05-03T14:00:00",
    "post_sla_escalation": True,
}

features = make_features(job, prediction_time)
print(features)
print("future fields leaked:", sorted(features.keys() & future_only_keys))

Output

{'runner_pool': 'a100-pool', 'queued_minutes': 42.0, 'hours_since_last_heartbeat': 0.5, 'queue_backlog': 18.0, 'queue_backlog_missing': 0, 'priority_job': 0}
future fields leaked: []

The contract now exercises its promises: required fields, finite positive queued minutes, missing-runner-pool rejection, boolean priority flag, future-heartbeat rejection, heartbeat-age cap, finite nonnegative queue backlog, missing-queue backlog indicator, unseen-runner-pool bucket, and output allowlist.

A model still needs a fixed vector order and fitted categorical mapping. Version both beside the model artifact.

vectorize-features.py

runner_pool_code = {"a100-pool": 0, "h100-pool": 1, "other": 2}
feature_order = (
    "runner_pool_code",
    "queued_minutes",
    "hours_since_last_heartbeat",
    "queue_backlog",
    "queue_backlog_missing",
    "priority_job",
)
features = {
    "runner_pool": "other",
    "queued_minutes": 42.0,
    "hours_since_last_heartbeat": 8.0,
    "queue_backlog": 0.0,
    "queue_backlog_missing": 1,
    "priority_job": 0,
}

vector = [
    runner_pool_code[features["runner_pool"]],
    features["queued_minutes"],
    features["hours_since_last_heartbeat"],
    features["queue_backlog"],
    features["queue_backlog_missing"],
    features["priority_job"],
]
print("order:", feature_order)
print("vector:", vector)

Output

order: ('runner_pool_code', 'queued_minutes', 'hours_since_last_heartbeat', 'queue_backlog', 'queue_backlog_missing', 'priority_job')
vector: [2, 42.0, 8.0, 0.0, 1, 0]

Test parity before release

An offline notebook might compute queue backlog by scanning a completed daily table. The service might read an hourly cache. Even when both columns are named queue_backlog, differences in freshness or aggregation can change predictions. This failure is training-serving skew.

Feast documents point-in-time joins that reproduce feature state at each historical entity timestamp, scanning backward only within the configured TTL.^{[2]Reference 2Point-in-time Joinshttps://docs.feast.dev/getting-started/concepts/point-in-time-joins} It also documents online stores for low-latency serving, where only the latest feature values for each entity key are stored.^{[3]Reference 3Online Storehttps://docs.feast.dev/getting-started/components/online-store} The tool isn't the lesson: the contract is. A model release must identify the feature definition and snapshot that produced its score.

Sample offline and online paths on the same entities before promoting a model. Matching dictionaries provide a small, readable parity receipt.

check-parity.py

def mismatches(offline, online):
    keys = offline.keys() | online.keys()
    return {key: (offline.get(key), online.get(key)) for key in keys if offline.get(key) != online.get(key)}

offline = {"runner_pool": "a100-pool", "queue_backlog": 18.0, "priority_job": 0}
online = {"runner_pool": "a100-pool", "queue_backlog": 18.0, "priority_job": 0}
differences = mismatches(offline, online)
print("mismatches:", differences)
print("release allowed:", not differences)

Output

mismatches: {}
release allowed: True

The same check catches a stale cache or divergent transformation before users see different scores.

detect-parity-failure.py

def mismatches(offline, online):
    keys = offline.keys() | online.keys()
    return {key: (offline.get(key), online.get(key)) for key in keys if offline.get(key) != online.get(key)}

offline = {"runner_pool": "a100-pool", "queue_backlog": 18.0, "priority_job": 0}
online = {"runner_pool": "a100-pool", "queue_backlog": 0.0, "priority_job": 0}
differences = mismatches(offline, online)
print("mismatches:", differences)
print("release allowed:", not differences)

Output

mismatches: {'queue_backlog': (18.0, 0.0)}
release allowed: False

Monitor the contract before monitoring accuracy:

Production check	Failure it catches	Action
null rate by feature	upstream feed disappeared	fail closed or fallback
unseen-category rate	runner-pool catalog changed	collect labels and retrain
freshness lag	online values are stale	pause promotions
offline/online parity sample	transformations disagree	repair feature path

Freshness failures need an explicit product policy. This policy tries normal scoring, fallback, then abstention. Thresholds depend on product tolerance and must be versioned with the serving path.

apply-freshness-policy.py

def scoring_policy(freshness_lag_minutes):
    if freshness_lag_minutes <= 15:
        return "normal scoring"
    if freshness_lag_minutes <= 45:
        return "fallback model"
    return "abstain and alert"

for lag in [8, 30, 90]:
    print(f"{lag} minutes -> {scoring_policy(lag)}")

Output

minutes -> normal scoring
minutes -> fallback model
minutes -> abstain and alert

Practice: break the contract

Run build-feature-row.py, then make one change at a time:

Add "finished_at": row["finished_at"] to features. Confirm the output allowlist blocks release.
Change last_heartbeat_at to 2026-05-01T12:00:00. Confirm future-event rejection.
Change runner_pool to l4-pool. Confirm the returned category is other. Then remove runner_pool and confirm the contract blocks the row instead of hiding missing data inside that bucket.
Remove queue_backlog. Confirm value becomes 0.0 while queue_backlog_missing becomes 1.
Give offline and online paths different heartbeat-age caps. Confirm a parity sample exposes the disagreement.

Explain the row without looking back

Evaluation rubric

Evidence	What a strong answer shows
prediction contract	identifies prediction time, allowed fields, labels, and missing-value meaning
leakage control	proves future-only events and post-cutoff heartbeats can't enter feature construction
parity plan	versions transformations and monitors online/offline disagreement

Common pitfalls

Symptom	Cause	Fix
Offline score is excellent, live accuracy collapses	future event entered features	enforce prediction timestamps and leakage gates
New runner_pool causes errors or silent zeros	categorical mapping wasn't versioned	reserve `other` and monitor its rate
Predictions shift after a data job rewrite	feature meaning changed without model release	version transformation and test parity
Feed outage looks like healthy operations	missing queue backlog was encoded as measured zero	preserve missing indicator or abstain

Next Step

Continue to Batch and Streaming Feature Pipelines

You can now define one trustworthy prediction row. Next you'll construct those rows from event history without joining future information or serving stale features.

PreviousDataset Pipelines and Data Quality

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

Hidden Technical Debt in Machine Learning Systems.

Sculley et al. · 2015

Point-in-time Joins

Feast Contributors · 2026 · Official documentation

Online Store

Feast Contributors · 2026 · Official documentation

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

Back to Topics

LearnProduction ML SystemsFeature Engineering for Production ML

⚙️MediumMLOps & Deployment

Feature Engineering for Production ML

Turn training-job events into stable prediction inputs while preventing leakage and training-serving mismatch.

11 min read

Learning path

Step 42 of 158 in the full curriculum

Dataset Pipelines and Data Quality Batch and Streaming Feature Pipelines

Start With a Prediction Timestamp

Suppose the product asks at 2026-05-01 09:00: will job J-204 miss its SLA? Job heartbeats after that timestamp aren't available to the prediction service and can't appear in its training row.

Candidate field	Known at prediction time?	Use as feature?
runner pool	yes	yes, categorical
queued minutes	yes	yes, numeric
minutes since most recent heartbeat	yes	yes, numeric
cluster queue backlog	yes	yes, numeric
finished timestamp	no	no, it defines the eventual label
post-SLA escalation	no	no, it leaks the outcome

Define one row before training

Build a small contract for job-SLA prediction:

Feature	Type	Missing rule	Why it can help
`queued_minutes`	numeric	reject if absent	longer queue waits expose more scheduling risk
`hours_since_last_heartbeat`	numeric	cap at 180	stale heartbeats signal job risk
`queue_backlog`	numeric	use measured queue only	queue pressure affects scheduling and preemption
`runner_pool`	categorical	reject if absent; map unseen to `other`	runner pools have different networks
`priority_job`	boolean	default `false` only when source guarantees it	priority changes SLA and retry policy

Prove the Time Boundary

keep-visible-events.py

from datetime import datetime

prediction_time = datetime.fromisoformat("2026-05-01T09:00:00")
heartbeats = [
    datetime.fromisoformat("2026-05-01T01:00:00"),
    datetime.fromisoformat("2026-05-01T08:30:00"),
    datetime.fromisoformat("2026-05-01T12:00:00"),
]

visible_heartbeats = [heartbeat for heartbeat in heartbeats if heartbeat <= prediction_time]
print("visible heartbeats:", len(visible_heartbeats))
print("latest visible:", max(visible_heartbeats).isoformat())

Output

visible heartbeats: 2
latest visible: 2026-05-01T08:30:00

Filtering source history is necessary but not sufficient. A transformation can still copy a future-only field into its output by mistake. Check the actual feature keys before training or serving.

reject-leaked-output.py

allowed_feature_keys = {
    "runner_pool",
    "queued_minutes",
    "hours_since_last_heartbeat",
    "queue_backlog",
    "queue_backlog_missing",
    "priority_job",
}
candidate_features = {
    "runner_pool": "a100-pool",
    "queued_minutes": 42.0,
    "finished_at": "2026-05-03T14:00:00",
}

blocked_fields = sorted(candidate_features.keys() - allowed_feature_keys)
print("blocked fields:", blocked_fields)
print("release allowed:", not blocked_fields)

Output

blocked fields: ['finished_at']
release allowed: False

Encode ambiguity deliberately

encode-missing-queue-backlog.py

def encode_queue_backlog(value):
    return {
        "queue_backlog": 0.0 if value is None else float(value),
        "queue_backlog_missing": int(value is None),
    }

print("measured zero:", encode_queue_backlog(0))
print("missing:", encode_queue_backlog(None))

Output

measured zero: {'queue_backlog': 0.0, 'queue_backlog_missing': 0}
missing: {'queue_backlog': 0.0, 'queue_backlog_missing': 1}

encode-unseen-runner-pool.py

known_runner_pools = {"a100-pool", "h100-pool"}

def encode_runner_pool(value):
    if not isinstance(value, str) or not value.strip():
        raise ValueError("runner_pool is missing")
    normalized = value.strip().casefold()
    return normalized if normalized in known_runner_pools else "other"

for runner_pool in ["a100-pool", "l4-pool", None]:
    try:
        print(f"{runner_pool!r} -> {encode_runner_pool(runner_pool)}")
    except ValueError as error:
        print(f"{runner_pool!r} -> blocked: {error}")

Output

'a100-pool' -> a100-pool
'l4-pool' -> other
None -> blocked: runner_pool is missing

Caps are contract choices, not universal constants. This example limits the influence of very old heartbeats. The fitted cap must travel with the model.

cap-heartbeat-age.py

max_heartbeat_age_hours = 180.0

def cap_heartbeat_age(hours):
    return min(float(hours), max_heartbeat_age_hours)

for hours in [8, 180, 360]:
    print(f"{hours} -> {cap_heartbeat_age(hours)}")

Output

-> 8.0
-> 180.0
-> 180.0

A negative heartbeat age indicates broken event timing or a missing cutoff filter. Reject it instead of passing a surprising number into the model.

reject-future-heartbeat.py

from datetime import datetime

def hours_since_last_heartbeat(last_heartbeat_at, prediction_time):
    heartbeat = datetime.fromisoformat(last_heartbeat_at)
    if heartbeat > prediction_time:
        raise ValueError("last_heartbeat_at is after prediction_time")
    return (prediction_time - heartbeat).total_seconds() / 3600

prediction_time = datetime.fromisoformat("2026-05-01T09:00:00")
try:
    hours_since_last_heartbeat("2026-05-01T12:00:00", prediction_time)
except ValueError as error:
    print("blocked:", error)

Output

blocked: last_heartbeat_at is after prediction_time

Build one trustworthy row

Now assemble those policies. The source record may contain post-job-SLA fields because labels need them later. The returned model row may contain only allowlisted feature keys.

build-feature-row.py

from datetime import datetime
from math import isfinite

allowed_feature_keys = {
    "runner_pool",
    "queued_minutes",
    "hours_since_last_heartbeat",
    "queue_backlog",
    "queue_backlog_missing",
    "priority_job",
}
known_runner_pools = {"a100-pool", "h100-pool"}
future_only_keys = {"finished_at", "post_sla_escalation"}

def make_features(row, prediction_time):
    required_keys = {"queued_minutes", "last_heartbeat_at", "priority_job"}
    missing_keys = sorted(required_keys - row.keys())
    if missing_keys:
        raise ValueError(f"missing required fields: {missing_keys}")

    queued_minutes = float(row["queued_minutes"])
    if not isfinite(queued_minutes) or queued_minutes <= 0:
        raise ValueError("queued_minutes must be finite and positive")

    runner_pool = row.get("runner_pool")
    if not isinstance(runner_pool, str) or not runner_pool.strip():
        raise ValueError("runner_pool is missing")
    runner_pool = runner_pool.strip().casefold()

    if not isinstance(row["priority_job"], bool):
        raise ValueError("priority_job must be boolean")

    last_heartbeat = datetime.fromisoformat(row["last_heartbeat_at"])
    if last_heartbeat > prediction_time:
        raise ValueError("last_heartbeat_at is after prediction_time")
    heartbeat_age = min((prediction_time - last_heartbeat).total_seconds() / 3600, 180.0)

    queue_backlog = row.get("queue_backlog")
    if queue_backlog is not None:
        queue_backlog = float(queue_backlog)
        if not isfinite(queue_backlog) or queue_backlog < 0:
            raise ValueError("queue_backlog must be finite and nonnegative")
    features = {
        "runner_pool": runner_pool if runner_pool in known_runner_pools else "other",
        "queued_minutes": queued_minutes,
        "hours_since_last_heartbeat": heartbeat_age,
        "queue_backlog": 0.0 if queue_backlog is None else queue_backlog,
        "queue_backlog_missing": int(queue_backlog is None),
        "priority_job": int(row["priority_job"]),
    }

    blocked_fields = sorted(features.keys() - allowed_feature_keys)
    if blocked_fields:
        raise ValueError(f"unexpected feature fields: {blocked_fields}")
    return features

prediction_time = datetime.fromisoformat("2026-05-01T09:00:00")
job = {
    "job_id": "J-204",
    "runner_pool": "a100-pool",
    "queued_minutes": 42,
    "queue_backlog": 18,
    "priority_job": False,
    "last_heartbeat_at": "2026-05-01T08:30:00",
    "finished_at": "2026-05-03T14:00:00",
    "post_sla_escalation": True,
}

features = make_features(job, prediction_time)
print(features)
print("future fields leaked:", sorted(features.keys() & future_only_keys))

Output

{'runner_pool': 'a100-pool', 'queued_minutes': 42.0, 'hours_since_last_heartbeat': 0.5, 'queue_backlog': 18.0, 'queue_backlog_missing': 0, 'priority_job': 0}
future fields leaked: []

A model still needs a fixed vector order and fitted categorical mapping. Version both beside the model artifact.

vectorize-features.py

runner_pool_code = {"a100-pool": 0, "h100-pool": 1, "other": 2}
feature_order = (
    "runner_pool_code",
    "queued_minutes",
    "hours_since_last_heartbeat",
    "queue_backlog",
    "queue_backlog_missing",
    "priority_job",
)
features = {
    "runner_pool": "other",
    "queued_minutes": 42.0,
    "hours_since_last_heartbeat": 8.0,
    "queue_backlog": 0.0,
    "queue_backlog_missing": 1,
    "priority_job": 0,
}

vector = [
    runner_pool_code[features["runner_pool"]],
    features["queued_minutes"],
    features["hours_since_last_heartbeat"],
    features["queue_backlog"],
    features["queue_backlog_missing"],
    features["priority_job"],
]
print("order:", feature_order)
print("vector:", vector)

Output

order: ('runner_pool_code', 'queued_minutes', 'hours_since_last_heartbeat', 'queue_backlog', 'queue_backlog_missing', 'priority_job')
vector: [2, 42.0, 8.0, 0.0, 1, 0]

Test parity before release

Sample offline and online paths on the same entities before promoting a model. Matching dictionaries provide a small, readable parity receipt.

check-parity.py

def mismatches(offline, online):
    keys = offline.keys() | online.keys()
    return {key: (offline.get(key), online.get(key)) for key in keys if offline.get(key) != online.get(key)}

offline = {"runner_pool": "a100-pool", "queue_backlog": 18.0, "priority_job": 0}
online = {"runner_pool": "a100-pool", "queue_backlog": 18.0, "priority_job": 0}
differences = mismatches(offline, online)
print("mismatches:", differences)
print("release allowed:", not differences)

Output

mismatches: {}
release allowed: True

The same check catches a stale cache or divergent transformation before users see different scores.

detect-parity-failure.py

def mismatches(offline, online):
    keys = offline.keys() | online.keys()
    return {key: (offline.get(key), online.get(key)) for key in keys if offline.get(key) != online.get(key)}

offline = {"runner_pool": "a100-pool", "queue_backlog": 18.0, "priority_job": 0}
online = {"runner_pool": "a100-pool", "queue_backlog": 0.0, "priority_job": 0}
differences = mismatches(offline, online)
print("mismatches:", differences)
print("release allowed:", not differences)

Output

mismatches: {'queue_backlog': (18.0, 0.0)}
release allowed: False

Monitor the contract before monitoring accuracy:

Production check	Failure it catches	Action
null rate by feature	upstream feed disappeared	fail closed or fallback
unseen-category rate	runner-pool catalog changed	collect labels and retrain
freshness lag	online values are stale	pause promotions
offline/online parity sample	transformations disagree	repair feature path

Freshness failures need an explicit product policy. This policy tries normal scoring, fallback, then abstention. Thresholds depend on product tolerance and must be versioned with the serving path.

apply-freshness-policy.py

def scoring_policy(freshness_lag_minutes):
    if freshness_lag_minutes <= 15:
        return "normal scoring"
    if freshness_lag_minutes <= 45:
        return "fallback model"
    return "abstain and alert"

for lag in [8, 30, 90]:
    print(f"{lag} minutes -> {scoring_policy(lag)}")

Output

minutes -> normal scoring
minutes -> fallback model
minutes -> abstain and alert

Practice: break the contract

Run build-feature-row.py, then make one change at a time:

Add "finished_at": row["finished_at"] to features. Confirm the output allowlist blocks release.
Change last_heartbeat_at to 2026-05-01T12:00:00. Confirm future-event rejection.
Change runner_pool to l4-pool. Confirm the returned category is other. Then remove runner_pool and confirm the contract blocks the row instead of hiding missing data inside that bucket.
Remove queue_backlog. Confirm value becomes 0.0 while queue_backlog_missing becomes 1.
Give offline and online paths different heartbeat-age caps. Confirm a parity sample exposes the disagreement.

Explain the row without looking back

Evaluation rubric

Evidence	What a strong answer shows
prediction contract	identifies prediction time, allowed fields, labels, and missing-value meaning
leakage control	proves future-only events and post-cutoff heartbeats can't enter feature construction
parity plan	versions transformations and monitors online/offline disagreement

Common pitfalls

Symptom	Cause	Fix
Offline score is excellent, live accuracy collapses	future event entered features	enforce prediction timestamps and leakage gates
New runner_pool causes errors or silent zeros	categorical mapping wasn't versioned	reserve `other` and monitor its rate
Predictions shift after a data job rewrite	feature meaning changed without model release	version transformation and test parity
Feed outage looks like healthy operations	missing queue backlog was encoded as measured zero	preserve missing indicator or abstain

Next Step

Continue to Batch and Streaming Feature Pipelines

You can now define one trustworthy prediction row. Next you'll construct those rows from event history without joining future information or serving stale features.

PreviousDataset Pipelines and Data Quality

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

Hidden Technical Debt in Machine Learning Systems.

Sculley et al. · 2015

Point-in-time Joins

Feast Contributors · 2026 · Official documentation

Online Store

Feast Contributors · 2026 · Official documentation

Discussion

Questions and insights from fellow learners.

Discussion loads when you reach this section.

Feature Engineering for Production ML

Start With a Prediction Timestamp

Define one row before training

Prove the Time Boundary

Encode ambiguity deliberately

Build one trustworthy row

Test parity before release

Practice: break the contract

Explain the row without looking back

Evaluation rubric

Common pitfalls

Mastery Check

Discussion

Feature Engineering for Production ML

Start With a Prediction Timestamp

Define one row before training

Prove the Time Boundary

Encode ambiguity deliberately

Build one trustworthy row

Test parity before release

Practice: break the contract

Explain the row without looking back

Evaluation rubric

Common pitfalls

Mastery Check

Discussion

Feature Engineering for Production ML

Start With a Prediction Timestamp

Define one row before training

Prove the Time Boundary

Encode ambiguity deliberately

Build one trustworthy row

Test parity before release

Practice: break the contract

Explain the row without looking back

Why is finished_at allowed when constructing labels but forbidden when constructing features for an earlier prediction?

Why isn't replacing every missing queue backlog value with zero a neutral choice?

What must travel with a production model besides its fitted weights?

Evaluation rubric

Common pitfalls

Mastery Check

Discussion

Feature Engineering for Production ML

Start With a Prediction Timestamp

Define one row before training

Prove the Time Boundary

Encode ambiguity deliberately

Build one trustworthy row

Test parity before release

Practice: break the contract

Explain the row without looking back

Why is finished_at allowed when constructing labels but forbidden when constructing features for an earlier prediction?

Why isn't replacing every missing queue backlog value with zero a neutral choice?

What must travel with a production model besides its fitted weights?

Evaluation rubric

Common pitfalls

Mastery Check

Discussion