MLA-C01 Cheatsheet — SageMaker, MLOps, Endpoint Types, Monitoring & Security (High Yield)

March 28, 2026 9 min read

High-signal MLA-C01 reference: data ingestion/ETL + feature engineering, model selection/training/tuning/evaluation, SageMaker deployment endpoint choices, CI/CD and orchestration patterns, monitoring/drift/cost optimization, and security/governance essentials.

On this page

Keep this page open while drilling questions. MLA‑C01 rewards “production ML realism”: data quality gates, repeatability, safe deployments, drift monitoring, cost controls, and least-privilege security.

Quick facts (MLA-C01)

Item	Value
Questions	65 (multiple-choice + multiple-response)
Time	130 minutes
Passing score	720 (scaled 100–1000)
Cost	150 USD
Domains	D1 28% • D2 26% • D3 22% • D4 24%

Fast strategy (what the exam expects)

If the question says best-fit managed ML, the answer is often SageMaker (Feature Store, Pipelines, Model Registry, managed endpoints).
If the scenario is “data is messy,” think data quality checks, profiling, transformations, and feature consistency (train/serve).
If the scenario is “accuracy dropped in prod,” think drift, monitoring baselines, A/B or shadow, and retraining triggers.
If the scenario is “cost is spiking,” think right-sizing, endpoint type selection, auto scaling, Spot / Savings Plans, and budgets/tags.
If there’s “security/compliance,” include least privilege IAM, encryption, VPC isolation, and audit logging.
Read the last sentence first to capture constraints: latency, cost, ops effort, compliance, auditability.

Domain weights (how to allocate your time)

Domain	Weight	Prep focus
Domain 1: Data Preparation for ML	28%	Ingest/ETL, feature engineering, data quality and bias basics
Domain 2: ML Model Development	26%	Model choice, training/tuning, evaluation, Clarify/Debugger/Registry
Domain 3: Deployment + Orchestration	22%	Endpoint types, scaling, IaC, CI/CD for ML pipelines
Domain 4: Monitoring + Security	24%	Drift/model monitor, infra monitoring + costs, security controls

Final 20-minute recall (exam day)

Cue -> best answer (pattern map)

If the question says…	Usually best answer
Data is messy/inconsistent before training	Data Wrangler/DataBrew + quality checks
Train/serve feature mismatch	SageMaker Feature Store
Need systematic hyperparameter search	SageMaker Automatic Model Tuning
Need fairness/explainability evidence	SageMaker Clarify
Training instability / convergence issues	SageMaker Debugger
Accuracy degraded in production	SageMaker Model Monitor + drift triggers + retraining
Govern model promotion and rollback	SageMaker Model Registry + approval workflow
Constant low-latency traffic	Real-time endpoint
Spiky traffic with low idle tolerance	Serverless endpoint
Long-running or non-interactive inference	Async endpoint or batch transform

Must-memorize MLA defaults

Topic	Fast recall
First failure domain	Data quality and leakage before model changes
Metric selection	Match metric to business cost (precision vs recall trade-off)
Drift controls	Baselines, alerts, and versioned retraining pipeline
Cost controls	Right-size, auto scale, pick correct endpoint mode, use Spot where safe
Security baseline	Least-privilege IAM, KMS/TLS, VPC isolation, CloudTrail

Last-minute traps

Chasing model complexity before fixing data quality.
Choosing real-time endpoints for workloads that are actually batch/async.
Treating accuracy as the only metric while ignoring latency/cost/compliance.
Deploying without monitoring baselines and rollback path.

0) SageMaker service map (high yield)

Capability	What it’s for	MLA‑C01 “why it matters”
SageMaker Data Wrangler	Data prep + feature engineering	Fast, repeatable transforms; reduces time-to-first-model
SageMaker Feature Store	Central feature storage	Avoid train/serve skew; feature reuse and governance
SageMaker Training	Managed training jobs	Repeatable, scalable training on AWS compute
SageMaker AMT	Hyperparameter tuning	Systematic search for better model configs
SageMaker Clarify	Bias + explainability	Responsible ML evidence + model understanding
SageMaker Model Debugger	Training diagnostics	Debug convergence and training instability
SageMaker Model Registry	Versioning + approvals	Auditability, rollback, safe promotion to prod
SageMaker Endpoints	Managed model serving	Real-time/serverless/async inference patterns
SageMaker Model Monitor	Monitoring workflows	Detect drift and quality issues in production
SageMaker Pipelines	ML workflow orchestration	Build-test-train-evaluate-register-deploy automation

1) End-to-end ML on AWS (mental model)

    flowchart LR
	  S["Sources"] --> I["Ingest"]
	  I --> T["Transform + Quality Checks"]
	  T --> F["Feature Engineering + Feature Store"]
	  F --> TR["Train + Tune"]
	  TR --> E["Evaluate + Bias/Explainability"]
	  E --> R["Register + Approve"]
	  R --> D["Deploy Endpoint or Batch"]
	  D --> M["Monitor Drift/Quality/Cost/Security"]
	  M -->|Triggers| RT["Retrain"]
	  RT --> TR

High-yield framing: MLA‑C01 is about the pipeline, not just the model.

2) Domain 1 — Data preparation (28%)

“Which tool should I use?” (ETL and prep picker)

You need…	Typical best-fit	Why
Visual data prep + fast iteration	SageMaker Data Wrangler	Interactive + repeatable workflows
No/low-code transforms and profiling	AWS Glue DataBrew	Good for business-friendly prep
Scalable ETL jobs	AWS Glue / Spark	Production batch ETL at scale
Big Spark workloads (custom)	Amazon EMR	More control over Spark
Simple streaming transforms	AWS Lambda	Event-driven, lightweight
Streaming analytics	Managed Apache Flink	Stateful streaming at scale

Data formats (pickers)

Format	Why it shows up	Typical trade-off
Parquet / ORC	Columnar analytics + efficient reads	Best for large tabular datasets
CSV / JSON	Interop + simplicity	Bigger + slower at scale
Avro	Schema evolution + streaming	Good for pipelines
RecordIO	ML-specific record formats	Useful with some training stacks

Rule: choose formats based on access patterns (scan vs selective reads), schema evolution, and scale.

Data ingestion and storage (high yield)

Amazon S3: default data lake for ML (durable, cheap, scalable).
Amazon EFS / FSx: file-based access patterns; useful when training expects POSIX-like file semantics.
Streaming ingestion: use Kinesis/managed streaming where low-latency data arrival matters.

Common best answers:

Use AWS Glue / Spark on EMR for big ETL jobs.
Use SageMaker Data Wrangler for fast interactive prep and repeatable transformations.
Use SageMaker Feature Store to keep training/inference features consistent.

Feature Store: why it matters

Avoid train/serve skew: the feature used in training is the same feature served to inference.
Support feature reuse across teams and models.
Enable governance: feature definitions and versions.

Data integrity + bias basics (often tested)

Problem	What to do	Tooling you might name
Missing/invalid data	Add data quality checks + fail fast	Glue DataBrew / Glue Data Quality
Class imbalance	Resampling or synthetic data	(Conceptual) + Clarify for analysis
Bias sources	Identify selection/measurement bias	SageMaker Clarify (bias analysis)
Sensitive data	Classify + mask/anonymize + encrypt	KMS + access controls
Compliance constraints	Data residency + least privilege + audit logs	IAM + CloudTrail + region choices

High-yield rule: don’t “fix” model issues before you verify data quality and leakage.

3) Domain 2 — Model development (26%)

Choosing an approach

If you need…	Typical best-fit
A standard AI capability with minimal ML ops	AWS AI services (Translate/Transcribe/Rekognition, etc.)
A custom model with managed training + deployment	Amazon SageMaker
A foundation model / generative capability	Amazon Bedrock (when applicable)

Rule: don’t overbuild. If an AWS managed AI service solves it, it usually wins on time-to-value and ops.

Training and tuning (high yield)

Training loop terms: epoch, step, batch size.
Speedups: early stopping, distributed training.
Generalization controls: regularization (L1/L2, dropout, weight decay) + better data/features.
Hyperparameter tuning: random search vs Bayesian optimization; in SageMaker, use Automatic Model Tuning (AMT).

Metrics picker (what to choose)

Task	Common metrics	What the exam tries to trick you on
Classification	Accuracy, precision, recall, F1, ROC-AUC	Class imbalance makes accuracy misleading
Regression	MAE/RMSE	Outliers and error cost (what matters more?)
Model selection	Metric + cost/latency	“Best” isn’t only accuracy

Overfitting vs underfitting (signals)

Symptom	Likely issue	Typical fix
Train ↑, validation ↓	Overfitting	Regularization, simpler model, more data, better features
Both low	Underfitting	More expressive model, better features, tune hyperparameters

Clarify vs Debugger vs Model Monitor (common confusion)

Tool	What it helps with	When to name it
SageMaker Clarify	Bias + explainability	Fairness questions, “why did it predict X?”
SageMaker Model Debugger	Training diagnostics + convergence	Training instability, loss not decreasing, debugging training
SageMaker Model Monitor	Production monitoring workflows	Drift, data quality degradation, monitoring baselines

Model Registry (repeatability + governance)

Track: model artifacts, metrics, lineage, approvals.
Enables safe promotion/rollback and audit-ready workflows.

4) Domain 3 — Deployment and orchestration (22%)

Endpoint types (must-know picker)

Endpoint type	Best for	Typical constraint
Real-time	Steady, low-latency inference	Cost for always-on capacity
Serverless	Spiky traffic, scale-to-zero	Cold starts + limits
Asynchronous	Long inference time, bursty workloads	Event-style patterns + polling/callback
Batch inference	Scheduled/offline scoring	Not interactive

Scaling metrics (what to pick)

Metric	Good when…	Watch out
Invocations per instance	Request volume drives load	Spiky traffic can cause oscillation
Latency	You have a latency SLO	Noisy metrics require smoothing
CPU/GPU utilization	Compute bound models	Not always correlated to request rate

Multi-model / multi-container (why they exist)

Multi-model: multiple models behind one endpoint to reduce cost.
Multi-container: pre/post-processing plus model serving, or multiple frameworks.

IaC + containers (exam patterns)

IaC: CloudFormation or CDK for reproducible environments.
Containers: build/publish to ECR, deploy via SageMaker, ECS, or EKS.

CI/CD for ML (what’s different)

You version and validate more than code:

Code + data + features + model artifacts + evaluation reports
Promotion gates: accuracy thresholds, bias checks, smoke tests, canary/shadow validation

Typical services: CodePipeline/CodeBuild/CodeDeploy, SageMaker Pipelines, EventBridge triggers.

    flowchart LR
	  G["Git push"] --> CP["CodePipeline"]
	  CP --> CB["CodeBuild: tests + build"]
	  CB --> P["SageMaker Pipeline: process/train/eval"]
	  P --> Gate{"Meets<br/>thresholds?"}
	  Gate -->|yes| MR["Model Registry approve"]
	  Gate -->|no| Stop["Stop + report"]
	  MR --> Dep["Deploy (canary/shadow)"]
	  Dep --> Mon["Monitor + rollback triggers"]

5) Domain 4 — Monitoring, cost, and security (24%)

Monitoring and drift (high yield)

Data drift: input distribution changed.
Concept drift: relationship between input and label changed.
Use baselines + ongoing checks; monitor latency/errors too.

Common services/patterns:

SageMaker Model Monitor for monitoring workflows.
A/B testing or shadow deployments for safe comparison.

Monitoring checklist (what to instrument)

Inference quality: when ground truth is available later, compare predicted vs actual.
Data quality: nulls, ranges, schema changes, category explosion.
Distribution shift: feature histograms/summary stats vs baseline.
Ops signals: p50/p95 latency, error rate, throttles, timeouts.
Safety/security: anomalous traffic spikes, abuse patterns, permission failures.

Infra + cost optimization (high yield)

Theme	What to do
Observability	CloudWatch metrics/logs/alarms; Logs Insights; X-Ray for traces
Rightsizing	Pick instance family/size based on perf; use Inference Recommender + Compute Optimizer
Spend control	Tags + Cost Explorer + Budgets + Trusted Advisor
Purchasing options	Spot / Reserved / Savings Plans where the workload fits

Cost levers (common “best answer” patterns)

Choose the right inference mode first: batch (cheapest) → async → serverless → real-time (most always-on).
Right-size and auto scale; don’t leave endpoints overprovisioned.
Use Spot for fault-tolerant training/batch where interruptions are acceptable.
Use Budgets + tags early (before the bills surprise you).

Security defaults (high yield)

Least privilege IAM for training jobs, pipelines, and endpoints.
Encrypt at rest + in transit (KMS + TLS).
VPC isolation (subnets + security groups) for ML resources when required.
Audit trails (CloudTrail) + controlled access to logs and artifacts.

Common IAM/security “gotchas”

Training role can read S3 but can’t decrypt KMS key (KMS key policy vs IAM policy mismatch).
Endpoint role has broad S3 access (“*”) instead of a tight prefix.
Secrets leak into logs/artifacts (build logs, notebooks, environment variables).
No audit trail for model registry approvals or endpoint updates.

Next steps

Use Resources to stay anchored to the official exam guide and SageMaker docs.
Use the FAQ to confirm expected depth and where the exam is more engineering than data science.
Turn weak deployment, monitoring, and security rows into timed scenario drills.

Quiz

Loading quiz…

Study Plan

FAQ