Keep this page open while drilling questions. MLA‑C01 rewards “production ML realism”: data quality gates, repeatability, safe deployments, drift monitoring, cost controls, and least-privilege security.
Quick facts (MLA-C01) Item Value Questions 65 (multiple-choice + multiple-response) Time 130 minutes Passing score 720 (scaled 100–1000) Cost 150 USD Domains D1 28% • D2 26% • D3 22% • D4 24%
Fast strategy (what the exam expects) If the question says best-fit managed ML , the answer is often SageMaker (Feature Store, Pipelines, Model Registry, managed endpoints). If the scenario is “data is messy,” think data quality checks , profiling , transformations , and feature consistency (train/serve). If the scenario is “accuracy dropped in prod,” think drift , monitoring baselines , A/B or shadow , and retraining triggers . If the scenario is “cost is spiking,” think right-sizing , endpoint type selection , auto scaling , Spot / Savings Plans , and budgets/tags . If there’s “security/compliance,” include least privilege IAM , encryption , VPC isolation , and audit logging . Read the last sentence first to capture constraints: latency , cost , ops effort , compliance , auditability . Domain weights (how to allocate your time) Domain Weight Prep focus Domain 1: Data Preparation for ML 28% Ingest/ETL, feature engineering, data quality and bias basics Domain 2: ML Model Development 26% Model choice, training/tuning, evaluation, Clarify/Debugger/Registry Domain 3: Deployment + Orchestration 22% Endpoint types, scaling, IaC, CI/CD for ML pipelines Domain 4: Monitoring + Security 24% Drift/model monitor, infra monitoring + costs, security controls
Final 20-minute recall (exam day) Cue -> best answer (pattern map) If the question says… Usually best answer Data is messy/inconsistent before training Data Wrangler/DataBrew + quality checks Train/serve feature mismatch SageMaker Feature Store Need systematic hyperparameter search SageMaker Automatic Model Tuning Need fairness/explainability evidence SageMaker Clarify Training instability / convergence issues SageMaker Debugger Accuracy degraded in production SageMaker Model Monitor + drift triggers + retraining Govern model promotion and rollback SageMaker Model Registry + approval workflow Constant low-latency traffic Real-time endpoint Spiky traffic with low idle tolerance Serverless endpoint Long-running or non-interactive inference Async endpoint or batch transform
Must-memorize MLA defaults Topic Fast recall First failure domain Data quality and leakage before model changes Metric selection Match metric to business cost (precision vs recall trade-off) Drift controls Baselines, alerts, and versioned retraining pipeline Cost controls Right-size, auto scale, pick correct endpoint mode, use Spot where safe Security baseline Least-privilege IAM, KMS/TLS, VPC isolation, CloudTrail
Last-minute traps Chasing model complexity before fixing data quality. Choosing real-time endpoints for workloads that are actually batch/async. Treating accuracy as the only metric while ignoring latency/cost/compliance. Deploying without monitoring baselines and rollback path. 0) SageMaker service map (high yield) Capability What it’s for MLA‑C01 “why it matters” SageMaker Data Wrangler Data prep + feature engineering Fast, repeatable transforms; reduces time-to-first-model SageMaker Feature Store Central feature storage Avoid train/serve skew; feature reuse and governance SageMaker Training Managed training jobs Repeatable, scalable training on AWS compute SageMaker AMT Hyperparameter tuning Systematic search for better model configs SageMaker Clarify Bias + explainability Responsible ML evidence + model understanding SageMaker Model Debugger Training diagnostics Debug convergence and training instability SageMaker Model Registry Versioning + approvals Auditability, rollback, safe promotion to prod SageMaker Endpoints Managed model serving Real-time/serverless/async inference patterns SageMaker Model Monitor Monitoring workflows Detect drift and quality issues in production SageMaker Pipelines ML workflow orchestration Build-test-train-evaluate-register-deploy automation
1) End-to-end ML on AWS (mental model)
flowchart LR
S["Sources"] --> I["Ingest"]
I --> T["Transform + Quality Checks"]
T --> F["Feature Engineering + Feature Store"]
F --> TR["Train + Tune"]
TR --> E["Evaluate + Bias/Explainability"]
E --> R["Register + Approve"]
R --> D["Deploy Endpoint or Batch"]
D --> M["Monitor Drift/Quality/Cost/Security"]
M -->|Triggers| RT["Retrain"]
RT --> TR
High-yield framing: MLA‑C01 is about the pipeline, not just the model.
2) Domain 1 — Data preparation (28%) You need… Typical best-fit Why Visual data prep + fast iteration SageMaker Data Wrangler Interactive + repeatable workflows No/low-code transforms and profiling AWS Glue DataBrew Good for business-friendly prep Scalable ETL jobs AWS Glue / Spark Production batch ETL at scale Big Spark workloads (custom) Amazon EMR More control over Spark Simple streaming transforms AWS Lambda Event-driven, lightweight Streaming analytics Managed Apache Flink Stateful streaming at scale
Format Why it shows up Typical trade-off Parquet / ORC Columnar analytics + efficient reads Best for large tabular datasets CSV / JSON Interop + simplicity Bigger + slower at scale Avro Schema evolution + streaming Good for pipelines RecordIO ML-specific record formats Useful with some training stacks
Rule: choose formats based on access patterns (scan vs selective reads), schema evolution , and scale .
Data ingestion and storage (high yield) Amazon S3: default data lake for ML (durable, cheap, scalable).Amazon EFS / FSx: file-based access patterns; useful when training expects POSIX-like file semantics.Streaming ingestion: use Kinesis/managed streaming where low-latency data arrival matters.Common best answers:
Use AWS Glue / Spark on EMR for big ETL jobs. Use SageMaker Data Wrangler for fast interactive prep and repeatable transformations. Use SageMaker Feature Store to keep training/inference features consistent. Feature Store: why it matters Avoid train/serve skew : the feature used in training is the same feature served to inference. Support feature reuse across teams and models. Enable governance: feature definitions and versions. Data integrity + bias basics (often tested) Problem What to do Tooling you might name Missing/invalid data Add data quality checks + fail fast Glue DataBrew / Glue Data Quality Class imbalance Resampling or synthetic data (Conceptual) + Clarify for analysis Bias sources Identify selection/measurement bias SageMaker Clarify (bias analysis) Sensitive data Classify + mask/anonymize + encrypt KMS + access controls Compliance constraints Data residency + least privilege + audit logs IAM + CloudTrail + region choices
High-yield rule: don’t “fix” model issues before you verify data quality and leakage .
3) Domain 2 — Model development (26%) Choosing an approach If you need… Typical best-fit A standard AI capability with minimal ML ops AWS AI services (Translate/Transcribe/Rekognition, etc.) A custom model with managed training + deployment Amazon SageMaker A foundation model / generative capability Amazon Bedrock (when applicable)
Rule: don’t overbuild. If an AWS managed AI service solves it, it usually wins on time-to-value and ops .
Training and tuning (high yield) Training loop terms: epoch , step , batch size . Speedups: early stopping , distributed training . Generalization controls: regularization (L1/L2, dropout, weight decay) + better data/features. Hyperparameter tuning: random search vs Bayesian optimization ; in SageMaker, use Automatic Model Tuning (AMT) . Metrics picker (what to choose) Task Common metrics What the exam tries to trick you on Classification Accuracy, precision, recall, F1, ROC-AUC Class imbalance makes accuracy misleading Regression MAE/RMSE Outliers and error cost (what matters more?) Model selection Metric + cost/latency “Best” isn’t only accuracy
Overfitting vs underfitting (signals) Symptom Likely issue Typical fix Train ↑, validation ↓ Overfitting Regularization, simpler model, more data, better features Both low Underfitting More expressive model, better features, tune hyperparameters
Clarify vs Debugger vs Model Monitor (common confusion) Tool What it helps with When to name it SageMaker Clarify Bias + explainability Fairness questions, “why did it predict X?” SageMaker Model Debugger Training diagnostics + convergence Training instability, loss not decreasing, debugging training SageMaker Model Monitor Production monitoring workflows Drift, data quality degradation, monitoring baselines
Model Registry (repeatability + governance) Track: model artifacts, metrics, lineage, approvals. Enables safe promotion/rollback and audit-ready workflows. 4) Domain 3 — Deployment and orchestration (22%) Endpoint types (must-know picker) Endpoint type Best for Typical constraint Real-time Steady, low-latency inference Cost for always-on capacity Serverless Spiky traffic, scale-to-zero Cold starts + limits Asynchronous Long inference time, bursty workloads Event-style patterns + polling/callback Batch inference Scheduled/offline scoring Not interactive
Scaling metrics (what to pick) Metric Good when… Watch out Invocations per instance Request volume drives load Spiky traffic can cause oscillation Latency You have a latency SLO Noisy metrics require smoothing CPU/GPU utilization Compute bound models Not always correlated to request rate
Multi-model / multi-container (why they exist) Multi-model: multiple models behind one endpoint to reduce cost.Multi-container: pre/post-processing plus model serving, or multiple frameworks.IaC + containers (exam patterns) IaC: CloudFormation or CDK for reproducible environments. Containers: build/publish to ECR , deploy via SageMaker , ECS , or EKS . CI/CD for ML (what’s different) You version and validate more than code:
Code + data + features + model artifacts + evaluation reports Promotion gates: accuracy thresholds, bias checks, smoke tests, canary/shadow validation Typical services: CodePipeline/CodeBuild/CodeDeploy , SageMaker Pipelines , EventBridge triggers.
flowchart LR
G["Git push"] --> CP["CodePipeline"]
CP --> CB["CodeBuild: tests + build"]
CB --> P["SageMaker Pipeline: process/train/eval"]
P --> Gate{"Meets<br/>thresholds?"}
Gate -->|yes| MR["Model Registry approve"]
Gate -->|no| Stop["Stop + report"]
MR --> Dep["Deploy (canary/shadow)"]
Dep --> Mon["Monitor + rollback triggers"]
5) Domain 4 — Monitoring, cost, and security (24%) Monitoring and drift (high yield) Data drift: input distribution changed.Concept drift: relationship between input and label changed.Use baselines + ongoing checks; monitor latency/errors too. Common services/patterns:
SageMaker Model Monitor for monitoring workflows.A/B testing or shadow deployments for safe comparison.Monitoring checklist (what to instrument) Inference quality: when ground truth is available later, compare predicted vs actual.Data quality: nulls, ranges, schema changes, category explosion.Distribution shift: feature histograms/summary stats vs baseline.Ops signals: p50/p95 latency, error rate, throttles, timeouts.Safety/security: anomalous traffic spikes, abuse patterns, permission failures.Infra + cost optimization (high yield) Theme What to do Observability CloudWatch metrics/logs/alarms; Logs Insights; X-Ray for traces Rightsizing Pick instance family/size based on perf; use Inference Recommender + Compute Optimizer Spend control Tags + Cost Explorer + Budgets + Trusted Advisor Purchasing options Spot / Reserved / Savings Plans where the workload fits
Cost levers (common “best answer” patterns) Choose the right inference mode first: batch (cheapest) → async → serverless → real-time (most always-on). Right-size and auto scale; don’t leave endpoints overprovisioned. Use Spot for fault-tolerant training/batch where interruptions are acceptable. Use Budgets + tags early (before the bills surprise you). Security defaults (high yield) Least privilege IAM for training jobs, pipelines, and endpoints.Encrypt at rest + in transit (KMS + TLS).VPC isolation (subnets + security groups) for ML resources when required.Audit trails (CloudTrail) + controlled access to logs and artifacts.Common IAM/security “gotchas” Training role can read S3 but can’t decrypt KMS key (KMS key policy vs IAM policy mismatch). Endpoint role has broad S3 access (“*”) instead of a tight prefix. Secrets leak into logs/artifacts (build logs, notebooks, environment variables). No audit trail for model registry approvals or endpoint updates. Next steps Use Resources to stay anchored to the official exam guide and SageMaker docs. Use the FAQ to confirm expected depth and where the exam is more engineering than data science. Turn weak deployment, monitoring, and security rows into timed scenario drills. Quiz This quiz requires JavaScript to run. The questions are shown below in plain text.
Loading quiz…