Tech Abstractions
MLOps·ML System Design·Easy

Design an ML Feature Engineering Platform

Asked at Airbnb, Uber, Netflix

Your ML organization has 30+ data scientists who spend 60% of their time on feature engineering — writing ad-hoc SQL queries and Python scripts to compute features from raw data, then copy-pasting feature computation logic between training and serving code. Design a feature engineering platform that makes feature creation, sharing, and serving systematic and reusable.

Scale Requirements

  • Support 500+ features across all teams, growing 30% quarterly
  • Feature computation runs on 10TB/day of raw data
  • Online feature serving: 50,000 QPS with p99 latency under 5ms
  • Batch feature serving: generate training datasets up to 1TB in under 30 minutes
  • Features must be discoverable across teams to reduce duplication

Design Requirements

  1. Design the feature computation pipeline — how do data scientists define and schedule features?
  2. Design the feature registry for discovery, documentation, and governance.
  3. Design the dual serving layer — online (low-latency) and offline (high-throughput).
  4. Explain how you ensure feature consistency between training and serving.
  5. Describe how you would drive platform adoption across 30+ data scientists.

Your Answer

Unlock AI-powered scoring, all questions, and progress tracking.

Study the related chapter →