Tech Abstractions
ML System Design·Hard

Design a Feature Store for a Machine Learning Platform

Asked at Uber, Airbnb, DoorDash

Your organization has 50+ ML models in production, each consuming dozens to hundreds of features. Different teams compute features independently using ad-hoc pipelines, leading to duplicated computation, inconsistent feature definitions between training and serving, and difficulty debugging model predictions. You have been asked to design a centralized feature store.

Scale Requirements

  • 1,000+ features registered across all teams
  • 100,000 QPS peak online serving with p99 latency under 10ms
  • 10 TB/day of new feature data ingested from streaming and batch sources
  • Features span multiple domains: user features, content features, real-time context, and embeddings

Design Requirements

  1. Design the overall architecture including both online serving and offline training paths.
  2. Explain how you guarantee consistency between features used at training time and serving time.
  3. Describe the feature registration, discovery, and governance model.
  4. Address how you handle point-in-time correct training data generation.
  5. Discuss monitoring: how do you detect feature drift and serving anomalies?

Your Answer

Unlock AI-powered scoring, all questions, and progress tracking.

Study the related chapter →