Tech Abstractions
MLOps·Applied Reasoning·Medium

Score a Proxy Label on Five Dimensions

Asked at Meta, TikTok, Pinterest

A product team at a social platform wants to use click-through rate (CTR) as the proxy label for a content recommendation system. The true outcome they care about is "content that users find genuinely valuable." They have abundant CTR signal — tens of millions of events per day.

Score CTR as a proxy on five dimensions: alignment (how close is it to the real outcome?), gaming resistance (does optimizing it encourage harmful behavior?), coverage (do you get enough signal across important segments?), latency (how quickly does the label arrive?), and causal usefulness (if CTR improves, does the business KPI move?). Then identify the single biggest weakness and propose a more aligned alternative.

Follow-up ladder

  1. Rung 1: The team proposes adding "dwell time" (seconds spent on the content) to CTR to create a combined proxy. Does this fix the alignment problem? What new risks does it introduce?
  2. Rung 2: After 6 months of optimizing for the combined CTR + dwell proxy, the team notices that creator diversity has dropped — the top 50 creators now account for 80% of impressions, up from 60%. Is this a proxy failure? What would you investigate?
  3. Rung 3: The team wants to add an explicit quality signal — users can rate content as "valuable" or "not valuable." Adoption is low: only 2% of users who complete a piece of content rate it. Is this worth using? How do you handle the coverage problem?

Your Answer

Unlock AI-powered scoring, all questions, and progress tracking.

Study the related chapter →