Reframe a Multiclass Classifier That Cannot Scale — Practice

A team at an e-commerce company built a multiclass classifier to predict "which product a user will buy next." The catalog has 50,000 SKUs and adds 500 new SKUs per week. They have spent 2 months training the model. The model output is a probability distribution over all 50,000 product classes.

The model is now showing performance decay on new items: products added in the past 4 weeks have significantly lower recommendation rates than older items, and the team is realizing the class count will make the model unmanageable within a year.

What is wrong with the framing? Walk through: your preferred reframing, one reason it scales better, one tradeoff it introduces, and what the team should do with the 2 months of existing work.

Follow-up ladder

Rung 1: You reframe as a scoring model. The product team asks: "How do we recommend new products that have no interaction data?" How does your scoring approach handle this, and what does it require from the product catalog?
Rung 2: The scoring model is deployed. You notice that new products are now getting recommended, but the click-through rate on new product recommendations is 40% lower than on established products. What might explain this, and is it a model problem or a framing problem?
Rung 3: The catalog team says they cannot guarantee high-quality feature data for new SKUs — titles, descriptions, and category tags are often incomplete for the first 2 weeks after a product is added. How does this affect your approach?