I'm a product manager at a platform company in Shanghai and I've been working in China tech for over a decade.
I'll comment on the focus on optimizing KPIs for your coming performance review. We have two main flow, pre-purchase and post-purchase. Pre-purchase, we look at optimizing conversion rates. Post-purchase, we look at decreasing customer service contacts, either by providing self-service functionality or by fixing the source of contacts.
We have a domestic product and one for the international markets. For international, the conversion rate is between 3% and 9% depending on market and market maturity, and business line.
What our product managers do is run experiments hoping to increase the 3 to a higher number. Every week we have thousands of A/B test running and some of them end up as "statistically significant" with a 1% relative increase to baseline. From 3% to 3.03% that is.
The whole idea -- and trust me, there are no other ideas here -- is to randomly perform experiments until the 3 turns into a 9.
It never will, for many reasons.
First, these experiments disregard effect size. A 1% relative increase has no practical significance. It is noise.
Science has a reproducibility crisis, and it also has had a crisis to overfixation on NHST as a method of testing and "proving" hypotheses. But as we know, few scientific results are reproducible by other researchers, especially in the social science. The three-time reproducibility rate of many findings is low. The same goes for these A/B tests. They are never tested for reproducibility, and most of them are not reproducible.
So the idea that results stack is clearly false in theory. It is also empirically false, because otherwise conversion rates wouldn't be 5% after a decade of running thousands of experiments per week. Mathematically, it would be approaching 100%. It isn't. Product management isn't math. It isn't a hard science. It is a social science and has the same issues and challenges.
You may have notice that the same product converts at 9% in some markets and 3% in other markets. Nobody can figure out why. That's because it has nothing to do with the user experience - it is equally poor in every market, essentially like taking the Meituan app and translating it to English, then launching it in Europe and wondering why it doesn't work - it has to do with brand awareness and attitude. Our users actually aren't robots who click on buttons at a pre-defined rates whenever presented to them - they actually think for themselves! A terrible problem for a 互联网产品经理。I wrote about this here: https://dilemmaworks.substack.com/p/brand-awareness-is-the-most-overlooked
We do user research and every problem with the user experience has been documented, across several markets.
We still don't get fixed though. Because we're focused on "quick iterations" for 1% relative improvements. And hey, we A/B tested this experience before, so it must be fine. The research is wrong. Listen to the data. Not understanding that we're in a local maxima, not a global maxima.
We make an active decision to not listen to users, to not listen to local market reps. Our product development is a random walk, where HQ comes up with and tests random ideas and selects whichever are "statistically significant". And that's the product.
I'm a product manager at a platform company in Shanghai and I've been working in China tech for over a decade.
I'll comment on the focus on optimizing KPIs for your coming performance review. We have two main flow, pre-purchase and post-purchase. Pre-purchase, we look at optimizing conversion rates. Post-purchase, we look at decreasing customer service contacts, either by providing self-service functionality or by fixing the source of contacts.
We have a domestic product and one for the international markets. For international, the conversion rate is between 3% and 9% depending on market and market maturity, and business line.
What our product managers do is run experiments hoping to increase the 3 to a higher number. Every week we have thousands of A/B test running and some of them end up as "statistically significant" with a 1% relative increase to baseline. From 3% to 3.03% that is.
The whole idea -- and trust me, there are no other ideas here -- is to randomly perform experiments until the 3 turns into a 9.
It never will, for many reasons.
First, these experiments disregard effect size. A 1% relative increase has no practical significance. It is noise.
Science has a reproducibility crisis, and it also has had a crisis to overfixation on NHST as a method of testing and "proving" hypotheses. But as we know, few scientific results are reproducible by other researchers, especially in the social science. The three-time reproducibility rate of many findings is low. The same goes for these A/B tests. They are never tested for reproducibility, and most of them are not reproducible.
So the idea that results stack is clearly false in theory. It is also empirically false, because otherwise conversion rates wouldn't be 5% after a decade of running thousands of experiments per week. Mathematically, it would be approaching 100%. It isn't. Product management isn't math. It isn't a hard science. It is a social science and has the same issues and challenges.
You may have notice that the same product converts at 9% in some markets and 3% in other markets. Nobody can figure out why. That's because it has nothing to do with the user experience - it is equally poor in every market, essentially like taking the Meituan app and translating it to English, then launching it in Europe and wondering why it doesn't work - it has to do with brand awareness and attitude. Our users actually aren't robots who click on buttons at a pre-defined rates whenever presented to them - they actually think for themselves! A terrible problem for a 互联网产品经理。I wrote about this here: https://dilemmaworks.substack.com/p/brand-awareness-is-the-most-overlooked
We do user research and every problem with the user experience has been documented, across several markets.
We still don't get fixed though. Because we're focused on "quick iterations" for 1% relative improvements. And hey, we A/B tested this experience before, so it must be fine. The research is wrong. Listen to the data. Not understanding that we're in a local maxima, not a global maxima.
We make an active decision to not listen to users, to not listen to local market reps. Our product development is a random walk, where HQ comes up with and tests random ideas and selects whichever are "statistically significant". And that's the product.