microsoft finds a way to have ais powerfully self-improve in math reasoning

if they can do this for math, why can't they do it for general reasoning?

https://youtu.be/Bhoy_arJvaE?si=OLomRfCVUguhx3rx