Seems mostly to posit that AI will quickly improve once it’s unleashed on an environment that lets it judge its own reinforcement learning feedback, versus mostly text-based human responses up to this point.
My opinion is that there are probably a few companies capable of building this, but jury is still out on succeeding meaningfully, and I doubt it will keep using LLMs. Half the time I find Cursor agents work themselves into error loops, but I realize this is a consumer product likely loosing money, versus a research project sponsored by companies with massive capital.
The is the paper the article is written about: https://storage.googleapis.com/deepmind-media/Era-of-Experie...
Seems mostly to posit that AI will quickly improve once it’s unleashed on an environment that lets it judge its own reinforcement learning feedback, versus mostly text-based human responses up to this point.
My opinion is that there are probably a few companies capable of building this, but jury is still out on succeeding meaningfully, and I doubt it will keep using LLMs. Half the time I find Cursor agents work themselves into error loops, but I realize this is a consumer product likely loosing money, versus a research project sponsored by companies with massive capital.