Reinforcement Learning Progress
Today, OpenAI released a new result. We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros....
Today, OpenAI released a new result. We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros. This is the gam
Original source
Read the original post on Sam Altman's blog- Published
- 2018-06-25
- Updated
- 2026-04-25
- Words
- 212
- Reading time
- 1 min
https://blog.samaltman.com/reinforcement-learning-progress
Related Posts
The Merge
A popular topic in Silicon Valley is talking about what year humans and machines will merge (or, if not, what year humans will get surpassed by rapidly improving AI or a genetically enhanced species). Most guesses seem to be between 2025 and 2075....
A Clarification
I made a point in this post inelegantly in a way that was easy to misunderstand, so I’d like to clarify it. I didn’t mean that we need to tolerate brilliant homophobic jerks in the lab so that we can have scientific progress....
GPT-4o
There are two things from our announcement today I wanted to highlight. First, a key part of our mission is to put very capable AI tools in the hands of people for free (or at a great price)....
Reflections
The second birthday of ChatGPT was only a little over a month ago, and now we have transitioned into the next paradigm of models that can do complex reasoning....