Back to archive
1 min read Severity 3 / 5: Cartographic Incident

Reinforcement Learning Progress

Today, OpenAI released a new result. We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros....

Today, OpenAI released a new result. We used PPO (Proximal Policy Optimization), a general reinforcement learning algorithm invented by OpenAI, to train a team of 5 agents to play Dota and beat semi-pros. This is the gam

Original source

Read the original post on Sam Altman's blog
Published
2018-06-25
Updated
2026-04-25
Words
212
Reading time
1 min

https://blog.samaltman.com/reinforcement-learning-progress

Related Posts

Severity 4 / 5: Cave Inscription

The Merge

A popular topic in Silicon Valley is talking about what year humans and machines will merge (or, if not, what year humans will get surpassed by rapidly improving AI or a genetically enhanced species). Most guesses seem to be between 2025 and 2075....

Severity 3 / 5: Cartographic Incident

A Clarification

I made a point in this post inelegantly in a way that was easy to misunderstand, so I’d like to clarify it. I didn’t mean that we need to tolerate brilliant homophobic jerks in the lab so that we can have scientific progress....

Severity 2 / 5: Archive Goblin

GPT-4o

There are two things from our announcement today I wanted to highlight. First, a key part of our mission is to put very capable AI tools in the hands of people for free (or at a great price)....

Severity 2 / 5: Archive Goblin

Reflections

The second birthday of ChatGPT was only a little over a month ago, and now we have transitioned into the next paradigm of models that can do complex reasoning....