Latest Notes

Roman Grossi • Founder

Indie hacking, startups, resilient systems - and staying sane while building a small company

Back to articles

AI Agents That Can Literally Use Your Computer

· 1 min read · 9 views

Telegram image

A bit about AI

The day before yesterday OpenAI introduced a new toolkit for building AI agents, and today I finally got around to reading the release notes and the description of what is actually there.

And there is a lot:

First, a convenient SDK for building your own AI agents.

Second, a set of tools that almost removes any limits on a developer’s imagination, namely:

1. Calling predefined functions

2. Web search

3. File search

4. Using the computer (!)

The fourth point is where it gets serious, at least because the 4o model’s 'vision' can now be used to perform actions in the OS or in the browser on the user’s behalf. How it works: 4o looks at whatever is on the computer screen → based on the context it performs the required action (click, text input, scroll, wait, key press, etc.) → then returns a screenshot back to the model for the next step or to finish the loop (the diagram in the screenshot shows exactly how this works). It is absolutely mind-blowing. Just imagine: AI can now do on your computer ABSOLUTELY EVERYTHING you can do.

We live in interesting times.

More to explore