AI Agents That Can Literally Use Your Computer

A bit about AI
The day before yesterday OpenAI introduced a new toolkit for building AI agents, and today I finally got around to reading the release notes and the description of what is actually there.
And there is a lot:
First, a convenient SDK for building your own AI agents.
Second, a set of tools that almost removes any limits on a developer’s imagination, namely:
1. Calling predefined functions
2. Web search
3. File search
4. Using the computer (!)
The fourth point is where it gets serious, at least because the 4o model’s 'vision' can now be used to perform actions in the OS or in the browser on the user’s behalf. How it works: 4o looks at whatever is on the computer screen → based on the context it performs the required action (click, text input, scroll, wait, key press, etc.) → then returns a screenshot back to the model for the next step or to finish the loop (the diagram in the screenshot shows exactly how this works). It is absolutely mind-blowing. Just imagine: AI can now do on your computer ABSOLUTELY EVERYTHING you can do.
We live in interesting times.
More to explore
Startup Taxes Between Estonia and Portugal: A Quick Reality Check
As a tax resident of an EU country who files my own returns, today is my quarterly 'Tax Day'. On this day I set aside a few hours to file social security report…
Saylify Update: Fighting Perfectionism, Refactoring, and Finding the Right Focus
I have not written anything about Saylify for a long time, even though I planned to launch in January. Unfortunately, life likes to throw in challenges you can …
Human-Like Memory for LLMs
TL;DR I wrote a manifesto-style essay about a memory model for LLMs that is as close as possible to human memory and lets the system build a relationship histor…
When Companies Finally Say the Ugly Part Out Loud
Now we are finally fucking talking. Not all that crap like "internal policies", "no explanation needed", "just because".1Office are the first who wrote it plain…