AIGuideTutorial

AI Agents Explained: How They Work and How to Build One in 2026

Sanjay Tarani12 May 202611 min read

Sanjay Tarani · Product Designer, Sydney

Working on ai design? Let's chat.

The AI landscape just shifted, and most people are still playing in the old version. Chat models had their moment. The next stage belongs to agents.

If you have been using ChatGPT or Claude to draft emails, summarise documents, or research ideas, you have only touched a small part of what AI can do. Agents change the contract entirely. Instead of you typing a question and waiting for a reply, you hand the agent a goal and it plans, acts, and delivers the result.

I recently came across a clear breakdown by Remy Gasill on Greg Eisenberg's podcast, and a lot of it lined up with how I have been quietly building my own workflow over the past few months. This guide takes those concepts and lays them out for product designers, founders, and anyone who has been hearing the word "agents" thrown around without a real explanation of what is happening underneath.

By the end of this post, you will understand the agent loop, the components that make up an agent, what MCPs and skills actually are, and how to start building your own AI operating system that runs the manual parts of your work.

TL;DR: What You Actually Need to Know About AI Agents

Concept	What it is
Chat vs Agent	Chat is question to answer. Agent is goal to result.
The agent loop	Observe, think, act. Repeats until the goal is done.
Agent harness	The platform that runs the loop (Claude Code, Codex, Manus, Cowork).
Context file	A markdown file (`agents.md` or `claude.md`) that loads your business context into every session.
Memory file	A markdown file the agent updates over time so it remembers your preferences.
MCP	The standard that lets an agent talk to your tools (Gmail, Notion, Stripe).
Skill	A reusable SOP that captures a process so you only explain it once.
AI operating system	A folder structure of context, memory, MCPs, and skills that runs like a team.

If you only take one thing from this guide, start with a single context file and a single skill for a task you repeat every week. Everything else compounds from there.

Chat Models vs AI Agents

The simplest way to keep these straight in your head is this:

Chat models answer a question. You ask, they reply, you do the work.
AI agents complete a goal. You give them a task, they plan it, they execute it, and they hand you the result.

Chat is pingpong. Agents are autonomous. When you tell an agent "build me a portfolio site", it does not just describe how to build one. It researches you, drafts a plan, writes the code, hosts a preview, and screenshots its own output to check the job is done.

That shift from "I ask, it answers" to "I delegate, it delivers" is the entire reason agents matter. People using agents well are working five to ten times faster than people still living in chat.

The Agent Loop: Observe, Think, Act

Every agent runs the same three-step loop under the hood:

Observe. Load the prompt, context files, prior steps, and any tool results into memory.
Think. Decide what the next step should be.
Act. Execute that step, then feed the result back into the observe stage.

The loop repeats until the agent decides the task is complete based on the parameters you set. Ask for "ten sources compiled into a slide deck" and it will keep looping until it has those ten sources and that deck.

This is also why agents feel different from chat. You are not babysitting one reply. You are watching a process unfold.

The Four Components of an Agent

Strip away the marketing and an AI agent is four things stacked together:

The model (LLM). The brain. Claude Opus, GPT-5, Gemini, whichever you choose.
The loop. The thing that keeps the agent going until the goal is done.
The tools. Connections to the apps you actually use, like Gmail, Notion, or Stripe.
The context. The information about you, your business, and your preferences.

The platform that wires all of this together is called an agent harness. Claude Code, Codex, Manus, Cowork, and Google Anti-Gravity are all just different harnesses running the same fundamental loop. Learn the underlying concepts and you can move between them in an afternoon.

Step One: Give Your Agent Context With an `agents.md` File

When you start a fresh chat in ChatGPT and ask "what do I do?", the model already knows a scary amount about you. That is automatic memory in the cloud. With agents, context is something you set up and control yourself.

The standard way to do this is a markdown file in your project folder.

In Claude Code, it is called claude.md.
In Codex or Open Claw, it is called agents.md.
In Gemini, it is gemini.md.

Same idea, same content, different filename. Every new session the agent reads this file before answering. Inside it you put things like:

Who you are and what your business does
Who your customers are
What tools you use and what you use them for
Your tone, brand voice, and working preferences

Once that file is in place, prompts can be ridiculously simple. "Write a cold email" goes from useless to immediately on-brand because the agent already knows who you sell to and how you talk.

This is the shift from prompt engineering to context engineering. The quality of your output is now mostly a function of how well you have briefed the agent, not how cleverly you have phrased the prompt.

Step Two: Add a `memory.md` File So It Remembers You

Here is the catch with context files. They are static. If you tell the agent "never sign emails with cheers, use warm regards", it will say "got it" and then forget the next day.

The fix is a second file: memory.md. You instruct the main context file to read memory.md at the start of every session and to update it whenever you correct it or it learns something new. A simple snippet at the top of agents.md is enough:

Read all files in /context.
Read memory.md. This is what you have learned over time.
When I correct you or you learn something new, update the relevant section in memory.md.
Keep memory.md current. When something changes, update it in place.

Now corrections actually stick. Tell it once to keep emails casual, and casual becomes the default in every session afterwards. Over weeks and months, this compounds. Errors drop. Output gets sharper. Your agent starts to feel like an employee who actually pays attention.

Some harnesses now bake this in automatically. Knowing how it works under the hood still matters, because you can move the same files between platforms.

Step Three: Connect Your Tools With MCP

By default, most agent harnesses can search the web and read files. To do anything useful with your actual stack, like sending an email from your Gmail or creating a payment link in Stripe, you need to connect tools.

The way you do this is called MCP, which stands for Model Context Protocol. Anthropic built it. The simplest way to think about MCP is a universal translator. Your agent speaks one language, each tool speaks its own, and MCP sits in the middle so they can talk without custom code for every integration.

In most modern harnesses, connecting an MCP is a few clicks. You log in to Gmail, Notion, Calendar, Stripe, and from then on the agent can use them inside the loop.

Once that wiring is done you can ask one agent to do something like this:

Review my Granola notes from this morning's call with Sarah, draft a proposal email, create a Stripe payment link, and set up the project in Notion.

The agent reads the meeting notes from Granola, drafts the email in Gmail, generates the Stripe link, and creates the Notion project. You never open any of those apps. They become layers underneath your one workspace.

Step Four: Build Skills to Stop Explaining Yourself

The last concept is the most powerful one. Skills are SOPs for AI.

The first time you ask an agent to write a proposal, you might spend twenty minutes going back and forth on tone, structure, where the price sits, what colour the headers should be. Eventually you land on a proposal you love. Without skills, that learning evaporates the moment the session closes.

A skill captures that process in a markdown file. The next time you say "write a proposal", the agent reads the skill and produces something in the right format on the first try.

There are two ways to create a skill that work well:

Build it up front. Drop a course transcript, an SOP doc, or a written process into the agent and ask it to use its built-in skill creator to turn that into a reusable skill file.
Capture it from real work. Walk through a process manually with the agent once. When you are happy with the output, tell it to package what you just did into a skill.

Stack three to five skills a week and within a few months you have a library that automates entire chunks of your role.

What This Looks Like for a Designer or Founder

The original transcript example was an executive assistant, which is useful, but here is what this looks like inside a design and product workflow.

My setup is a folder per client or company, with sub-folders for each "role":

A design director folder with a context file that knows my visual standards and brand voice, plus skills for design QA, component audits, and screenshot reviews.
A growth folder with skills for landing page copywriting, ad creative briefs, and weekly performance summaries pulled from Google Analytics.
A client ops folder with a proposal-writing skill, a Stripe payment link skill, and a Notion project setup skill that chains together.

When a new lead comes in from a discovery call, I run one prompt. The agent pulls the meeting notes from Granola, drafts a proposal in my voice, generates a Stripe link, sets up the Notion project, and writes the follow-up email. What used to take me ninety minutes takes about three.

This is what people mean when they talk about an AI operating system. It is not a single app. It is a folder of markdown files, MCP connections, and skills that compounds every week you use it.

Where to Start This Week

If this is your first time building an agent, do not try to set up the whole thing at once. The compounding only works if you actually start.

Here is the smallest useful version:

Pick one role you wish you had hired. Executive assistant, marketer, design QA, content writer.
Create a folder for it.
Spend twenty minutes writing an agents.md or claude.md that explains who you are, what the role is, and how you want it to behave.
Connect two MCPs you use every day. Gmail and Calendar are a strong start.
Use it for a week. Every time you correct it, let it update memory.md. Every time you finish a manual process, ask it to turn that into a skill.

That is it. After a week you will have something that already saves you hours. After a month, it will start to feel like a real team member.

Final Thoughts

AI agents are not magic and they are not going to replace you. What they will do is replace the dull, repetitive twenty percent of your work that has been eating your week. Email triage. Meeting follow-ups. Proposals. Research. Reporting.

The people who get ahead in 2026 will not be the ones with the cleverest prompts. They will be the ones who quietly built an operating system of context, memory, tools, and skills, then let it compound.

If you want help thinking through how AI agents fit into your product or business, or you want to build out a custom AI workflow for your team, feel free to reach out.

Frequently Asked Questions About AI Agents

What is an AI agent in simple terms?

An AI agent is an AI system that takes a goal, plans the steps, uses tools, and delivers a result on its own. A chat model answers a question. An agent finishes a task.

What is the difference between a chat model and an AI agent?

Chat models are pingpong. You ask, they answer. AI agents are autonomous. They run a loop of observing, thinking, and acting until the goal is complete, often calling tools and writing files along the way.

What is an agent harness?

An agent harness is the platform that runs the agent loop and connects the model to tools and context. Claude Code, Codex, Manus, Cowork, and Google Anti-Gravity are all examples. They differ in features and interface, but the fundamentals are the same.

What is an `agents.md` file?

An agents.md file is a markdown file that lives in your project folder and gets loaded into the agent at the start of every session. It contains your role, business context, tone, and preferences. Claude Code calls it claude.md, Gemini calls it gemini.md, but the purpose is identical.

What is MCP?

MCP, or Model Context Protocol, is the standard Anthropic created so AI agents can connect to tools like Gmail, Notion, and Stripe without custom integrations for each one. Most modern agent harnesses now support MCP out of the box.

What is an AI skill?

A skill is a reusable instruction file that captures a process so the agent runs it the same way every time. Think of it as a standard operating procedure for AI. You build one for proposals, one for weekly reports, one for ad analysis, and never explain those processes again.

Which AI agent platform should beginners start with?

For most beginners, start with Claude Code or Cowork. Both have a clean interface, work off local markdown files, and make it easy to see what the agent is thinking. OpenCode is more powerful but harder to learn first.

How long does it take to build a useful AI agent?

A first useful agent takes about an hour. One agents.md file, two MCP connections, and a single skill is enough to start saving time. The real value compounds over weeks as you add more skills and let the memory file build up.

About the Author

Sanjay Tarani is the Head of Design at DoxAI, helping entrepreneurs and business owners build scalable, user-focused digital products. Sanjay has led design system initiatives behind 50+ successful projects and has been recognised with the Website Wizard award. Sanjay brings experience from high-growth startup environments, including learning within the Startmate ecosystem, and shares practical insights on design, product strategy, and building profitable apps. Connect with Sanjay on LinkedIn.

Keep reading