The Future of Agentic Coding with Claude Code

Posted on: September 2, 2025

Sep 2, 2025, Video: The future of agentic coding with Claude Code (YouTube)

This English version closely follows the complete Chinese notes to preserve all information from the talk transcript and references.

• Personal Background and Opening

Alex shares an early programming memory: writing BASIC on a TI-83 Plus calculator in math class to store exam answers.
Alex: leads Claude Relations at Anthropic.
Guest Boris: an Anthropic engineer and the creator of Claude Code.
Theme: coding has changed dramatically over the past 12 months, especially with AI.

• One Year Ago: The State of Coding

Typical dev flow relied on IDE autocomplete and a simple chat assistant, with lots of copy-paste.
AI was a functional tool, not deeply integrated into the inner loop.
About a year ago the “agent” pattern emerged and began to enter the workflow.
Compared to hand-editing text, developers are increasingly relying on AI agents to write and modify code.

• Early Claude Code Attempts

The initial release used Sonnet 3.5 and was limited; Boris used it for about 10% of his own coding.
Even with early models and a primitive harness, internal trials showed value.
Rapid year of model progress: Sonnet 3.7, Claude 4.0, then Opus 4.1 with notably improved capabilities.
The harness (Claude Code itself) kept improving: context management, tool use, permissions, etc., enabling real development utility.

• Co-evolution of Models and Product

Everyone at Anthropic, including researchers, uses Claude Code daily.
Pain points from real usage directly inform model and product improvements.
Example: early models drifted during longer edit sessions; later versions stay on track much longer.
Improvements are driven by actual engineering work with Claude Code, not abstract benchmarks.

• Evaluation and Feedback Loop

Boris’s evaluation method: use the new model to do his real work for the day and judge the outcome.
Daily work spans new feature code, bug fixes, reading Slack, replying to GitHub issues—good coverage for testing capabilities.
Benchmarks like SWE-bench and T-Bench exist, but “vibes” (hands-on feel) are most decisive for product quality.
Key practice: a single internal Slack channel for feedback, with rapid response and fixes to sustain a positive loop.
This fast iteration maintains a steady stream of feedback that drives Claude Code’s evolution.

• Claude Code Today and Extensibility

Design goal: stay simple and hackable.
Earliest extension: a repository CLAUDE.md file to inject persistent context.
Then increasingly rich primitives:
- More capable settings and permissioning.
- Hooks to extend more phases of operation.
- MCP (Model Context Protocol) as an extension point.
- Slash commands and Subagents, user-customizable.
These make Claude Code useful beyond coding—a general agent SDK.

• Outlook (6–24 Months)

Work splits into two modes:
- Some “hands-on coding,” increasingly having Claude modify text for you.
- More tasks where Claude proposes and performs changes, and you accept or adjust.
Longer term, Claude moves from task execution toward goal completion (e.g., building an app end-to-end).
Engineers shift from “text editors” to “goal setters and reviewers.”

• Advice on Learning and Careers

From TI-83 days to the modern stack: barriers used to be high; agents lower them.
Agents re-focus effort on ideas and products, not incidental complexity.
Code is no longer scarce; rewrite freely.
Still master fundamentals: languages, compilers, runtimes, web systems, and system design.
Cultivate creativity—turn ideas into prototypes quickly, even startup ideas.

• Claude Code Tips and Best Practices

Tip for beginners: don’t start by having Claude write code—first use it to understand the codebase, ask questions, explore history; get comfortable with it as a research partner.
Treat work by complexity:
- Easy: let Claude generate the change in one go (e.g., mention @claude on a GitHub issue to generate a PR).
- Medium: use Plan Mode to align on steps, then Auto-Accept to run.
- Hard: you drive; Claude assists with research, prototypes, and tests; humans write most of the final code.
Adjust usage style to task difficulty; avoid one-size-fits-all.
Closing: expect more autonomy, stronger tooling, and lower barriers. Claude Code’s mission is to be a true intelligent partner—not just write code.

Time-coded Highlights and Details

• [00:00–00:24] Opening and personal anecdote (TI-83 + BASIC)

Alex recalls programming a TI-83 Plus in BASIC to store exam answers, discovering the joy of hackability. Supplemental reading: TI-83/84 TI-BASIC quickstart and manual 12 (Wikibooks).

• [00:25–00:44] Guests and theme

Host Alex (Claude Relations at Anthropic); guest Boris (creator of Claude Code, Anthropic engineer). Theme: Claude Code and the future of software engineering; the past year has been exceptionally fast-moving.

• [00:45–01:01] Framing the retrospective

Alex asks Boris to summarize how coding has changed over the past year and where we are now.

• [01:02–01:24] Typical workflow a year ago

IDE autocomplete + chat app, heavy copy/paste; AI lived outside the inner loop.

• [01:25–01:47] Agents move into the inner loop

The standout shift: coding now increasingly uses agents rather than manual, character-level editing; from “press Tab” to “the model writes.”

• [01:48–02:20] From hand editing to model-driven edits

Transition to more “hands-off” work: specify goals to the agent; it performs large-scale edits and even scaffolds apps.

• [02:21–02:47] Why last year couldn’t do this

Two reasons: model capability limits; and immature scaffolding/harness (the orchestration layer above the model).

• [02:48–03:08] Very early Claude Code

Initial release still used Sonnet 3.5 (not the upgraded model); “usable but limited.” Boris used it on ~10% of his own code.

• [03:09–03:25] Early internal adoption

Day after release to core teams, engineers were already using it; even early, it delivered value.

• [03:26–03:40] Not great yet, still helpful

Both model and harness were rough but useful.

• [03:41–04:02] A year of progress in models and harness

Models: from 3.7, 4.0 to Opus 4.1, with agentic coding improvements; the harness (Claude Code) also advanced greatly 6 (Anthropic, Anthropic).
Key point: you can’t just “use the model”—you need a harness to direct it.

• [04:03–04:29] Horse and saddle analogy

Model as horse; engineers need a saddle/harness to guide effectively.

• [04:30–04:55] What the harness includes

The harness = Claude Code: system prompt, context management, tools, pluggable MCP servers, settings, permissions 1 (Anthropic).

• [04:56–05:19] Making the model “see” the full context

Harness feeds context and tools to the model; this dramatically affects performance. Over the past year, the team refined how to “build around the model.”

• [05:20–05:36] Why coevolution happened

Not just from training presets; it emerged naturally because everyone at Anthropic (including researchers) uses Claude Code daily.

• [05:37–05:54] Finding limits in daily use

Example: failures in string replacement indicate true model gaps and provide lessons for improvement.

• [05:55–06:12] Longer autonomous “run time”

Letting the model “run itself”: from short, drift-prone runs on 3.5 to much longer stable runs on newer models—achieved through repeated “correct → teach” loops in human-in-the-loop usage.

• [06:13–06:29] How to evaluate new models/features

Best evaluation: “I use it to do my real work today.”

• [06:30–06:52] Real work covers many capabilities

Write features, fix bugs, read Slack, reply to GitHub issues; more and more is possible. Via MCP, pull context, read messages, and use sources like Sentry logs to help debugging 2 (Anthropic, Sentry_docs).

• [06:53–07:10] Productized evals are hard

Attempts at product evals exist, but the most effective signal still comes from real usage.

• [07:11–07:34] Benchmarks exist, but “vibes” matter more

SWE-bench, T-Bench, etc., can’t capture engineering complexity; hands-on feel is a sharper signal 4 (SWE-bench, GitHub).

• [07:35–08:09] Frequent question: how to test prompts?

Claude Code relies on a tight in-use feedback loop, which is more immediate than fixed eval suites.

• [08:10–08:32] “We mostly go by vibes now”

Model performance on SWE-bench is already high; the community is seeking harder/newer evals (e.g., T-Bench), but fully covering engineering reality remains difficult 4 (SWE-bench, Terminal-Bench).

• [08:33–09:07] Why internal dogfooding works well

Product philosophy: extreme user listening and lowering feedback friction.

• [09:08–09:26] Single Slack feedback channel

All feedback goes to one channel, reducing the sense of a “black hole.”

• [09:27–09:43] Quick fixes → sustained feedback firehose

Boris batches fixes and replies item-by-item, sustaining positive feedback; the channel remains a “bursting firehose.”

• [09:44–10:22] Stay humble and user-oriented

In a new AI domain, nobody “truly knows”; continuous listening is essential.

• [10:23–10:50] Current design: simple and hackable

Goal: minimal and extensible. The earliest extension point is CLAUDE.md as persistent context 1 (Anthropic).

• [11:09–11:24] CLAUDE.md location and version control

Can live at repo root or subdirs; typically checked into the repo and evolves with it 1 (Anthropic).

• [11:24–12:00] Many more extension points

Introduced a richer settings/permissions system, hooks, MCP, slash commands, subagents 1 7 (Anthropic).

• [12:01–12:23] Slash command example

A custom “commit” command encodes how to write good commit messages and can pre-approve running git commit to avoid repeated confirmations 7 (Anthropic).

• [12:24–12:52] Agents vs. slash commands

Think of agents as slash commands with branching context windows; two sides of the same coin. The SDK also applies to non-coding agents 8 (Anthropic, Anthropic).

• [12:53–13:08] Underlying model keeps improving

More autonomy, better instruction-following and memory—all of which strengthen these extensions 11 (Anthropic, Anthropic).

• [13:09–13:31] Daily flow in 6–12 months

Still some hand coding, but more often Claude manipulates text while you plan and review.

• [13:32–13:51] From “less hand writing” to model proactivity

The model proactively proposes and completes changes; you curate.

• [13:52–14:35] 12–24 months: goals over tasks

Agents focus less on small tasks, more on monthly/strategic objectives.

• [14:36–14:56] Moving up abstraction levels

From “edit a file” → “submit a PR” → “make progress toward building the app.”

• [14:57–15:21] Back to the TI-83 spark

Emphasizes the satisfaction of quick experiments and immediate feedback.

• [15:22–16:04] High past barriers vs. lower agentic barriers

Traditional web stacks (React, Next.js, multiple build/deploy steps) are complex; agents make “have an idea → build it” much faster.

• [16:05–16:28] Code is rewritable; code is less “precious”

Hand-coding can still be fun (e.g., weekend C++ for fun), but outcomes matter most.

• [16:29–17:33] Study advice

Keep fundamentals (languages, compilers, runtimes, web architecture, system design) and be more creative: you can build product/startup ideas quickly now.

• [17:34–17:58] Best practice Q&A

Alex asks for Claude Code tips from its creator.

• [17:59–18:18] Tip 1: start by asking

Use it to explore the codebase first (e.g., how to add a logger? why is a function designed this way? scan Git history for rationale).

• [18:19–18:39] Research assistant first → then codegen

Build the mental model of “agent as researcher” before letting it write code.

• [18:40–19:08] Tip 2: three task classes

Easy: single-prompt tasks.
Medium: Plan Mode first, then Auto-Accept 10 (ClaudeLog).
Hard: human-led, Claude assists (research, prototypes, unit tests).

• [19:09–19:23] Easy in practice

Mention @claude on GitHub issues/PRs to generate PRs—no terminal required 3 (Anthropic).

• [19:24–19:40] Medium in practice

In the terminal, switch to Plan mode (Shift+Tab), then Auto-Accept after plan alignment 10 (ClaudeLog).

• [19:41–19:58] Hard in practice

Human drives; Claude does research, prototyping, tests; main implementation remains human-written.

• [19:59–20:15] Wrap-up

Thanks exchanged; conversation ends.

References and Further Reading (grouped by topic)

https://docs.anthropic.com/en/docs/claude-code/overview “Claude Code overview (official docs)”
https://docs.anthropic.com/en/docs/mcp “Model Context Protocol (Anthropic docs)”; https://modelcontextprotocol.io/ “MCP official site”
https://docs.anthropic.com/en/docs/claude-code/github-actions “Claude Code GitHub Actions (@claude to generate PRs/fixes)”
https://www.swebench.com/ “SWE-bench (real OSS issue-fixing benchmark)”; https://github.com/SWE-bench/SWE-bench “SWE-bench GitHub”
https://github.com/laude-institute/terminal-bench “Terminal-Bench (terminal agent eval)”; https://www.tbench.ai/news/announcement “T-Bench announcement”
https://www.anthropic.com/news/claude-opus-4-1 “Claude Opus 4.1 release and capabilities”
https://docs.anthropic.com/en/docs/claude-code/slash-commands “Claude Code custom slash commands”
https://docs.anthropic.com/en/docs/claude-code/sub-agents “Claude Code Subagents”
https://www.anthropic.com/engineering/claude-code-best-practices “Claude Code best practices (engineering blog)”
https://www.claudelog.com/mechanics/auto-accept-permissions/ “Plan / Auto-Accept mode and one-click permissions (keyboard toggle)”
https://docs.anthropic.com/en/release-notes/api “Anthropic release notes and latest models”
https://en.wikibooks.org/wiki/How_to_Program_a_TI-83_Plus/Intro “TI-83 Plus TI-BASIC intro (Wikibooks)”
https://docs.sentry.io/product/explore/logs/ “Sentry Logs (structured logs for debugging/observability)”

The above closely follows the provided transcript. External links annotate terms/mechanisms/benchmarks/integrations for cross-checking and further reading.

Tags: Claude-Code