The Future of Agentic Coding with Claude Code
Sep 2, 2025, Video: The future of agentic coding with Claude Code (YouTube)
This English version closely follows the complete Chinese notes to preserve all information from the talk transcript and references.
• Personal Background and Opening
- Alex shares an early programming memory: writing BASIC on a TI-83 Plus calculator in math class to store exam answers.
- Alex: leads Claude Relations at Anthropic.
- Guest Boris: an Anthropic engineer and the creator of Claude Code.
- Theme: coding has changed dramatically over the past 12 months, especially with AI.
• One Year Ago: The State of Coding
- Typical dev flow relied on IDE autocomplete and a simple chat assistant, with lots of copy-paste.
- AI was a functional tool, not deeply integrated into the inner loop.
- About a year ago the “agent” pattern emerged and began to enter the workflow.
- Compared to hand-editing text, developers are increasingly relying on AI agents to write and modify code.
• Early Claude Code Attempts
- The initial release used Sonnet 3.5 and was limited; Boris used it for about 10% of his own coding.
- Even with early models and a primitive harness, internal trials showed value.
- Rapid year of model progress: Sonnet 3.7, Claude 4.0, then Opus 4.1 with notably improved capabilities.
- The harness (Claude Code itself) kept improving: context management, tool use, permissions, etc., enabling real development utility.
• Co-evolution of Models and Product
- Everyone at Anthropic, including researchers, uses Claude Code daily.
- Pain points from real usage directly inform model and product improvements.
- Example: early models drifted during longer edit sessions; later versions stay on track much longer.
- Improvements are driven by actual engineering work with Claude Code, not abstract benchmarks.
• Evaluation and Feedback Loop
- Boris’s evaluation method: use the new model to do his real work for the day and judge the outcome.
- Daily work spans new feature code, bug fixes, reading Slack, replying to GitHub issues—good coverage for testing capabilities.
- Benchmarks like SWE-bench and T-Bench exist, but “vibes” (hands-on feel) are most decisive for product quality.
- Key practice: a single internal Slack channel for feedback, with rapid response and fixes to sustain a positive loop.
- This fast iteration maintains a steady stream of feedback that drives Claude Code’s evolution.
• Claude Code Today and Extensibility
- Design goal: stay simple and hackable.
- Earliest extension: a repository
CLAUDE.md
file to inject persistent context. - Then increasingly rich primitives:
- More capable settings and permissioning.
- Hooks to extend more phases of operation.
- MCP (Model Context Protocol) as an extension point.
- Slash commands and Subagents, user-customizable.
- These make Claude Code useful beyond coding—a general agent SDK.
• Outlook (6–24 Months)
- Work splits into two modes:
- Some “hands-on coding,” increasingly having Claude modify text for you.
- More tasks where Claude proposes and performs changes, and you accept or adjust.
- Longer term, Claude moves from task execution toward goal completion (e.g., building an app end-to-end).
- Engineers shift from “text editors” to “goal setters and reviewers.”
• Advice on Learning and Careers
- From TI-83 days to the modern stack: barriers used to be high; agents lower them.
- Agents re-focus effort on ideas and products, not incidental complexity.
- Code is no longer scarce; rewrite freely.
- Still master fundamentals: languages, compilers, runtimes, web systems, and system design.
- Cultivate creativity—turn ideas into prototypes quickly, even startup ideas.
• Claude Code Tips and Best Practices
Tip for beginners: don’t start by having Claude write code—first use it to understand the codebase, ask questions, explore history; get comfortable with it as a research partner.
Treat work by complexity:
- Easy: let Claude generate the change in one go (e.g., mention
@claude
on a GitHub issue to generate a PR). - Medium: use Plan Mode to align on steps, then Auto-Accept to run.
- Hard: you drive; Claude assists with research, prototypes, and tests; humans write most of the final code.
- Easy: let Claude generate the change in one go (e.g., mention
Adjust usage style to task difficulty; avoid one-size-fits-all.
Closing: expect more autonomy, stronger tooling, and lower barriers. Claude Code’s mission is to be a true intelligent partner—not just write code.
Time-coded Highlights and Details
• [00:00–00:24] Opening and personal anecdote (TI-83 + BASIC)
- Alex recalls programming a TI-83 Plus in BASIC to store exam answers, discovering the joy of hackability. Supplemental reading: TI-83/84 TI-BASIC quickstart and manual 12 (Wikibooks).
• [00:25–00:44] Guests and theme
- Host Alex (Claude Relations at Anthropic); guest Boris (creator of Claude Code, Anthropic engineer). Theme: Claude Code and the future of software engineering; the past year has been exceptionally fast-moving.
• [00:45–01:01] Framing the retrospective
- Alex asks Boris to summarize how coding has changed over the past year and where we are now.
• [01:02–01:24] Typical workflow a year ago
- IDE autocomplete + chat app, heavy copy/paste; AI lived outside the inner loop.
• [01:25–01:47] Agents move into the inner loop
- The standout shift: coding now increasingly uses agents rather than manual, character-level editing; from “press Tab” to “the model writes.”
• [01:48–02:20] From hand editing to model-driven edits
- Transition to more “hands-off” work: specify goals to the agent; it performs large-scale edits and even scaffolds apps.
• [02:21–02:47] Why last year couldn’t do this
- Two reasons: model capability limits; and immature scaffolding/harness (the orchestration layer above the model).
• [02:48–03:08] Very early Claude Code
- Initial release still used Sonnet 3.5 (not the upgraded model); “usable but limited.” Boris used it on ~10% of his own code.
• [03:09–03:25] Early internal adoption
- Day after release to core teams, engineers were already using it; even early, it delivered value.
• [03:26–03:40] Not great yet, still helpful
- Both model and harness were rough but useful.
• [03:41–04:02] A year of progress in models and harness
- Models: from 3.7, 4.0 to Opus 4.1, with agentic coding improvements; the harness (Claude Code) also advanced greatly 6 (Anthropic, Anthropic).
- Key point: you can’t just “use the model”—you need a harness to direct it.
• [04:03–04:29] Horse and saddle analogy
- Model as horse; engineers need a saddle/harness to guide effectively.
• [04:30–04:55] What the harness includes
- The harness = Claude Code: system prompt, context management, tools, pluggable MCP servers, settings, permissions 1 (Anthropic).
• [04:56–05:19] Making the model “see” the full context
- Harness feeds context and tools to the model; this dramatically affects performance. Over the past year, the team refined how to “build around the model.”
• [05:20–05:36] Why coevolution happened
- Not just from training presets; it emerged naturally because everyone at Anthropic (including researchers) uses Claude Code daily.
• [05:37–05:54] Finding limits in daily use
- Example: failures in string replacement indicate true model gaps and provide lessons for improvement.
• [05:55–06:12] Longer autonomous “run time”
- Letting the model “run itself”: from short, drift-prone runs on 3.5 to much longer stable runs on newer models—achieved through repeated “correct → teach” loops in human-in-the-loop usage.
• [06:13–06:29] How to evaluate new models/features
- Best evaluation: “I use it to do my real work today.”
• [06:30–06:52] Real work covers many capabilities
- Write features, fix bugs, read Slack, reply to GitHub issues; more and more is possible. Via MCP, pull context, read messages, and use sources like Sentry logs to help debugging 2 (Anthropic, Sentry_docs).
• [06:53–07:10] Productized evals are hard
- Attempts at product evals exist, but the most effective signal still comes from real usage.
• [07:11–07:34] Benchmarks exist, but “vibes” matter more
- SWE-bench, T-Bench, etc., can’t capture engineering complexity; hands-on feel is a sharper signal 4 (SWE-bench, GitHub).
• [07:35–08:09] Frequent question: how to test prompts?
- Claude Code relies on a tight in-use feedback loop, which is more immediate than fixed eval suites.
• [08:10–08:32] “We mostly go by vibes now”
- Model performance on SWE-bench is already high; the community is seeking harder/newer evals (e.g., T-Bench), but fully covering engineering reality remains difficult 4 (SWE-bench, Terminal-Bench).
• [08:33–09:07] Why internal dogfooding works well
- Product philosophy: extreme user listening and lowering feedback friction.
• [09:08–09:26] Single Slack feedback channel
- All feedback goes to one channel, reducing the sense of a “black hole.”
• [09:27–09:43] Quick fixes → sustained feedback firehose
- Boris batches fixes and replies item-by-item, sustaining positive feedback; the channel remains a “bursting firehose.”
• [09:44–10:22] Stay humble and user-oriented
- In a new AI domain, nobody “truly knows”; continuous listening is essential.
• [10:23–10:50] Current design: simple and hackable
- Goal: minimal and extensible. The earliest extension point is
CLAUDE.md
as persistent context 1 (Anthropic).
• [11:09–11:24] CLAUDE.md
location and version control
- Can live at repo root or subdirs; typically checked into the repo and evolves with it 1 (Anthropic).
• [11:24–12:00] Many more extension points
- Introduced a richer settings/permissions system, hooks, MCP, slash commands, subagents 17 (Anthropic).
• [12:01–12:23] Slash command example
- A custom “commit” command encodes how to write good commit messages and can pre-approve running
git commit
to avoid repeated confirmations 7 (Anthropic).
• [12:24–12:52] Agents vs. slash commands
- Think of agents as slash commands with branching context windows; two sides of the same coin. The SDK also applies to non-coding agents 8 (Anthropic, Anthropic).
• [12:53–13:08] Underlying model keeps improving
- More autonomy, better instruction-following and memory—all of which strengthen these extensions 11 (Anthropic, Anthropic).
• [13:09–13:31] Daily flow in 6–12 months
- Still some hand coding, but more often Claude manipulates text while you plan and review.
• [13:32–13:51] From “less hand writing” to model proactivity
- The model proactively proposes and completes changes; you curate.
• [13:52–14:35] 12–24 months: goals over tasks
- Agents focus less on small tasks, more on monthly/strategic objectives.
• [14:36–14:56] Moving up abstraction levels
- From “edit a file” → “submit a PR” → “make progress toward building the app.”
• [14:57–15:21] Back to the TI-83 spark
- Emphasizes the satisfaction of quick experiments and immediate feedback.
• [15:22–16:04] High past barriers vs. lower agentic barriers
- Traditional web stacks (React, Next.js, multiple build/deploy steps) are complex; agents make “have an idea → build it” much faster.
• [16:05–16:28] Code is rewritable; code is less “precious”
- Hand-coding can still be fun (e.g., weekend C++ for fun), but outcomes matter most.
• [16:29–17:33] Study advice
- Keep fundamentals (languages, compilers, runtimes, web architecture, system design) and be more creative: you can build product/startup ideas quickly now.
• [17:34–17:58] Best practice Q&A
- Alex asks for Claude Code tips from its creator.
• [17:59–18:18] Tip 1: start by asking
- Use it to explore the codebase first (e.g., how to add a logger? why is a function designed this way? scan Git history for rationale).
• [18:19–18:39] Research assistant first → then codegen
- Build the mental model of “agent as researcher” before letting it write code.
• [18:40–19:08] Tip 2: three task classes
- Easy: single-prompt tasks.
- Medium: Plan Mode first, then Auto-Accept 10 (ClaudeLog).
- Hard: human-led, Claude assists (research, prototypes, unit tests).
• [19:09–19:23] Easy in practice
• [19:24–19:40] Medium in practice
- In the terminal, switch to Plan mode (Shift+Tab), then Auto-Accept after plan alignment 10 (ClaudeLog).
• [19:41–19:58] Hard in practice
- Human drives; Claude does research, prototyping, tests; main implementation remains human-written.
• [19:59–20:15] Wrap-up
- Thanks exchanged; conversation ends.
References and Further Reading (grouped by topic)
- https://docs.anthropic.com/en/docs/claude-code/overview “Claude Code overview (official docs)”
- https://docs.anthropic.com/en/docs/mcp “Model Context Protocol (Anthropic docs)”; https://modelcontextprotocol.io/ “MCP official site”
- https://docs.anthropic.com/en/docs/claude-code/github-actions “Claude Code GitHub Actions (@claude to generate PRs/fixes)”
- https://www.swebench.com/ “SWE-bench (real OSS issue-fixing benchmark)”; https://github.com/SWE-bench/SWE-bench “SWE-bench GitHub”
- https://github.com/laude-institute/terminal-bench “Terminal-Bench (terminal agent eval)”; https://www.tbench.ai/news/announcement “T-Bench announcement”
- https://www.anthropic.com/news/claude-opus-4-1 “Claude Opus 4.1 release and capabilities”
- https://docs.anthropic.com/en/docs/claude-code/slash-commands “Claude Code custom slash commands”
- https://docs.anthropic.com/en/docs/claude-code/sub-agents “Claude Code Subagents”
- https://www.anthropic.com/engineering/claude-code-best-practices “Claude Code best practices (engineering blog)”
- https://www.claudelog.com/mechanics/auto-accept-permissions/ “Plan / Auto-Accept mode and one-click permissions (keyboard toggle)”
- https://docs.anthropic.com/en/release-notes/api “Anthropic release notes and latest models”
- https://en.wikibooks.org/wiki/How_to_Program_a_TI-83_Plus/Intro “TI-83 Plus TI-BASIC intro (Wikibooks)”
- https://docs.sentry.io/product/explore/logs/ “Sentry Logs (structured logs for debugging/observability)”
The above closely follows the provided transcript. External links annotate terms/mechanisms/benchmarks/integrations for cross-checking and further reading.