The Future of Agentic Coding with Claude Code

Sep 2, 2025, Video: The future of agentic coding with Claude Code (YouTube)


This English version closely follows the complete Chinese notes to preserve all information from the talk transcript and references.

• Personal Background and Opening

  • Alex shares an early programming memory: writing BASIC on a TI-83 Plus calculator in math class to store exam answers.
  • Alex: leads Claude Relations at Anthropic.
  • Guest Boris: an Anthropic engineer and the creator of Claude Code.
  • Theme: coding has changed dramatically over the past 12 months, especially with AI.

• One Year Ago: The State of Coding

  • Typical dev flow relied on IDE autocomplete and a simple chat assistant, with lots of copy-paste.
  • AI was a functional tool, not deeply integrated into the inner loop.
  • About a year ago the “agent” pattern emerged and began to enter the workflow.
  • Compared to hand-editing text, developers are increasingly relying on AI agents to write and modify code.

• Early Claude Code Attempts

  • The initial release used Sonnet 3.5 and was limited; Boris used it for about 10% of his own coding.
  • Even with early models and a primitive harness, internal trials showed value.
  • Rapid year of model progress: Sonnet 3.7, Claude 4.0, then Opus 4.1 with notably improved capabilities.
  • The harness (Claude Code itself) kept improving: context management, tool use, permissions, etc., enabling real development utility.

• Co-evolution of Models and Product

  • Everyone at Anthropic, including researchers, uses Claude Code daily.
  • Pain points from real usage directly inform model and product improvements.
  • Example: early models drifted during longer edit sessions; later versions stay on track much longer.
  • Improvements are driven by actual engineering work with Claude Code, not abstract benchmarks.

• Evaluation and Feedback Loop

  • Boris’s evaluation method: use the new model to do his real work for the day and judge the outcome.
  • Daily work spans new feature code, bug fixes, reading Slack, replying to GitHub issues—good coverage for testing capabilities.
  • Benchmarks like SWE-bench and T-Bench exist, but “vibes” (hands-on feel) are most decisive for product quality.
  • Key practice: a single internal Slack channel for feedback, with rapid response and fixes to sustain a positive loop.
  • This fast iteration maintains a steady stream of feedback that drives Claude Code’s evolution.

• Claude Code Today and Extensibility

  • Design goal: stay simple and hackable.
  • Earliest extension: a repository CLAUDE.md file to inject persistent context.
  • Then increasingly rich primitives:
    • More capable settings and permissioning.
    • Hooks to extend more phases of operation.
    • MCP (Model Context Protocol) as an extension point.
    • Slash commands and Subagents, user-customizable.
  • These make Claude Code useful beyond coding—a general agent SDK.

• Outlook (6–24 Months)

  • Work splits into two modes:
    • Some “hands-on coding,” increasingly having Claude modify text for you.
    • More tasks where Claude proposes and performs changes, and you accept or adjust.
  • Longer term, Claude moves from task execution toward goal completion (e.g., building an app end-to-end).
  • Engineers shift from “text editors” to “goal setters and reviewers.”

• Advice on Learning and Careers

  • From TI-83 days to the modern stack: barriers used to be high; agents lower them.
  • Agents re-focus effort on ideas and products, not incidental complexity.
  • Code is no longer scarce; rewrite freely.
  • Still master fundamentals: languages, compilers, runtimes, web systems, and system design.
  • Cultivate creativity—turn ideas into prototypes quickly, even startup ideas.

• Claude Code Tips and Best Practices

  • Tip for beginners: don’t start by having Claude write code—first use it to understand the codebase, ask questions, explore history; get comfortable with it as a research partner.

  • Treat work by complexity:

    • Easy: let Claude generate the change in one go (e.g., mention @claude on a GitHub issue to generate a PR).
    • Medium: use Plan Mode to align on steps, then Auto-Accept to run.
    • Hard: you drive; Claude assists with research, prototypes, and tests; humans write most of the final code.
  • Adjust usage style to task difficulty; avoid one-size-fits-all.

  • Closing: expect more autonomy, stronger tooling, and lower barriers. Claude Code’s mission is to be a true intelligent partner—not just write code.


Time-coded Highlights and Details

• [00:00–00:24] Opening and personal anecdote (TI-83 + BASIC)

  • Alex recalls programming a TI-83 Plus in BASIC to store exam answers, discovering the joy of hackability. Supplemental reading: TI-83/84 TI-BASIC quickstart and manual 12 (Wikibooks).

• [00:25–00:44] Guests and theme

  • Host Alex (Claude Relations at Anthropic); guest Boris (creator of Claude Code, Anthropic engineer). Theme: Claude Code and the future of software engineering; the past year has been exceptionally fast-moving.

• [00:45–01:01] Framing the retrospective

  • Alex asks Boris to summarize how coding has changed over the past year and where we are now.

• [01:02–01:24] Typical workflow a year ago

  • IDE autocomplete + chat app, heavy copy/paste; AI lived outside the inner loop.

• [01:25–01:47] Agents move into the inner loop

  • The standout shift: coding now increasingly uses agents rather than manual, character-level editing; from “press Tab” to “the model writes.”

• [01:48–02:20] From hand editing to model-driven edits

  • Transition to more “hands-off” work: specify goals to the agent; it performs large-scale edits and even scaffolds apps.

• [02:21–02:47] Why last year couldn’t do this

  • Two reasons: model capability limits; and immature scaffolding/harness (the orchestration layer above the model).

• [02:48–03:08] Very early Claude Code

  • Initial release still used Sonnet 3.5 (not the upgraded model); “usable but limited.” Boris used it on ~10% of his own code.

• [03:09–03:25] Early internal adoption

  • Day after release to core teams, engineers were already using it; even early, it delivered value.

• [03:26–03:40] Not great yet, still helpful

  • Both model and harness were rough but useful.

• [03:41–04:02] A year of progress in models and harness

  • Models: from 3.7, 4.0 to Opus 4.1, with agentic coding improvements; the harness (Claude Code) also advanced greatly 6 (Anthropic, Anthropic).
  • Key point: you can’t just “use the model”—you need a harness to direct it.

• [04:03–04:29] Horse and saddle analogy

  • Model as horse; engineers need a saddle/harness to guide effectively.

• [04:30–04:55] What the harness includes

  • The harness = Claude Code: system prompt, context management, tools, pluggable MCP servers, settings, permissions 1 (Anthropic).

• [04:56–05:19] Making the model “see” the full context

  • Harness feeds context and tools to the model; this dramatically affects performance. Over the past year, the team refined how to “build around the model.”

• [05:20–05:36] Why coevolution happened

  • Not just from training presets; it emerged naturally because everyone at Anthropic (including researchers) uses Claude Code daily.

• [05:37–05:54] Finding limits in daily use

  • Example: failures in string replacement indicate true model gaps and provide lessons for improvement.

• [05:55–06:12] Longer autonomous “run time”

  • Letting the model “run itself”: from short, drift-prone runs on 3.5 to much longer stable runs on newer models—achieved through repeated “correct → teach” loops in human-in-the-loop usage.

• [06:13–06:29] How to evaluate new models/features

  • Best evaluation: “I use it to do my real work today.”

• [06:30–06:52] Real work covers many capabilities

  • Write features, fix bugs, read Slack, reply to GitHub issues; more and more is possible. Via MCP, pull context, read messages, and use sources like Sentry logs to help debugging 2 (Anthropic, Sentry_docs).

• [06:53–07:10] Productized evals are hard

  • Attempts at product evals exist, but the most effective signal still comes from real usage.

• [07:11–07:34] Benchmarks exist, but “vibes” matter more

  • SWE-bench, T-Bench, etc., can’t capture engineering complexity; hands-on feel is a sharper signal 4 (SWE-bench, GitHub).

• [07:35–08:09] Frequent question: how to test prompts?

  • Claude Code relies on a tight in-use feedback loop, which is more immediate than fixed eval suites.

• [08:10–08:32] “We mostly go by vibes now”

  • Model performance on SWE-bench is already high; the community is seeking harder/newer evals (e.g., T-Bench), but fully covering engineering reality remains difficult 4 (SWE-bench, Terminal-Bench).

• [08:33–09:07] Why internal dogfooding works well

  • Product philosophy: extreme user listening and lowering feedback friction.

• [09:08–09:26] Single Slack feedback channel

  • All feedback goes to one channel, reducing the sense of a “black hole.”

• [09:27–09:43] Quick fixes → sustained feedback firehose

  • Boris batches fixes and replies item-by-item, sustaining positive feedback; the channel remains a “bursting firehose.”

• [09:44–10:22] Stay humble and user-oriented

  • In a new AI domain, nobody “truly knows”; continuous listening is essential.

• [10:23–10:50] Current design: simple and hackable

  • Goal: minimal and extensible. The earliest extension point is CLAUDE.md as persistent context 1 (Anthropic).

• [11:09–11:24] CLAUDE.md location and version control

  • Can live at repo root or subdirs; typically checked into the repo and evolves with it 1 (Anthropic).

• [11:24–12:00] Many more extension points

  • Introduced a richer settings/permissions system, hooks, MCP, slash commands, subagents 17 (Anthropic).

• [12:01–12:23] Slash command example

  • A custom “commit” command encodes how to write good commit messages and can pre-approve running git commit to avoid repeated confirmations 7 (Anthropic).

• [12:24–12:52] Agents vs. slash commands

  • Think of agents as slash commands with branching context windows; two sides of the same coin. The SDK also applies to non-coding agents 8 (Anthropic, Anthropic).

• [12:53–13:08] Underlying model keeps improving

  • More autonomy, better instruction-following and memory—all of which strengthen these extensions 11 (Anthropic, Anthropic).

• [13:09–13:31] Daily flow in 6–12 months

  • Still some hand coding, but more often Claude manipulates text while you plan and review.

• [13:32–13:51] From “less hand writing” to model proactivity

  • The model proactively proposes and completes changes; you curate.

• [13:52–14:35] 12–24 months: goals over tasks

  • Agents focus less on small tasks, more on monthly/strategic objectives.

• [14:36–14:56] Moving up abstraction levels

  • From “edit a file” → “submit a PR” → “make progress toward building the app.”

• [14:57–15:21] Back to the TI-83 spark

  • Emphasizes the satisfaction of quick experiments and immediate feedback.

• [15:22–16:04] High past barriers vs. lower agentic barriers

  • Traditional web stacks (React, Next.js, multiple build/deploy steps) are complex; agents make “have an idea → build it” much faster.

• [16:05–16:28] Code is rewritable; code is less “precious”

  • Hand-coding can still be fun (e.g., weekend C++ for fun), but outcomes matter most.

• [16:29–17:33] Study advice

  • Keep fundamentals (languages, compilers, runtimes, web architecture, system design) and be more creative: you can build product/startup ideas quickly now.

• [17:34–17:58] Best practice Q&A

  • Alex asks for Claude Code tips from its creator.

• [17:59–18:18] Tip 1: start by asking

  • Use it to explore the codebase first (e.g., how to add a logger? why is a function designed this way? scan Git history for rationale).

• [18:19–18:39] Research assistant first → then codegen

  • Build the mental model of “agent as researcher” before letting it write code.

• [18:40–19:08] Tip 2: three task classes

  • Easy: single-prompt tasks.
  • Medium: Plan Mode first, then Auto-Accept 10 (ClaudeLog).
  • Hard: human-led, Claude assists (research, prototypes, unit tests).

• [19:09–19:23] Easy in practice

  • Mention @claude on GitHub issues/PRs to generate PRs—no terminal required 3 (Anthropic).

• [19:24–19:40] Medium in practice

  • In the terminal, switch to Plan mode (Shift+Tab), then Auto-Accept after plan alignment 10 (ClaudeLog).

• [19:41–19:58] Hard in practice

  • Human drives; Claude does research, prototyping, tests; main implementation remains human-written.

• [19:59–20:15] Wrap-up

  • Thanks exchanged; conversation ends.

References and Further Reading (grouped by topic)

The above closely follows the provided transcript. External links annotate terms/mechanisms/benchmarks/integrations for cross-checking and further reading.