Code w/ Claude 2025: A Compilation of Notes
Prompting for Agents | Code w/ Claude
Speakers: Hannah (Applied AI) and Jeremy (Applied AI, Product Engineer).
What is an Agent (Hannah)
- Anthropic’s definition: “models using tools in a loop”.
- How it works: Given a task → the agent autonomously chooses to call tools → updates its decisions based on tool feedback → continues iterating until completion.
- Three elements: Environment (available tools and external systems), Tools (executable capabilities), and System Prompt (tells the agent its goal/role).
- Design principle: The simpler, the better; let the model work on its own with a clear objective and tools.
When (Not) to Use an Agent: A Checklist (Hannah)
- Task Complexity: If a human can provide “repeatable steps,” it’s more like a workflow/script; you don’t need an agent. Agents are better suited for when the path is unclear and requires exploration.
- Value Density: High-leverage/high-value tasks (e.g., those that can generate revenue or significantly improve user experience) are more worthy of an agent; low-value tasks can use simple workflows.
- Feasibility: Can you provide the agent with the necessary tools and data to complete the task? If critical tools/permissions are missing, you should narrow the scope or use another method.
- Cost of Error / Correctability: Errors that are hard to detect or costly → add a human in the loop; errors that are easy to spot and roll back → you can delegate more authority.
Typical Use Case Examples (Hannah)
- Coding: From design docs to PRs, the path and iterations are uncertain, high-value, and very suitable for an agent.
- Search/Research: Errors can be corrected with citations and reviews; mistakes are recoverable.
- Computer Use: Multiple attempts can be rolled back, suitable for trial-and-error clicks and process exploration.
- Data Analysis: The goal is clear, but the data shape/quality is unknown, and the steps need exploration, making it suitable for an agent.
Core Concepts for Prompting Agents (Jeremy)
Think Like an Agent: Build a mental model of the agent in its environment of tools and return values; use “human-understandable” tool descriptions/patterns to simulate the agent’s situation.
Provide Pragmatic Heuristics:
- Examples: Be cautious with irreversible operations; stop once you find the answer; set budgets/quotas (simple queries ≤ 5 tool calls; complex ones can go up to 10–15).
- Write down “common sense” to prevent the model from being overly proactive/exploratory.
Tool Selection Strategy is Key:
- Sonnet 4 / Opus 4 can handle a large number of tools simultaneously (~100+), but you must clarify “which types of tools to prioritize when” (e.g., checking Slack first within a company).
- Avoid just giving a “short description + a bunch of tool names”; clarify usage scenarios and priorities.
Guide the Thinking Process (not just turning on “extended thinking”):
- In the first thought block, require it to: assess problem complexity, plan the number of tool calls, list information sources, and define success criteria.
- Use “interleaved thinking”: After getting a result, reflect on its quality and decide whether to validate/supplement/add a disclaimer.
Unpredictability and Side Effects of Agents:
- A prompt like “keep searching until you find the optimal source” → could lead to an endless loop/context window overflow; you need to specify stopping conditions and a “good enough” standard.
Context Window Management:
- Claude 4’s context is ~200k tokens; long tasks will hit the limit.
- Compaction: At around 190k, automatic compression and summarization is triggered, migrating key context to a new session, which can usually provide near-infinite continuation (with occasional detail loss).
- External File Memory: Write key state to a file, acting as “external memory” to extend the context.
- Sub-agents: Let sub-agents handle re-retrieval/re-processing and pass a compressed result back to the main agent, saving context in the main session.
Let Claude be Claude:
- First, try a minimalist prompt + a few tools, and only add instructions and examples after seeing real flaws; don’t hard-code the process too early.
Tool Design (Jeremy)
- Characteristics of a Good Tool: Simple and accurate name; complete description (if a human can read and use it, it’s good); tested and working.
- Avoid homogenized/identically named tools (e.g., 6 similar searchers); merge similar capabilities into a single tool.
- Let the agent form a complete tool-think-act loop, from database query → reflection → invoice generation → sending an email, etc.
Live Demo (Jeremy)
Demonstrated a research agent prompt in the Console (about a thousand tokens):
- Required pre-planning, parallel tool calls (parallel multi-path search), alternating thinking and tools, quality standards/source requirements, and a tool budget.
Example question: “How many bananas can fit in a Rivian R1S?”
- The model needs to look up specs/banana dimensions online → parallel retrieval → unit conversion/volume estimation → arrive at an order of magnitude (common range 30k–50k, one demo got ~48k).
- The process showed alternating thought blocks and tool calls, and reflection/re-checking/calculation on the results.
Evaluation (Evals) Methodology (Jeremy)
Start Small: When the effect is significant, even a small sample can provide a strong signal; don’t start with a “fully automated large-scale eval of hundreds of samples,” run a few real use cases first.
Stay Close to Real Tasks: The evaluation content should be consistent with production tasks (use real engineering tasks for coding, not competition problems).
LLM-as-a-Judge + Clear Scoring Rubric:
- Example: For a search task, provide criteria like source quality/answer range (e.g., the number of bananas for an R1S should be within a reasonable range).
- More robust than “string matching.”
Tool Use Accuracy: Programmatically check if the required tools were called/the right number of times (e.g., “booking a flight” must call the “search flights” tool).
Final State Evaluation (mentioned open-source benchmarks like tobench/“Towen”): Verify if the correct final state was reached (DB row correctly updated, file modified, PR meets standards, etc.).
Human Evaluation is Irreplaceable: Read the dialogue/thought blocks/call traces to understand the system’s true behavior and the root cause of problems.
Q&A Highlights
Process for Building an Agent Prompt: Start small and simple → test run → collect failure/edge cases → gradually add instructions and examples to improve the pass rate.
The Role of Few-shot/CoT in Agents:
- Frontier models already “know how to think,” so you don’t need to force “chain-of-thought” commands.
- A few non-rigid examples can be used, but avoid over-specifying the process to not constrain the model; it’s better to specify the thinking/planning method and points of attention (e.g., list the plan/budget/success criteria in the thought block).
Parallelism and Collaboration: You can run multiple instances in parallel; if necessary, use a shared Markdown/ticket file to pass state between agents.
Reusable Prompt Points Summary (Extracted from the talk)
- Role/Goal: A single sentence clarifying the task and definition of success.
- Tool Strategy: List tools and their priority/typical trigger scenarios; set a call budget and stopping conditions.
- Thinking Guidance: The first thought block should plan first (complexity, number of calls, sources, verification methods, success criteria).
- Risk Control: Irreversible operations need confirmation; stop once the answer is found; enable compression/external memory when the context limit is approached.
- Output Contract: Require a parsable structure (JSON/tagged block), and if necessary, attach evidence/citations or a disclaimer.
- Evaluation Loop: Small sample of real tasks + LLM-as-a-judge (with a rubric) + tool use and final state validation; gradually expand coverage.
Prompting 101 | Code w/ Claude
Video Information and Gist
- Speakers: Hannah (Applied AI Team) and Christian (Applied AI Team).
- Objective: To demonstrate how to iteratively build high-quality prompts and best practices using a real, adapted business scenario.
Scenario (Car Insurance Claim Example)
- Role: A claims scenario for a car insurance company in Sweden.
- Input: ① A car accident report form (in Swedish, with 17 checkboxes, left and right columns for Vehicle A / Vehicle B respectively); ② A hand-drawn sketch of the accident.
- Task: Extract objective facts from the form + sketch and determine liability (who is primarily at fault).
First Direct Attempt (V0) and Lessons Learned
- Threw both documents at the model with a very crude prompt.
- Result: The model mistook it for a “skiing accident” and misunderstood place names (because the task and context were not set).
- Conclusion: Prompt engineering is an iterative + empirical process; you need to gradually complete the scenario, boundaries, and constraints.
Recommended Overall Prompt Structure (for single-shot / API scenarios)
- Task / Role and Goal: Tell the model “who you are, what to do, and what the success criteria are.”
- Content / Dynamic Input: In this case, the form image + sketch (could also come from an external system).
- Instructions / Detailed Steps: What to do first, what to do next, how to reason, when to stop.
- Examples: If necessary, provide positive/negative examples/tricky cases and the expected output.
- Reminders: Prevent hallucinations, only output when confident, cite evidence, etc.
- Note: The actual demo distributes these modules between the system prompt and user prompt, distinguished by stability and reusability.
Version Iteration and Key Improvements
V1 → V2: Adding Task and “Tone/Confidence”
- Clarified: This is a car insurance claim; the image is hand-drawn; don’t guess if you can’t see clearly; only draw conclusions when very confident.
- Settings: temperature = 0; max tokens is large (to ensure reproducibility and no truncation).
V3: Moving “Stable Context” to the System Prompt
- Fixed information (the form’s structure: title, meaning of left/right columns, what each of the 17 rows represents; common non-standard filling methods like circling/smudging/not using “×”) is cached/reused long-term.
- Value: The model doesn’t have to guess the form structure every time, reducing irrelevant narration and getting to fact extraction faster.
Structured Tags and Organization
- Use XML-style tags (or Markdown sections) to clearly define blocks: e.g.,
<analysis_form>…</analysis_form>
,<sketch_analysis>…</sketch_analysis>
,<final_verdict>…</final_verdict>
, which facilitates citation and alignment.
- Use XML-style tags (or Markdown sections) to clearly define blocks: e.g.,
Order of Operations (Crucial)
- First, read the form item by item to confirm which boxes are checked and record objective facts → Then, read the sketch to understand the process in conjunction with the facts.
- Purpose: To avoid the misleading interpretations and hallucinations that can result from “looking at the sketch first.”
Evidence and Confidence Constraints
- The output judgment must cite evidence (e.g., “Vehicle B turned right—because item X was explicitly checked”).
- If unsure, refuse to answer/state uncertainty, do not speculate.
V4: Output Contract and Implementability
- Define a final output template (e.g., wrap only the “final verdict” in
<final_verdict>
), with other analysis available for debugging but croppable in production. - Can be combined with pre-filling techniques: require the output to start with a specific XML snippet / JSON structure for machine-parsable results and downstream database writing.
- Define a final output template (e.g., wrap only the “final verdict” in
Examples (Few-shot) and “Tricky Case Library”
- Accumulate edge/gray cases in the system prompt; if necessary, embed image base64 in examples with textual descriptions to help the model learn to deconstruct/compare.
Conversation/History (Optional)
- For user-oriented long-conversation products: you can summarize relevant conversation history in the system prompt to enrich the context (this demo was for backend processing and didn’t use it).
Observed Behavioral Changes in the Demo
- After adding the system context, the model no longer explained the meaning of the form itself, but focused on the checked facts and their correspondence with the sketch.
- By following the “form first, then sketch” order, the model could make a more confident judgment: the example clearly concluded that Vehicle B was at fault.
- If the instructions required “checking each checkbox item by item,” the model would explicitly show its process (useful for development and debugging; can be simplified for production).
Key Best Practices (Generalized Summary)
- Put stable knowledge in the System prompt, and variable data in the User prompt; reduce the overhead of “re-explaining every time.”
- Order and structure (XML/block tags) can significantly reduce hallucinations and improve maintainability and reproducibility.
- Provide stopping conditions and confidence rules to avoid “making up answers.”
- Require citation of evidence to form an auditable chain of reasoning.
- Pre-filling/output format contracts make the result directly machine-readable (JSON/XML), facilitating integration with databases/workflows.
Extended Capabilities and Tools
- Prompt Caching: Enable caching for long-term, unchanging system context (like the form structure) to save costs and improve performance.
- Extended Thinking: Can be used as a handle to analyze “how the model is thinking”; from the thought process, you can then solidify effective steps into the prompt (which saves tokens and makes the result more stable).
MCP 201 — The Power of the Protocol | Code w/ Claude
Speaker: David Soria Parra (Anthropic, one of the co-creators of MCP).
Objective: To systematically explain the capability boundaries of the MCP protocol, not just limited to the familiar tools (tool use), but also including other primitives, and to show how to build richer interactions and the roadmap (Web, authorization, scalability, etc.).
Opening and Gist
- Currently, most people’s use of MCP is focused on tool use; this session will showcase all available primitives of the protocol and more advanced interaction patterns.
- Structure: First, talk about the primitives an MCP server can expose to a client → then, lesser-known capabilities → how to build richer interactions → a look at MCP on the Web and near/mid-term plans.
MCP Primitive 1: Prompts — “User-Driven”
Definition: A reusable interaction template (a piece of text/prompt) exposed by the MCP server, which the user can directly inject into the context to guide the model.
Value:
- The server author provides best-use examples, letting users know “how to use this server most effectively.”
- Dynamic: The underlying prompt can execute code (in the MCP server), allowing for richer parameterized logic.
Demo: In an editor, call a Prompt to pull GitHub PR comments into the context, making it easier for the model to then modify the code based on those comments.
Difference from tools: Whether the user decides to add it to the context (Prompts) vs. the model deciding when to call it (Tools).
Prompt Completion: Prompts can define parameters and auto-completion (e.g., popping up a list of PRs to choose from), which can be implemented with just a few lines of code in TS (a function that generates the Prompt + a completion function).
MCP Primitive 2: Resources — “Application-Driven”
Definition: Raw data/content (files, schemas, documents, knowledge, etc.) exposed by the server to the client.
Usage:
- The client can directly add the resource content to the context.
- It’s also possible to build a vector index/do RAG on the resources, with the application selecting the most relevant content to inject (currently still an area for more exploration).
Demo: Expose a database schema (in the example, apostra) as a Resource, the client treats it like a “file” to add to the context, and asks Claude to draw a visual relationship diagram.
MCP Primitive 3: Tools — “Model-Driven”
- Definition: Executable actions exposed by the server; the model decides when to call them during reasoning.
- Experience: The moment the model successfully calls a tool for the first time and produces an external effect (like querying a DB/modifying data) is a “magical” moment.
The “Interaction Model” and Division of Labor of the Three Primitives
- Prompts: User-driven (slash commands, Add commands, etc.), the user decides what to put into the context.
- Resources: Application/client-driven (retrieval, indexing, selective injection).
- Tools: Model-driven (autonomous calls during reasoning).
- The same piece of data can be exposed in all three ways, depending on when and by whom you want it introduced into the interaction → this helps create a more nuanced product experience, rather than just “waiting for the model to call a tool.”
Beyond the Basics: Richer Interaction Patterns
Example Problem: “Make an MCP server that can summarize discussions from an Issue Tracker.”
- The server needs to be able to pull discussion data (easy).
- It also needs to be able to summarize it (which requires a model)—this raises the question of attribution for the model call.
Solution A (Not Ideal): The server comes with its own SDK to call a model.
- Problem: The model chosen by the client is unknown; it also requires an extra API Key, which is awkward for security/cost/privacy.
Solution B: The Sampling Primitive
Meaning: The server requests a “completion” from the client (letting the client compute it with its already configured model).
Benefits:
- Security/privacy/cost are all controlled by the client (using their existing subscription/key).
- Sampling requests can be bubbled up through a chain of multiple servers, while still maintaining unified control by the client.
- Supports building more complex MCP agent chaining patterns.
Status: Very exciting but has the least client support; official first-party products will support it within the year.
The Roots Primitive
- Function: The server queries the client for the workspace root/open project (e.g., the currently open project in VS Code).
- Scenario: Making a Git server that needs to know in which directories to execute commands.
- Status: Supported in VS Code; the name is “not great” (author’s self-deprecation).
Primitives Summary: 5 primitives, two-sided ownership
- Server-side: Prompts / Resources / Tools
- Client-side: Sampling / Roots
End-to-End Example: Making an MCP Server for a Chat App (Slack/Discord)
- Prompts: Provide templates like “summarize this discussion” or “what’s new since yesterday” (with parameter completion: recent threads, user lists…).
- Resources: List channels, expose recent threads for the client to index/retrieve.
- Tools: Search, read channels/threads.
- Sampling: Initiate a summary for a thread (executed by the client’s completion). → Combine primitives to construct an experience far beyond “simple tool use”.
Bringing MCP to the Web (From Local to Public)
Background: ~10,000 MCP servers have been built by the community in 6–7 months, mostly for local experiences.
Direction: MCP Server = a website (client connects directly to a web endpoint), no longer a local executable/Docker.
Two Prerequisites: Authorization and Scaling.
- The latest revision of the spec has incorporated feedback from the community and industry partners to improve these two areas.
Authorization: OAuth 2.1
Purpose: To securely provide a user’s private context/account resources to an LLM application; to bind server capabilities to a specific user identity.
Specification: OAuth 2.1 (essentially a collection of security best practices for 2.0; if you’ve done 2.0, you’re basically doing what 2.1 requires).
Two Common Deployment Models:
Public/Web Server: e.g., a payment provider mcp.payment.com
- The client goes through OAuth on connection; the user logs into a trusted party (the payment provider), no need for a strange local image.
- The provider can update the server at any time, without the user needing to pull a new image.
- Live Demo: Using Claude AI Integrations to connect to a remote endpoint and go through OAuth, getting tools bound to the user’s data.
Corporate Intranet:
- Deployed on the company’s internal network; use an IdP like Azure AD / Okta for SSO.
- After an employee logs in in the morning, any MCP server they use automatically has their identity.
- Facilitates a division of labor and scale between a platform team (platform/auth/ops) and integration teams (business tools).
Scaling: Streamable HTTP
Goal: To make MCP horizontally scalable like a regular API.
Form:
- Simple scenarios: Return JSON directly (like REST) → connect and disconnect, easy to scale (e.g., stateless functions/Lambda).
- Complex scenarios: Open a streaming channel (for Sampling/notifications/multi-part results), allowing multiple events/messages to be sent to the client before the final return.
Authorization + Scalability = The foundation for taking MCP from local to the Web.
What’s Next (Roadmap / Near-Mid Term)
Asynchronous tasks: Support for long-running (minutes→hours) agents/tasks.
Elicitation: The server proactively asks the user for input (this capability is already in the protocol and will land “today or Monday”).
Official Registry: A central place to discover/publish MCP servers; will allow agents to dynamically download/install/use them.
Multimodality: Better streaming results and more modality support (details TBD).
Ecosystem Updates:
- Ruby SDK (donated by Shopify, will be merged in a few weeks).
- Go SDK (being worked on by the Google Go team, an official implementation).
Conclusion
- This session aimed to get everyone to use the full set of primitives to build “user-driven + application-driven + model-driven” three-way interactions, and to look forward to a Web-based MCP.
- No Q&A due to time constraints; available for discussion afterward.
Claude Code best practices | Code w/ Claude
Speaker and Background
- Speaker’s intro: Joined Anthropic’s Applied AI/Product team, summarizing methods for using Claude Code from extensive frontline practice.
- Personal experience: Initially used Claude Code to build a note-taking app over a weekend and was amazed by its efficiency; later joined the team, focusing on system prompts, tool descriptions, evaluation schemes, etc.
Mental Model for Claude Code (like a “colleague who is all-in on the terminal”)
- Analogy: Claude Code is like a colleague who is an expert in Bash/Vim/various CLIs, always ready to help you debug and modify code in the terminal.
Architecture and How It Works (“Doing simple and effective things”)
- Pure agent: Give the model a set of powerful tools and a few instructions, and let it call the tools in a loop until the “task is done.”
- Tool Layer: Create/edit files, execute terminal commands, call external capabilities connected via MCP.
- How It Understands a Codebase: No full indexing/Embedding/RAG; instead, it performs Agentic Search (using
grep
/find
/glob
to progressively explore → read results → continue searching), learning a project just like a new colleague. - Thin UI and Permissions Layer: Real-time visualization of “thinking/calls,” with permission prompts for “write/execute” type dangerous operations, ensuring a human is in the loop to intervene.
- Security and Multi-Cloud Availability: Can connect directly to Anthropic or use the model through providers like AWS/GCP.
Typical Use Cases and Value
Codebase Exploration / Onboarding: Locating where features are implemented, combing through Git history, summarizing recent change stories.
“Thinking Partner”: Let Claude first search and evaluate implementation options, without making changes yet, and present 2–3 options before deciding.
Writing Code:
- 0→1: Quickly build an application/prototype from an empty directory.
- Working in an existing codebase (team’s focus): Extremely convenient for adding tests, thus achieving higher unit test coverage; automatically writes Commit/PR descriptions when done.
Deployment and Automation: Integrate into CI/CD, GitHub, and other pipelines via Headless/SDK.
Support and Scale: Faster localization of production errors; makes large-scale migrations/refactorings (e.g., old Java→new version, PHP→React/Angular) more feasible.
Terminal Scenarios: Proficient with CLIs like Git/Docker/BigQuery; can even “hand over” complex rebases to Claude.
Best Practice 1:
CLAUDE.md
(Core Mechanism)- Function: Automatically injected into the context on startup, carrying “key instructions for working on this repository.”
- Placement: Project root directory (for team sharing); you can also put a “global CLAUDE.md” in the user’s home directory.
- Suggested Content: How to run tests, project structure overview/module descriptions, code style/commit conventions, internal tool instructions, etc.; can be accumulated over time.
Best Practice 2: Permissions Management
Read operations are allowed by default; actions with “side effects” like writing files/running Bash require confirmation.
Speed-up tricks:
- Auto-accept (e.g., Shift+Tab to let it work continuously);
- Always allow common commands (like
npm run test
) in the settings; - Permissions can also be configured at the session/project level.
Best Practice 3: Choosing an Integration Method (CLI vs. MCP)
- For services with an official/mature CLI tool (like GitHub’s
gh
), prioritize installing the CLI for Claude to call, as it’s usually more stable and well-documented; - Self-developed or custom capabilities can be exposed via an MCP server; relevant instructions can be written in
CLAUDE.md
.
- For services with an official/mature CLI tool (like GitHub’s
Best Practice 4: Context Management
The model supports a very long context (200k-level tokens), but long sessions can still “fill up.”
Two key commands:
/clear
: Clears the history and starts over fromCLAUDE.md
;/compact
: Has Claude first summarize the current session (like a handover to the “next developer”), then continues with the summary to maintain continuity.
Best Practice 5: Efficient Workflows
- Plan then execute: Have Claude first search + provide a plan, you review the plan, and then it starts work.
- To-do list: Large tasks are automatically broken down into a to-do list; you can interrupt at any time with Esc to modify the list and correct its course.
- “Smart vibe coding”: Small, fast steps + TDD; frequently run tests/type checks/linting; commit often for easy rollbacks.
- Multimodal debugging/implementation: Paste screenshots/diagrams for Claude to reference when implementing UI/fixing bugs.
Best Practice 6: Advanced Techniques
- Parallel instances: Run 2–4 Claude instances in parallel to make progress; use a terminal multiplexer/multiple tabs to collaborate.
- The magic of Esc: Interrupt and interject at any time during work; double-press Esc to “rewind the conversation/reset tool extensions.”
- Tool Extension & Headless: Bring in an MCP server when Bash/built-in tools aren’t enough; use it programmatically in Headless/CI scenarios.
Latest Features and Advice on “Keeping Up”
/model
&/config
: View/switch the current model version at any time.- “Think hard/Extended thinking”: Starting with Claude 4, it can engage in longer “thinking” between tool calls, which can be enabled for complex debugging.
- IDE Integration: VS Code / JetBrains plugins enhance the experience with “current file awareness,” etc.
- Follow updates: Recommended to follow the Claude Code GitHub project/docs and changelog, and get into a weekly review rhythm.
Q&A Highlights (Live Q&A)
- Multiple
CLAUDE.md
files: Multiple files in the same directory are not supported; a relevantCLAUDE.md
in a subdirectory will be read when found during a search; by default, only the one in the working directory is loaded; you can reference other files (@…
) inCLAUDE.md
to combine common instructions. - “Disobedient” comments issue: Older models (like 3.7) were prone to rewriting “unnecessary comments”; this has been significantly improved in Claude 4’s “instruction following,” so upgrading and organizing
CLAUDE.md
is recommended. - Multi-agent parallelism/context sharing: The current focus is on a “single, capable coding agent”; if multi-agent collaboration is needed, use a shared Markdown/ticket file to pass state between sessions. More native support may be considered in the future.
- Multiple
Further Reading (Text version of best practices)
- Anthropic’s official engineering blog post “Claude Code: Best practices for agentic coding” (a systematic summary of practices for CLAUDE.md, permissions, plan→execute, CI/Headless, etc.).
Building headless automation with Claude Code | Code w/ Claude
Speaker: Sedara (Engineer, Claude Code Team)
What is the Claude Code SDK (Positioning and Value)
Programmatically call Claude Code’s agent capabilities in headless mode
A new foundational building block/primitive that enables many previously impossible automation applications
Typical Uses:
- Use it like a Unix tool: Can be integrated anywhere you can run Bash/a terminal, participating in pipes (pipe in / pipe out) and composition
- For CI automation: Ask Claude to review code or even write custom linters
- Build your own chat/assistant application (with Claude Code as the brain)
- Let Claude write code in a remote/isolated environment
Language Bindings: Python / TypeScript SDKs (bindings) will be available soon
SDK Basic Usage (Commands and Examples)
- Direct call:
claude -p "Write me a function to calulate the fibonacci series" --allowedTools "Write"
(semantic example) - Authorize writes: Pre-authorize the “write file” tool via parameters (example mentioned allow tools write) to save to the filesystem
- Pipe example:
cat app.log | claude -p "Summarize the most common error logs"
to have Claude summarize log failures - Parse system command output: Hand over the results of
ifconfig
, etc., to Claude for explanation - Structured output: Use
--output-format json
or--output-format stream-json
to return parsable JSON, making it easy for downstream programs to consume
- Direct call:
GitHub Action Live Demo (Built on the SDK)
Used an open-source Quiz App repository for the demo (showcased functionality by running
npm start
locally)Handled two Issues:
- Power-ups: Add “50/50 elimination” and “free skip”, which should be toggleable on a settings page, with skips not deducting points
- Per-question timer: Add a separate countdown timer for each question
Trigger method: @claude in an Issue comment (e.g., “please implement this feature”)
Action execution process was visible:
- A bot comment appeared under the Issue saying work has started, with a link to the Action run
- The logs printed the SDK’s JSON output; a To-Do list was automatically generated and executed
Also demonstrated appending a commit to an existing PR: Changed a PR for “background color to blue” to green (added a new commit, modifying all relevant definitions)
Result verification:
- The Action automatically created a branch/PR to implement the Power-ups; switched to that branch locally and started the app
- A Power-ups section (with checkboxes) appeared on the settings page; 50/50 and Skip buttons appeared on the quiz screen
- Clicked 50/50 and Skip live, and the features worked as expected (acknowledged there was room for improvement, e.g., marking which questions had used a Power-up)
SDK Advanced Features (Finer-grained capabilities)
Permissions and Security (conservative by default):
- No edit/destructive permissions by default; pre-authorize needed capabilities via allowed tools
- Common authorization combinations:
bash
(e.g.,npm run build
,npm test
),write
(write file); if connected to MCP, can whitelist specific MCP tools
Structured Output Modes:
json
: Returns the complete JSON at oncestream-json
: Streams events/fragments as they are produced- Downstream can parse/orchestrate based on this to build application logic and UI
Custom System Prompt: e.g.,
--system-prompt "talk like a pirate"
(example emphasized playfulness)Session Continuation and User Interaction:
- In structured output mode, a session_id is returned; saving this allows continuing in the same context, supporting the integration of subsequent user feedback into the same session
Real-time Permission Prompting (permission-prompt tool):
- If you don’t want to pre-enumerate authorized tools, you can delegate the permission decision to an MCP server, which can prompt the user at runtime to ask if they want to allow a certain tool call (new feature, feedback welcome)
GitHub Action Capabilities Recap
- Read code, create a PR from an Issue, append commits to an existing PR/branch
- Answer questions/explain code, review code
- Runs on existing GitHub Runners, zero self-hosted infrastructure
Layered Implementation of the Action (Open Source)
- Bottom layer: SDK
- Middle layer: Base Action (communicates with Claude Code and sends back results)
- Top layer: PR Action (responsible for interactions on the PR like commenting/rendering To-Dos/attaching links)
- Both the Base and PR Actions are open source, so you can reference the source code to extend your own custom workflows
How to Install the GitHub Action (Live tutorial)
- In the target repository directory, open the Claude Code terminal and run:
/install github action
- Follow the prompts to complete the configuration and automatically commit a YAML PR; merge it and configure the API Key to use it
- For Bedrock / Vertex users: the process is slightly different and more manual, refer to the corresponding documentation
- In the target repository directory, open the Claude Code terminal and run:
Best Practices/Tips (Suggestions throughout the talk)
- Pre-authorize necessary commands in CI to ensure Claude can build/test/validate before continuing to write code
- Prioritize using structured output to interface with your scripts and UI
- Use the To-Do list and append commits to achieve small, fast, and revertible steps
- The experience can still be improved beyond the demo (e.g., marking used Power-ups, more visualizations, etc.)
Resources and Feedback
- The talk listed: cloud open-source repository links (Base Action / PR Action), the public Claude Code repository (welcome to file Issues/feedback)
- Encouraged everyone to combine the SDK and Action with their own use cases and continue to extend them
Wrap-up
- Goal achieved: Showcased the SDK as a “headless” primitive + a GitHub Action practice, demonstrating how to embed Claude Code into pipelines and CI and get “write/test/modify/review” automation with minimal operational cost
- Thanks and well wishes (closing remarks)