Skip to content

Writing Claude.md – Making Claude 4.x Behave Like an Engineering System

The traditional LLM usage pattern is Prompt → Output. Claude 4.x operates closer to Workflow → Tooling → State → Verification → Iteration.

Claude.md is not a prompt—it's a Work Contract. It defines how the model should plan, execute, collaborate, maintain state, and verify outcomes.

Below are practical observations and the official Claude 4.x best practices, with examples, pitfalls, and prompt patterns. The notes come from sustained use of Claude Code (Anthropic's coding CLI built on Sonnet 4.5 and Opus 4.5) on the OpenClaw project across 2025–2026.


1. Claude.md: From Prompt to Workflow

Claude 4.x shows significant improvements in:

  • Long-horizon reasoning
  • Multi-step planning
  • State tracking
  • Context awareness
  • Subagent orchestration
  • Verification via tests
  • Tool-assisted execution

This means:

Without a work contract, the model uses its own strategies With a contract, the model follows your defined process

Claude.md is that contract.


2. Claude.md Structure

Recommended skeleton:

PURPOSE
WORKFLOW
STATE MANAGEMENT
TESTING / VERIFICATION
TOOL POLICY
SUBAGENT CONTRACTS (optional)
COMPLETION CRITERIA
RISK / FAILURE MODES

Not every section is required, but this structure produces consistent behavior.


3. Workflow Design and Multi-Window Strategy

Claude doesn't work like traditional ChatGPT-style "generate everything at once." It breaks tasks into phases (windows):

Typical pattern:

  1. Build architecture or plan (Plan)
  2. Tests as correctness oracle (Tests)
  3. Minimal implementation
  4. Verification
  5. Refactor + Docs
  6. Commit state

Key characteristics:

  • Tests are immutable
  • Tests serve as correctness oracle
  • State externalized (preferably in filesystem / git)
  • Reset window beats stuffing summaries
  • External memory > Token memory

4. State Management

Official best practices provide clear direction:

  • Structured state (JSON / YAML)
  • Notes (Markdown)
  • Git for timeline & diff
  • Immutable tests
  • Externalize progress

Claude uses these states to recover context without relying on token memory, avoiding hallucination. For runtime checks beyond unit tests, complement this with verification through tracing in production.


5. Subagents and Tooling

Claude can autonomously determine:

  • What needs delegating
  • What needs parallel calls
  • What suits Explorer vs Planner vs Executor

With a subagent contract:

Explorer: read files, summarize structure
Planner: create task breakdown
Coder: apply changes
Tester: verify via tests

Collaboration is far more stable than monolithic prompting.


6. Claude.md Example (Engineering Practice)

A simplified example showing how to influence behavior:

WORKFLOW
1. Build architectural plan before coding.
2. Write tests as correctness oracle. Tests are immutable.
3. Implement minimal version.
4. Verify using tests.
5. Refactor for readability + maintainability.
6. Produce documentation on design choices.

STATE
- Use JSON for machine-readable state.
- Use Markdown for progress notes.
- Use git for diff + timeline.

COMPLETION
- All tests pass.
- Code readable.
- No regressions.
- States committed.

Even this short section will influence Claude's behavior.


7. Common Failure Modes and Mitigation Strategies

Practical observations of where Claude struggles, with prompt patterns to address them.


(1) Over-optimizing to Pass Tests

Symptoms:

  • Claude hacks tests
  • Uses helper scripts or bypass tools
  • Hard-codes edge cases
  • Over-optimizes correctness at the cost of maintainability

Suggested prompt:

Do not optimize for passing tests. Optimize for correctness, maintainability, and readability.
Use standard tools. Avoid custom helper scripts unless explicitly requested.

(2) Over-engineering

Opus is particularly prone to:

  • Excessive abstraction
  • Too many interfaces
  • Introducing 3 helper modules for a trivial feature

Prompt:

Prefer simplicity. Avoid unnecessary abstractions.
Optimize for clarity over flexibility.

(3) Answering Without Reading Code

Claude sometimes skips the reading step but answers confidently.

Prompt:

Before answering, read the relevant code files using the explorer.
Summarize what you read before proposing changes.

(4) Hallucinated APIs / Modules

In coding, hallucination commonly occurs when:

  • Not accessing the filesystem
  • Insufficient context
  • "Try-to-be-helpful" speculation

Prompt:

If information is missing or ambiguous, ask before assuming.
Do not invent functions or modules.

(5) Testing Shortcuts

Claude takes shortcuts to pass tests rather than following proper engineering workflows.

Example prompt:

Use standard tools and test frameworks only.
Do not bypass verification using helper scripts or custom stubs.
Do not hard-code to satisfy tests.

8. The Improvement Case: From Chatbot to System

Claude's greatest capability is not:

"Generating a text response"

But rather:

"Maintaining tasks → Tracking state → Verifying → Correcting → Completing → Closing"

Claude.md helps surface this capability.


9. Summary

Claude.md is essentially:

  • Workflow spec
  • Execution contract
  • Verification guide
  • State policy
  • Tooling interface
  • Completion definition

Claude 4.x is among the first LLMs with genuine:

  • Multi-window
  • State-aware
  • Test-driven
  • Agentic coding

capabilities—which is why it needs an "instruction manual." Claude.md is that manual.