Writing Claude.md – Making Claude 4.x Behave Like an Engineering System

The traditional LLM usage pattern is Prompt → Output. Claude 4.x operates closer to Workflow → Tooling → State → Verification → Iteration.

Claude.md is not a prompt—it's a Work Contract. It defines how the model should plan, execute, collaborate, maintain state, and verify outcomes.

Below are practical observations and the official Claude 4.x best practices, with examples, pitfalls, and prompt patterns. The notes come from sustained use of Claude Code (Anthropic's coding CLI built on Sonnet 4.5 and Opus 4.5) on the OpenClaw project across 2025–2026.

1. Claude.md: From Prompt to Workflow

Claude 4.x shows significant improvements in:

Long-horizon reasoning
Multi-step planning
State tracking
Context awareness
Subagent orchestration
Verification via tests
Tool-assisted execution

This means:

Without a work contract, the model uses its own strategies With a contract, the model follows your defined process

Claude.md is that contract.

2. Claude.md Structure

Recommended skeleton:

PURPOSE
WORKFLOW
STATE MANAGEMENT
TESTING / VERIFICATION
TOOL POLICY
SUBAGENT CONTRACTS (optional)
COMPLETION CRITERIA
RISK / FAILURE MODES

Not every section is required, but this structure produces consistent behavior.

3. Workflow Design and Multi-Window Strategy

Claude doesn't work like traditional ChatGPT-style "generate everything at once." It breaks tasks into phases (windows):

Typical pattern:

Build architecture or plan (Plan)
Tests as correctness oracle (Tests)
Minimal implementation
Verification
Refactor + Docs
Commit state

Key characteristics:

Tests are immutable
Tests serve as correctness oracle
State externalized (preferably in filesystem / git)
Reset window beats stuffing summaries
External memory > Token memory

4. State Management

Official best practices provide clear direction:

Structured state (JSON / YAML)
Notes (Markdown)
Git for timeline & diff
Immutable tests
Externalize progress

Claude uses these states to recover context without relying on token memory, avoiding hallucination. For runtime checks beyond unit tests, complement this with verification through tracing in production.

5. Subagents and Tooling

Claude can autonomously determine:

What needs delegating
What needs parallel calls
What suits Explorer vs Planner vs Executor

With a subagent contract:

Explorer: read files, summarize structure
Planner: create task breakdown
Coder: apply changes
Tester: verify via tests

Collaboration is far more stable than monolithic prompting.

6. Claude.md Example (Engineering Practice)

A simplified example showing how to influence behavior:

WORKFLOW
1. Build architectural plan before coding.
2. Write tests as correctness oracle. Tests are immutable.
3. Implement minimal version.
4. Verify using tests.
5. Refactor for readability + maintainability.
6. Produce documentation on design choices.

STATE
- Use JSON for machine-readable state.
- Use Markdown for progress notes.
- Use git for diff + timeline.

COMPLETION
- All tests pass.
- Code readable.
- No regressions.
- States committed.

Even this short section will influence Claude's behavior.

7. Common Failure Modes and Mitigation Strategies

Practical observations of where Claude struggles, with prompt patterns to address them.

(1) Over-optimizing to Pass Tests

Symptoms:

Claude hacks tests
Uses helper scripts or bypass tools
Hard-codes edge cases
Over-optimizes correctness at the cost of maintainability

Suggested prompt:

Do not optimize for passing tests. Optimize for correctness, maintainability, and readability.
Use standard tools. Avoid custom helper scripts unless explicitly requested.

(2) Over-engineering

Opus is particularly prone to:

Excessive abstraction
Too many interfaces
Introducing 3 helper modules for a trivial feature

Prompt:

Prefer simplicity. Avoid unnecessary abstractions.
Optimize for clarity over flexibility.

(3) Answering Without Reading Code

Claude sometimes skips the reading step but answers confidently.

Prompt:

Before answering, read the relevant code files using the explorer.
Summarize what you read before proposing changes.

(4) Hallucinated APIs / Modules

In coding, hallucination commonly occurs when:

Not accessing the filesystem
Insufficient context
"Try-to-be-helpful" speculation

Prompt:

If information is missing or ambiguous, ask before assuming.
Do not invent functions or modules.

(5) Testing Shortcuts

Claude takes shortcuts to pass tests rather than following proper engineering workflows.

Example prompt:

Use standard tools and test frameworks only.
Do not bypass verification using helper scripts or custom stubs.
Do not hard-code to satisfy tests.

8. The Improvement Case: From Chatbot to System

Claude's greatest capability is not:

"Generating a text response"

But rather:

"Maintaining tasks → Tracking state → Verifying → Correcting → Completing → Closing"

Claude.md helps surface this capability.

9. Summary

Claude.md is essentially:

Workflow spec
Execution contract
Verification guide
State policy
Tooling interface
Completion definition

Claude 4.x is among the first LLMs with genuine:

Multi-window
State-aware
Test-driven
Agentic coding

capabilities—which is why it needs an "instruction manual." Claude.md is that manual.

Multi-Context Workflows and State Management – Managing state across sessions
Agent Skills Best Practice – Designing skills that agents can reliably select and execute
Introduction to Claude Code – Getting started with Claude Code CLI

Writing Claude.md – Making Claude 4.x Behave Like an Engineering System

1. Claude.md: From Prompt to Workflow

2. Claude.md Structure

3. Workflow Design and Multi-Window Strategy

4. State Management

5. Subagents and Tooling

6. Claude.md Example (Engineering Practice)

7. Common Failure Modes and Mitigation Strategies

(1) Over-optimizing to Pass Tests

(2) Over-engineering

(3) Answering Without Reading Code

(4) Hallucinated APIs / Modules

(5) Testing Shortcuts

8. The Improvement Case: From Chatbot to System

9. Summary

Related Posts