Claude Code + the Karpathy Guidelines: A Saner AI Coding Workflow

Claude Code logo

After a year of pairing with LLMs on infrastructure code, I’ve landed on a setup that mostly stops the AI from over-engineering everything. Two pieces make it work: Anthropic’s Claude Code CLI and a tiny set of behavioral rules from Andrej Karpathy that I now keep in my global config. Here’s how I install both, and the philosophy behind why this combination beats Copilot-style autocomplete for anything bigger than a one-liner.


🛠️ Claude Code, the Karpathy Guidelines, and the Superpowers Plugin

Claude Code is Anthropic’s official terminal CLI for their Claude models. Unlike a browser chat, it reads and edits files in your project, runs shell commands, watches test output, and stays grounded in the actual codebase. The plugin system layers skills on top — bite-sized behavioral rules that activate when they match the task at hand. The right plugin set turns Claude from a chatty autocomplete into something closer to a junior engineer who has read the handbook.

📋 Prerequisites

  • Claude.ai subscription or API key — Pro/Team works, or pay-as-you-go via the API
  • Node 18+ — Claude Code installs via npm
  • A terminal and a project — Claude Code works in any directory; cd in and run it

🧭 The Karpathy Guidelines — Expanded

Karpathy posted a short set of rules for collaborating with LLMs on code. They’re four sentences each, and they shut down almost every common failure mode (over-engineering, scope creep, premature abstractions, confidently-wrong refactors). Here they are verbatim:

1. Think before coding. State assumptions explicitly. If multiple interpretations
   exist, present them — don't pick silently. If something is unclear, stop and ask.

2. Simplicity first. Minimum code that solves the problem. No features,
   abstractions, error handling, or "flexibility" beyond what was asked. If a
   senior engineer would call it overcomplicated, simplify.

3. Surgical changes. Touch only what the task requires. Don't "improve" adjacent
   code, refactor what isn't broken, or reformat things. Match existing style.
   Remove orphans your changes created; leave pre-existing dead code alone
   unless asked.

4. Goal-driven execution. Convert tasks into verifiable success criteria before
   starting ("write a test that reproduces the bug, then make it pass"). For
   multi-step work, state a brief plan with a verification check per step.

Each rule on its own:

1. Think before coding

LLMs default to doing. Karpathy’s first rule flips that: write out your assumptions and any ambiguous interpretations before the first edit. In Claude Code this shows up as the model saying, “I see two ways to read this — A would do X, B would do Y. Which?” instead of silently picking one and shipping it. That single behavior catches the vast majority of “that’s not what I asked for” rework.

2. Simplicity first

Without a rule, the model adds error handling for impossible cases, defensive null checks, configuration knobs nobody will use, and abstractions for a second caller that doesn’t exist yet. “Minimum code that solves the problem” is the antidote. A bug fix doesn’t need surrounding cleanup; a one-shot script doesn’t need a CLI flag library.

3. Surgical changes

This one stops the “helpful” refactors. You ask for a fix to function A; the model touches functions B and C “while I’m here.” Now the diff is 200 lines and code review takes an hour. Surgical means: change only what the task requires, leave the existing style alone, and clean up only the orphans your changes created. Pre-existing tech debt is not your problem in this PR.

4. Goal-driven execution

Convert the task into something verifiable before writing code. For a bug: write a failing test that reproduces it, then make it pass. For a feature: state the success criteria and how each step proves it. This makes the model’s plan and its progress legible — and it makes “done” mean something concrete instead of “I think I did it.”

To make these stick, drop them into ~/.claude/CLAUDE.md. That file is read on every Claude Code session, project or global:

# ~/.claude/CLAUDE.md — applies to every project

## Always apply Karpathy coding guidelines

Follow these four rules on every task:

1. **Think before coding.** State assumptions explicitly...
2. **Simplicity first.** Minimum code that solves the problem...
3. **Surgical changes.** Touch only what the task requires...
4. **Goal-driven execution.** Convert tasks into verifiable success criteria...

📦 Installing skills via the plugin marketplace

Claude Code has a built-in plugin system. Plugins ship as marketplaces — each one is a git repo with a manifest. Anthropic publishes an official marketplace; community ones exist too. Adding the official one and browsing is three commands:

# 1. Open Claude Code in your project directory
claude

# 2. Add Anthropic's official skills marketplace
/plugin marketplace add anthropics/skills

# 3. Browse and install
/plugin

The /plugin command opens an interactive picker. Two plugins are worth installing on every machine you code from:

# From the /plugin menu — or directly:
/plugin install superpowers@anthropics-skills
/plugin install andrej-karpathy-skills@anthropics-skills

🦸 Superpowers — the workflow plugin

Superpowers is the heaviest plugin in the marketplace and the one I get the most value from. It bundles a dozen skills that enforce a workflow rather than answer one-off questions. The ones that fire most often for me:

  • using-superpowers — the entrypoint. Tells Claude how to find and apply the rest of the skills.
  • brainstorming — required before any creative or feature work. Forces a “what are we actually building, what are the constraints, what are the edge cases” pass before the first line of code. Single biggest reduction in rework I’ve seen.
  • writing-plans — turns a spec into a numbered, verifiable plan with success criteria per step. The plan becomes the contract.
  • executing-plans — runs an existing plan in a separate session with review checkpoints. Pairs with writing-plans for multi-day work.
  • test-driven-development — write a failing test, make it pass, refactor. No code goes in without a test that justifies it.
  • systematic-debugging — symptom → reproduction → root cause → minimal fix → test. Stops the “add a try/except and hope” anti-pattern.
  • using-git-worktrees — spins up an isolated workspace per feature so the current branch isn’t held hostage by half-finished work.
  • verification-before-completion — Claude must run the verification command (tests, build, type-check) before claiming “fixed” or “done.” Evidence before assertions.
  • requesting-code-review / receiving-code-review — pair-review behaviors that ask for and respond to critique without performative agreement.
  • dispatching-parallel-agents — when 2+ tasks are independent, fan them out into subagents in parallel instead of serializing.

🧭 karpathy-guidelines — the four-rule plugin

The andrej-karpathy-skills:karpathy-guidelines plugin is the same four rules, but loadable as a skill instead of hand-pasted into your CLAUDE.md. The advantage is that the skill mechanism makes the rules activate on each task rather than living as background context, which improves compliance noticeably. I run both — the rules in ~/.claude/CLAUDE.md for the global heartbeat, and the skill for the per-task reinforcement.


💡 Why this combination works

Out of the box, an LLM optimizes for helpful-looking output: lots of code, lots of options, lots of “in case you also wanted…” branches. That’s the failure mode that produces tech debt at AI scale. The Karpathy guidelines push in the opposite direction (minimum surface area, explicit assumptions, verifiable outcomes), and Superpowers operationalizes those values into a workflow you can audit at every step: brainstorm → plan → execute → verify → review.

For a small MSP — where I’m the engineer, ops team, and code reviewer — that auditability is the difference between AI being a time saver and AI being a debt generator. Every change has a paper trail: what we decided to build, what the plan was, what got tested, what got reviewed. That’s the bar I want for production infrastructure code, and it’s harder to hit without the rails.


⏰ Next steps

  1. Install Claude Code: npm install -g @anthropic-ai/claude-code, then claude in any project.
  2. Add the marketplace and install the two plugins above.
  3. Paste the four Karpathy rules into ~/.claude/CLAUDE.md.
  4. Try a small task, watch Claude state assumptions first, and tweak the rules to match your style.

You’re done! Three commands, four rules, and a noticeably saner pair-programming experience.


If you’ve found a plugin worth pinning — or a rule that’s saved you from a refactor disaster — drop me a line at [email protected]. Always interested in what other people have found that works.

Thanks!

Christopher Engelhardt

Rainier IT

Rainier IT

Leave a Reply

Your email address will not be published. Required fields are marked *