AI Coding Best Practices That Show Up in Session Data

There are a thousand "10 AI coding tips" posts. Most of them are plausible, unfalsifiable, and impossible to act on, because there's no way to check whether anyone follows them.

This guide is different in one specific way: every practice here is observable. Each one leaves a footprint in a real Claude Code or agent session transcript — a plan-mode entry, a subagent dispatch, a compact event, a commit. So these aren't vibes, and more importantly, you can verify whether your team actually does them. Most teams believe they do all eight. When you read the session data, the gap between believed and actual practice is usually the most useful thing you learn.

We build tooling that surfaces these practices from real team sessions — that's our bias, stated upfront. The practices stand on their own either way.

1. Plan before you edit

For any non-trivial change, get a written plan before the agent touches a file — plan mode, a plan document, or a structured "here's the approach, confirm before proceeding" exchange.

Why it works: an agent that starts editing immediately commits to its first interpretation of your request. A plan forces the interpretation into the open while it's still cheap to correct — you review intent before any code exists to be attached to.

In a session: a plan-mode entry, or an early assistant turn that lays out files to touch, sequence, and risks — followed by a human reply that adjusts it ("skip the migration, that table is going away") before edits begin.

When it's absent: edits start on turn one, then a long tail of corrections — the human steering a moving vehicle instead of approving a route. The session "works" but takes far more turns, and the rework is invisible afterward because only the final diff survives.

How the common advice gets it wrong: "always plan first" gets applied uniformly, so people plan trivial one-line fixes and, in a hurry, skip planning the gnarly refactor — exactly backwards. The skill is calibrating plan depth to blast radius, and that calibration is visible in session data, while "do you plan?" on a survey just gets "yes."

2. Delegate research to subagents

When the agent needs to understand a codebase area, search broadly, or read a pile of files, dispatch a subagent to do it and return a summary — instead of reading everything into the main conversation.

Why it works: every file dumped into the main thread is dead weight the model carries for the rest of the session, diluting attention on the actual task. A subagent absorbs the exploration in its own context and hands back only the conclusion.

In a session: explicit subagent dispatches — "search the codebase for how auth tokens are validated" — with compact summaries coming back, while the main thread stays focused on the change itself.

When it's absent: the main context fills with raw file reads and grep output. By the time the agent starts editing, the instructions from turn two are buried under forty files of exploration, and quality visibly drops in the back half of the session.

How the common advice gets it wrong: "use subagents" is usually pitched as parallelism — do more at once. The bigger win for most developers is context protection, and that's a habit, not a feature flag. Transcripts show who has it: their main threads stay short and decisive even on unfamiliar code.

3. Context hygiene: clear and compact deliberately

Treat the session's context window as a resource you manage. Compact when a long session has accumulated cruft; start fresh when the task changes; don't let one session sprawl across four unrelated pieces of work.

Why it works: a session that has rotted — full of abandoned approaches, stale file states, half-finished tangents — actively misleads the model. Old context isn't neutral; the agent will pattern-match against an approach you discarded an hour ago. Clearing is how you stop paying for decisions you've already reversed.

In a session: deliberate clear/compact events at task boundaries. The tell is placement — hygiene done at a natural seam, not as a panic response when things get weird.

When it's absent: marathon sessions where the agent contradicts earlier decisions, re-suggests rejected designs, or edits files based on contents that changed twenty turns ago. The human blames "the model being dumb"; the actual problem is a context window that's mostly sediment.

How the common advice gets it wrong: the usual advice is "start a new chat when it gets confused" — reactive, after quality has degraded. Strong sessions show the proactive version: hygiene at task boundaries, before confusion, paired with practice 4 so nothing important is lost in the clear.

4. Externalize memory

Keep project knowledge — conventions, architecture quirks, "never do X here" rules — in memory files the agent loads automatically (CLAUDE.md, project memory), instead of re-explaining the project at the top of every session.

Why it works: anything you re-paste each session is knowledge with a single point of failure: your discipline on a given morning. A memory file makes it durable, versionable, and consistent across every session and teammate — and each correction written down is a correction never made again.

In a session: memory files present and loaded; sessions that open with the actual task rather than three paragraphs of project orientation; the occasional turn that updates memory — "add to CLAUDE.md: integration tests need the local stub running."

When it's absent: the same explanations appear, slightly degraded, at the top of session after session — and the agent repeats the same project-specific mistakes, because nothing it learned yesterday survived the night.

How the common advice gets it wrong: "write a CLAUDE.md" gets treated as one-time setup. The observable difference between teams is whether memory gets maintained — whether corrections flow back into the file. A stale memory file is worse than none; the agent trusts it.

Every team believes it does these eight things. The session data usually disagrees — and the gap is the most useful thing it shows.

5. Track multi-step work with explicit task lists

For work with more than a couple of steps, have the agent maintain a visible task list — created up front, checked off as steps complete.

Why it works: the mechanism is mundane and real: a written list is state that survives context pressure. Deep into a long session, "what's left" is no longer a matter of the model's recall — it's an artifact both of you can read.

In a session: task-list creation near the start of multi-step work, with status updates as the session proceeds. The strong version pairs it with planning: plan produces list, list drives execution.

When it's absent: long sessions that quietly drop step four of six. Nobody notices until review — or production — because neither the human nor the agent had a checklist to notice against.

How the common advice gets it wrong: task tracking gets dismissed as ceremony, the agent equivalent of a standup ritual. In transcripts it's the opposite of ceremony — it's the cheapest known defense against the most common long-session failure, which is silent omission.

6. Steer and inspect

Review the agent's output incrementally — read the diff after each meaningful chunk, redirect early — rather than letting it run long and accepting the result wholesale.

Why it works: agent errors compound. A wrong assumption in the first file propagates into every file built on it. Catching it at file one costs a sentence; catching it at file nine costs the session. Incremental review converts one large, unreviewable decision into many small, reviewable ones.

In a session: a rhythm of agent-work-then-human-checkpoint — short corrective turns like "use the existing retry helper instead" between edits. The human's messages are frequent, short, and specific.

When it's absent: one prompt, twenty minutes of uninterrupted agent output, "accept all." Sometimes that's fine. Often the next session opens with "actually, rewrite most of that" — the review happened, just too late to be cheap.

How the common advice gets it wrong: the discourse offers two poles — "never trust the agent" (exhaustive review at the end) and "let it cook" (full autonomy). Both lose to steering: end-loaded review is where attention goes to die, and autonomy works right up until the first wrong assumption.

7. Build reusable skills and saved commands

When a workflow recurs — release prep, a review checklist, scaffolding a new module the house way — capture it as a skill or saved command instead of re-typing the prompt from memory.

Why it works: a saved command is a prompt that's been debugged — every edge case it ever mishandled is now handled in the text. Re-typing from memory re-rolls the dice each time, and means each teammate runs a private, slightly worse version of the same workflow.

In a session: saved-command and skill invocations — and, more telling, their creation: a session where someone finishes a workflow and then packages it. That's the moment individual practice becomes team infrastructure.

When it's absent: the same multi-paragraph prompt appears across many sessions in slightly different wordings, with quality that varies the way anything typed from memory varies.

How the common advice gets it wrong: "automate repetitive prompts" frames this as a time-saver. The time saved is minor. The real effects are consistency and transfer — a saved command is a best practice new teammates get on day one, instead of after six months of osmosis.

8. Commit early and often

Have the agent's work committed in small, frequent checkpoints as the session progresses — not one bulk commit when everything is "done."

Why it works: frequent commits give every chunk of agent output a durable, diffable checkpoint. Work becomes reviewable in pieces (see practice 6) and recoverable when an approach goes sideways — revert to the last good commit instead of un-picking a tangle.

In a session: commits interleaved with the work, each scoped to a coherent step, often right after a human checkpoint. Commit messages that describe the step, not "wip."

When it's absent: hour-long sessions ending in a single thousand-line commit. When something in there is wrong — something usually is — there's no checkpoint to fall back to, and the reviewer gets a take-it-or-leave-it monolith.

How the common advice gets it wrong: "commit often" is the oldest advice in software, so everyone claims it. Agent sessions quietly break the habit: the human's rhythm — commit when I finish a thought — doesn't transfer to supervising an agent that finishes ten thoughts an hour. Session data makes the slippage visible; self-report never does.

How to tell if your team actually does these

Here's the uncomfortable part. You can't check any of this from a usage dashboard. Seat counts, active users, tokens consumed — none of it distinguishes a team running all eight practices from a team pasting stack traces into a chat box. The practices live inside sessions, and standard tooling doesn't look there. That's the observability gap.

The fix is to look at the sessions themselves. That's what we build: Accrete parses your team's real Claude Code sessions and classifies these practice signals at ingest, into a per-person, per-practice adoption matrix. How it works covers the pipeline; discover your team's practices covers what you do with the result. One caveat we'll keep repeating: a signal means a practice was used, not used well. The data tells you where to look; judgment makes the call.

If you're a senior engineer rather than a team lead, the same lens works solo — Accrete for engineers is the bottom-up version, and this guide is part of a broader set in our guides.

We're pre-launch and working with a small set of early teams. If you want to see which of these eight practices your team actually does — with evidence, not a survey — join the early access list, or ask about becoming a design partner when you sign up.

AI Coding Best Practices That Actually Show Up in Session Data