Opus 4.7

Opus 4.7 now powers Amp's smart mode.

In our internal evals, Opus 4.7 scored ~72%, up from Opus 4.6's ~65% - the first model since GPT 5.4 to clear 70%.

It takes some getting used to

Compared to Opus 4.7, Opus 4.6 was forgiving.

You could give it a vague task and it would often infer the missing pieces, make a plan, and start working. Sometimes that was useful. But it also could lead to the model confidently solving a nearby problem instead of the one you actually had. Or rushing to the first, but not the best, solution.

Opus 4.7 is less like that.

It follows prompts more closely. It fills in fewer gaps. It researches more. It is less likely to silently generalize from "fix this case" to "fix every related case." If the task is underspecified, you are more likely to get a narrow answer, a pause, or a request for the missing constraint.

At first, that can feel worse. But then you realize that a good prompt can make it go further.

Opus 4.7 is better at harder coding work, especially tasks that span multiple files, tools, and verification steps. It is better at keeping the shape of a change in its head and carrying it through the codebase. It's better at refactoring too. Its explanations are more thorough.

Fewer Built-in Tools

We removed grep and glob from smart.

Opus 4.7 is good enough at using the shell directly. When it needs to search, it can run rg or use the codebase search agent.

Its ASCII diagrams are also equal to or better than what Opus 4.6 achieved with rendered diagrams.

Token Usage

Our internal assessment matches Anthropic's (see last section and graph): "token usage across all effort levels is improved." Opus 4.7 might use more tokens in some cases, but those tokens are smarter and lead to better results. And better results lead to less tokens wasted.

Tunable Thinking Effort

You can now toggle the thinking effort for smart directly from the CLI with Opt+D (Alt+D), cycling through high, xhigh, and max.

How to Use It

The main change is simple: tell it what success looks like.

A few patterns have worked well for us:

  • Give it success criteria, not steps. Tell it what done means, not every move to make. Example: "Clean up the billing settings. Done means no public API changes, no database changes, pnpm test billing passes, and pnpm typecheck passes."
  • Give it a way to check itself. A model with a test, CLI, Storybook, preview URL, or screenshot diff is much better than a model guessing from code. Example: "Fix the import flow. Reproduce it with pnpm cli import ./fixtures/bad.csv. It is fixed when that command succeeds and pnpm test import passes."
  • Brainstorm, pick, implement. Use one pass to explore options, then implement the chosen approach. Example: "Compare two ways to remove this duplicate state. Recommend one. Do not edit files yet." Then: "Implement option B. Keep the API unchanged and verify with pnpm test settings."

Update Amp to the latest version by running amp update and you're ready to go: smart mode is now powered by Opus 4.7.