Opus 4.7 now powers Amp's smart mode.
In our internal evals, Opus 4.7 scored ~72%, up from Opus 4.6's ~65% - the first model since GPT 5.4 to clear 70%.
It takes some getting used to
Compared to Opus 4.7, Opus 4.6 was forgiving.
You could give it a vague task and it would often infer the missing pieces, make a plan, and start working. Sometimes that was useful. But it also could lead to the model confidently solving a nearby problem instead of the one you actually had. Or rushing to the first, but not the best, solution.
Opus 4.7 is less like that.
It follows prompts more closely. It fills in fewer gaps. It researches more. It is less likely to silently generalize from "fix this case" to "fix every related case." If the task is underspecified, you are more likely to get a narrow answer, a pause, or a request for the missing constraint.
At first, that can feel worse. But then you realize that a good prompt can make it go further.
Opus 4.7 is better at harder coding work, especially tasks that span multiple files, tools, and verification steps. It is better at keeping the shape of a change in its head and carrying it through the codebase. It's better at refactoring too. Its explanations are more thorough.
Fewer Built-in Tools
We removed grep and glob from smart.
Opus 4.7 is good enough at using the shell directly. When it needs to search, it can run rg or use the codebase search agent.
Its ASCII diagrams are also equal to or better than what Opus 4.6 achieved with rendered diagrams.
Token Usage
Our internal assessment matches Anthropic's (see last section and graph): "token usage across all effort levels is improved." Opus 4.7 might use more tokens in some cases, but those tokens are smarter and lead to better results. And better results lead to less tokens wasted.
Tunable Thinking Effort
You can now toggle the thinking effort for smart directly from the CLI with Opt+D (Alt+D), cycling through high, xhigh, and max.
How to Use It
The main change is simple: tell it what success looks like.
A few patterns have worked well for us:
- Give it success criteria, not steps. Tell it what done means, not every move to make. Example: "Clean up the billing settings. Done means no public API changes, no database changes,
pnpm test billingpasses, andpnpm typecheckpasses." - Give it a way to check itself. A model with a test, CLI, Storybook, preview URL, or screenshot diff is much better than a model guessing from code. Example: "Fix the import flow. Reproduce it with
pnpm cli import ./fixtures/bad.csv. It is fixed when that command succeeds andpnpm test importpasses." - Brainstorm, pick, implement. Use one pass to explore options, then implement the chosen approach. Example: "Compare two ways to remove this duplicate state. Recommend one. Do not edit files yet." Then: "Implement option B. Keep the API unchanged and verify with
pnpm test settings."
Update Amp to the latest version by running amp update and you're ready to go: smart mode is now powered by Opus 4.7.