News

Dispatches from the Amp Team

RSS
July 30, 2025 Back to News

Amp Tab 30% Faster

Response times of Amp Tab, our in-editor completion engine, are now 30% faster, with up to 50% improvements during peak usage.

We worked together with Baseten to optimize our custom deployment. The new infrastructure delivers roughly 2x performance improvements by switching to TensorRT-LLM as the inference engine and implementing KV caching with speculative decoding.

This new infrastructure also includes a modified version of lookahead decoding that uses an improved n-gram candidate selection algorithm and variable-length speculations, which reduces both draft tokens and compute per iteration compared to standard implementations.

Chart showing Amp Tab latency improvements over time