FIX: "the prompt is too long" -- character aware token estimation #4799

tylergannon · 2025-11-26T21:25:50Z

Background

I continue to get "the prompt is too long" errors on a variety of Claude models. It's frequent enough to be noticeable as an issue for my team.

Findings

It happens on requests when I was at ~25% context fill prior to sending the request
It still happens despite modifying my opencode.json to have 10% lower context limits across all models.

Most recently, after changing Opus 4.1 to have 180k instead of 200k context, I still got "226170 > 200000" which is a full 25% higher than the 180k limit that was configured via opencode.

Hypothesis

Since most token counts are based actual from the LLM response, I judge that we are underestimating tokens when estimating. I think 1:4 is both very aggressive and not sensitive enough to differences in types of content. While for English-language text, my earlier estimates of 1:3.5 are roughly in agreement with 1:4, we diverge when it comes to numeric characters and delimiters.

Changes

I've replaced the original estimate of 4 characters per token, to a weighted estimate that counts different partial token amounts per character, depending on whether the character is a digit, letter, punctuation, or whitespace. These combine to create a weighted estimate which can then be multiplied by a factor provided in the environment variable, OPENCODE_TOKEN_FACTOR, in order to make the estimate more or less conservative.

Other approaches considered

I declined to add tiktoken though I think it would be wise, given that it would add dependencies for a non-functional upgrade.
Make the weights and overall factor available in the config file. Decided this would be heavy-handed until if/when different users might have enough data on their workloads, to be able effectively to tune those values.
I'd also be interested in making token estimation into a pluggable behavior, so that users could provide a function, e.g. (provider string, model string, text string) => number in order to be able to choose how these estimations are done. If one of y'all would give pointers on an acceptable way to do this, I'd be interested to do that.

rekram1-node · 2025-11-28T17:22:35Z

@tylergannon this is pretty good but just wanted to clarify if this will actually solve your problem, the token estimation as you may know is only being used in the tool call pruning specifically so it won't fix issues if grep, webfetch, or another tool dumps so much context that it overflows prompt

Update Nix flake.lock and hashes

fb6a688

tylergannon force-pushed the feat/character-aware-token-estimation branch 2 times, most recently from 7337307 to 20c02ed Compare November 26, 2025 21:31

feat: character-aware token estimation with tunable factor

6f0833c

tylergannon force-pushed the feat/character-aware-token-estimation branch from 20c02ed to 6f0833c Compare November 26, 2025 21:33

github-actions bot force-pushed the dev branch from f8ee907 to 6a9856d Compare November 27, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX: "the prompt is too long" -- character aware token estimation #4799

FIX: "the prompt is too long" -- character aware token estimation #4799

tylergannon commented Nov 26, 2025 •

edited

Loading

Uh oh!

rekram1-node commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FIX: "the prompt is too long" -- character aware token estimation #4799

Are you sure you want to change the base?

FIX: "the prompt is too long" -- character aware token estimation #4799

Conversation

tylergannon commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Findings

Hypothesis

Changes

Other approaches considered

Uh oh!

rekram1-node commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tylergannon commented Nov 26, 2025 •

edited

Loading