FIX: "the prompt is too long" -- character aware token estimation #4799
+51
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
I continue to get "the prompt is too long" errors on a variety of Claude models. It's frequent enough to be noticeable as an issue for my team.
Findings
Most recently, after changing Opus 4.1 to have 180k instead of 200k context, I still got "226170 > 200000" which is a full 25% higher than the 180k limit that was configured via opencode.
Hypothesis
Since most token counts are based actual from the LLM response, I judge that we are underestimating tokens when estimating. I think 1:4 is both very aggressive and not sensitive enough to differences in types of content. While for English-language text, my earlier estimates of 1:3.5 are roughly in agreement with 1:4, we diverge when it comes to numeric characters and delimiters.
Changes
I've replaced the original estimate of 4 characters per token, to a weighted estimate that counts different partial token amounts per character, depending on whether the character is a digit, letter, punctuation, or whitespace. These combine to create a weighted estimate which can then be multiplied by a factor provided in the environment variable, OPENCODE_TOKEN_FACTOR, in order to make the estimate more or less conservative.
Other approaches considered
(provider string, model string, text string) => numberin order to be able to choose how these estimations are done. If one of y'all would give pointers on an acceptable way to do this, I'd be interested to do that.