Skip to content

Conversation

@tylergannon
Copy link
Contributor

@tylergannon tylergannon commented Nov 26, 2025

Background

I continue to get "the prompt is too long" errors on a variety of Claude models. It's frequent enough to be noticeable as an issue for my team.

Findings

  1. It happens on requests when I was at ~25% context fill prior to sending the request
  2. It still happens despite modifying my opencode.json to have 10% lower context limits across all models.

Most recently, after changing Opus 4.1 to have 180k instead of 200k context, I still got "226170 > 200000" which is a full 25% higher than the 180k limit that was configured via opencode.

Hypothesis

Since most token counts are based actual from the LLM response, I judge that we are underestimating tokens when estimating. I think 1:4 is both very aggressive and not sensitive enough to differences in types of content. While for English-language text, my earlier estimates of 1:3.5 are roughly in agreement with 1:4, we diverge when it comes to numeric characters and delimiters.

Changes

I've replaced the original estimate of 4 characters per token, to a weighted estimate that counts different partial token amounts per character, depending on whether the character is a digit, letter, punctuation, or whitespace. These combine to create a weighted estimate which can then be multiplied by a factor provided in the environment variable, OPENCODE_TOKEN_FACTOR, in order to make the estimate more or less conservative.

Other approaches considered

  • I declined to add tiktoken though I think it would be wise, given that it would add dependencies for a non-functional upgrade.
  • Make the weights and overall factor available in the config file. Decided this would be heavy-handed until if/when different users might have enough data on their workloads, to be able effectively to tune those values.
  • I'd also be interested in making token estimation into a pluggable behavior, so that users could provide a function, e.g. (provider string, model string, text string) => number in order to be able to choose how these estimations are done. If one of y'all would give pointers on an acceptable way to do this, I'd be interested to do that.

@tylergannon tylergannon force-pushed the feat/character-aware-token-estimation branch 2 times, most recently from 7337307 to 20c02ed Compare November 26, 2025 21:31
@rekram1-node
Copy link
Collaborator

@tylergannon this is pretty good but just wanted to clarify if this will actually solve your problem, the token estimation as you may know is only being used in the tool call pruning specifically so it won't fix issues if grep, webfetch, or another tool dumps so much context that it overflows prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants