-
Notifications
You must be signed in to change notification settings - Fork 321
feat: customizable per-model rate limiting #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: customizable per-model rate limiting #812
Conversation
WalkthroughAdds a ModelRateLimiter (with YAML persistence and semaphore-based concurrency), exposes GET/POST /api/rate_limits, integrates rate limiting into LiteLLM adapters/embeddings/rerankers, adds frontend schema and a Settings UI for rate limits, and includes unit and API tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant UI as Frontend (Settings)
participant BE as Backend API
participant RL as ModelRateLimiter
participant AD as Adapter
UI->>BE: GET /api/rate_limits
BE->>RL: load_rate_limits()
RL-->>BE: RateLimits
BE-->>UI: 200 RateLimits
UI->>BE: POST /api/rate_limits (new limits)
BE->>RL: shared().update_rate_limits(...)
RL-->>BE: persisted
BE-->>UI: 200 RateLimits
Note over BE,AD: Runtime model request flow
BE->>AD: handle request -> call adapter
AD->>RL: async with limit(provider, model)
RL->>RL: acquire semaphore (or no-op if unlimited)
AD->>AD: perform external async model call
RL-->>AD: release semaphore
AD-->>BE: result
BE-->>UI: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas to focus on:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📊 Coverage ReportOverall Coverage: 92% Diff: origin/main...HEAD
Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
libs/core/kiln_ai/utils/model_rate_limiter.py (1)
38-49: Consider logging or narrowing the exception in_load_rate_limitsCatching bare
Exceptionand returning{}can silently mask YAML parse errors or permission issues. Consider at least logging the exception (or narrowing toOSError | yaml.YAMLError) so unexpected failures are discoverable while still treating them as “no limits configured”.libs/core/kiln_ai/adapters/rerankers/litellm_reranker_adapter.py (1)
18-32: Defensively normalize provider name for rate limitingHere you pass
self.reranker_config.model_provider_namedirectly into the rate limiter. Given other code wraps this field inModelProviderName(...)when looking up providers, there’s a risk it can be either a raw string or an enum, while the rate-limits YAML uses plain provider-name strings.To avoid mismatches and mirror the embedding adapter, consider:
- async with self.rate_limiter.limit( - self.reranker_config.model_provider_name, - self.reranker_config.model_name, - ): - response = await litellm.arerank(**rerank_kwargs) + provider_name = getattr( + self.reranker_config.model_provider_name, + "value", + self.reranker_config.model_provider_name, + ) + + async with self.rate_limiter.limit( + provider_name, + self.reranker_config.model_name, + ): + response = await litellm.arerank(**rerank_kwargs)This keeps the API tolerant of both enum and string values while aligning with the string keys persisted in
rate_limits.yaml.Also applies to: 58-63
app/web_ui/src/lib/api_schema.d.ts (1)
8557-8637: Rate-limits operations are consistent; consider stronger typing via backend schema
read_rate_limits_api_rate_limits_getreturns a genericRecord<string, unknown>, whileupdate_rate_limits_api_rate_limits_postaccepts a nestedRecord<string, number>. That’s functionally fine and matches a “settings blob” style, but it loses structure on the read side, so TS callers won’t get much type safety.If you later want stricter typing (e.g., explicit provider/model keys), it will need to be modeled in the FastAPI/Pydantic schema (e.g., in
rate_limits_api.py) and then regenerated into this file rather than edited here. For now, the signatures are coherent and safe to use as-is.
Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
app/desktop/desktop_server.py(2 hunks)app/desktop/studio_server/rate_limits_api.py(1 hunks)app/desktop/studio_server/test_rate_limits_api.py(1 hunks)app/web_ui/src/lib/api_schema.d.ts(4 hunks)app/web_ui/src/routes/(app)/settings/+page.svelte(1 hunks)app/web_ui/src/routes/(app)/settings/rate_limits/+page.svelte(1 hunks)libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py(3 hunks)libs/core/kiln_ai/adapters/extractors/litellm_extractor.py(5 hunks)libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py(3 hunks)libs/core/kiln_ai/adapters/rerankers/litellm_reranker_adapter.py(2 hunks)libs/core/kiln_ai/utils/model_rate_limiter.py(1 hunks)libs/core/kiln_ai/utils/test_model_rate_limiter.py(1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/upload_file_dialog.svelte:107-120
Timestamp: 2025-09-10T08:32:18.688Z
Learning: leonardmq prefers to work within the constraints of their SDK codegen for API calls, even when typing is awkward (like casting FormData to match expected types), rather than using alternative approaches like native fetch that would cause compiler errors with their generated types.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/datamodel/chunk.py:138-160
Timestamp: 2025-08-22T11:17:56.862Z
Learning: leonardmq prefers to avoid redundant validation checks when upstream systems already guarantee preconditions are met. He trusts the attachment system to ensure paths are properly formatted and prefers letting the attachment method (resolve_path) handle any edge cases rather than adding defensive precondition checks.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 390
File: libs/core/kiln_ai/adapters/provider_tools.py:494-499
Timestamp: 2025-07-23T08:58:45.769Z
Learning: leonardmq prefers to keep tightly coupled implementation details (like API versions) hardcoded when they are not user-facing and could break other modules if changed. The shared Config class is reserved for user-facing customizable values, not internal implementation details.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 341
File: libs/server/kiln_server/document_api.py:44-51
Timestamp: 2025-06-18T08:22:58.510Z
Learning: leonardmq prefers to defer fixing blocking I/O in async handlers when: the operation is very fast (milliseconds), user-triggered rather than automated, has no concurrency concerns, and would require additional testing to fix properly. He acknowledges such issues as valid but makes pragmatic decisions about timing the fixes.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/datamodel/rag.py:33-35
Timestamp: 2025-09-04T06:45:44.212Z
Learning: leonardmq requires vector_store_config_id to be a mandatory field in RagConfig (similar to extractor_config_id, chunker_config_id, embedding_config_id) for consistency. He prefers fixing dependent code that breaks due to missing required fields rather than making fields optional to accommodate incomplete data.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 654
File: app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/create_rag_config/edit_rag_config_form.svelte:141-152
Timestamp: 2025-09-25T06:38:14.854Z
Learning: leonardmq prefers simple onMount initialization patterns over reactive statements when possible, and is cautious about maintaining internal state for idempotency in Svelte components. He values simplicity and safety in component lifecycle management.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 402
File: libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py:0-0
Timestamp: 2025-07-14T03:43:07.283Z
Learning: leonardmq prefers to keep defensive validation checks even when they're technically redundant, viewing them as useful "quick sanity checks" that provide additional safety nets. He values defensive programming over strict DRY (Don't Repeat Yourself) principles when the redundant code serves as a safeguard.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/lib/api_schema.d.ts:0-0
Timestamp: 2025-09-10T08:27:47.227Z
Learning: leonardmq correctly identifies auto-generated files and prefers not to manually edit them. When suggesting changes to generated TypeScript API schema files, the fix should be made in the source OpenAPI specification in the backend, not in the generated TypeScript file.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:136-150
Timestamp: 2025-09-04T06:34:12.318Z
Learning: leonardmq prefers brute force re-insertion of all chunks when partial chunks exist in the LanceDB vector store, rather than selectively inserting only missing chunks. His reasoning is that documents typically have only dozens of chunks (making overwriting cheap), and partial insertion likely indicates data corruption or conflicts that should be resolved by re-inserting all chunks to ensure consistency.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:98-103
Timestamp: 2025-09-04T07:08:04.248Z
Learning: leonardmq prefers to let TableNotFoundError bubble up in delete_nodes_by_document_id operations rather than catching it, as this error indicates operations are being done in the wrong order on the caller side and should be surfaced as a programming error rather than handled gracefully.
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
Applied to files:
libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.pylibs/core/kiln_ai/adapters/model_adapters/litellm_adapter.pylibs/core/kiln_ai/adapters/rerankers/litellm_reranker_adapter.pylibs/core/kiln_ai/adapters/extractors/litellm_extractor.py
📚 Learning: 2025-10-29T06:59:55.635Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/lib/api_schema.d.ts:8143-8180
Timestamp: 2025-10-29T06:59:55.635Z
Learning: app/web_ui/src/lib/api_schema.d.ts is generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend (e.g., app/desktop/studio_server/data_gen_api.py or libs/server/kiln_server/*), then re-generate the TS types.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-09-04T06:27:58.852Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: app/web_ui/src/lib/api_schema.d.ts:3315-3320
Timestamp: 2025-09-04T06:27:58.852Z
Learning: The file app/web_ui/src/lib/api_schema.d.ts is auto-generated by openapi-typescript from the backend API schema and should never be manually edited. Changes to API types should be made in the backend, which will then be reflected in the regenerated TypeScript definitions.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-09-10T08:30:56.973Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/lib/api_schema.d.ts:4173-4218
Timestamp: 2025-09-10T08:30:56.973Z
Learning: leonardmq correctly identified that app/web_ui/src/lib/api_schema.d.ts is auto-generated and should not be manually edited. The proper fix for type inconsistencies between TypeScript API schema and backend models is to update the backend Python models and regenerate the TypeScript definitions.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-10-29T11:19:31.059Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
🧬 Code graph analysis (9)
app/desktop/desktop_server.py (2)
app/desktop/studio_server/test_rate_limits_api.py (1)
app(20-23)app/desktop/studio_server/rate_limits_api.py (1)
connect_rate_limits(46-109)
libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py (3)
libs/core/kiln_ai/utils/model_rate_limiter.py (3)
ModelRateLimiter(11-132)get_global_rate_limiter(139-152)limit(76-94)libs/core/kiln_ai/datamodel/embedding.py (1)
EmbeddingConfig(18-40)libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
litellm_model_id(474-487)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
libs/core/kiln_ai/utils/model_rate_limiter.py (3)
ModelRateLimiter(11-132)get_global_rate_limiter(139-152)limit(76-94)
libs/core/kiln_ai/adapters/rerankers/litellm_reranker_adapter.py (3)
libs/core/kiln_ai/utils/model_rate_limiter.py (3)
ModelRateLimiter(11-132)get_global_rate_limiter(139-152)limit(76-94)libs/core/kiln_ai/adapters/rerankers/base_reranker.py (1)
BaseReranker(28-43)libs/core/kiln_ai/adapters/provider_tools.py (1)
LiteLlmCoreConfig(420-423)
libs/core/kiln_ai/utils/test_model_rate_limiter.py (2)
libs/core/kiln_ai/utils/config.py (2)
Config(31-288)settings_dir(232-236)libs/core/kiln_ai/utils/model_rate_limiter.py (8)
ModelRateLimiter(11-132)reset_global_rate_limiter(155-162)get_limit(96-102)set_limit(104-127)reload(129-132)limit(76-94)_get_semaphore(55-73)_get_semaphore_key(51-53)
app/desktop/studio_server/test_rate_limits_api.py (3)
libs/core/kiln_ai/utils/config.py (1)
settings_dir(232-236)app/desktop/studio_server/rate_limits_api.py (1)
connect_rate_limits(46-109)libs/core/kiln_ai/utils/test_model_rate_limiter.py (1)
temp_home(24-26)
libs/core/kiln_ai/utils/model_rate_limiter.py (1)
libs/core/kiln_ai/utils/config.py (2)
Config(31-288)settings_dir(232-236)
libs/core/kiln_ai/adapters/extractors/litellm_extractor.py (1)
libs/core/kiln_ai/utils/model_rate_limiter.py (3)
ModelRateLimiter(11-132)get_global_rate_limiter(139-152)limit(76-94)
app/desktop/studio_server/rate_limits_api.py (1)
libs/core/kiln_ai/utils/config.py (2)
Config(31-288)settings_dir(232-236)
🪛 GitHub Actions: Build and Test
libs/core/kiln_ai/utils/test_model_rate_limiter.py
[error] 59-59: AssertionError in test_init_with_no_file: expected None but got 5 from limiter.get_limit('openai', 'gpt_5'). This suggests ModelRateLimiter.get_limit() returns a default limit when no file exists; needs fix. (pytest run failed: exit code 1)
🪛 GitHub Actions: Debug Detector
libs/core/kiln_ai/adapters/extractors/litellm_extractor.py
[error] 210-219: Developer content found. Step 'Checking for TODO or FIXME' failed the pipeline and exited with code 1.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Generate Coverage Report
- GitHub Check: Build Desktop Apps (macos-15-intel)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (21)
app/web_ui/src/routes/(app)/settings/+page.svelte (1)
62-69: Settings tile wiring looks consistentThe new “Rate Limits” settings item correctly points to
/settings/rate_limitsand the description matches the new page’s purpose. No issues here.app/desktop/desktop_server.py (1)
24-27: Rate-limits API wiring is in the right placeHooking
connect_rate_limits(app)alongside the otherconnect_*calls (before webhost) integrates the new endpoints cleanly without affecting existing routing. Looks good.Also applies to: 58-67
libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py (1)
91-103: The review comment is incorrect and should be disregarded.The current code already works correctly.
ModelProviderNameinherits from bothstrandEnum, which means enum members behave identically to strings in dict operations. Whenrate_limiter.limit(model_provider_name, ...)is called with aModelProviderNameenum, the_get_semaphore()method's dict lookupself._rate_limits.get(provider, {})succeeds because:
- Enum members have the same hash as their string value:
hash(ModelProviderName.openai) == hash("openai")- Enum members compare equal to strings:
ModelProviderName.openai == "openai"returnsTrue- Python's dict uses hash and equality for key matching, so the enum member finds the string key from the YAML config
The
.valuenormalization is unnecessary and adds confusion. The code is already robust due to Python's design ofstr-inheriting enums, specifically to enable this use case.Likely an incorrect or invalid review comment.
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (4)
43-43: LGTM!The import of
ModelRateLimiterandget_global_rate_limiteris appropriate for integrating rate limiting into the adapter.
70-76: LGTM!The constructor signature change is consistent with the rate limiting pattern used in other adapters. The optional
rate_limiterparameter allows for dependency injection while maintaining backward compatibility.
83-85: LGTM!The rate limiter initialization correctly defaults to the global singleton, ensuring all adapters share the same rate limiter and enforce limits consistently across the application.
294-297: LGTM!The rate limiter context manager correctly wraps the
litellm.acompletioncall, ensuring concurrent requests are throttled according to the configured per-model limits.libs/core/kiln_ai/adapters/extractors/litellm_extractor.py (4)
27-27: LGTM!The import of
ModelRateLimiterandget_global_rate_limiteris appropriate for integrating rate limiting into the extractor.
101-101: LGTM!The constructor signature change is consistent with the rate limiting pattern used in other adapters.
121-123: LGTM!The rate limiter initialization correctly defaults to the global singleton, ensuring consistent rate limiting across the application.
382-386: LGTM!The rate limiter context manager correctly wraps the
litellm.acompletioncall for non-PDF extraction paths.app/desktop/studio_server/test_rate_limits_api.py (4)
1-29: LGTM!The test fixtures are well-structured. The
temp_homefixture correctly patches the home directory for isolated testing, and the app/client fixtures follow standard FastAPI testing patterns.
31-92: LGTM!The test coverage for rate limits CRUD operations is comprehensive, covering:
- Reading empty rate limits
- Reading existing rate limits from file
- Creating new rate limits
- Overwriting existing rate limits
- Clearing rate limits with an empty payload
The tests correctly validate both API responses and filesystem persistence.
94-191: LGTM!The model listing tests are thorough, validating:
- Response structure (normal_models, embedding_models, reranker_models)
- Model field structure (model_name, friendly_name, provider_name, model_id)
- Presence of known models
- Grouping by provider
This provides good confidence that the
/api/models/allendpoint returns the expected data shape.
193-225: LGTM!The roundtrip and nested structure tests validate end-to-end persistence and retrieval, ensuring that complex rate limit configurations are correctly stored and retrieved.
app/desktop/studio_server/rate_limits_api.py (3)
13-24: LGTM!The Pydantic models
ModelInfoandAllModelsResponseprovide clear structure for the API responses and enable automatic validation.
46-62: LGTM!The rate limits endpoints correctly handle loading and saving rate limits with appropriate error handling via
HTTPException.
64-109: LGTM!The
/api/models/allendpoint correctly aggregates models from all three built-in model lists, flattening provider information into separateModelInfoentries. The error handling is appropriate.app/web_ui/src/lib/api_schema.d.ts (3)
1312-1346: New/api/rate_limitsand/api/models/allpaths are wired correctlyThe path entries correctly reference the corresponding operations and follow the same pattern as existing endpoints (GET + POST for settings-like resources, GET-only for listings). No changes needed here.
Based on learnings
2125-2133:AllModelsResponseschema cleanly structures model listsThe split into
normal_models,embedding_models, andreranker_models, all reusingModelInfo, is consistent and should make the UI consumption straightforward. Looks good.
Based on learnings
4356-4366:ModelInfoshape looks appropriate; confirmmodel_idnullabilityThe
ModelInfofields (model_name,friendly_name,provider_name, nullablemodel_id) match how a UI would present models. Allowingmodel_idto benullseems intended for providers without a stable ID, but it’s worth confirming this aligns with the backend datamodel and any assumptions in the new settings UI.
Based on learnings
|
Something wrong in tests - we should probably remove the limiter in tests or mock / reset it on every test somehow |
|
Late to ask: but how much work to make this "Customize per model settings"? Add in JSON mode, etc. |
You mean putting it alongside In this PR, we track the total number of in flight requests across the entire app because for cases like RAG, doing one user action such as adding a doc can trigger multiple independent job runners from many different configs concurrently, all of which contribute to the global number of requests. A user could be running an eval + synth data + RAG at the same time and other such combos. If we had the rate limit on a per-run config basis, it would probably be hard to share the semaphore across everything that is going on. We would probably have competing long running tasks running concurrently with distinct semaphores. Unless maybe we wrap that under the current semaphore; like: (1) acquire provider-wide/model-wide semaphore -> (2) then acquire semaphore from run config so it stays under both - but that seems error prone / dodgy. We can move away from this later though if we figure out something better. The current PR is an easy / not too risky way of giving users a way out of getting stuck if they get stuck - lower for people hitting rate limits, and cranking it up to the max if people have higher account tiers. Maybe a hybrid could be like:
|
…eat-per-model-app-wide-rate-limiting
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
app/desktop/studio_server/rate_limits_api.py (1)
9-36: Consider logging exceptions before returning HTTP 500Both handlers wrap
ModelRateLimitercalls in a broadexcept Exceptionand immediately raiseHTTPException(500, ...), but the definedloggerisn’t used. For diagnosability, consider logging the exception (e.g.logger.exception("Failed to read/update rate limits")) before raising, so filesystem or parsing issues show up in logs while keeping the API surface unchanged.app/web_ui/src/routes/(app)/settings/rate_limits/+page.svelte (1)
324-345: Tighten POST body typing for/api/rate_limitsThe request body is typed as
components["schemas"]["RateLimits"]in the API schema, which is:{ provider_limits?: Record<string, number>; model_limits?: Record<string, Record<string, number>>; }But the code casts to
Record<string, Record<string, number>>, which is structurally incorrect. This silently bypasses type checking and risks accidental mismatches with the actual API contract.Use the generated
RateLimitstype instead:import type { components } from "$lib/api_schema" // ... const { error: api_error } = await client.POST("/api/rate_limits", { body: rate_limits as components["schemas"]["RateLimits"], })This aligns TypeScript with the backend contract and prevents shape drift.
libs/core/kiln_ai/utils/model_rate_limiter.py (2)
54-75: Clarifydefault_provider_limitsemantics and consider a small testing hook for the singletonTwo small points here:
__init__’s docstring only mentionsrate_limits, but notdefault_provider_limit. It would help future readers to explicitly document that this is the fallback per‑provider cap used when there is no explicit limit, and that it feeds_default_provider_limits.
shared()creates a process‑global singleton that holds on to semaphores and reads/writes a real file insettings_dir. This is fine for production, but it makes tests and short‑lived scripts a bit awkward (state leakage between tests, reliance on the real user settings directory). Consider exposing a minimal, clearly‑named helper for tests, e.g.:@classmethod def _reset_shared_for_tests(cls) -> None: cls._shared_instance = Noneor similar, so tests don’t have to monkeypatch private attributes.
Also applies to: 76-90
91-103: Normalize provider keys tostrand handle unknown providers gracefully in defaultsTwo related subtleties around provider keys and defaults:
_initialize_default_provider_limitsseeds the dict withModelProviderName.ollamaas a key, but elsewhere the type is annotated asDict[str, int]and lookups are done withprovider: str. IfModelProviderNameisn’t strictly interchangeable withstrat the dict level, the special‑case for Ollama (limit 1) might never be hit. Normalizing tostris cheap and removes the ambiguity:- limits: Dict[str, int] = defaultdict(lambda: default_provider_limit) - limits[ModelProviderName.ollama] = 1 + limits: Dict[str, int] = defaultdict(lambda: default_provider_limit) + limits[str(ModelProviderName.ollama)] = 1
- In
get_model_default_max_concurrent_requests,ModelProviderName(provider)will raiseValueErrorifproviderisn’t a valid enum value (e.g. any future/3rd‑party provider or a typo). That turns a “no explicit limit configured” case into a hard failure instead of falling back to safe defaults. You can keep the strong typing for known providers but guard the unknown case:- model_provider = built_in_models_from_provider( - ModelProviderName(provider), model - ) + try: + provider_name = ModelProviderName(provider) + except ValueError: + # Unknown provider; fall back to the generic default. + return self._default_provider_limits[provider] + + model_provider = built_in_models_from_provider(provider_name, model)The rest of the function can stay as‑is, still using
self._default_provider_limits[provider]for the minimum.Also applies to: 310-334
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
app/desktop/studio_server/rate_limits_api.py(1 hunks)app/desktop/studio_server/test_rate_limits_api.py(1 hunks)app/desktop/studio_server/test_tool_api.py(1 hunks)app/web_ui/src/lib/api_schema.d.ts(3 hunks)app/web_ui/src/routes/(app)/settings/rate_limits/+page.svelte(1 hunks)libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py(3 hunks)libs/core/kiln_ai/adapters/extractors/litellm_extractor.py(5 hunks)libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py(3 hunks)libs/core/kiln_ai/adapters/rerankers/litellm_reranker_adapter.py(2 hunks)libs/core/kiln_ai/utils/model_rate_limiter.py(1 hunks)libs/core/kiln_ai/utils/test_model_rate_limiter.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- libs/core/kiln_ai/utils/test_model_rate_limiter.py
- app/desktop/studio_server/test_rate_limits_api.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/upload_file_dialog.svelte:107-120
Timestamp: 2025-09-10T08:32:18.688Z
Learning: leonardmq prefers to work within the constraints of their SDK codegen for API calls, even when typing is awkward (like casting FormData to match expected types), rather than using alternative approaches like native fetch that would cause compiler errors with their generated types.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/datamodel/chunk.py:138-160
Timestamp: 2025-08-22T11:17:56.862Z
Learning: leonardmq prefers to avoid redundant validation checks when upstream systems already guarantee preconditions are met. He trusts the attachment system to ensure paths are properly formatted and prefers letting the attachment method (resolve_path) handle any edge cases rather than adding defensive precondition checks.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 390
File: libs/core/kiln_ai/adapters/provider_tools.py:494-499
Timestamp: 2025-07-23T08:58:45.769Z
Learning: leonardmq prefers to keep tightly coupled implementation details (like API versions) hardcoded when they are not user-facing and could break other modules if changed. The shared Config class is reserved for user-facing customizable values, not internal implementation details.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 341
File: libs/server/kiln_server/document_api.py:44-51
Timestamp: 2025-06-18T08:22:58.510Z
Learning: leonardmq prefers to defer fixing blocking I/O in async handlers when: the operation is very fast (milliseconds), user-triggered rather than automated, has no concurrency concerns, and would require additional testing to fix properly. He acknowledges such issues as valid but makes pragmatic decisions about timing the fixes.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/datamodel/rag.py:33-35
Timestamp: 2025-09-04T06:45:44.212Z
Learning: leonardmq requires vector_store_config_id to be a mandatory field in RagConfig (similar to extractor_config_id, chunker_config_id, embedding_config_id) for consistency. He prefers fixing dependent code that breaks due to missing required fields rather than making fields optional to accommodate incomplete data.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 654
File: app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/create_rag_config/edit_rag_config_form.svelte:141-152
Timestamp: 2025-09-25T06:38:14.854Z
Learning: leonardmq prefers simple onMount initialization patterns over reactive statements when possible, and is cautious about maintaining internal state for idempotency in Svelte components. He values simplicity and safety in component lifecycle management.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 402
File: libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py:0-0
Timestamp: 2025-07-14T03:43:07.283Z
Learning: leonardmq prefers to keep defensive validation checks even when they're technically redundant, viewing them as useful "quick sanity checks" that provide additional safety nets. He values defensive programming over strict DRY (Don't Repeat Yourself) principles when the redundant code serves as a safeguard.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/lib/api_schema.d.ts:0-0
Timestamp: 2025-09-10T08:27:47.227Z
Learning: leonardmq correctly identifies auto-generated files and prefers not to manually edit them. When suggesting changes to generated TypeScript API schema files, the fix should be made in the source OpenAPI specification in the backend, not in the generated TypeScript file.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:136-150
Timestamp: 2025-09-04T06:34:12.318Z
Learning: leonardmq prefers brute force re-insertion of all chunks when partial chunks exist in the LanceDB vector store, rather than selectively inserting only missing chunks. His reasoning is that documents typically have only dozens of chunks (making overwriting cheap), and partial insertion likely indicates data corruption or conflicts that should be resolved by re-inserting all chunks to ensure consistency.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:98-103
Timestamp: 2025-09-04T07:08:04.248Z
Learning: leonardmq prefers to let TableNotFoundError bubble up in delete_nodes_by_document_id operations rather than catching it, as this error indicates operations are being done in the wrong order on the caller side and should be surfaced as a programming error rather than handled gracefully.
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
Applied to files:
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.pylibs/core/kiln_ai/adapters/extractors/litellm_extractor.pylibs/core/kiln_ai/adapters/rerankers/litellm_reranker_adapter.pylibs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.pylibs/core/kiln_ai/utils/model_rate_limiter.py
📚 Learning: 2025-10-29T06:59:55.635Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/lib/api_schema.d.ts:8143-8180
Timestamp: 2025-10-29T06:59:55.635Z
Learning: app/web_ui/src/lib/api_schema.d.ts is generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend (e.g., app/desktop/studio_server/data_gen_api.py or libs/server/kiln_server/*), then re-generate the TS types.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-09-04T06:27:58.852Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: app/web_ui/src/lib/api_schema.d.ts:3315-3320
Timestamp: 2025-09-04T06:27:58.852Z
Learning: The file app/web_ui/src/lib/api_schema.d.ts is auto-generated by openapi-typescript from the backend API schema and should never be manually edited. Changes to API types should be made in the backend, which will then be reflected in the regenerated TypeScript definitions.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-09-10T08:30:56.973Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/lib/api_schema.d.ts:4173-4218
Timestamp: 2025-09-10T08:30:56.973Z
Learning: leonardmq correctly identified that app/web_ui/src/lib/api_schema.d.ts is auto-generated and should not be manually edited. The proper fix for type inconsistencies between TypeScript API schema and backend models is to update the backend Python models and regenerate the TypeScript definitions.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-10-29T11:19:31.059Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
🔇 Additional comments (11)
app/desktop/studio_server/test_tool_api.py (1)
3566-3668: Archived RAG and kiln-task coverage looks solidThe new test cleanly exercises the archived-vs-active filtering for both RAG configs and kiln task tools and matches the endpoint’s intended behavior; no changes needed.
libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py (1)
91-104: Rate limiter integration in embedding adapter looks correctConstructor wiring and the
async with self.rate_limiter.limit(...)wrapper aroundlitellm.aembeddingare consistent with other adapters and correctly key limits by provider/model; no changes needed.Also applies to: 142-171
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
69-85: Model rate limiting aroundacompletionis wired correctlyThe adapter now cleanly accepts an optional
rate_limiterand defaults to the shared instance; theacompletion_checking_responsewrapper correctly scopes the LiteLLM call underlimit(provider, model). This matches the other adapter integrations.Also applies to: 291-297
libs/core/kiln_ai/adapters/rerankers/litellm_reranker_adapter.py (1)
21-32: Reranker rate limiter integration is consistent and safeThe optional
rate_limiterparameter and theasync with self.rate_limiter.limit(...)aroundlitellm.arerankmirror the other adapters and correctly scope concurrency per provider/model; nothing further to change here.Also applies to: 58-63
libs/core/kiln_ai/adapters/extractors/litellm_extractor.py (1)
94-123: Extractor now uses shared rate limiter correctlyThe extractor’s constructor, per-page
_extract_single_pdf_page, and non-PDF_extractpath all consistently wraplitellm.acompletionin the model rate limiter, layered on top of the existingmax_parallel_requests_for_modelbatching. The previous debug prints are gone and behavior looks good.Also applies to: 188-212, 350-378
app/web_ui/src/lib/api_schema.d.ts (4)
1312-1341: New/api/rate_limitspath is consistent with existing API patternsGET/POST shape, parameters typed as
never, and operation IDs match the established naming convention (e.g.,/api/settings). No issues from the type-layer; assuming these came from the backend OpenAPI, this looks good.As per prior learnings, any future adjustments should be made in the FastAPI schema and re‑generated here, not edited manually. [Based on learnings]
4809-4833:RateLimitsschema matches the described provider/model semanticsThe
provider_limitsand nestedmodel_limitsdictionaries cleanly represent provider‑wide vs per‑model concurrency limits, and both being optional aligns with allowing partial configs. No inconsistencies with the PR’s described behavior from the type perspective.Reminder: if the Python
RateLimitsmodel changes (fields, defaults, docstrings), regenerate this file from OpenAPI rather than editing it directly. [Based on learnings]
8557-8576:read_rate_limits_api_rate_limits_getoperation correctly returnsRateLimitsNo params, no request body, and a 200 response typed as
RateLimitsis exactly what the UI needs for loading the config. This is consistent with the/api/rate_limitspath entry.
8577-8609:update_rate_limits_api_rate_limits_postwiring looks correctRequest body and 200 response both use the
RateLimitsschema, and 422 reuses the standardHTTPValidationError, matching patterns used elsewhere (e.g., settings endpoints). Types are coherent with the new path and schema.libs/core/kiln_ai/utils/model_rate_limiter.py (2)
18-33: RateLimits schema matches the intended config shapeThe Pydantic model cleanly captures provider-wide and per-model limits with sensible defaults and descriptions. No issues from a correctness or design standpoint.
116-171: YAML persistence and reload behavior look robustThe load/save/update/reload path is defensive in a good way: invalid or unreadable YAML results in empty
RateLimitswith clear logging, writes are guarded by a lock, andupdate_rate_limits/reloadboth clear semaphores so new calls respect updated config. This is a solid foundation for the API/UI wired on top.Also applies to: 300-309
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
libs/core/kiln_ai/utils/test_model_rate_limiter.py (1)
1-19: Incomplete fix: temp_home fixture still doesn't follow the reviewed recommendationThe past review identified that
Config.settings_dir()usesPath.home()and provided a specific fix (importPathfrompathliband usepatch.object(Path, "home", return_value=tmp_path)). The current implementation uses string-based patchingpatch("pathlib.Path.home", ...)which may not reliably patchPath.home()where it's used in the Config module.Apply the recommended diff from the previous review:
import asyncio import os +from pathlib import Path from unittest.mock import patch import pytest import yaml from kiln_ai.datamodel.datamodel_enums import ModelProviderName from kiln_ai.utils.config import Config from kiln_ai.utils.model_rate_limiter import ModelRateLimiter, RateLimits @pytest.fixture def temp_home(tmp_path): - with ( - patch.object(os.path, "expanduser", return_value=str(tmp_path)), - patch("pathlib.Path.home", return_value=tmp_path), - ): + with patch.object(Path, "home", return_value=tmp_path): yield tmp_pathThis ensures
Config.settings_dir()resolves into the isolated temporary directory, preventing test cross-contamination and pipeline failures.libs/core/kiln_ai/utils/model_rate_limiter.py (3)
198-208: Docstring mismatch already flagged in past review.The docstrings for
_get_model_limit(line 203) and_get_provider_limit(line 214) claim these methods returnNonefor "unlimited" concurrency, but the actual behavior is thatNonemeans "no explicit override" and the system falls back to defaults (still finite). This was already identified in a previous review comment.Also applies to: 210-216
155-184: Potential race condition when creating semaphores in_get_semaphore.The pattern
if key not in self._semaphoresfollowed byself._semaphores[key] = asyncio.Semaphore(...)(lines 167-168, 174-175, 180-183) is not protected by a lock. If this method is called concurrently from multiple async tasks or threads for the same provider/model combination, multiple semaphores could be created and the last one overwrites previous ones. Tasks that already acquired a reference to an earlier semaphore would be using a different semaphore instance, breaking the global rate limit enforcement.While asyncio tasks in a single event loop cooperatively yield, the PR objectives mention "concurrent job runners," suggesting potential multi-threaded usage. Even in single-threaded async, the lack of synchronization is a latent bug that could surface if execution context changes.
Consider one of these approaches:
- Use an asyncio.Lock to protect semaphore creation (requires method to be async)
- Use the existing threading.Lock if all callers are guaranteed to be in the same thread
- Accept the tradeoff and document that semaphore creation is eventually consistent
If you choose option 2, apply this diff:
def _get_semaphore(self, provider: str, model: str) -> asyncio.Semaphore: """ Get or create a semaphore for the given provider/model. Checks model-specific limit first, then falls back to provider-wide limit. If using provider-wide limit, all models from that provider share the same semaphore. If no limit is configured, uses a default value or the model's max_parallel_requests defined in the model list. """ model_limit = self._rate_limits.model_limits.get(provider, {}).get(model) if model_limit is not None and model_limit > 0: key = self._get_semaphore_key(provider, model) - if key not in self._semaphores: - self._semaphores[key] = asyncio.Semaphore(model_limit) + with self._lock: + if key not in self._semaphores: + self._semaphores[key] = asyncio.Semaphore(model_limit) return self._semaphores[key] provider_limit = self._rate_limits.provider_limits.get(provider) if provider_limit is not None and provider_limit > 0: key = self._get_semaphore_key(provider, None) - if key not in self._semaphores: - self._semaphores[key] = asyncio.Semaphore(provider_limit) + with self._lock: + if key not in self._semaphores: + self._semaphores[key] = asyncio.Semaphore(provider_limit) return self._semaphores[key] # if no limit is configured, set a default of 10 key = self._get_semaphore_key(provider, model) - if key not in self._semaphores: - self._semaphores[key] = asyncio.Semaphore( - self._get_model_default_max_concurrent_requests(provider, model) - ) + with self._lock: + if key not in self._semaphores: + self._semaphores[key] = asyncio.Semaphore( + self._get_model_default_max_concurrent_requests(provider, model) + ) return self._semaphores[key]
73-86: The singleton pattern inshared()is not thread-safe.If multiple threads call
shared()simultaneously before_shared_instanceis set, they could all checkcls._shared_instance is None, seeTrue, and each create a new instance. The last assignment wins, but multiple instances will have been created, breaking the singleton guarantee and potentially causing inconsistent rate limiting across the application.Apply this diff to make
shared()thread-safe:+ _shared_lock = threading.Lock() + @classmethod def shared(cls) -> "ModelRateLimiter": """ Get the shared singleton instance of the rate limiter. This ensures all adapters share the same rate limiter and semaphores, properly enforcing global rate limits across the application. Returns: ModelRateLimiter: The shared rate limiter instance """ - if cls._shared_instance is None: - cls._shared_instance = cls() + if cls._shared_instance is None: + with cls._shared_lock: + if cls._shared_instance is None: + cls._shared_instance = cls() return cls._shared_instance
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/desktop/studio_server/test_rate_limits_api.py(1 hunks)app/web_ui/src/lib/api_schema.d.ts(3 hunks)libs/core/kiln_ai/utils/model_rate_limiter.py(1 hunks)libs/core/kiln_ai/utils/test_model_rate_limiter.py(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/upload_file_dialog.svelte:107-120
Timestamp: 2025-09-10T08:32:18.688Z
Learning: leonardmq prefers to work within the constraints of their SDK codegen for API calls, even when typing is awkward (like casting FormData to match expected types), rather than using alternative approaches like native fetch that would cause compiler errors with their generated types.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/datamodel/chunk.py:138-160
Timestamp: 2025-08-22T11:17:56.862Z
Learning: leonardmq prefers to avoid redundant validation checks when upstream systems already guarantee preconditions are met. He trusts the attachment system to ensure paths are properly formatted and prefers letting the attachment method (resolve_path) handle any edge cases rather than adding defensive precondition checks.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 390
File: libs/core/kiln_ai/adapters/provider_tools.py:494-499
Timestamp: 2025-07-23T08:58:45.769Z
Learning: leonardmq prefers to keep tightly coupled implementation details (like API versions) hardcoded when they are not user-facing and could break other modules if changed. The shared Config class is reserved for user-facing customizable values, not internal implementation details.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 341
File: libs/server/kiln_server/document_api.py:44-51
Timestamp: 2025-06-18T08:22:58.510Z
Learning: leonardmq prefers to defer fixing blocking I/O in async handlers when: the operation is very fast (milliseconds), user-triggered rather than automated, has no concurrency concerns, and would require additional testing to fix properly. He acknowledges such issues as valid but makes pragmatic decisions about timing the fixes.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/datamodel/rag.py:33-35
Timestamp: 2025-09-04T06:45:44.212Z
Learning: leonardmq requires vector_store_config_id to be a mandatory field in RagConfig (similar to extractor_config_id, chunker_config_id, embedding_config_id) for consistency. He prefers fixing dependent code that breaks due to missing required fields rather than making fields optional to accommodate incomplete data.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 654
File: app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/create_rag_config/edit_rag_config_form.svelte:141-152
Timestamp: 2025-09-25T06:38:14.854Z
Learning: leonardmq prefers simple onMount initialization patterns over reactive statements when possible, and is cautious about maintaining internal state for idempotency in Svelte components. He values simplicity and safety in component lifecycle management.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 402
File: libs/core/kiln_ai/adapters/embedding/litellm_embedding_adapter.py:0-0
Timestamp: 2025-07-14T03:43:07.283Z
Learning: leonardmq prefers to keep defensive validation checks even when they're technically redundant, viewing them as useful "quick sanity checks" that provide additional safety nets. He values defensive programming over strict DRY (Don't Repeat Yourself) principles when the redundant code serves as a safeguard.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 546
File: app/web_ui/src/lib/api_schema.d.ts:0-0
Timestamp: 2025-09-10T08:27:47.227Z
Learning: leonardmq correctly identifies auto-generated files and prefers not to manually edit them. When suggesting changes to generated TypeScript API schema files, the fix should be made in the source OpenAPI specification in the backend, not in the generated TypeScript file.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:136-150
Timestamp: 2025-09-04T06:34:12.318Z
Learning: leonardmq prefers brute force re-insertion of all chunks when partial chunks exist in the LanceDB vector store, rather than selectively inserting only missing chunks. His reasoning is that documents typically have only dozens of chunks (making overwriting cheap), and partial insertion likely indicates data corruption or conflicts that should be resolved by re-inserting all chunks to ensure consistency.
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: libs/core/kiln_ai/adapters/vector_store/lancedb_adapter.py:98-103
Timestamp: 2025-09-04T07:08:04.248Z
Learning: leonardmq prefers to let TableNotFoundError bubble up in delete_nodes_by_document_id operations rather than catching it, as this error indicates operations are being done in the wrong order on the caller side and should be surfaced as a programming error rather than handled gracefully.
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.
Applied to files:
libs/core/kiln_ai/utils/test_model_rate_limiter.pylibs/core/kiln_ai/utils/model_rate_limiter.py
📚 Learning: 2025-10-29T06:59:55.635Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/lib/api_schema.d.ts:8143-8180
Timestamp: 2025-10-29T06:59:55.635Z
Learning: app/web_ui/src/lib/api_schema.d.ts is generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend (e.g., app/desktop/studio_server/data_gen_api.py or libs/server/kiln_server/*), then re-generate the TS types.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-09-04T06:27:58.852Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: app/web_ui/src/lib/api_schema.d.ts:3315-3320
Timestamp: 2025-09-04T06:27:58.852Z
Learning: The file app/web_ui/src/lib/api_schema.d.ts is auto-generated by openapi-typescript from the backend API schema and should never be manually edited. Changes to API types should be made in the backend, which will then be reflected in the regenerated TypeScript definitions.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-10-29T11:19:31.059Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.
Applied to files:
app/web_ui/src/lib/api_schema.d.ts
🧬 Code graph analysis (3)
libs/core/kiln_ai/utils/test_model_rate_limiter.py (2)
libs/core/kiln_ai/utils/config.py (2)
Config(31-288)settings_dir(232-236)libs/core/kiln_ai/utils/model_rate_limiter.py (11)
ModelRateLimiter(36-276)RateLimits(18-33)rate_limits_path(99-104)_get_model_limit(198-208)_set_model_limit(218-239)reload(241-249)limit(187-196)_get_semaphore(155-184)_get_semaphore_key(143-153)_get_provider_limit(210-216)_get_model_default_max_concurrent_requests(251-276)
libs/core/kiln_ai/utils/model_rate_limiter.py (3)
libs/core/kiln_ai/utils/config.py (1)
settings_dir(232-236)app/desktop/studio_server/rate_limits_api.py (1)
update_rate_limits(19-36)libs/core/kiln_ai/adapters/ml_model_list.py (1)
built_in_models_from_provider(5320-5328)
app/desktop/studio_server/test_rate_limits_api.py (3)
libs/core/kiln_ai/utils/config.py (2)
Config(31-288)settings_dir(232-236)libs/core/kiln_ai/utils/model_rate_limiter.py (6)
ModelRateLimiter(36-276)rate_limits_path(99-104)shared(74-86)_get_provider_limit(210-216)_get_model_limit(198-208)update_rate_limits(132-141)app/desktop/studio_server/rate_limits_api.py (2)
connect_rate_limits(9-36)update_rate_limits(19-36)
🪛 GitHub Actions: Check API Bindings
app/web_ui/src/lib/api_schema.d.ts
[error] 4821-4821: OpenAPI schema is not current. Run generate_schema.sh to update. Differences indicate provider-related description change in RateLimits: 'Max concurrent requests per provider ... unless overridden by a model-specific limit'
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Generate Coverage Report
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-15-intel)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build, Typecheck, and Test Python (3.13)
- GitHub Check: Build, Typecheck, and Test Python (3.12)
- GitHub Check: Build, Typecheck, and Test Python (3.11)
- GitHub Check: Build, Typecheck, and Test Python (3.10)
🔇 Additional comments (3)
app/web_ui/src/lib/api_schema.d.ts (2)
1312-1341: New/api/rate_limitspath wiring matches existing REST patternsThe new path entry correctly follows the existing conventions: no params, GET/POST mapped to
read_rate_limits_api_rate_limits_get/update_rate_limits_api_rate_limits_post, and response slots reserved as expected. No issues from a typings/layout perspective.
8562-8614: Rate limits operations typings are consistent with schema and conventions
read_rate_limits_api_rate_limits_getcorrectly returnsRateLimitswith no parameters or body;update_rate_limits_api_rate_limits_postaccepts and returnsRateLimitsand includes a422 HTTPValidationErrorbranch, matching surrounding API patterns. Once the OpenAPI/TS regeneration issue is fixed, these look ready.Based on learnings
app/desktop/studio_server/test_rate_limits_api.py (1)
32-314: LGTM! Comprehensive test coverage.The test suite thoroughly covers:
- Empty and existing rate limits
- Updates, overwrites, and clears
- Roundtrip persistence
- Nested structures and mixed provider/model limits
- Validation and error handling
- Integration with the shared limiter
Well-structured and comprehensive.
What does this PR do?
Issue:
Changes:
Checklists
Summary by CodeRabbit
New Features
Behavior Changes
Tests