Skip to content

Conversation

@robobun
Copy link
Collaborator

@robobun robobun commented Nov 24, 2025

Summary

  • Fix crash in FormData.from() when called with very large ArrayBuffer input
  • Add length check in C++ toString function against both Bun's synthetic limit and WebKit's String::MaxLength
  • For UTF-8 tagged strings, use simdutf to calculate actual UTF-16 length only when byte length exceeds the limit

Root Cause

When FormData.from() was called with a very large ArrayBuffer (e.g., new Uint32Array(913148244) = ~3.6GB), the code would crash with:

ASSERTION FAILED: data.size() <= MaxLength
vendor/WebKit/Source/WTF/wtf/text/StringImpl.h(886)

The toString() function in helpers.h was only checking against Bun__stringSyntheticAllocationLimit (which defaults to ~4GB), but not against WebKit's String::MaxLength (INT32_MAX, ~2GB). When the input exceeded String::MaxLength, createWithoutCopying() would fail with an assertion.

Changes

  1. helpers.h: Added || str.len > WTF::String::MaxLength checks to all three code paths in toString():

    • UTF-8 tagged pointer path (with simdutf length calculation only when needed)
    • External pointer path
    • Non-copying creation path
  2. url.zig: Reverted the incorrect Zig-side check (UTF-8 byte length != UTF-16 character length)

Test plan

  • Added test that verifies FormData.from with oversized input doesn't crash
  • Verified original crash case now returns empty FormData instead of crashing:
    const v3 = new Uint32Array(913148244);
    FormData.from(v3); // No longer crashes

🤖 Generated with Claude Code

@linear
Copy link

linear bot commented Nov 24, 2025

@robobun
Copy link
Collaborator Author

robobun commented Nov 24, 2025

Updated 2:31 PM PT - Nov 26th, 2025

@Jarred-Sumner, your commit f204cda has 2 failures in Build #32467 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 25006

That installs a local version of the PR into your bun-25006 executable, so you can run:

bun-25006 --bun

// This will fail if the string is too long. Let's make it explicit instead of an ASSERT.
// Check both the Bun synthetic limit and WebKit's String::MaxLength.
if (str.len > Bun__stringSyntheticAllocationLimit || str.len > WTF::String::MaxLength) [[unlikely]] {
return {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use SIMDUTF's length-checking functions to verify the length is too big when inside this branch before we return {}?

src/url.zig Outdated
var str = jsc.ZigString.fromUTF8(strings.withoutUTF8BOM(input));
const data = strings.withoutUTF8BOM(input);
// Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX)
if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct because UTF8 byte length != UTF-16/latin1 byte length.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

Unified max-length guards and UTF‑8→UTF‑16 prechecks added across string conversion and append paths to prevent oversized strings; FormData creation now propagates string-too-long errors from URL-query parsing; URL glue returns/throws JSErrors appropriately; a test asserts FormData.from throws on very large input.

Changes

Cohort / File(s) Summary
String conversion & length guards
src/bun.js/bindings/helpers.h
Added SIMDUTF.h; compute maxLength = min(Bun__stringSyntheticAllocationLimit, WTF::String::MaxLength) and apply across UTF‑8, external, 8/16‑bit and ZigString paths; use simdutf to precompute UTF‑16 length for UTF‑8 inputs and early-return/abort when exceeding maxLength; replace selective assertions with explicit early returns and safe frees; enforce max checks in appendToBuilder.
FormData creation flow (bindings)
src/bun.js/bindings/bindings.cpp
Materialized toString(*arg1) into a temporary str; if str is empty while input length > 0, throw STRING_TOO_LONG via a throw scope; otherwise pass WTFMove(str) to DOMFormData::create.
FormData API & error propagation (Zig)
src/url.zig
After createFromURLQuery, check result for zero and return a JSError when appropriate; refine catch handling to return JSError/JSTerminated directly or call globalThis.throwError(...) for other errors.
Test: oversized input
test/js/web/html/FormData.test.ts
Add test "FormData.from throws on very large input instead of crashing": set a synthetic allocation limit, construct an oversized Uint8Array, assert FormData.from throws with "Cannot create a string longer than" in the message, and restore the limit in finally.

Possibly related PRs

  • Implement Bun.YAML.stringify #22183 — Modifies appendToBuilder/StringBuilder bindings and helpers; touches related string-builder/append paths that intersect with this PR's appendToBuilder and conversion guards.

Suggested reviewers

  • taylordotfish
  • nektro

Pre-merge checks

✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main fix: preventing crashes in FormData.from() when handling oversized input by adding proper error handling instead of assertion failures.
Description check ✅ Passed The PR description comprehensively covers what was changed, root cause analysis, specific modifications to helpers.h and url.zig, and includes test verification. Both template sections are adequately addressed.
Linked Issues check ✅ Passed The changes directly address ENG-21721 by adding WTF::String::MaxLength checks to prevent assertion failures in WebKit string creation when handling very large buffers, and include regression testing for the crash scenario.
Out of Scope Changes check ✅ Passed All modifications are scoped to fixing the crash: helpers.h length validations, url.zig error handling, and a focused test case for FormData with oversized input. No unrelated changes detected.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 265f1b2 and f204cda.

📒 Files selected for processing (1)
  • src/bun.js/bindings/bindings.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.{cpp,zig}

📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)

src/**/*.{cpp,zig}: Use bun bd or bun run build:debug to build debug versions for C++ and Zig source files; creates debug build at ./build/debug/bun-debug
Run tests using bun bd test <test-file> with the debug build; never use bun test directly as it will not include your changes
Execute files using bun bd <file> <...args>; never use bun <file> directly as it will not include your changes
Enable debug logs for specific scopes using BUN_DEBUG_$(SCOPE)=1 environment variable
Code generation happens automatically as part of the build process; no manual code generation commands are required

Files:

  • src/bun.js/bindings/bindings.cpp
src/bun.js/bindings/**/*.cpp

📄 CodeRabbit inference engine (CLAUDE.md)

src/bun.js/bindings/**/*.cpp: C++ code for JavaScriptCore bindings and Web APIs should be placed in src/bun.js/bindings/*.cpp
When implementing JavaScript classes in C++, create three classes if there's a public constructor: class Foo : public JSC::JSDestructibleObject, class FooPrototype : public JSC::JSNonFinalObject, and class FooConstructor : public JSC::InternalFunction
When implementing JavaScript classes in C++, define properties using HashTableValue arrays and add iso subspaces for classes with C++ fields, caching structures in ZigGlobalObject

Files:

  • src/bun.js/bindings/bindings.cpp
src/**/*.{ts,zig,cpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Always use absolute paths in file operations

Files:

  • src/bun.js/bindings/bindings.cpp
🧠 Learnings (17)
📓 Common learnings
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.249Z
Learning: Applies to test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} : Use `Buffer.alloc(count, fill).toString()` instead of `'A'.repeat(count)` to create repetitive strings in tests, as ''.repeat is very slow in debug JavaScriptCore builds
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:36:33.049Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:33.049Z
Learning: Applies to src/bun.js/bindings/**/*.cpp : C++ code for JavaScriptCore bindings and Web APIs should be placed in `src/bun.js/bindings/*.cpp`

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:37:47.887Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/AGENTS.md:0-0
Timestamp: 2025-11-24T18:37:47.887Z
Learning: Applies to src/bun.js/bindings/v8/**/<UNKNOWN> : <UNKNOWN>

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:36:59.675Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.675Z
Learning: Applies to src/bun.js/bindings/v8/src/symbols.txt : Add symbol names without leading underscore to src/symbols.txt for each new V8 API method

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:36:59.675Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.675Z
Learning: Applies to src/bun.js/bindings/v8/src/symbols.dyn : Add symbol names with leading underscore and semicolons in braces to src/symbols.dyn for each new V8 API method

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:36:59.675Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.675Z
Learning: Applies to src/bun.js/bindings/v8/V8*.h : Add BUN_EXPORT visibility attribute to all public V8 API functions to ensure proper symbol export across platforms

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:36:33.049Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:33.049Z
Learning: Applies to src/bun.js/bindings/**/*.cpp : When implementing JavaScript classes in C++, define properties using HashTableValue arrays and add iso subspaces for classes with C++ fields, caching structures in ZigGlobalObject

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:36:59.675Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.675Z
Learning: Applies to src/bun.js/bindings/v8/V8*.cpp : Use JSC::WriteBarrier for heap-allocated references in V8 objects and implement visitChildren() for custom heap objects to support garbage collection

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-09-07T08:20:47.215Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22258
File: src/cli/test_command.zig:1258-1281
Timestamp: 2025-09-07T08:20:47.215Z
Learning: For Bun's test line filtering feature, the parseFileLineArg function should only handle the specific cases of "file:line" and "file:line:col" formats. It should not try to be overly tolerant of other patterns, as components like ":col" or other non-numeric segments could legitimately be part of filenames. The current conservative approach that checks for numeric segments in expected positions is appropriate.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:35:25.866Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: .cursor/rules/javascriptcore-class.mdc:0-0
Timestamp: 2025-11-24T18:35:25.866Z
Learning: Applies to *.cpp : To create JavaScript objects from Zig, implement C++ functions following the `Bun__ClassName__toJS(Zig::GlobalObject*, NativeType*)` convention that construct and return the JavaScript object as an encoded JSValue

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-24T18:35:25.866Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: .cursor/rules/javascriptcore-class.mdc:0-0
Timestamp: 2025-11-24T18:35:25.866Z
Learning: Applies to *.cpp : When constructing JavaScript objects, retrieve the structure from the global object using `zigGlobalObject->m_JSX509CertificateClassStructure.get(zigGlobalObject)` or similar pattern

Applied to files:

  • src/bun.js/bindings/bindings.cpp
🧬 Code graph analysis (1)
src/bun.js/bindings/bindings.cpp (1)
src/bun.js/bindings/ErrorCode.cpp (4)
  • STRING_TOO_LONG (1083-1089)
  • STRING_TOO_LONG (1083-1083)
  • create (169-174)
  • create (169-169)
🔇 Additional comments (1)
src/bun.js/bindings/bindings.cpp (1)

5680-5688: FormData URL-query overflow handling looks correct and matches helpers.h contract

Capturing str = toString(*arg1) and treating str.isEmpty() && arg1->len > 0 as a sentinel for “string too long” cleanly maps oversized inputs into ERR_STRING_TOO_LONG instead of hitting WebKit assertions, while preserving behavior for genuinely empty input. The ThrowScope usage in the error branch and the DOMFormData::create(..., WTFMove(str)) success path are consistent with other Bun error helpers and creation sites.

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (2)
  • UTF-8: Entity not found: Issue - Could not find referenced Issue.
  • UTF-16: Entity not found: Issue - Could not find referenced Issue.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f31db64 and 3ac3082.

📒 Files selected for processing (3)
  • src/bun.js/bindings/helpers.h (1 hunks)
  • src/url.zig (1 hunks)
  • test/js/web/html/FormData.test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.

Applied to files:

  • src/url.zig
  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T20:50:47.750Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:366-373
Timestamp: 2025-10-18T20:50:47.750Z
Learning: In Bun's Zig codebase (src/bun.js/bindings/JSValue.zig), the JSValue enum uses `.null` (not `.js_null`) for JavaScript's null value. Only `js_undefined` has the `js_` prefix to avoid collision with Zig's built-in `undefined` keyword. The correct enum fields are: `js_undefined`, `null`, `true`, `false`, and `zero`.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun's Zig codebase, `.js_undefined` is a valid way to represent JavaScript's undefined value when working with JSPromise.resolve() and similar JavaScript interop functions. This is the correct pattern to use rather than `jsc.JSValue.jsUndefined()`.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-11T15:19:30.301Z
Learnt from: mastermakrela
Repo: oven-sh/bun PR: 19167
File: src/bun.js/api.zig:31-31
Timestamp: 2025-10-11T15:19:30.301Z
Learning: The learnings about `pub const js = JSC.Codegen.JS<ClassName>` and re-exporting toJS/fromJS/fromJSDirect apply to class-based Zig bindings with constructors and prototype methods (e.g., those with `new` constructors). They do NOT apply to simple namespace-style API objects like TOMLObject, YAMLObject, and CSVObject which expose static functions via a `create()` method that returns a plain JS object.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun’s Zig code, `.js_undefined` is a valid and preferred JSValue literal for “undefined” (e.g., resolving JSPromise). Do not refactor usages to `jsc.JSValue.jsUndefined()`, especially in src/valkey/js_valkey_functions.zig unsubscribe().

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-02T17:41:07.869Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 22227
File: src/memory.zig:60-76
Timestamp: 2025-09-02T17:41:07.869Z
Learning: In bun's memory utilities, when handling const pointers in deinit operations, prefer compile-time errors over silent skipping to avoid hard-to-find memory leaks. Users expect explicit failures rather than silent omissions in memory management.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.

Applied to files:

  • test/js/web/html/FormData.test.ts
🧬 Code graph analysis (2)
src/bun.js/bindings/helpers.h (1)
src/bun.js/bindings/headers-handwritten.h (2)
  • WTF (15-17)
  • WTF (32-35)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
  • setSyntheticAllocationLimitForTesting (110-114)
🔇 Additional comments (3)
src/bun.js/bindings/helpers.h (3)

82-86: LGTM! UTF-8 tagged pointer path correctly guards against oversized strings.

The added length check prevents calling WTF::String::fromUTF8ReplacingInvalidSequences() with lengths exceeding either the synthetic allocation limit or WebKit's String::MaxLength (INT32_MAX), replacing the previous assertion with an explicit early return.

The implementation correctly:

  • Validates against both limits before proceeding
  • Returns an empty string as a safe fallback (appropriate for C++ helper context)
  • Includes a clear comment explaining the check

90-95: LGTM! External pointer path correctly guards and prevents memory leaks.

The length check against the synthetic allocation limit correctly validates external strings before attempting creation. Importantly, line 93 properly frees the external string resource using free_global_string() before returning, preventing a memory leak when rejecting oversized inputs.

Note: This path only checks Bun__stringSyntheticAllocationLimit (not WTF::String::MaxLength), which is appropriate since ExternalStringImpl may have different internal limits than the UTF-8 conversion path.


102-105: LGTM! Non-copying path correctly guards against oversized strings.

The length check against the synthetic allocation limit properly validates strings before calling WTF::StringImpl::createWithoutCopying(), returning an empty string as a safe fallback when the limit is exceeded.

This completes the comprehensive coverage of all three string creation paths in the toString() function:

  1. UTF-8 tagged pointers (lines 82-87)
  2. External pointers (lines 90-95)
  3. Non-copying creation (lines 102-110)

All paths now have explicit length validation instead of relying on assertions, which transforms crashes into controlled error conditions.

src/url.zig Outdated
Comment on lines 986 to 991
const data = strings.withoutUTF8BOM(input);
// Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX)
if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) {
return error.@"Cannot create a string longer than 2^32-1 characters";
}
var str = jsc.ZigString.fromUTF8(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct the error message to accurately reflect the actual string length limit.

The error message states "Cannot create a string longer than 2^32-1 characters", but the check on line 988 validates against std.math.maxInt(i32), which equals 2^31-1 (2,147,483,647), not 2^32-1 (4,294,967,295).

Apply this diff to fix the error message:

-                return error.@"Cannot create a string longer than 2^32-1 characters";
+                return error.@"Cannot create a string longer than 2^31-1 characters";

Alternatively, for a less technical user-facing message:

-                return error.@"Cannot create a string longer than 2^32-1 characters";
+                return error.@"String length exceeds maximum allowed size";

Otherwise, the logic is sound:

  • Efficiently removes BOM before checking length (line 986)
  • Validates against both Bun's synthetic limit and WebKit's String::MaxLength
  • Returns a proper error that gets caught and converted to a JavaScript exception
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const data = strings.withoutUTF8BOM(input);
// Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX)
if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) {
return error.@"Cannot create a string longer than 2^32-1 characters";
}
var str = jsc.ZigString.fromUTF8(data);
const data = strings.withoutUTF8BOM(input);
// Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX)
if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) {
return error.@"Cannot create a string longer than 2^31-1 characters";
}
var str = jsc.ZigString.fromUTF8(data);
🤖 Prompt for AI Agents
In src/url.zig around lines 986 to 991, the error message incorrectly claims a
2^32-1 limit while the code checks against std.math.maxInt(i32) (2^31-1); update
the returned error string to match the actual bound (e.g., "Cannot create a
string longer than 2^31-1 characters") or replace with a less-technical
user-facing message (e.g., "Cannot create a string that long") so the message
accurately reflects the enforced limit.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/bun.js/bindings/helpers.h (1)

82-112: Four UTF‑8 conversion paths lack the comprehensive maxLength guard — consider aligning them with the improved logic

The improved UTF‑8 branch in toString(ZigString str) at line 93 is correct:

  • It uses maxLength = std::min(Bun__stringSyntheticAllocationLimit, (size_t)WTF::String::MaxLength) to clamp the effective limit.
  • For str.len > maxLength, it refines the decision using simdutf::utf16_length_from_utf8() before returning {}.
  • Explicit guards on both Bun__stringSyntheticAllocationLimit and WTF::String::MaxLength prevent assertions.

However, verification confirms that four other UTF‑8 conversion paths remain unprotected:

  1. Line 135 (toString(ZigString str, StringPointer ptr)) – UTF‑8 branch calls fromUTF8ReplacingInvalidSequences() without explicit maxLength guard. The guard below only checks str.len but the span uses ptr.len.
  2. Line 155 (toStringCopy(ZigString str, StringPointer ptr)) – Same issue: UTF‑8 branch unguarded; guard checks str.len not ptr.len.
  3. Line 175 (toStringCopy(ZigString str)) – UTF‑8 branch calls fromUTF8ReplacingInvalidSequences() with no length guard at all.
  4. Line 202 (appendToBuilder(ZigString str, WTF::StringBuilder& builder)) – UTF‑8 branch unguarded; only null/zero check precedes the conversion.

For consistency and to avoid similar assertion failures in other code paths, align these four functions with the guard logic from line 93.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3ac3082 and 71824e0.

📒 Files selected for processing (2)
  • src/bun.js/bindings/helpers.h (3 hunks)
  • test/js/web/html/FormData.test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (13)
📚 Learning: 2025-11-10T00:57:09.173Z
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.

Applied to files:

  • test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.

Applied to files:

  • test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.

Applied to files:

  • test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:38.580Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:38.580Z
Learning: In the Bun codebase, FFI is compiled with tcc (TinyCC), which barely supports C99. The headers `src/bun.js/api/FFI.h` and `src/bun.js/api/ffi-stdbool.h` are only used for FFI compilation with tcc, not for the main Bun target. Therefore, C23 compatibility concerns (such as bool/true/false keyword conflicts) do not apply to these FFI headers.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).

Applied to files:

  • src/bun.js/bindings/helpers.h
🧬 Code graph analysis (1)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
  • setSyntheticAllocationLimitForTesting (110-114)

// Create a buffer larger than the limit
const largeBuffer = new Uint8Array(2 * 1024 * 1024); // 2MB
// @ts-expect-error - FormData.from is a Bun extension
const result = FormData.from(largeBuffer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should throw an exception. not return an empty FormData.

auto scope = DECLARE_THROW_SCOPE(globalObject->vm());
return Bun::ERR::STRING_TOO_LONG(scope, globalObject);
}
auto formData = DOMFormData::create(globalObject->scriptExecutionContext(), str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto formData = DOMFormData::create(globalObject->scriptExecutionContext(), str);
auto formData = DOMFormData::create(globalObject->scriptExecutionContext(), WTFMove(str));

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 71824e0 and 7d4d3e2.

📒 Files selected for processing (4)
  • src/bun.js/bindings/bindings.cpp (1 hunks)
  • src/bun.js/bindings/helpers.h (7 hunks)
  • src/url.zig (2 hunks)
  • test/js/web/html/FormData.test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (20)
📓 Common learnings
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
📚 Learning: 2025-11-10T00:57:09.173Z
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.

Applied to files:

  • test/js/web/html/FormData.test.ts
  • src/url.zig
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.

Applied to files:

  • test/js/web/html/FormData.test.ts
  • src/url.zig
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.

Applied to files:

  • test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
  • src/url.zig
  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T20:50:58.644Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/Handlers.zig:154-159
Timestamp: 2025-10-17T20:50:58.644Z
Learning: In Bun socket configuration error messages (src/bun.js/api/bun/socket/Handlers.zig), use the user-facing JavaScript names "data" and "drain" instead of internal field names "onData" and "onWritable", as these are the names users see in the API according to SocketConfig.bindv2.ts.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-03T20:40:59.655Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 24273
File: src/bun.js/bindings/JSValue.zig:545-586
Timestamp: 2025-11-03T20:40:59.655Z
Learning: In Bun's Zig codebase, JSErrors (returned as `bun.JSError!T`) must always be properly handled. Using `catch continue` or `catch { break; }` to silently suppress JSErrors is a bug. Errors should either be explicitly handled or propagated with `try`. This applies to snapshot serializer error handling where Jest's behavior is to throw when serializers throw.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-24T10:43:09.398Z
Learnt from: fmguerreiro
Repo: oven-sh/bun PR: 23774
File: src/install/PackageManager/updatePackageJSONAndInstall.zig:548-548
Timestamp: 2025-10-24T10:43:09.398Z
Learning: In Bun's Zig codebase, the `as(usize, intCast(...))` cast pattern triggers a Zig compiler bug that causes compilation to hang indefinitely when used in complex control flow contexts (loops + short-circuit operators + optional unwrapping). Avoid this pattern and use simpler alternatives like just `intCast(...)` if type casting is necessary.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-26T04:50:17.892Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 24086
File: src/bun.js/api/server.zig:2534-2535
Timestamp: 2025-10-26T04:50:17.892Z
Learning: In src/bun.js/api/server.zig, u32 is acceptable for route indices and websocket_context_index fields, as the practical limit of 4 billion contexts will never be reached.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-06T03:37:41.154Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 22229
File: src/bundler/LinkerGraph.zig:0-0
Timestamp: 2025-09-06T03:37:41.154Z
Learning: In Bun's codebase, when checking import record source indices in src/bundler/LinkerGraph.zig, prefer using `if (import_index >= self.import_records.len)` bounds checking over `isValid()` checks, as the bounds check is more robust and `isValid()` is a strict subset of this condition.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-21T20:57:01.892Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23889
File: src/string.zig:751-752
Timestamp: 2025-10-21T20:57:01.892Z
Learning: In Zig code handling bun.String: calling deref() on an empty bun.String (String.empty or after being set to .empty) is well-defined behavior and safely does nothing (no-op). While such calls may be redundant and could be removed for code clarity, their presence is not a correctness issue.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:38.580Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:38.580Z
Learning: In the Bun codebase, FFI is compiled with tcc (TinyCC), which barely supports C99. The headers `src/bun.js/api/FFI.h` and `src/bun.js/api/ffi-stdbool.h` are only used for FFI compilation with tcc, not for the main Bun target. Therefore, C23 compatibility concerns (such as bool/true/false keyword conflicts) do not apply to these FFI headers.

Applied to files:

  • src/bun.js/bindings/helpers.h
🧬 Code graph analysis (3)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
  • setSyntheticAllocationLimitForTesting (110-114)
src/bun.js/bindings/bindings.cpp (1)
src/bun.js/bindings/ErrorCode.cpp (4)
  • STRING_TOO_LONG (1083-1089)
  • STRING_TOO_LONG (1083-1083)
  • create (169-174)
  • create (169-169)
src/bun.js/bindings/helpers.h (2)
src/bun.js/bindings/headers-handwritten.h (2)
  • WTF (15-17)
  • WTF (32-35)
src/bun.js/bindings/v8/V8String.cpp (2)
  • span (36-36)
  • span (70-70)
🔇 Additional comments (8)
src/bun.js/bindings/helpers.h (4)

96-106: LGTM: External pointer path correctly guards against both limits.

The extended check against both Bun__stringSyntheticAllocationLimit and WTF::String::MaxLength is correct, and the cleanup via free_global_string before returning empty prevents resource leaks.


108-117: LGTM: Non-copying path correctly validates string length.

The guard against both synthetic allocation limit and WebKit's MaxLength before calling createWithoutCopying prevents the assertion crash described in the PR objectives.


134-144: LGTM: StringPointer UTF-8 path mirrors main toString() pattern.

The implementation correctly uses ptr.len for the substring length check and computes UTF-16 length from the appropriate offset, maintaining consistency with the main toString() function.


225-237: LGTM: appendToBuilder guards against oversized UTF-8 input.

The implementation correctly applies the same length validation pattern. Note that oversized strings are silently skipped (early return) rather than throwing an error, which is reasonable for a builder pattern but differs from the other functions that return empty strings.

src/url.zig (2)

985-993: LGTM: Correct JSError propagation for URLEncoded FormData.

The implementation correctly checks for .zero (indicating a pending exception) and propagates error.JSError without throwing a new exception. This aligns with Bun's error handling patterns and ensures that string-length errors from createFromURLQuery are properly surfaced.


1049-1054: LGTM: Correct JSError propagation in multipart error handling.

The error handling correctly distinguishes between:

  • error.JSError: exception already thrown, propagate without re-throwing
  • Other errors: wrap in new error with context

This follows Bun's error handling patterns (as noted in learnings) and prevents double-throwing exceptions.

test/js/web/html/FormData.test.ts (1)

280-295: Test correctly exercises the intended code path and will properly verify exception behavior.

The verification confirms that an empty Uint8Array(2MB) triggers the same string-length validation as populated data:

  1. Buffer size drives the check, not content: When FormData.from() processes the 2MB buffer, helpers.h::toString() checks the byte length (2MB) against the synthetic limit (1MB).

  2. Exception flow confirmed:

    • toString() detects 2MB > 1MB limit
    • Returns empty string to signal overflow
    • bindings.cpp line 5678 detects: str.isEmpty() && arg1->len > 0
    • Throws STRING_TOO_LONG exception with matching message
  3. Test validation is sound: The 2MB buffer—regardless of being all zeros—correctly exercises the same allocation limit path that would catch the original crash scenario.

src/bun.js/bindings/bindings.cpp (1)

5676-5681: The review comment is based on an incorrect assumption about toString().

The toString() function (defined in src/bun.js/bindings/helpers.h lines 76–91) does not throw exceptions. When input exceeds the maximum length, it simply returns an empty WTF::String(). This is the documented behavior of the function—it handles error conditions internally by returning an empty string rather than throwing.

The current code at lines 5676–5681 is correct:

  • Line 5676 calls toString(*arg1), which safely returns either a valid string or empty
  • Lines 5678–5680 detect the "too long" condition by checking if the result is empty while the input was non-empty
  • No pending exception exists after toString() returns, so declaring ThrowScope afterward is safe
  • The pattern of declaring scope after toString() is consistent with similar code elsewhere in the codebase (e.g., line 2155–2156)

No changes are needed.

Likely an incorrect or invalid review comment.

@Jarred-Sumner Jarred-Sumner changed the title fix(FormData): throw error instead of crashing on very large input fix(FormData): throw error instead of assertion failure on very large input Nov 24, 2025
}

return FormData.toJS(globalThis, input, encoding) catch |err| return globalThis.throwError(err, "while parsing FormData");
return FormData.toJS(globalThis, input, encoding) catch |err| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this simply be a try?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 914cd22 and 837fddd.

📒 Files selected for processing (4)
  • src/bun.js/bindings/bindings.cpp (1 hunks)
  • src/bun.js/bindings/helpers.h (7 hunks)
  • src/url.zig (4 hunks)
  • test/js/web/html/FormData.test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (24)
📓 Common learnings
Learnt from: cirospaciari
Repo: oven-sh/bun PR: 22946
File: test/js/sql/sql.test.ts:195-202
Timestamp: 2025-09-25T22:07:13.851Z
Learning: PR oven-sh/bun#22946: JSON/JSONB result parsing updates (e.g., returning parsed arrays instead of legacy strings) are out of scope for this PR; tests keep current expectations with a TODO. Handle parsing fixes in a separate PR.
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.

Applied to files:

  • src/url.zig
  • src/bun.js/bindings/bindings.cpp
  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T20:50:47.750Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:366-373
Timestamp: 2025-10-18T20:50:47.750Z
Learning: In Bun's Zig codebase (src/bun.js/bindings/JSValue.zig), the JSValue enum uses `.null` (not `.js_null`) for JavaScript's null value. Only `js_undefined` has the `js_` prefix to avoid collision with Zig's built-in `undefined` keyword. The correct enum fields are: `js_undefined`, `null`, `true`, `false`, and `zero`.

Applied to files:

  • src/url.zig
📚 Learning: 2025-11-03T20:40:59.655Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 24273
File: src/bun.js/bindings/JSValue.zig:545-586
Timestamp: 2025-11-03T20:40:59.655Z
Learning: In Bun's Zig codebase, JSErrors (returned as `bun.JSError!T`) must always be properly handled. Using `catch continue` or `catch { break; }` to silently suppress JSErrors is a bug. Errors should either be explicitly handled or propagated with `try`. This applies to snapshot serializer error handling where Jest's behavior is to throw when serializers throw.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun’s Zig code, `.js_undefined` is a valid and preferred JSValue literal for “undefined” (e.g., resolving JSPromise). Do not refactor usages to `jsc.JSValue.jsUndefined()`, especially in src/valkey/js_valkey_functions.zig unsubscribe().

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-17T20:50:58.644Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/Handlers.zig:154-159
Timestamp: 2025-10-17T20:50:58.644Z
Learning: In Bun socket configuration error messages (src/bun.js/api/bun/socket/Handlers.zig), use the user-facing JavaScript names "data" and "drain" instead of internal field names "onData" and "onWritable", as these are the names users see in the API according to SocketConfig.bindv2.ts.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.

Applied to files:

  • src/url.zig
  • test/js/web/html/FormData.test.ts
📚 Learning: 2025-11-10T00:57:09.173Z
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.

Applied to files:

  • src/url.zig
  • test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-24T10:43:09.398Z
Learnt from: fmguerreiro
Repo: oven-sh/bun PR: 23774
File: src/install/PackageManager/updatePackageJSONAndInstall.zig:548-548
Timestamp: 2025-10-24T10:43:09.398Z
Learning: In Bun's Zig codebase, the `as(usize, intCast(...))` cast pattern triggers a Zig compiler bug that causes compilation to hang indefinitely when used in complex control flow contexts (loops + short-circuit operators + optional unwrapping). Avoid this pattern and use simpler alternatives like just `intCast(...)` if type casting is necessary.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-26T04:50:17.892Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 24086
File: src/bun.js/api/server.zig:2534-2535
Timestamp: 2025-10-26T04:50:17.892Z
Learning: In src/bun.js/api/server.zig, u32 is acceptable for route indices and websocket_context_index fields, as the practical limit of 4 billion contexts will never be reached.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-06T03:37:41.154Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 22229
File: src/bundler/LinkerGraph.zig:0-0
Timestamp: 2025-09-06T03:37:41.154Z
Learning: In Bun's codebase, when checking import record source indices in src/bundler/LinkerGraph.zig, prefer using `if (import_index >= self.import_records.len)` bounds checking over `isValid()` checks, as the bounds check is more robust and `isValid()` is a strict subset of this condition.

Applied to files:

  • src/url.zig
  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-21T20:57:01.892Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23889
File: src/string.zig:751-752
Timestamp: 2025-10-21T20:57:01.892Z
Learning: In Zig code handling bun.String: calling deref() on an empty bun.String (String.empty or after being set to .empty) is well-defined behavior and safely does nothing (no-op). While such calls may be redundant and could be removed for code clarity, their presence is not a correctness issue.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun's Zig codebase, `.js_undefined` is a valid way to represent JavaScript's undefined value when working with JSPromise.resolve() and similar JavaScript interop functions. This is the correct pattern to use rather than `jsc.JSValue.jsUndefined()`.

Applied to files:

  • src/url.zig
📚 Learning: 2025-09-19T19:55:22.427Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 22568
File: src/valkey/js_valkey_functions.zig:853-854
Timestamp: 2025-09-19T19:55:22.427Z
Learning: In Bun's JSPromise API, `resolvedPromiseValue(globalThis: *JSGlobalObject, value: JSValue) JSValue` returns a JSValue directly (not a JSPromise that needs `.toJS()` called on it). This is different from `resolvedPromise()` which returns a *JSPromise that would need `.toJS()`.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-11T15:19:30.301Z
Learnt from: mastermakrela
Repo: oven-sh/bun PR: 19167
File: src/bun.js/api.zig:31-31
Timestamp: 2025-10-11T15:19:30.301Z
Learning: The learnings about `pub const js = JSC.Codegen.JS<ClassName>` and re-exporting toJS/fromJS/fromJSDirect apply to class-based Zig bindings with constructors and prototype methods (e.g., those with `new` constructors). They do NOT apply to simple namespace-style API objects like TOMLObject, YAMLObject, and CSVObject which expose static functions via a `create()` method that returns a plain JS object.

Applied to files:

  • src/url.zig
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.

Applied to files:

  • src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.

Applied to files:

  • test/js/web/html/FormData.test.ts
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.

Applied to files:

  • src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).

Applied to files:

  • src/bun.js/bindings/helpers.h
🧬 Code graph analysis (2)
src/bun.js/bindings/bindings.cpp (2)
src/bun.js/bindings/BunString.cpp (10)
  • toString (178-181)
  • toString (178-178)
  • toString (208-211)
  • toString (208-208)
  • toString (230-236)
  • toString (230-230)
  • toString (237-243)
  • toString (237-237)
  • toString (244-250)
  • toString (244-244)
src/bun.js/bindings/ErrorCode.cpp (4)
  • STRING_TOO_LONG (1083-1089)
  • STRING_TOO_LONG (1083-1083)
  • create (169-174)
  • create (169-169)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
  • setSyntheticAllocationLimitForTesting (110-114)
🔇 Additional comments (6)
src/bun.js/bindings/helpers.h (2)

5-5: UTF-8 → UTF-16 precheck and max-length policy are consistent and safe

Including wtf/SIMDUTF.h and gating all UTF‑8 ZigStringWTF::String conversions behind:

  • maxLength = min(Bun__stringSyntheticAllocationLimit, static_cast<size_t>(WTF::String::MaxLength)), and
  • a secondary utf16_length_from_utf8 check only when the byte length already exceeds maxLength,

ensures we never hand WebKit an input that would produce a string longer than its internal limit, while avoiding extra work on common, small strings. Returning a default-constructed WTF::String in the overflow case matches the documented invariant that toString() only yields an empty result for non-empty input when the string is too long, which the new FormData URL-query path now relies on.

No changes requested here.

Also applies to: 83-119


131-185: Extended guards for substring and builder paths correctly mirror the core toString policy

The added maxLength/simdutf checks in:

  • toString(ZigString, StringPointer),
  • toStringCopy(ZigString, StringPointer),
  • toStringCopy(ZigString), and
  • appendToBuilder(ZigString, StringBuilder&),

bring the substring and append cases in line with the main toString(ZigString) behavior:

  • UTF‑8‑tagged substrings use utf16_length_from_utf8 only when ptr.len exceeds min(synthetic, MaxLength), and bail out with {} (or a no-op append) when the computed UTF‑16 length would overflow.
  • Non‑UTF‑8 paths explicitly check both Bun__stringSyntheticAllocationLimit and WTF::String::MaxLength before creating or copying backing storage.

This closes the remaining oversize-string assertion vectors along these helper call sites without changing semantics for normal inputs; callers will now see either a truncated “empty” string/no-op when the data is too long, or a normal string otherwise.

Looks good as implemented.

Also applies to: 187-220, 222-246

src/url.zig (3)

983-996: LGTM! Error propagation properly implemented.

The changes correctly update the error handling pattern:

  • Return type changed to bun.JSError!jsc.JSValue for proper error propagation
  • URLEncoded path checks for .zero to detect exceptions from createFromURLQuery (e.g., string too long)
  • Multipart path uses try to propagate errors from toJSFromMultipartData

This aligns with the PR objective to throw errors instead of crashing or returning empty FormData.


998-1050: Signature and error propagation updated correctly.

The function signature now returns bun.JSError!jsc.JSValue and uses try at line 1049 to propagate errors from FormData.toJS. This maintains consistency with the new error handling pattern.


1057-1144: Error throwing approach properly implemented.

The function now:

  • Returns bun.JSError!jsc.JSValue (line 1061)
  • Uses globalThis.throwError() to throw JS exceptions at lines 1066 and 1139 instead of returning raw error values
  • Provides descriptive error context ("while parsing FormData")

This is the correct pattern for Bun's JSC error handling and addresses the requirement to throw exceptions rather than crash or return empty results.

test/js/web/html/FormData.test.ts (1)

280-295: Test properly validates the fix for oversized input.

The test correctly:

  • Uses setSyntheticAllocationLimitForTesting to create a testable boundary condition (1MB limit with 2MB input)
  • Verifies FormData.from throws an exception rather than crashing (addressing the PR objective)
  • Uses try/finally to ensure cleanup even if the test fails
  • Employs partial string matching ("Cannot create a string longer than") for flexibility

The test structure is solid and validates the regression without requiring gigabytes of memory in CI.

When `FormData.from()` is called with a very large ArrayBuffer (exceeding
WebKit's String::MaxLength of INT32_MAX), it would crash with an assertion
failure in WebKit's StringImpl. This fixes the issue by:

1. Adding length checks in the C++ `toString` and related functions (helpers.h)
   to check against both Bun's synthetic limit and WebKit's String::MaxLength.
   For UTF-8 tagged strings, we use simdutf to calculate the actual UTF-16
   length only when the byte length exceeds the limit.

2. Throwing an ERR_STRING_TOO_LONG exception from createFromURLQuery when the
   string is too long, instead of silently returning an empty FormData.

Changes:
- Add length checks to all UTF-8 code paths in helpers.h:
  - toString(ZigString)
  - toString(ZigString, StringPointer)
  - toStringCopy(ZigString)
  - toStringCopy(ZigString, StringPointer)
  - appendToBuilder(ZigString, StringBuilder)
- Add WTF::String::MaxLength check to non-UTF-8 paths
- Throw ERR_STRING_TOO_LONG in createFromURLQuery when string is too long
- Properly propagate JSError from toJS to fromMultipartData

Now `FormData.from(new Uint32Array(913148244))` throws:
  error: Cannot create a string longer than 2147483647 characters
         code: "ERR_STRING_TOO_LONG"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@Jarred-Sumner Jarred-Sumner merged commit 43c46b1 into main Nov 26, 2025
14 of 27 checks passed
@Jarred-Sumner Jarred-Sumner deleted the claude/ENG-21721 branch November 26, 2025 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants