-
Notifications
You must be signed in to change notification settings - Fork 3.5k
fix(FormData): throw error instead of assertion failure on very large input #25006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Updated 2:31 PM PT - Nov 26th, 2025
❌ @Jarred-Sumner, your commit f204cda has 2 failures in
🧪 To try this PR locally: bunx bun-pr 25006That installs a local version of the PR into your bun-25006 --bun |
src/bun.js/bindings/helpers.h
Outdated
| // This will fail if the string is too long. Let's make it explicit instead of an ASSERT. | ||
| // Check both the Bun synthetic limit and WebKit's String::MaxLength. | ||
| if (str.len > Bun__stringSyntheticAllocationLimit || str.len > WTF::String::MaxLength) [[unlikely]] { | ||
| return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use SIMDUTF's length-checking functions to verify the length is too big when inside this branch before we return {}?
src/url.zig
Outdated
| var str = jsc.ZigString.fromUTF8(strings.withoutUTF8BOM(input)); | ||
| const data = strings.withoutUTF8BOM(input); | ||
| // Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX) | ||
| if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct because UTF8 byte length != UTF-16/latin1 byte length.
WalkthroughUnified max-length guards and UTF‑8→UTF‑16 prechecks added across string conversion and append paths to prevent oversized strings; FormData creation now propagates string-too-long errors from URL-query parsing; URL glue returns/throws JSErrors appropriately; a test asserts FormData.from throws on very large input. Changes
Possibly related PRs
Suggested reviewers
Pre-merge checks✅ Passed checks (4 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)src/**/*.{cpp,zig}📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)
Files:
src/bun.js/bindings/**/*.cpp📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.{ts,zig,cpp}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (17)📓 Common learnings📚 Learning: 2025-10-01T21:59:54.571ZApplied to files:
📚 Learning: 2025-11-24T18:36:33.049ZApplied to files:
📚 Learning: 2025-11-24T18:37:47.887ZApplied to files:
📚 Learning: 2025-10-01T21:49:27.862ZApplied to files:
📚 Learning: 2025-11-24T18:36:59.675ZApplied to files:
📚 Learning: 2025-11-24T18:36:59.675ZApplied to files:
📚 Learning: 2025-11-24T18:36:59.675ZApplied to files:
📚 Learning: 2025-11-24T18:36:33.049ZApplied to files:
📚 Learning: 2025-10-17T01:20:22.188ZApplied to files:
📚 Learning: 2025-11-24T18:36:59.675ZApplied to files:
📚 Learning: 2025-09-12T18:16:50.754ZApplied to files:
📚 Learning: 2025-09-07T08:20:47.215ZApplied to files:
📚 Learning: 2025-10-18T20:59:45.579ZApplied to files:
📚 Learning: 2025-10-18T01:49:31.037ZApplied to files:
📚 Learning: 2025-11-24T18:35:25.866ZApplied to files:
📚 Learning: 2025-11-24T18:35:25.866ZApplied to files:
🧬 Code graph analysis (1)src/bun.js/bindings/bindings.cpp (1)
🔇 Additional comments (1)
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (2)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/bun.js/bindings/helpers.h(1 hunks)src/url.zig(1 hunks)test/js/web/html/FormData.test.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (17)
📓 Common learnings
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
Applied to files:
src/url.zigsrc/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T20:50:47.750Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:366-373
Timestamp: 2025-10-18T20:50:47.750Z
Learning: In Bun's Zig codebase (src/bun.js/bindings/JSValue.zig), the JSValue enum uses `.null` (not `.js_null`) for JavaScript's null value. Only `js_undefined` has the `js_` prefix to avoid collision with Zig's built-in `undefined` keyword. The correct enum fields are: `js_undefined`, `null`, `true`, `false`, and `zero`.
Applied to files:
src/url.zig
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.
Applied to files:
src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun's Zig codebase, `.js_undefined` is a valid way to represent JavaScript's undefined value when working with JSPromise.resolve() and similar JavaScript interop functions. This is the correct pattern to use rather than `jsc.JSValue.jsUndefined()`.
Applied to files:
src/url.zig
📚 Learning: 2025-10-11T15:19:30.301Z
Learnt from: mastermakrela
Repo: oven-sh/bun PR: 19167
File: src/bun.js/api.zig:31-31
Timestamp: 2025-10-11T15:19:30.301Z
Learning: The learnings about `pub const js = JSC.Codegen.JS<ClassName>` and re-exporting toJS/fromJS/fromJSDirect apply to class-based Zig bindings with constructors and prototype methods (e.g., those with `new` constructors). They do NOT apply to simple namespace-style API objects like TOMLObject, YAMLObject, and CSVObject which expose static functions via a `create()` method that returns a plain JS object.
Applied to files:
src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun’s Zig code, `.js_undefined` is a valid and preferred JSValue literal for “undefined” (e.g., resolving JSPromise). Do not refactor usages to `jsc.JSValue.jsUndefined()`, especially in src/valkey/js_valkey_functions.zig unsubscribe().
Applied to files:
src/url.zig
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-02T17:41:07.869Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 22227
File: src/memory.zig:60-76
Timestamp: 2025-09-02T17:41:07.869Z
Learning: In bun's memory utilities, when handling const pointers in deinit operations, prefer compile-time errors over silent skipping to avoid hard-to-find memory leaks. Users expect explicit failures rather than silent omissions in memory management.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.
Applied to files:
test/js/web/html/FormData.test.ts
🧬 Code graph analysis (2)
src/bun.js/bindings/helpers.h (1)
src/bun.js/bindings/headers-handwritten.h (2)
WTF(15-17)WTF(32-35)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
setSyntheticAllocationLimitForTesting(110-114)
🔇 Additional comments (3)
src/bun.js/bindings/helpers.h (3)
82-86: LGTM! UTF-8 tagged pointer path correctly guards against oversized strings.The added length check prevents calling
WTF::String::fromUTF8ReplacingInvalidSequences()with lengths exceeding either the synthetic allocation limit or WebKit's String::MaxLength (INT32_MAX), replacing the previous assertion with an explicit early return.The implementation correctly:
- Validates against both limits before proceeding
- Returns an empty string as a safe fallback (appropriate for C++ helper context)
- Includes a clear comment explaining the check
90-95: LGTM! External pointer path correctly guards and prevents memory leaks.The length check against the synthetic allocation limit correctly validates external strings before attempting creation. Importantly, line 93 properly frees the external string resource using
free_global_string()before returning, preventing a memory leak when rejecting oversized inputs.Note: This path only checks
Bun__stringSyntheticAllocationLimit(notWTF::String::MaxLength), which is appropriate sinceExternalStringImplmay have different internal limits than the UTF-8 conversion path.
102-105: LGTM! Non-copying path correctly guards against oversized strings.The length check against the synthetic allocation limit properly validates strings before calling
WTF::StringImpl::createWithoutCopying(), returning an empty string as a safe fallback when the limit is exceeded.This completes the comprehensive coverage of all three string creation paths in the
toString()function:
- UTF-8 tagged pointers (lines 82-87)
- External pointers (lines 90-95)
- Non-copying creation (lines 102-110)
All paths now have explicit length validation instead of relying on assertions, which transforms crashes into controlled error conditions.
src/url.zig
Outdated
| const data = strings.withoutUTF8BOM(input); | ||
| // Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX) | ||
| if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) { | ||
| return error.@"Cannot create a string longer than 2^32-1 characters"; | ||
| } | ||
| var str = jsc.ZigString.fromUTF8(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the error message to accurately reflect the actual string length limit.
The error message states "Cannot create a string longer than 2^32-1 characters", but the check on line 988 validates against std.math.maxInt(i32), which equals 2^31-1 (2,147,483,647), not 2^32-1 (4,294,967,295).
Apply this diff to fix the error message:
- return error.@"Cannot create a string longer than 2^32-1 characters";
+ return error.@"Cannot create a string longer than 2^31-1 characters";Alternatively, for a less technical user-facing message:
- return error.@"Cannot create a string longer than 2^32-1 characters";
+ return error.@"String length exceeds maximum allowed size";Otherwise, the logic is sound:
- Efficiently removes BOM before checking length (line 986)
- Validates against both Bun's synthetic limit and WebKit's String::MaxLength
- Returns a proper error that gets caught and converted to a JavaScript exception
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const data = strings.withoutUTF8BOM(input); | |
| // Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX) | |
| if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) { | |
| return error.@"Cannot create a string longer than 2^32-1 characters"; | |
| } | |
| var str = jsc.ZigString.fromUTF8(data); | |
| const data = strings.withoutUTF8BOM(input); | |
| // Check against both Bun's synthetic limit and WebKit's String::MaxLength (INT32_MAX) | |
| if (data.len > bun.String.max_length() or data.len > std.math.maxInt(i32)) { | |
| return error.@"Cannot create a string longer than 2^31-1 characters"; | |
| } | |
| var str = jsc.ZigString.fromUTF8(data); |
🤖 Prompt for AI Agents
In src/url.zig around lines 986 to 991, the error message incorrectly claims a
2^32-1 limit while the code checks against std.math.maxInt(i32) (2^31-1); update
the returned error string to match the actual bound (e.g., "Cannot create a
string longer than 2^31-1 characters") or replace with a less-technical
user-facing message (e.g., "Cannot create a string that long") so the message
accurately reflects the enforced limit.
3ac3082 to
71824e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/bun.js/bindings/helpers.h (1)
82-112: Four UTF‑8 conversion paths lack the comprehensive maxLength guard — consider aligning them with the improved logicThe improved UTF‑8 branch in
toString(ZigString str)at line 93 is correct:
- It uses
maxLength = std::min(Bun__stringSyntheticAllocationLimit, (size_t)WTF::String::MaxLength)to clamp the effective limit.- For
str.len > maxLength, it refines the decision usingsimdutf::utf16_length_from_utf8()before returning{}.- Explicit guards on both
Bun__stringSyntheticAllocationLimitandWTF::String::MaxLengthprevent assertions.However, verification confirms that four other UTF‑8 conversion paths remain unprotected:
- Line 135 (
toString(ZigString str, StringPointer ptr)) – UTF‑8 branch callsfromUTF8ReplacingInvalidSequences()without explicit maxLength guard. The guard below only checksstr.lenbut the span usesptr.len.- Line 155 (
toStringCopy(ZigString str, StringPointer ptr)) – Same issue: UTF‑8 branch unguarded; guard checksstr.lennotptr.len.- Line 175 (
toStringCopy(ZigString str)) – UTF‑8 branch callsfromUTF8ReplacingInvalidSequences()with no length guard at all.- Line 202 (
appendToBuilder(ZigString str, WTF::StringBuilder& builder)) – UTF‑8 branch unguarded; only null/zero check precedes the conversion.For consistency and to avoid similar assertion failures in other code paths, align these four functions with the guard logic from line 93.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/bun.js/bindings/helpers.h(3 hunks)test/js/web/html/FormData.test.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (13)
📚 Learning: 2025-11-10T00:57:09.173Z
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.
Applied to files:
test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.
Applied to files:
test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.
Applied to files:
test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:38.580Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:38.580Z
Learning: In the Bun codebase, FFI is compiled with tcc (TinyCC), which barely supports C99. The headers `src/bun.js/api/FFI.h` and `src/bun.js/api/ffi-stdbool.h` are only used for FFI compilation with tcc, not for the main Bun target. Therefore, C23 compatibility concerns (such as bool/true/false keyword conflicts) do not apply to these FFI headers.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).
Applied to files:
src/bun.js/bindings/helpers.h
🧬 Code graph analysis (1)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
setSyntheticAllocationLimitForTesting(110-114)
71824e0 to
833910c
Compare
test/js/web/html/FormData.test.ts
Outdated
| // Create a buffer larger than the limit | ||
| const largeBuffer = new Uint8Array(2 * 1024 * 1024); // 2MB | ||
| // @ts-expect-error - FormData.from is a Bun extension | ||
| const result = FormData.from(largeBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should throw an exception. not return an empty FormData.
833910c to
7d4d3e2
Compare
src/bun.js/bindings/bindings.cpp
Outdated
| auto scope = DECLARE_THROW_SCOPE(globalObject->vm()); | ||
| return Bun::ERR::STRING_TOO_LONG(scope, globalObject); | ||
| } | ||
| auto formData = DOMFormData::create(globalObject->scriptExecutionContext(), str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto formData = DOMFormData::create(globalObject->scriptExecutionContext(), str); | |
| auto formData = DOMFormData::create(globalObject->scriptExecutionContext(), WTFMove(str)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/bun.js/bindings/bindings.cpp(1 hunks)src/bun.js/bindings/helpers.h(7 hunks)src/url.zig(2 hunks)test/js/web/html/FormData.test.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (20)
📓 Common learnings
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
📚 Learning: 2025-11-10T00:57:09.173Z
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.
Applied to files:
test/js/web/html/FormData.test.tssrc/url.zig
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.
Applied to files:
test/js/web/html/FormData.test.tssrc/url.zig
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.
Applied to files:
test/js/web/html/FormData.test.ts
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
Applied to files:
src/bun.js/bindings/bindings.cppsrc/url.zigsrc/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.
Applied to files:
src/bun.js/bindings/bindings.cppsrc/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.
Applied to files:
src/bun.js/bindings/bindings.cppsrc/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T20:50:58.644Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/Handlers.zig:154-159
Timestamp: 2025-10-17T20:50:58.644Z
Learning: In Bun socket configuration error messages (src/bun.js/api/bun/socket/Handlers.zig), use the user-facing JavaScript names "data" and "drain" instead of internal field names "onData" and "onWritable", as these are the names users see in the API according to SocketConfig.bindv2.ts.
Applied to files:
src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-11-03T20:40:59.655Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 24273
File: src/bun.js/bindings/JSValue.zig:545-586
Timestamp: 2025-11-03T20:40:59.655Z
Learning: In Bun's Zig codebase, JSErrors (returned as `bun.JSError!T`) must always be properly handled. Using `catch continue` or `catch { break; }` to silently suppress JSErrors is a bug. Errors should either be explicitly handled or propagated with `try`. This applies to snapshot serializer error handling where Jest's behavior is to throw when serializers throw.
Applied to files:
src/url.zig
📚 Learning: 2025-10-24T10:43:09.398Z
Learnt from: fmguerreiro
Repo: oven-sh/bun PR: 23774
File: src/install/PackageManager/updatePackageJSONAndInstall.zig:548-548
Timestamp: 2025-10-24T10:43:09.398Z
Learning: In Bun's Zig codebase, the `as(usize, intCast(...))` cast pattern triggers a Zig compiler bug that causes compilation to hang indefinitely when used in complex control flow contexts (loops + short-circuit operators + optional unwrapping). Avoid this pattern and use simpler alternatives like just `intCast(...)` if type casting is necessary.
Applied to files:
src/url.zig
📚 Learning: 2025-10-26T04:50:17.892Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 24086
File: src/bun.js/api/server.zig:2534-2535
Timestamp: 2025-10-26T04:50:17.892Z
Learning: In src/bun.js/api/server.zig, u32 is acceptable for route indices and websocket_context_index fields, as the practical limit of 4 billion contexts will never be reached.
Applied to files:
src/url.zig
📚 Learning: 2025-09-06T03:37:41.154Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 22229
File: src/bundler/LinkerGraph.zig:0-0
Timestamp: 2025-09-06T03:37:41.154Z
Learning: In Bun's codebase, when checking import record source indices in src/bundler/LinkerGraph.zig, prefer using `if (import_index >= self.import_records.len)` bounds checking over `isValid()` checks, as the bounds check is more robust and `isValid()` is a strict subset of this condition.
Applied to files:
src/url.zig
📚 Learning: 2025-10-21T20:57:01.892Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23889
File: src/string.zig:751-752
Timestamp: 2025-10-21T20:57:01.892Z
Learning: In Zig code handling bun.String: calling deref() on an empty bun.String (String.empty or after being set to .empty) is well-defined behavior and safely does nothing (no-op). While such calls may be redundant and could be removed for code clarity, their presence is not a correctness issue.
Applied to files:
src/url.zig
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:38.580Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:38.580Z
Learning: In the Bun codebase, FFI is compiled with tcc (TinyCC), which barely supports C99. The headers `src/bun.js/api/FFI.h` and `src/bun.js/api/ffi-stdbool.h` are only used for FFI compilation with tcc, not for the main Bun target. Therefore, C23 compatibility concerns (such as bool/true/false keyword conflicts) do not apply to these FFI headers.
Applied to files:
src/bun.js/bindings/helpers.h
🧬 Code graph analysis (3)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
setSyntheticAllocationLimitForTesting(110-114)
src/bun.js/bindings/bindings.cpp (1)
src/bun.js/bindings/ErrorCode.cpp (4)
STRING_TOO_LONG(1083-1089)STRING_TOO_LONG(1083-1083)create(169-174)create(169-169)
src/bun.js/bindings/helpers.h (2)
src/bun.js/bindings/headers-handwritten.h (2)
WTF(15-17)WTF(32-35)src/bun.js/bindings/v8/V8String.cpp (2)
span(36-36)span(70-70)
🔇 Additional comments (8)
src/bun.js/bindings/helpers.h (4)
96-106: LGTM: External pointer path correctly guards against both limits.The extended check against both
Bun__stringSyntheticAllocationLimitandWTF::String::MaxLengthis correct, and the cleanup viafree_global_stringbefore returning empty prevents resource leaks.
108-117: LGTM: Non-copying path correctly validates string length.The guard against both synthetic allocation limit and WebKit's MaxLength before calling
createWithoutCopyingprevents the assertion crash described in the PR objectives.
134-144: LGTM: StringPointer UTF-8 path mirrors main toString() pattern.The implementation correctly uses
ptr.lenfor the substring length check and computes UTF-16 length from the appropriate offset, maintaining consistency with the maintoString()function.
225-237: LGTM: appendToBuilder guards against oversized UTF-8 input.The implementation correctly applies the same length validation pattern. Note that oversized strings are silently skipped (early return) rather than throwing an error, which is reasonable for a builder pattern but differs from the other functions that return empty strings.
src/url.zig (2)
985-993: LGTM: Correct JSError propagation for URLEncoded FormData.The implementation correctly checks for
.zero(indicating a pending exception) and propagateserror.JSErrorwithout throwing a new exception. This aligns with Bun's error handling patterns and ensures that string-length errors fromcreateFromURLQueryare properly surfaced.
1049-1054: LGTM: Correct JSError propagation in multipart error handling.The error handling correctly distinguishes between:
error.JSError: exception already thrown, propagate without re-throwing- Other errors: wrap in new error with context
This follows Bun's error handling patterns (as noted in learnings) and prevents double-throwing exceptions.
test/js/web/html/FormData.test.ts (1)
280-295: Test correctly exercises the intended code path and will properly verify exception behavior.The verification confirms that an empty
Uint8Array(2MB)triggers the same string-length validation as populated data:
Buffer size drives the check, not content: When
FormData.from()processes the 2MB buffer,helpers.h::toString()checks the byte length (2MB) against the synthetic limit (1MB).Exception flow confirmed:
toString()detects 2MB > 1MB limit- Returns empty string to signal overflow
bindings.cppline 5678 detects:str.isEmpty() && arg1->len > 0- Throws
STRING_TOO_LONGexception with matching messageTest validation is sound: The 2MB buffer—regardless of being all zeros—correctly exercises the same allocation limit path that would catch the original crash scenario.
src/bun.js/bindings/bindings.cpp (1)
5676-5681: The review comment is based on an incorrect assumption about toString().The
toString()function (defined insrc/bun.js/bindings/helpers.hlines 76–91) does not throw exceptions. When input exceeds the maximum length, it simply returns an emptyWTF::String(). This is the documented behavior of the function—it handles error conditions internally by returning an empty string rather than throwing.The current code at lines 5676–5681 is correct:
- Line 5676 calls
toString(*arg1), which safely returns either a valid string or empty- Lines 5678–5680 detect the "too long" condition by checking if the result is empty while the input was non-empty
- No pending exception exists after
toString()returns, so declaringThrowScopeafterward is safe- The pattern of declaring scope after
toString()is consistent with similar code elsewhere in the codebase (e.g., line 2155–2156)No changes are needed.
Likely an incorrect or invalid review comment.
7d4d3e2 to
914cd22
Compare
| } | ||
|
|
||
| return FormData.toJS(globalThis, input, encoding) catch |err| return globalThis.throwError(err, "while parsing FormData"); | ||
| return FormData.toJS(globalThis, input, encoding) catch |err| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this simply be a try?
914cd22 to
837fddd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/bun.js/bindings/bindings.cpp(1 hunks)src/bun.js/bindings/helpers.h(7 hunks)src/url.zig(4 hunks)test/js/web/html/FormData.test.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (24)
📓 Common learnings
Learnt from: cirospaciari
Repo: oven-sh/bun PR: 22946
File: test/js/sql/sql.test.ts:195-202
Timestamp: 2025-09-25T22:07:13.851Z
Learning: PR oven-sh/bun#22946: JSON/JSONB result parsing updates (e.g., returning parsed arrays instead of legacy strings) are out of scope for this PR; tests keep current expectations with a TODO. Handle parsing fixes in a separate PR.
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
📚 Learning: 2025-10-01T21:59:54.571Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h:47-74
Timestamp: 2025-10-01T21:59:54.571Z
Learning: In the new bindings generator (bindgenv2) for `src/bun.js/bindings/webcore/JSDOMConvertEnumeration.h`, the context-aware enumeration conversion overloads intentionally use stricter validation (requiring `value.isString()` without ToString coercion), diverging from Web IDL semantics. This is a design decision documented in comments.
Applied to files:
src/url.zigsrc/bun.js/bindings/bindings.cppsrc/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T20:50:47.750Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:366-373
Timestamp: 2025-10-18T20:50:47.750Z
Learning: In Bun's Zig codebase (src/bun.js/bindings/JSValue.zig), the JSValue enum uses `.null` (not `.js_null`) for JavaScript's null value. Only `js_undefined` has the `js_` prefix to avoid collision with Zig's built-in `undefined` keyword. The correct enum fields are: `js_undefined`, `null`, `true`, `false`, and `zero`.
Applied to files:
src/url.zig
📚 Learning: 2025-11-03T20:40:59.655Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 24273
File: src/bun.js/bindings/JSValue.zig:545-586
Timestamp: 2025-11-03T20:40:59.655Z
Learning: In Bun's Zig codebase, JSErrors (returned as `bun.JSError!T`) must always be properly handled. Using `catch continue` or `catch { break; }` to silently suppress JSErrors is a bug. Errors should either be explicitly handled or propagated with `try`. This applies to snapshot serializer error handling where Jest's behavior is to throw when serializers throw.
Applied to files:
src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun’s Zig code, `.js_undefined` is a valid and preferred JSValue literal for “undefined” (e.g., resolving JSPromise). Do not refactor usages to `jsc.JSValue.jsUndefined()`, especially in src/valkey/js_valkey_functions.zig unsubscribe().
Applied to files:
src/url.zig
📚 Learning: 2025-10-17T20:50:58.644Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/Handlers.zig:154-159
Timestamp: 2025-10-17T20:50:58.644Z
Learning: In Bun socket configuration error messages (src/bun.js/api/bun/socket/Handlers.zig), use the user-facing JavaScript names "data" and "drain" instead of internal field names "onData" and "onWritable", as these are the names users see in the API according to SocketConfig.bindv2.ts.
Applied to files:
src/url.zig
📚 Learning: 2025-10-18T20:59:45.579Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: src/bun.js/telemetry.zig:458-475
Timestamp: 2025-10-18T20:59:45.579Z
Learning: In src/bun.js/telemetry.zig, the RequestId (u64) to JavaScript number (f64) conversion in jsRequestId() is intentionally allowed to lose precision beyond 2^53-1. This is acceptable because: (1) at 1M requests/sec it takes ~285 years to overflow, (2) the counter resets per-process, and (3) these are observability IDs, not critical distributed IDs. Precision loss is an acceptable trade-off for this use case.
Applied to files:
src/url.zigtest/js/web/html/FormData.test.ts
📚 Learning: 2025-11-10T00:57:09.173Z
Learnt from: franciscop
Repo: oven-sh/bun PR: 24514
File: src/bun.js/api/crypto/PasswordObject.zig:86-101
Timestamp: 2025-11-10T00:57:09.173Z
Learning: In Bun's Zig codebase (PasswordObject.zig), when validating the parallelism parameter for Argon2, the upper limit is set to 65535 (2^16 - 1) rather than using `std.math.maxInt(u24)` because the latter triggers Zig's truncation limit checks. The value 65535 is a practical upper bound that avoids compiler issues while being sufficient for thread parallelism use cases.
Applied to files:
src/url.zigtest/js/web/html/FormData.test.ts
📚 Learning: 2025-10-24T10:43:09.398Z
Learnt from: fmguerreiro
Repo: oven-sh/bun PR: 23774
File: src/install/PackageManager/updatePackageJSONAndInstall.zig:548-548
Timestamp: 2025-10-24T10:43:09.398Z
Learning: In Bun's Zig codebase, the `as(usize, intCast(...))` cast pattern triggers a Zig compiler bug that causes compilation to hang indefinitely when used in complex control flow contexts (loops + short-circuit operators + optional unwrapping). Avoid this pattern and use simpler alternatives like just `intCast(...)` if type casting is necessary.
Applied to files:
src/url.zig
📚 Learning: 2025-10-26T04:50:17.892Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 24086
File: src/bun.js/api/server.zig:2534-2535
Timestamp: 2025-10-26T04:50:17.892Z
Learning: In src/bun.js/api/server.zig, u32 is acceptable for route indices and websocket_context_index fields, as the practical limit of 4 billion contexts will never be reached.
Applied to files:
src/url.zig
📚 Learning: 2025-09-06T03:37:41.154Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 22229
File: src/bundler/LinkerGraph.zig:0-0
Timestamp: 2025-09-06T03:37:41.154Z
Learning: In Bun's codebase, when checking import record source indices in src/bundler/LinkerGraph.zig, prefer using `if (import_index >= self.import_records.len)` bounds checking over `isValid()` checks, as the bounds check is more robust and `isValid()` is a strict subset of this condition.
Applied to files:
src/url.zigsrc/bun.js/bindings/helpers.h
📚 Learning: 2025-10-21T20:57:01.892Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23889
File: src/string.zig:751-752
Timestamp: 2025-10-21T20:57:01.892Z
Learning: In Zig code handling bun.String: calling deref() on an empty bun.String (String.empty or after being set to .empty) is well-defined behavior and safely does nothing (no-op). While such calls may be redundant and could be removed for code clarity, their presence is not a correctness issue.
Applied to files:
src/url.zig
📚 Learning: 2025-09-05T19:49:26.188Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 21728
File: src/valkey/js_valkey_functions.zig:852-867
Timestamp: 2025-09-05T19:49:26.188Z
Learning: In Bun's Zig codebase, `.js_undefined` is a valid way to represent JavaScript's undefined value when working with JSPromise.resolve() and similar JavaScript interop functions. This is the correct pattern to use rather than `jsc.JSValue.jsUndefined()`.
Applied to files:
src/url.zig
📚 Learning: 2025-09-19T19:55:22.427Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 22568
File: src/valkey/js_valkey_functions.zig:853-854
Timestamp: 2025-09-19T19:55:22.427Z
Learning: In Bun's JSPromise API, `resolvedPromiseValue(globalThis: *JSGlobalObject, value: JSValue) JSValue` returns a JSValue directly (not a JSPromise that needs `.toJS()` called on it). This is different from `resolvedPromise()` which returns a *JSPromise that would need `.toJS()`.
Applied to files:
src/url.zig
📚 Learning: 2025-10-11T15:19:30.301Z
Learnt from: mastermakrela
Repo: oven-sh/bun PR: 19167
File: src/bun.js/api.zig:31-31
Timestamp: 2025-10-11T15:19:30.301Z
Learning: The learnings about `pub const js = JSC.Codegen.JS<ClassName>` and re-exporting toJS/fromJS/fromJSDirect apply to class-based Zig bindings with constructors and prototype methods (e.g., those with `new` constructors). They do NOT apply to simple namespace-style API objects like TOMLObject, YAMLObject, and CSVObject which expose static functions via a `create()` method that returns a plain JS object.
Applied to files:
src/url.zig
📚 Learning: 2025-10-01T21:49:27.862Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/BunIDLConvert.h:29-42
Timestamp: 2025-10-01T21:49:27.862Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), IDLStrictNull intentionally treats both undefined and null as null (using isUndefinedOrNull()), matching WebKit's IDLNull & IDLNullable behavior. This is the correct implementation and should not be changed to only accept null.
Applied to files:
src/bun.js/bindings/bindings.cppsrc/bun.js/bindings/helpers.h
📚 Learning: 2025-10-17T01:20:22.188Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/bindings/BunIDLConvert.h:72-97
Timestamp: 2025-10-17T01:20:22.188Z
Learning: In Bun's IDL bindings (src/bun.js/bindings/BunIDLConvert.h), converters inheriting from `Bun::DefaultTryConverter` automatically get `conversionHasSideEffects = true` via the inheritance chain (`DefaultTryConverter` → `WebCore::DefaultConverter`). No explicit declaration of `conversionHasSideEffects` is needed when using `DefaultTryConverter` as the base.
Applied to files:
src/bun.js/bindings/bindings.cpp
📚 Learning: 2025-10-08T13:48:02.430Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: test/js/bun/tarball/extract.test.ts:107-111
Timestamp: 2025-10-08T13:48:02.430Z
Learning: In Bun's test runner, use `expect(async () => { await ... }).toThrow()` to assert async rejections. Unlike Jest/Vitest, Bun does not require `await expect(...).rejects.toThrow()` - the async function wrapper with `.toThrow()` is the correct pattern for async error assertions in Bun tests.
Applied to files:
test/js/web/html/FormData.test.ts
📚 Learning: 2025-09-12T18:16:50.754Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22606
File: src/glob/GlobWalker.zig:449-452
Timestamp: 2025-09-12T18:16:50.754Z
Learning: For Bun codebase: prefer using `std.fs.path.sep` over manual platform separator detection, and use `bun.strings.lastIndexOfChar` instead of `std.mem.lastIndexOfScalar` for string operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-08T13:56:00.875Z
Learnt from: Jarred-Sumner
Repo: oven-sh/bun PR: 23373
File: src/bun.js/api/BunObject.zig:2514-2521
Timestamp: 2025-10-08T13:56:00.875Z
Learning: For Bun codebase: prefer using `bun.path` utilities (e.g., `bun.path.joinAbsStringBuf`, `bun.path.join`) over `std.fs.path` functions for path operations.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-01T21:48:38.278Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23169
File: src/bun.js/bindings/Bindgen/IDLTypes.h:1-3
Timestamp: 2025-10-01T21:48:38.278Z
Learning: In the Bun codebase, for `BunIDL*` and `Bindgen*` headers (e.g., BunIDLTypes.h, Bindgen/IDLTypes.h), it's acceptable to rely on transitive includes for standard library headers like <type_traits> and <utility> rather than including them explicitly.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-09-20T05:35:57.318Z
Learnt from: pfgithub
Repo: oven-sh/bun PR: 22534
File: src/bun.js/bindings/headers.h:729-731
Timestamp: 2025-09-20T05:35:57.318Z
Learning: symbols.txt in the Bun codebase is specifically for V8 API mangled symbols (without leading underscore), not for general Bun host functions declared with BUN_DECLARE_HOST_FUNCTION. Host functions are handled through different build mechanisms.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-15T20:19:37.256Z
Learnt from: markovejnovic
Repo: oven-sh/bun PR: 23680
File: cmake/targets/BuildBun.cmake:822-822
Timestamp: 2025-10-15T20:19:37.256Z
Learning: In the Bun codebase, FFI (Foreign Function Interface) code is compiled separately using TinyCC (tcc), which barely supports C99. Headers like src/bun.js/api/FFI.h and src/bun.js/api/ffi-stdbool.h are only used for FFI compilation with tcc, not with the main compiler. Therefore, C standard changes to the main Bun target do not affect FFI code compilation.
Applied to files:
src/bun.js/bindings/helpers.h
📚 Learning: 2025-10-18T01:49:31.037Z
Learnt from: taylordotfish
Repo: oven-sh/bun PR: 23755
File: src/bun.js/api/bun/socket/SocketConfig.bindv2.ts:58-58
Timestamp: 2025-10-18T01:49:31.037Z
Learning: In Bun's bindgenv2 TypeScript bindings (e.g., src/bun.js/api/bun/socket/SocketConfig.bindv2.ts), the pattern `b.String.loose.nullable.loose` is intentional and not a duplicate. The first `.loose` applies to the String type (loose string conversion), while the second `.loose` applies to the nullable (loose nullable, treating all falsy values as null rather than just null/undefined).
Applied to files:
src/bun.js/bindings/helpers.h
🧬 Code graph analysis (2)
src/bun.js/bindings/bindings.cpp (2)
src/bun.js/bindings/BunString.cpp (10)
toString(178-181)toString(178-178)toString(208-211)toString(208-208)toString(230-236)toString(230-230)toString(237-243)toString(237-237)toString(244-250)toString(244-244)src/bun.js/bindings/ErrorCode.cpp (4)
STRING_TOO_LONG(1083-1089)STRING_TOO_LONG(1083-1083)create(169-174)create(169-169)
test/js/web/html/FormData.test.ts (1)
src/js/internal-for-testing.ts (1)
setSyntheticAllocationLimitForTesting(110-114)
🔇 Additional comments (6)
src/bun.js/bindings/helpers.h (2)
5-5: UTF-8 → UTF-16 precheck and max-length policy are consistent and safeIncluding
wtf/SIMDUTF.hand gating all UTF‑8ZigString→WTF::Stringconversions behind:
maxLength = min(Bun__stringSyntheticAllocationLimit, static_cast<size_t>(WTF::String::MaxLength)), and- a secondary
utf16_length_from_utf8check only when the byte length already exceedsmaxLength,ensures we never hand WebKit an input that would produce a string longer than its internal limit, while avoiding extra work on common, small strings. Returning a default-constructed
WTF::Stringin the overflow case matches the documented invariant thattoString()only yields an empty result for non-empty input when the string is too long, which the new FormData URL-query path now relies on.No changes requested here.
Also applies to: 83-119
131-185: Extended guards for substring and builder paths correctly mirror the core toString policyThe added
maxLength/simdutf checks in:
toString(ZigString, StringPointer),toStringCopy(ZigString, StringPointer),toStringCopy(ZigString), andappendToBuilder(ZigString, StringBuilder&),bring the substring and append cases in line with the main
toString(ZigString)behavior:
- UTF‑8‑tagged substrings use
utf16_length_from_utf8only whenptr.lenexceedsmin(synthetic, MaxLength), and bail out with{}(or a no-op append) when the computed UTF‑16 length would overflow.- Non‑UTF‑8 paths explicitly check both
Bun__stringSyntheticAllocationLimitandWTF::String::MaxLengthbefore creating or copying backing storage.This closes the remaining oversize-string assertion vectors along these helper call sites without changing semantics for normal inputs; callers will now see either a truncated “empty” string/no-op when the data is too long, or a normal string otherwise.
Looks good as implemented.
Also applies to: 187-220, 222-246
src/url.zig (3)
983-996: LGTM! Error propagation properly implemented.The changes correctly update the error handling pattern:
- Return type changed to
bun.JSError!jsc.JSValuefor proper error propagation- URLEncoded path checks for
.zeroto detect exceptions fromcreateFromURLQuery(e.g., string too long)- Multipart path uses
tryto propagate errors fromtoJSFromMultipartDataThis aligns with the PR objective to throw errors instead of crashing or returning empty FormData.
998-1050: Signature and error propagation updated correctly.The function signature now returns
bun.JSError!jsc.JSValueand usestryat line 1049 to propagate errors fromFormData.toJS. This maintains consistency with the new error handling pattern.
1057-1144: Error throwing approach properly implemented.The function now:
- Returns
bun.JSError!jsc.JSValue(line 1061)- Uses
globalThis.throwError()to throw JS exceptions at lines 1066 and 1139 instead of returning raw error values- Provides descriptive error context ("while parsing FormData")
This is the correct pattern for Bun's JSC error handling and addresses the requirement to throw exceptions rather than crash or return empty results.
test/js/web/html/FormData.test.ts (1)
280-295: Test properly validates the fix for oversized input.The test correctly:
- Uses
setSyntheticAllocationLimitForTestingto create a testable boundary condition (1MB limit with 2MB input)- Verifies
FormData.fromthrows an exception rather than crashing (addressing the PR objective)- Uses
try/finallyto ensure cleanup even if the test fails- Employs partial string matching (
"Cannot create a string longer than") for flexibilityThe test structure is solid and validates the regression without requiring gigabytes of memory in CI.
837fddd to
bce40fc
Compare
When `FormData.from()` is called with a very large ArrayBuffer (exceeding
WebKit's String::MaxLength of INT32_MAX), it would crash with an assertion
failure in WebKit's StringImpl. This fixes the issue by:
1. Adding length checks in the C++ `toString` and related functions (helpers.h)
to check against both Bun's synthetic limit and WebKit's String::MaxLength.
For UTF-8 tagged strings, we use simdutf to calculate the actual UTF-16
length only when the byte length exceeds the limit.
2. Throwing an ERR_STRING_TOO_LONG exception from createFromURLQuery when the
string is too long, instead of silently returning an empty FormData.
Changes:
- Add length checks to all UTF-8 code paths in helpers.h:
- toString(ZigString)
- toString(ZigString, StringPointer)
- toStringCopy(ZigString)
- toStringCopy(ZigString, StringPointer)
- appendToBuilder(ZigString, StringBuilder)
- Add WTF::String::MaxLength check to non-UTF-8 paths
- Throw ERR_STRING_TOO_LONG in createFromURLQuery when string is too long
- Properly propagate JSError from toJS to fromMultipartData
Now `FormData.from(new Uint32Array(913148244))` throws:
error: Cannot create a string longer than 2147483647 characters
code: "ERR_STRING_TOO_LONG"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
bce40fc to
265f1b2
Compare
Summary
FormData.from()when called with very large ArrayBuffer inputtoStringfunction against both Bun's synthetic limit and WebKit'sString::MaxLengthRoot Cause
When
FormData.from()was called with a very large ArrayBuffer (e.g.,new Uint32Array(913148244)= ~3.6GB), the code would crash with:The
toString()function inhelpers.hwas only checking againstBun__stringSyntheticAllocationLimit(which defaults to ~4GB), but not against WebKit'sString::MaxLength(INT32_MAX, ~2GB). When the input exceededString::MaxLength,createWithoutCopying()would fail with an assertion.Changes
helpers.h: Added
|| str.len > WTF::String::MaxLengthchecks to all three code paths intoString():url.zig: Reverted the incorrect Zig-side check (UTF-8 byte length != UTF-16 character length)
Test plan
🤖 Generated with Claude Code