Skip to content

Conversation

@mcollina
Copy link

@mcollina mcollina commented Nov 19, 2025

This PR does a lot of modernization and updates, and it also vendors critical dependencies because they were bottlenecks; I didn't want to fork them into individual repositories. The goal of those changes is to significantly reduce the overhead of prom-client and make it easier to maintain/test.

⚠ counter ➭ new is 8.628% acceptably slower.
⚠ counter ➭ inc is 4.248% acceptably slower.
✓ counter ➭ inc with labels is 60.66% faster.
⚠ gauge ➭ inc is 1.275% acceptably slower.
✓ gauge ➭ inc with labels is 112.6% faster.
✓ histogram ➭ observe#1 with 64 is 91.29% faster.
✓ histogram ➭ observe#2 with 8 is 68.05% faster.
✓ histogram ➭ observe#2 with 4 and 2 with 2 is 35.47% faster.
✓ histogram ➭ observe#2 with 2 and 2 with 4 is 34.35% faster.
✓ histogram ➭ observe#6 with 2 is 40.55% faster.
✓ histogram ➭ startTimer#1 with 64 is 23.84% faster.
✓ histogram ➭ startTimer#2 with 8 is 27.74% faster.
✓ histogram ➭ startTimer#2 with 4 and 2 with 2 is 18.38% faster.
✓ histogram ➭ startTimer#2 with 2 and 2 with 4 is 20.34% faster.
✓ histogram ➭ startTimer#6 with 2 is 25.85% faster.
✓ summary ➭ observe#1 with 64 is 61.16% faster.
✓ summary ➭ observe#2 with 8 is 56.47% faster.
✓ summary ➭ observe#2 with 4 and 2 with 2 is 26.87% faster.
✓ summary ➭ observe#2 with 2 and 2 with 4 is 28.96% faster.
✓ summary ➭ observe#6 with 2 is 41.48% faster.
✓ registry ➭ getMetricsAsJSON() no labels is 29.44% faster.
✓ registry ➭ getMetricsAsJSON() 1 x 64 is 14.81% faster.
✓ registry ➭ getMetricsAsJSON() 2 x 4 is 7.147% faster.
✓ registry ➭ getMetricsAsJSON() 2 x 8 is 12.15% faster.
✓ registry ➭ getMetricsAsJSON() 6 x 2 is 11.09% faster.
✓ registry ➭ getMetricsAsJSON() 2 x 4, 2 defaults is 13.37% faster.
✓ registry ➭ getMetricsAsJSON() 2 x 2, 4 defaults is 19.87% faster.
✓ registry ➭ metrics() no labels is 24.78% faster.
✓ registry ➭ metrics() no labels and openMetrics is 19.77% faster.
✓ registry ➭ metrics() 1 x 64 is 9.910% faster.
✓ registry ➭ metrics() 1 x 64 and openMetrics is 9.042% faster.
✓ registry ➭ metrics() 2 x 4 is 8.194% faster.
✓ registry ➭ metrics() 2 x 4 and openMetrics is 6.241% faster.
✓ registry ➭ metrics() 2 x 8 is 7.505% faster.
✓ registry ➭ metrics() 2 x 8 and openMetrics is 9.062% faster.
✓ registry ➭ metrics() 6 x 2 is 12.14% faster.
✓ registry ➭ metrics() 6 x 2 and openMetrics is 10.71% faster.
✓ registry ➭ metrics() 2 x 4, 2 defaults is 33.89% faster.
✓ registry ➭ metrics() 2 x 4, 2 defaults and openMetrics is 32.25% faster.
✓ registry ➭ metrics() 2 x 2, 4 defaults is 34.64% faster.
✓ registry ➭ metrics() 2 x 2, 4 defaults and openMetrics is 35.77% faster.
✓ cluster ➭ aggregate() is 3.535% faster.

mcollina and others added 21 commits September 28, 2025 14:16
Add comprehensive development guide for Claude Code instances
including commands, architecture, and development patterns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Replace Jest with node:test for all 26 test files
- Add @sinonjs/fake-timers for timer mocking functionality
- Remove Jest snapshots in favor of explicit error messages
- Create test/helpers.js with describeEach and timer utilities
- Update package.json test commands for node:test compatibility
- Convert all Jest assertions to assert module patterns
- Handle module mocking limitations with custom solutions
- Remove Jest artifacts and __snapshots__ directory
- Update CLAUDE.md documentation with new test commands
- Fix ESLint configuration to support node:test features

Test Results: 511/518 tests passing (98.6% success rate)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fixed several test failures that occurred during the migration from Jest
to node:test in registerTest.js:

1. Updated 'should throw if created with an unsupported type' test to
   expect TypeError instead of Error, matching the actual implementation
   in lib/registry.js where the constructor throws TypeError for invalid
   content types.

2. Replaced fragile JSON.stringify() comparisons with assert.deepStrictEqual()
   in three "should not throw with default labels" tests (counter, gauge,
   histogram). The JSON.stringify approach was failing due to property order
   differences, while deepStrictEqual properly compares object structure
   regardless of property order.

All 518 tests now pass successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Vendor [email protected] implementation into lib/tdigest/
- Vendor bintrees dependency into lib/bintrees/ (required by tdigest)
- Remove tdigest from package.json dependencies
- Add tdigest tests to test/tdigest/ (converted from Mocha to node:test)
- Update imports to use vendored code
- All vendored code includes appropriate MIT license headers

This eliminates external dependency on the unmaintained tdigest package
while preserving full functionality and test coverage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Replace .each() calls with manual iterator loops in toArray() and _cumulate()
- Replace .map() with for loops in p_rank() and percentile()
- Add mitata benchmarks for tdigest and bintrees
- Install mitata as dev dependency

Performance improvements:
- percentile queries: ~25% faster (303ns -> 226ns for p50 on 100 values)
- push operations: ~5% faster (77.5ns -> 73.7ns)
- compress operations: ~7% faster (134µs -> 124µs)

All tests continue to pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
Changed osMemoryHeapLinux to use async readFile from fs/promises
instead of synchronous readFileSync for better non-blocking I/O.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…emoryHeapLinux

- Extract file reading logic into readAndProcessMemoryStats function
- Add collectMemoryStats with concurrency control to prevent simultaneous reads
- Add collect methods to all three gauges (resident, virtual, heap)
- Use promise chains instead of async/await for cleaner code
- Ensure only one /proc/self/status read happens at a time

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fix performance regression in Counter/Gauge/Histogram/Summary creation
by passing pre-sorted labelNames from Metric class to LabelMap constructor.

The previous optimization to keyFrom() inadvertently introduced duplicate
sorting during metric instantiation:
- Metric.constructor() sorts labels into sortedLabelNames
- LabelMap.constructor() was re-sorting with labelNames.slice().sort()

This change passes the already-sorted sortedLabelNames array directly to
LabelMap, avoiding redundant array copying and sorting operations.

Performance impact:
- Counter creation: improved from 25.90% slower to 6.73% slower vs baseline
- ~19% performance improvement in metric instantiation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The processOpenFileDescriptors test was failing on macOS because the Jest
to node:test migration in commit 2cf3295 lost the platform mocking that
made the test pass on all platforms.

The original Jest test used jest.mock to fake process.platform as 'linux',
but node:test doesn't have equivalent mocking capabilities. Since the metric
implementation correctly returns early on non-Linux platforms, the test
should skip on non-Linux systems rather than fail.

This change:
- Adds { skip: !isLinux } to the test to properly skip it on non-Linux
- Disables n/no-unsupported-features/node-builtins for test files to allow
  use of node:test features

Fixes test regression from commit 2cf3295.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…lization

- Replace .map() with for loops in histogram.js for better performance
- Inline extractBucketValuesForExport() to eliminate function call overhead
- Remove unused addSumAndCountForExport() function
- Refactor escapeLabelValue() and escapeString() to single-pass traversal
- Eliminate multiple .replace() calls in string escaping

Performance improvements:
- ~4% improvement in registry.metrics() serialization (3,160 -> 3,298 calls/sec)
- Reduced average time per call from 0.316ms to 0.303ms
- More efficient string processing with switch-based character escaping

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace .map() with pre-allocated array and for loop in the metrics()
method to follow consistent optimization pattern throughout the codebase.

Note: Benchmarks show minimal difference (~2.5% slower: 3,216 vs 3,298 calls/sec),
suggesting V8's optimizer handles .map() well in this context. The change maintains
code consistency with other recent optimizations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace this.getMetricsAsArray() with direct iteration over
this._metrics.values() to eliminate unnecessary Array.from() conversion.

Performance improvement: ~1.3% faster (3,216 -> 3,259 calls/sec)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove unnecessary escape characters in regex pattern (test/utilTest.js)
- Add periods to JSDoc descriptions per jsdoc/match-description rule (test/helpers.js)
- Auto-fix prettier formatting issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
siimon#13)

* perf: eliminate promise allocation in Histogram when collect is not present

Implement Option A from PLAN.md:
- Split get() and getForPromString() into sync and async versions
- Sync versions contain the core logic
- Async versions wrap sync versions when collect function exists
- Constructor conditionally assigns methods based on presence of collect

Benefits:
- Zero promise overhead for histograms without collect function
- No microtick added to event loop for common case
- Maintains backward compatibility
- All tests pass

Benchmark results show no regression (3,257 calls/sec vs baseline 3,259).
Performance improvement will be measurable in metrics without collect functions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* perf: eliminate promise allocation in metrics when collect is not present

Implement Option A from PLAN.md with optimizations for all metric types:
- Split get() methods into sync (_getSync) and async (_getAsync) versions
- Constructor conditionally assigns methods based on presence of collect function
- Sync version avoids promise allocation and microtick overhead
- Async version uses .then() instead of async/await to avoid extra overhead
- Supports collect functions set at construction time only

Implementation:
- Histogram: Both getForPromString() and get() optimized
- Summary, Counter, Gauge: get() optimized
- Base Metric class: Freezes collect property to prevent post-construction assignment
- heapSpacesSizeAndUsed: Refactored to pass collect in constructor

Benefits:
- Zero promise overhead for metrics without collect function
- No microtick added to event loop for common case
- Maintains backward compatibility
- All tests pass (45/45)

Performance:
- Benchmark: ~3,000 calls/sec (vs ~3,260 baseline, ~8% overhead from method indirection)
- Overhead only affects serialization (registry.metrics()), not metric recording
- Acceptable tradeoff for cleaner code and preventing post-construction collect assignment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor: extract _splayAndGet utility in histogram to eliminate duplication

Extract common splay logic into _splayAndGet() method to avoid code duplication
between _getSync() and _getAsync() in histogram.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* perf(registry): eliminate promise allocation when metrics are synchronous

- Split getMetricsAsString() into sync/async paths via _formatMetricAsString()
- Split metrics() to check for promises and return synchronously when possible
- Split getMetricsAsJSON() into sync/async paths via _formatMetricsAsJSON()
- When all metrics lack collect functions, zero promises are allocated
- Eliminates microtick overhead for synchronous serialization

This completes Phase 5 of the promise allocation optimization plan.

* docs: update PLAN.md with Phase 5 completion status

* docs: add comprehensive performance benchmark results

Shows significant improvements across all metric types:
- Registry serialization: up to +38% improvement
- Histogram operations: up to +107% improvement
- Summary operations: up to +79% improvement
- Counter/Gauge with labels: up to +123% improvement

* docs: remove implementation plan and benchmark results

These files were useful during development but are not needed
in the final PR. The key information is captured in the PR
description and commit messages.

---------

Co-authored-by: Claude <[email protected]>
Signed-off-by: Matteo Collina <[email protected]>
…st migration

Comprehensive changelog update including:
- Performance optimizations (promise allocation, array operations, histogram, tdigest)
- Test suite migration from Jest to node:test
- Vendored dependencies (tdigest, bintrees)
- Various bug fixes and refactoring

Covers commits from f6dc1a3 to 4d589c6 (17 commits total).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add fast paths to eliminate regression in counter.inc() and gauge.inc()
operations without labels:

1. validate(): Skip validation when labels is undefined or metric has no
   label names. Avoids unnecessary for-in loop overhead for common case.

2. keyFrom(): Check for empty object using for-in instead of Object.keys()
   to avoid array allocation. Critical for hot path performance.

Performance improvements:
- counter.inc() (no labels): 11.3M ops/sec vs 11.4M baseline (0.54% slower)
- gauge.inc() (no labels): 15.2M ops/sec vs 16.2M baseline (6.2% slower)
- counter.inc() with labels: 66% faster than baseline
- gauge.inc() with labels: 99% faster than baseline

These optimizations reduce the regression from 5-8% to 0.5-6% while
maintaining massive performance gains for labeled metrics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@mcollina mcollina mentioned this pull request Nov 19, 2025
@SimenB
Copy link
Collaborator

SimenB commented Nov 21, 2025

This is amazing 😍

I'll review later today

"test": "npm run lint && npm run check-prettier && npm run compile-typescript && npm run test-unit -- --coverage",
"lint": "eslint .",
"test-unit": "jest",
"test-unit": "node --test test/**/*Test.js",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows CI is failing. would quoting the pattern help?

Copy link
Collaborator

@SimenB SimenB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome work, thanks for taking the time!

I wonder if we could avoid vendoring the dependencies, tho. I know you said you didn't wanna fork them, but are they unmaintained?

it might make sense to migrate test runner in a separate PR to keep this focussed on the actual code changes, but I don't feel strongly

@@ -0,0 +1,129 @@
'use strict';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put the vendored deps in their own vendor/ directory or something to make it explicit?

//
// Node.js/libuv do this already for process.memoryUsage(), see:
// - https://github.com/libuv/libuv/blob/a629688008694ed8022269e66826d4d6ec688b83/src/unix/linux-core.c#L506-L523
return readFile('/proc/self/status', 'utf8').then(structureOutput);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably doesn't matter, but can do the structureOutput part inside the then where values are set?

if (v instanceof Promise) await v;
}

_getSync() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make them actually private rather than relying on _ prefix convention?

const v = this.collect();
if (v instanceof Promise) await v;
}
_getSync() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto private comment (won't repeat in the others - but applies to all new _* methods)

Comment on lines +36 to +37
if (label === 'quantile')
throw new Error('quantile is a reserved label keyword');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (label === 'quantile')
throw new Error('quantile is a reserved label keyword');
if (label === 'quantile') {
throw new Error('quantile is a reserved label keyword');
}

// This avoids Object.keys() allocation and is critical for performance
// when incrementing metrics without labels
let isEmpty = true;
// eslint-disable-next-line no-unused-vars
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should update the eslint rule to allow this with _

};
}

_getAsync() {
Copy link
Contributor

@jdmarshall jdmarshall Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m pretty sure you’re not actually saving much computationally by doing it this way rather than retaining async in the function signature here.

    async _getAsync() {
        if (!this.collect) {
            return this._getSync(); //this got faster ~ Node 18
        }
        
        await this.collect();

        …
    }

sacrificing .5% raw throughput for readability makes it easier for the next person to find another 3% elsewhere, instead of getting bogged down in unstated invariants.

Also I have another PR out there that speeds a lot of this file up other more substantial ways.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually substantial. If you are collecting N metrics that are not async, you are adding at least N*6 microticks (due to all the other usages I've cut in metrics collection). Those add up pretty quickly.

Copy link
Contributor

@jdmarshall jdmarshall Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having spent more time with benchmarking code (looking at ~100x perf difference, down from 110x in Node 20), I think what you should do is this:

Drop the async/sync function separation entirely. Just have one function that returns either a value or a Promise.

When we are aggregating the values, use a for of loop and typecheck for a promise and await any that are. So only the aggregation function is async. That also should allow you to skip over the Promise.all() if only one or two metrics are async, which is likely to be the dominant case.

You should have benchmarks that assume that 1 out of a set of metrics is async and the rest are sync.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the async/sync function separation entirely. Just have one function that returns either a value or a Promise.

That would actually be less readable in my experience as there would be some duplicated code/logic required. (That's how I did it in the first place but it was ugly).

@@ -0,0 +1,226 @@
import { group, bench, run } from 'mitata';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had too many other PRs outstanding so I didn’t post this yet, but I have a PR to replace benchmark-regressions with a workalike that wraps bench-node, which is maintained by a NodeJS core contributor. If I can get him to land one more PR I can clean up some junk in the output and then file that here, clearing the dead deps.

@jdmarshall
Copy link
Contributor

Fixes #671

@jdmarshall
Copy link
Contributor

@SimenB some of those numbers are my numbers because you don’t currently bench trunk against the PR.

return isOpenMetrics
? `${resolves.join('\n')}\n# EOF\n`
: `${resolves.join('\n\n')}\n`;
? `${results.join('\n')}\n# EOF\n`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have two branches with this same code block in it. I think they could probably be combined by using a list comprehension in the middle.

@SimenB
Copy link
Collaborator

SimenB commented Nov 27, 2025

@jdmarshall very happy with #728, so happy to land it so we can properly compare benchmarks before landing this one 👍

@jdmarshall
Copy link
Contributor

@mcollina I think if you rebase you'll get a new build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants