Skip to content

Conversation

@elle-j
Copy link
Contributor

@elle-j elle-j commented Nov 26, 2025

Description

Adds a workflow executing the benchmarks in resolc and revive-yul.

  • The workflow runs if the existing Test workflow:
    • Has completed successfully
    • Was triggered by a pull_request event
  • The benchmarks will run on the main branch and the PR branch
  • The result of the comparison is posted as a PR comment
    • The comment is updated if it exists, otherwise it is created

Notes

File must exist on the default branch

  • This workflow is triggered on a workflow_run event only if the workflow file exists on the default branch. Therefore, the workflow will not run for this PR. (See section further below on how it has still been tested.)

The default branch workflow will be used

  • When the workflow does run, it uses the one on the default branch.
  • Con:
    • When all the checks of a PR have completed successfully, this Benchmark workflow will start without any indications in the PR that the workflow is running.
    • Thus, it will not be part of the "Checks" for the triggering PR.
    • We may want to integrate some indication that a benchmark report will be added as a PR comment, so as to wait with merging when that's applicable.

Testing the Workflow

I've tested a similar enough workflow in a personal repo in order to push it to main and have it be triggered. Will verify again once this PR is merged.

  • Runs when the triggering workflow completes successfully
  • Does not run when the triggering workflow fails
  • workflow_run is triggered by a PR event
  • workflow_run is not triggered by a push event
  • A PR comment is created if it does not exist
  • A PR comment is updated if it already exists

Example PR Comment

Expand to see an example PR comment

Benchmarks

Compile E2E

Results
group                                            main_resolc                            pr_resolc
-----                                            -----------                            ---------
Dependency/resolc                                1.00    144.0±6.94ms        ? ?/sec    1.00    144.6±5.97ms        ? ?/sec
Dependency/solc                                  1.04    65.9±10.73ms        ? ?/sec    1.00    63.5±10.10ms        ? ?/sec
Empty/resolc                                     1.03     68.8±4.40ms        ? ?/sec    1.00     66.6±4.61ms        ? ?/sec
Empty/solc                                       1.13     11.2±2.01ms        ? ?/sec    1.00     10.0±2.28ms        ? ?/sec
LargeDivRem/resolc                               1.00    119.3±5.62ms        ? ?/sec    1.02   121.3±17.03ms        ? ?/sec
LargeDivRem/solc                                 1.15     24.7±3.15ms        ? ?/sec    1.00     21.5±1.03ms        ? ?/sec
Memset (`--yul`)/resolc                          1.02     62.7±3.92ms        ? ?/sec    1.00     61.7±3.76ms        ? ?/sec
Memset (`--yul`)/solc                            1.00      9.0±0.84ms        ? ?/sec    1.12     10.1±2.23ms        ? ?/sec
Multiple Contracts (`--standard-json`)/resolc    1.01  1559.2±68.95ms        ? ?/sec    1.00  1539.3±64.92ms        ? ?/sec
Multiple Contracts (`--standard-json`)/solc      1.01   652.0±19.38ms        ? ?/sec    1.00   648.4±31.83ms        ? ?/sec
Return (`--yul`)/resolc                          1.00    60.1±17.02ms        ? ?/sec    1.02    61.2±14.50ms        ? ?/sec
Return (`--yul`)/solc                            1.09      8.8±2.37ms        ? ?/sec    1.00      8.0±3.08ms        ? ?/sec

Parse Yul

Results
group             main_yul_parse                         pr_yul_parse
-----             --------------                         ------------
Baseline/parse    1.07      8.9±3.64µs        ? ?/sec    1.00      8.4±0.09µs        ? ?/sec
ERC20/parse       1.00   162.5±34.25µs        ? ?/sec    1.07  173.3±123.25µs        ? ?/sec
SHA1/parse        1.01    80.8±33.04µs        ? ?/sec    1.00    80.3±52.82µs        ? ?/sec
Storage/parse     1.00     16.9±8.15µs        ? ?/sec    1.01     17.2±7.54µs        ? ?/sec
Transfer/parse    1.00     18.5±0.23µs        ? ?/sec    1.23    22.7±18.50µs        ? ?/sec

Lower Yul

Results
group             main_yul_lower                         pr_yul_lower
-----             --------------                         ------------
Baseline/lower    1.00    122.1±5.95µs        ? ?/sec    1.09   133.6±35.12µs        ? ?/sec
ERC20/lower       1.00   641.7±13.24µs        ? ?/sec    1.15  735.6±665.92µs        ? ?/sec
SHA1/lower        1.00   361.4±78.47µs        ? ?/sec    1.09  392.3±114.48µs        ? ?/sec
Storage/lower     1.00   167.2±50.34µs        ? ?/sec    1.01   168.2±32.45µs        ? ?/sec
Transfer/lower    1.00    167.5±8.70µs        ? ?/sec    1.03   173.1±16.32µs        ? ?/sec

Resolved Issues

Closes #405

@elle-j elle-j changed the title [WIP] ci: Add GHA workflow executing benchmarks ci: Add GHA workflow executing benchmarks Nov 27, 2025
@elle-j elle-j marked this pull request as ready for review November 27, 2025 17:36
@elle-j elle-j requested review from kvpanch and xermicus November 27, 2025 17:36
cargo bench --package resolc --bench compile -- --save-baseline main_resolc
cargo bench --package revive-yul --bench parse -- --save-baseline main_yul_parse
cargo bench --package revive-yul --bench lower -- --save-baseline main_yul_lower
timeout-minutes: 20
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also remove the timeout-minutes and add it only if we notice unusually long runs. (This took about 6-7min on M4 Pro.)

Copy link
Contributor

@kvpanch kvpanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it known if CI machine is good for such perf testing ? For example, can CI machine run several tasks in parallel ? If so, perf results might not be stable and it might be required to run such perf testing on a dedicated machine

Comment on lines +46 to +47
git fetch origin main
git checkout -f main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, that makes sense to parameterize it by allowing to specify branch or tag to test against. This will help to nicely measure improvements/regressions over some release.
Obviously, that can be done later, not in this PR

Copy link
Contributor Author

@elle-j elle-j Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a neat and valuable feature, thanks. Can add in a follow-up 👍

Regarding your other comment and concurrency, jobs and workflows (not steps within jobs) can run concurrently by default. Which, thanks for the reminder, made me add a concurrency group where this workflow only runs one at a time for the given PR branch (and cancels in-progress ones if it's triggered again while in progress).

I'd have to look up whether another CI machine is still more optimial for perf testing tho.

Copy link
Member

@xermicus xermicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall


jobs:
benchmark:
runs-on: ubuntu-24.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to use the same host we benchmark the polkadot SDK with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like different ones are being used for benches in the Polkadot SDK, but two common ones are e.g. parity-large and parity-benchmark (both enabled in our revive repo). The differences can be seen here.

Comment on lines +21 to +24
- name: Checkout PR Branch
uses: actions/checkout@v4
with:
ref: ${{ github.event.workflow_run.head_sha }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking out the PR branch here only to checkout the main branch first and then switch to the PR branch again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially due to the get-solc and get-llvm actions used first. I figured we'd want to use these action files from the PR branch in case those were modified as part of the PR. An alternative would be to run those steps twice (once on main and then on the PR branch after switching).

types: [completed]

concurrency:
group: ${{ github.workflow }}-${{ github.event.workflow_run.head_branch }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a comment here why this is needed and what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add that it cancels in-progress runs (for runs on the same branch as the PR) if the workflow is re-triggered during a current run (by e.g. another push to the PR and successful Test workflow run).

## Parse Yul

<details>
<summary>Results</summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiding the results is definitively what we want here in order to not make it too noisy. Can we also have a summary or something on top that tells PR authors quickly whether the benchmarks have regressed, improved or no significant change was detected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GHA workflow executing benchmarks

4 participants