Skip to content

Conversation

@glevco
Copy link
Contributor

@glevco glevco commented Jul 4, 2025

Motivation

Current exception handling on nano contract execution has a few problems. It wraps any exception thrown during user code execution (that is, blueprint code) in an NCFail, and later any NCFail marks a tx execution as failed. This means bugs in our code would become wrapped exceptions and fail tx executions instead of crashing the full node, which would be the expected behavior.

This PR addresses this by recognizing that user-code and internal code should handle errors differently. If both use normal exception handling, it's pretty easy to accidentally include an internal bug in the eternal state of the blockchain.

It introduces a new Result type, inspired by Rust, for explicit error handling. All code from the consensus point where we call the runner down to actually executing user code with exec() is now handled with this type instead of raising exceptions. Since user code doesn't know about this type, exceptions are used to bridge internal code to user code and vice-versa, via raising/catching only on the boundaries: exceptions are raised on syscalls, and caught in exec().

Review Notes

  • Begin by reviewing the new hathor/utils/result.py file, to understand the new type.
  • Review the changes in metered_exec.py which are the innermost code handling exceptions when user code is executed with exec(). There are also comments explaining the new behavior in it.
  • Review runner.py.
  • Review all the rest which is basically adapting code from using exceptions to using Result.

Acceptance Criteria

  • Implement Result type for explicit error handling.
  • Refactor error handling on contract execution to use the Result type.
    • Code that should be updated (mostly Runner and other nano-related code) is changed to explicitly handle Results instead of exceptions.
    • Code that previously used exceptions and now receives results calls the unwrap_or_raise() method to bridge from a result to an exception. This includes for examples calls from APIs or test code.
    • Code that receives results that should never fail call unwrap() or expect().

Rationale and Alternatives

Using a custom Result type for error handling is something that we've considered for a while, and as it's a natural solution for the problem this PR addresses, I implemented it. I considered two libs:

  1. https://github.com/dry-python/returns - this is a pretty popular Python lib that implements not only a Result type but also other utility types. I tried it on branch refactor/nano/error-handling. It's a good lib and we may even reconsider using it in the future for its other features, but I didn't like the ergonomics of its Result type (it's not a sum type).
  2. https://github.com/rustedpy/result - an almost direct translation of Rust's Result type to Python. It's also a good lib and I liked its ergonomics, but it's unmaintained.

Considering this, and wanting to prevent new project dependencies, I decided to not use any of them and implement our own type. I did copy and adapt the implementation of the second one though, which is licensed under MIT. It's mostly the same, except for these differences:

  1. Removed deprecated properties.
  2. Rename unwrap_or_raise to unwrap_or_raise_another, because I implemented unwrap_or_raise with another behavior (analogous to the unmerged feat: add unwrap_or_raise_itself() if a value of Err is a BaseException rustedpy/result#199).
  3. Remove do-notation support because it's too esoteric for Python and we don't need it.
  4. Add support for capturing tracebacks on Results when the error type is an exception.
  5. Implement unwrap_or_propagate which emulates Rust's question mark operator, for better ergonomics (adapted from the unmerged feat: Emulate Rust's question mark operator rustedpy/result#197).

Risks

Incorrectly wrapped exceptions

Before this PR, ALL exceptions raised during contract execution would become part of the blockchain state forever through the tx execution state. This problem has been mitigated, because now only explicitly handled exceptions (with the Result type) become part of the tx failure. Any other exceptions, for example from bugs or asserts, will now crash the full node as expected.

This means we can still incorrectly make a bug part of the blockchain state if we handle an exception as a Result when it should be an assertion instead, for example. The difference is that now it's explicit. The reviewers should pay special attention to identify these cases — whether there's any instance where we handle a Result error but should use unwrap() instead.

Serialization

The serialization system is too extensive and raises a lot of exceptions internally. Instead of refactoring the whole module to use Results too, I just handled them on the boundary since nano execution only calls it in a single place.

For that, I identified it can raise 3 exception types: SerializationError, ValueError, TypeError. This means that any other exceptions thrown in the serialization module can crash the full node (which is way better than becoming part of the blockchain, as it would before). If there are any other know exceptions that can be raised, we should include them. But it's hard to do better than grepping for raises.

It also means that any ValueError or TypeError raised by Python itself will also become a reason for tx execution failure, even if it's caused by a bug in internal code. Ideally we should refactor the module to use specific exceptions, and even better, to use Results too.

Checklist

  • If you are requesting a merge into master, confirm this code is production-ready and can be included in future releases as soon as it gets merged

@glevco glevco self-assigned this Jul 4, 2025
@glevco glevco requested review from jansegre and msbrogli as code owners July 4, 2025 03:17
@glevco glevco moved this from Todo to In Progress (WIP) in Hathor Network Jul 4, 2025
@glevco glevco force-pushed the refactor/nano/error-handling-no-lib branch 2 times, most recently from 8965d3e to b18e20d Compare July 4, 2025 03:27
@github-actions
Copy link

github-actions bot commented Jul 4, 2025

🐰 Bencher Report

Branchrefactor/nano/error-handling-no-lib
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
minutes (m)
(Result Δ%)
Lower Boundary
minutes (m)
(Limit %)
Upper Boundary
minutes (m)
(Limit %)
sync-v2 (up to 20000 blocks)📈 view plot
🚷 view threshold
1.69 m
(+3.03%)Baseline: 1.64 m
1.47 m
(87.35%)
1.80 m
(93.66%)
🐰 View full continuous benchmarking report in Bencher

@glevco glevco force-pushed the refactor/nano/error-handling-no-lib branch 3 times, most recently from 8864a4b to fc7fa9c Compare July 4, 2025 16:30
@codecov
Copy link

codecov bot commented Jul 4, 2025

Codecov Report

Attention: Patch coverage is 80.78125% with 123 lines in your changes missing coverage. Please review.

Project coverage is 85.40%. Comparing base (e417e0e) to head (69b525f).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
hathor/utils/result.py 63.92% 77 Missing and 2 partials ⚠️
hathor/nanocontracts/runner/runner.py 89.23% 14 Missing ⚠️
hathor/transaction/headers/nano_header.py 84.61% 5 Missing and 1 partial ⚠️
hathor/nanocontracts/runner/types.py 73.33% 3 Missing and 1 partial ⚠️
hathor/transaction/storage/transaction_storage.py 84.61% 3 Missing and 1 partial ⚠️
hathor/consensus/block_consensus.py 87.50% 2 Missing and 1 partial ⚠️
hathor/consensus/consensus.py 70.00% 2 Missing and 1 partial ⚠️
hathor/nanocontracts/resources/blueprint.py 72.72% 2 Missing and 1 partial ⚠️
hathor/nanocontracts/balance_rules.py 94.44% 2 Missing ⚠️
hathor/nanocontracts/context.py 83.33% 2 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1311      +/-   ##
==========================================
- Coverage   85.63%   85.40%   -0.23%     
==========================================
  Files         426      427       +1     
  Lines       31947    32265     +318     
  Branches     4962     5015      +53     
==========================================
+ Hits        27358    27557     +199     
- Misses       3590     3696     +106     
- Partials      999     1012      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@glevco glevco force-pushed the refactor/nano/error-handling-no-lib branch from fc7fa9c to 7c8257a Compare July 5, 2025 00:16
Comment on lines -61 to -62
class NCSerializationTypeError(NCSerializationError):
pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused.

Comment on lines +570 to 575
# TODO: Review the `from_callable` call. It contains some `raise TypeError`
# that may be caused by user input and crash the full node.
parser = Method.from_callable(method)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Comment on lines +195 to +197
# TODO: Check this expect - can I create a situation where it fails?
self.storage.create_token(TokenUid(td.token_id), td.token_name, td.token_symbol) \
.expect('this err is already handled in the create_token syscall, so it\'s safe to unwrap here')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Comment on lines +1231 to +1235
raise OCBInvalidBlueprintVertexType(blueprint_id.hex()) # TODO
tx_meta = blueprint_tx.get_metadata()
if tx_meta.voided_by or not tx_meta.first_block:
raise OCBBlueprintNotConfirmed(blueprint_id.hex())
raise OCBBlueprintNotConfirmed(blueprint_id.hex()) # TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it as simple as adding Err(...)? I'm wondering why it was left as a to do.

Comment on lines +257 to +260
except OutOfFuelError as e: # TODO
self.log.error('loading blueprint module failed, fuel limit exceeded')
raise OCBOutOfFuelDuringLoading from e
except OutOfMemoryError as e:
except OutOfMemoryError as e: # TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@glevco glevco force-pushed the refactor/nano/error-handling-no-lib branch 2 times, most recently from babb76b to 22adadd Compare July 7, 2025 16:10
@glevco glevco moved this from In Progress (WIP) to In Progress (Done) in Hathor Network Jul 7, 2025
def _create_changes_tracker(self, contract_id: ContractId) -> Result[NCChangesTracker, NanoContractDoesNotExist]:
"""Return the latest change tracker for a contract."""
nc_storage = self.get_current_changes_tracker_or_storage(contract_id)
nc_storage = self.get_current_changes_tracker_or_storage(contract_id).unwrap_or_propagate()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an unwrap() instead?

# This ensures that, even if the blueprint method attempts to exploit or alter the context, it cannot
# impact the original context. Since the runner relies on the context for other critical checks, any
# unauthorized modification would pose a serious security risk.
copied_context = ctx.copy().unwrap_or_propagate()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an unwrap() instead?

storage = self._storages.get(contract_id)
if storage is None:
storage = self.block_storage.get_contract_storage(contract_id)
storage = self.block_storage.get_contract_storage(contract_id).unwrap_or_propagate()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an unwrap() instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still confusing to me on which one should be used in each case.


__all__ = [
'__version__',
'HATHOR_DIR',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

A type that represents all possible errors that can happen during a nano contract method execution,
which can be either an NCFail or un unhandled exception caused by blueprint code (NCRuntimeFailure).
"""
NCFailure: TypeAlias = NCFail | NCRuntimeFailure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird name since there's a NCFail exception too.


# Any other exception is considered an unhandled exception,
# and is wrapped in an Err via an NCRuntimeFailure.
failure = NCRuntimeFailure()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename it to NCUnhandledException?

try:
env = metered_executor.exec(self.code.text)
except OutOfFuelError as e:
except OutOfFuelError as e: # TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?!

self.nc_id = ContractId(VertexId(b'1' * 32))
self.initialize_contract()
self.nc_storage = self.runner.get_storage(self.nc_id)
self.nc_storage = self.runner.get_storage(self.nc_id).unwrap_or_raise()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should expose this unwrap_or_raise() to blueprint unit tests.

Comment on lines +260 to +261
from hathor import HATHOR_DIR
from tests import TESTS_DIR # skip-import-tests-custom-check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

TBE = TypeVar('TBE', bound=BaseException)


class Ok(Generic[T]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the docstrings of the methods in this class are kind of useless. They say what they do (e.g., "Return the value.") but it doesn't explain when this method should be used. A better docstring at the class level may solve this issue.

Comment on lines +1231 to +1235
raise OCBInvalidBlueprintVertexType(blueprint_id.hex()) # TODO
tx_meta = blueprint_tx.get_metadata()
if tx_meta.voided_by or not tx_meta.first_block:
raise OCBBlueprintNotConfirmed(blueprint_id.hex())
raise OCBBlueprintNotConfirmed(blueprint_id.hex()) # TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it as simple as adding Err(...)? I'm wondering why it was left as a to do.

Comment on lines -182 to +193
self.create_contract_with_nc_args(contract_id, blueprint_id, context, nc_args)
else:
self.call_public_method_with_nc_args(contract_id, nano_header.nc_method, context, nc_args)
return self.create_contract_with_nc_args(contract_id, blueprint_id, context, nc_args)

return self.call_public_method_with_nc_args(contract_id, nano_header.nc_method, context, nc_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep the else block here since it will always run one or the other.

storage = self._storages.get(contract_id)
if storage is None:
storage = self.block_storage.get_contract_storage(contract_id)
storage = self.block_storage.get_contract_storage(contract_id).unwrap_or_propagate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still confusing to me on which one should be used in each case.

@github-project-automation github-project-automation bot moved this from In Progress (Done) to In Review (WIP) in Hathor Network Jul 8, 2025
@glevco glevco force-pushed the refactor/nano/error-handling-no-lib branch from 22adadd to 69b525f Compare July 11, 2025 13:59
@glevco glevco moved this from In Review (WIP) to In Progress (WIP) in Hathor Network Jul 11, 2025
@glevco glevco force-pushed the refactor/nano/error-handling-no-lib branch from 0c50b3a to 80197fb Compare July 16, 2025 15:18
@glevco
Copy link
Contributor Author

glevco commented Jul 16, 2025

Replaced by #1321

@glevco glevco closed this Jul 16, 2025
@github-project-automation github-project-automation bot moved this from In Progress (WIP) to Waiting to be deployed in Hathor Network Jul 16, 2025
@glevco glevco deleted the refactor/nano/error-handling-no-lib branch July 16, 2025 21:27
@glevco glevco moved this from Waiting to be deployed to Done in Hathor Network Jul 16, 2025
@glevco glevco mentioned this pull request Jul 17, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants