refactor(nano): improve error handling #1311

glevco · 2025-07-04T03:17:43Z

Motivation

Current exception handling on nano contract execution has a few problems. It wraps any exception thrown during user code execution (that is, blueprint code) in an NCFail, and later any NCFail marks a tx execution as failed. This means bugs in our code would become wrapped exceptions and fail tx executions instead of crashing the full node, which would be the expected behavior.

This PR addresses this by recognizing that user-code and internal code should handle errors differently. If both use normal exception handling, it's pretty easy to accidentally include an internal bug in the eternal state of the blockchain.

It introduces a new Result type, inspired by Rust, for explicit error handling. All code from the consensus point where we call the runner down to actually executing user code with exec() is now handled with this type instead of raising exceptions. Since user code doesn't know about this type, exceptions are used to bridge internal code to user code and vice-versa, via raising/catching only on the boundaries: exceptions are raised on syscalls, and caught in exec().

Review Notes

Begin by reviewing the new hathor/utils/result.py file, to understand the new type.
Review the changes in metered_exec.py which are the innermost code handling exceptions when user code is executed with exec(). There are also comments explaining the new behavior in it.
Review runner.py.
Review all the rest which is basically adapting code from using exceptions to using Result.

Acceptance Criteria

Implement Result type for explicit error handling.
Refactor error handling on contract execution to use the Result type.
- Code that should be updated (mostly Runner and other nano-related code) is changed to explicitly handle Results instead of exceptions.
- Code that previously used exceptions and now receives results calls the unwrap_or_raise() method to bridge from a result to an exception. This includes for examples calls from APIs or test code.
- Code that receives results that should never fail call unwrap() or expect().

Rationale and Alternatives

Using a custom Result type for error handling is something that we've considered for a while, and as it's a natural solution for the problem this PR addresses, I implemented it. I considered two libs:

https://github.com/dry-python/returns - this is a pretty popular Python lib that implements not only a Result type but also other utility types. I tried it on branch refactor/nano/error-handling. It's a good lib and we may even reconsider using it in the future for its other features, but I didn't like the ergonomics of its Result type (it's not a sum type).
https://github.com/rustedpy/result - an almost direct translation of Rust's Result type to Python. It's also a good lib and I liked its ergonomics, but it's unmaintained.

Considering this, and wanting to prevent new project dependencies, I decided to not use any of them and implement our own type. I did copy and adapt the implementation of the second one though, which is licensed under MIT. It's mostly the same, except for these differences:

Removed deprecated properties.
Rename unwrap_or_raise to unwrap_or_raise_another, because I implemented unwrap_or_raise with another behavior (analogous to the unmerged feat: add unwrap_or_raise_itself() if a value of Err is a BaseException rustedpy/result#199).
Remove do-notation support because it's too esoteric for Python and we don't need it.
Add support for capturing tracebacks on Results when the error type is an exception.
Implement unwrap_or_propagate which emulates Rust's question mark operator, for better ergonomics (adapted from the unmerged feat: Emulate Rust's question mark operator rustedpy/result#197).

Risks

Incorrectly wrapped exceptions

Before this PR, ALL exceptions raised during contract execution would become part of the blockchain state forever through the tx execution state. This problem has been mitigated, because now only explicitly handled exceptions (with the Result type) become part of the tx failure. Any other exceptions, for example from bugs or asserts, will now crash the full node as expected.

This means we can still incorrectly make a bug part of the blockchain state if we handle an exception as a Result when it should be an assertion instead, for example. The difference is that now it's explicit. The reviewers should pay special attention to identify these cases — whether there's any instance where we handle a Result error but should use unwrap() instead.

Serialization

The serialization system is too extensive and raises a lot of exceptions internally. Instead of refactoring the whole module to use Results too, I just handled them on the boundary since nano execution only calls it in a single place.

For that, I identified it can raise 3 exception types: SerializationError, ValueError, TypeError. This means that any other exceptions thrown in the serialization module can crash the full node (which is way better than becoming part of the blockchain, as it would before). If there are any other know exceptions that can be raised, we should include them. But it's hard to do better than grepping for raises.

It also means that any ValueError or TypeError raised by Python itself will also become a reason for tx execution failure, even if it's caused by a bug in internal code. Ideally we should refactor the module to use specific exceptions, and even better, to use Results too.

Checklist

If you are requesting a merge into master, confirm this code is production-ready and can be included in future releases as soon as it gets merged

github-actions · 2025-07-04T03:35:53Z

Bencher Report

Branch	refactor/nano/error-handling-no-lib
Testbed	ubuntu-22.04

Click to view all benchmark results

Benchmark	Latency	Benchmark Result minutes (m) (Result Δ%)	Lower Boundary minutes (m) (Limit %)	Upper Boundary minutes (m) (Limit %)
sync-v2 (up to 20000 blocks)	📈 view plot 🚷 view threshold	1.69 m (+3.03%) Baseline: 1.64 m	1.47 m (87.35%)	1.80 m (93.66%)

🐰 View full continuous benchmarking report in Bencher

codecov · 2025-07-04T16:58:54Z

Codecov Report

Attention: Patch coverage is 80.78125% with 123 lines in your changes missing coverage. Please review.

Project coverage is 85.40%. Comparing base (e417e0e) to head (69b525f).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
hathor/utils/result.py	63.92%	77 Missing and 2 partials ⚠️
hathor/nanocontracts/runner/runner.py	89.23%	14 Missing ⚠️
hathor/transaction/headers/nano_header.py	84.61%	5 Missing and 1 partial ⚠️
hathor/nanocontracts/runner/types.py	73.33%	3 Missing and 1 partial ⚠️
hathor/transaction/storage/transaction_storage.py	84.61%	3 Missing and 1 partial ⚠️
hathor/consensus/block_consensus.py	87.50%	2 Missing and 1 partial ⚠️
hathor/consensus/consensus.py	70.00%	2 Missing and 1 partial ⚠️
hathor/nanocontracts/resources/blueprint.py	72.72%	2 Missing and 1 partial ⚠️
hathor/nanocontracts/balance_rules.py	94.44%	2 Missing ⚠️
hathor/nanocontracts/context.py	83.33%	2 Missing ⚠️
... and 2 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1311      +/-   ##
==========================================
- Coverage   85.63%   85.40%   -0.23%     
==========================================
  Files         426      427       +1     
  Lines       31947    32265     +318     
  Branches     4962     5015      +53     
==========================================
+ Hits        27358    27557     +199     
- Misses       3590     3696     +106     
- Partials      999     1012      +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

glevco · 2025-07-07T14:44:38Z

hathor/nanocontracts/exception.py

-class NCSerializationTypeError(NCSerializationError):
-    pass


This was unused.

glevco · 2025-07-07T15:18:32Z

hathor/nanocontracts/runner/runner.py

+                    # TODO: Review the `from_callable` call. It contains some `raise TypeError`
+                    #  that may be caused by user input and crash the full node.
                    parser = Method.from_callable(method)


glevco · 2025-07-07T15:29:33Z

hathor/nanocontracts/storage/changes_tracker.py

+            # TODO: Check this expect - can I create a situation where it fails?
+            self.storage.create_token(TokenUid(td.token_id), td.token_name, td.token_symbol) \
+                .expect('this err is already handled in the create_token syscall, so it\'s safe to unwrap here')


glevco · 2025-07-07T15:40:18Z

hathor/transaction/storage/transaction_storage.py

+            raise OCBInvalidBlueprintVertexType(blueprint_id.hex())  # TODO
        tx_meta = blueprint_tx.get_metadata()
        if tx_meta.voided_by or not tx_meta.first_block:
-            raise OCBBlueprintNotConfirmed(blueprint_id.hex())
+            raise OCBBlueprintNotConfirmed(blueprint_id.hex())  # TODO


Isn't it as simple as adding Err(...)? I'm wondering why it was left as a to do.

glevco · 2025-07-07T15:43:57Z

hathor/nanocontracts/on_chain_blueprint.py

+        except OutOfFuelError as e:  # TODO
            self.log.error('loading blueprint module failed, fuel limit exceeded')
            raise OCBOutOfFuelDuringLoading from e
-        except OutOfMemoryError as e:
+        except OutOfMemoryError as e:  # TODO


glevco · 2025-07-07T17:35:52Z

hathor/nanocontracts/runner/runner.py

+    def _create_changes_tracker(self, contract_id: ContractId) -> Result[NCChangesTracker, NanoContractDoesNotExist]:
        """Return the latest change tracker for a contract."""
-        nc_storage = self.get_current_changes_tracker_or_storage(contract_id)
+        nc_storage = self.get_current_changes_tracker_or_storage(contract_id).unwrap_or_propagate()


Should this be an unwrap() instead?

glevco · 2025-07-07T17:37:30Z

hathor/nanocontracts/runner/runner.py

+        # This ensures that, even if the blueprint method attempts to exploit or alter the context, it cannot
+        # impact the original context. Since the runner relies on the context for other critical checks, any
+        # unauthorized modification would pose a serious security risk.
+        copied_context = ctx.copy().unwrap_or_propagate()


Should this be an unwrap() instead?

glevco · 2025-07-07T17:38:50Z

hathor/nanocontracts/runner/runner.py

        storage = self._storages.get(contract_id)
        if storage is None:
-            storage = self.block_storage.get_contract_storage(contract_id)
+            storage = self.block_storage.get_contract_storage(contract_id).unwrap_or_propagate()


Should this be an unwrap() instead?

It's still confusing to me on which one should be used in each case.

msbrogli · 2025-07-08T21:07:00Z

hathor/__init__.py

+
+__all__ = [
+    '__version__',
+    'HATHOR_DIR',


Why this change?

msbrogli · 2025-07-08T21:13:47Z

hathor/nanocontracts/exception.py

+A type that represents all possible errors that can happen during a nano contract method execution,
+which can be either an NCFail or un unhandled exception caused by blueprint code (NCRuntimeFailure).
+"""
+NCFailure: TypeAlias = NCFail | NCRuntimeFailure


Weird name since there's a NCFail exception too.

msbrogli · 2025-07-08T21:14:47Z

hathor/nanocontracts/metered_exec.py

+
+            # Any other exception is considered an unhandled exception,
+            # and is wrapped in an Err via an NCRuntimeFailure.
+            failure = NCRuntimeFailure()


Maybe rename it to NCUnhandledException?

msbrogli · 2025-07-08T21:17:08Z

hathor/nanocontracts/on_chain_blueprint.py

        try:
            env = metered_executor.exec(self.code.text)
-        except OutOfFuelError as e:
+        except OutOfFuelError as e:  # TODO


msbrogli · 2025-07-08T21:24:22Z

tests/nanocontracts/blueprints/test_bet.py

        self.nc_id = ContractId(VertexId(b'1' * 32))
        self.initialize_contract()
-        self.nc_storage = self.runner.get_storage(self.nc_id)
+        self.nc_storage = self.runner.get_storage(self.nc_id).unwrap_or_raise()


I'm not sure we should expose this unwrap_or_raise() to blueprint unit tests.

msbrogli · 2025-07-08T21:39:41Z

hathor/utils/result.py

+        from hathor import HATHOR_DIR
+        from tests import TESTS_DIR  # skip-import-tests-custom-check


Why do we need this?

msbrogli · 2025-07-08T21:41:41Z

hathor/utils/result.py

+TBE = TypeVar('TBE', bound=BaseException)
+
+
+class Ok(Generic[T]):


I guess the docstrings of the methods in this class are kind of useless. They say what they do (e.g., "Return the value.") but it doesn't explain when this method should be used. A better docstring at the class level may solve this issue.

msbrogli · 2025-07-08T21:42:20Z

hathor/transaction/storage/transaction_storage.py

+            raise OCBInvalidBlueprintVertexType(blueprint_id.hex())  # TODO
        tx_meta = blueprint_tx.get_metadata()
        if tx_meta.voided_by or not tx_meta.first_block:
-            raise OCBBlueprintNotConfirmed(blueprint_id.hex())
+            raise OCBBlueprintNotConfirmed(blueprint_id.hex())  # TODO


Isn't it as simple as adding Err(...)? I'm wondering why it was left as a to do.

msbrogli · 2025-07-08T21:44:27Z

hathor/nanocontracts/runner/runner.py

-            self.create_contract_with_nc_args(contract_id, blueprint_id, context, nc_args)
-        else:
-            self.call_public_method_with_nc_args(contract_id, nano_header.nc_method, context, nc_args)
+            return self.create_contract_with_nc_args(contract_id, blueprint_id, context, nc_args)
+
+        return self.call_public_method_with_nc_args(contract_id, nano_header.nc_method, context, nc_args)


I'd rather keep the else block here since it will always run one or the other.

msbrogli · 2025-07-08T21:44:51Z

hathor/nanocontracts/runner/runner.py

        storage = self._storages.get(contract_id)
        if storage is None:
-            storage = self.block_storage.get_contract_storage(contract_id)
+            storage = self.block_storage.get_contract_storage(contract_id).unwrap_or_propagate()


It's still confusing to me on which one should be used in each case.

glevco · 2025-07-16T21:27:37Z

Replaced by #1321

glevco self-assigned this Jul 4, 2025

glevco requested review from jansegre and msbrogli as code owners July 4, 2025 03:17

glevco added this to Hathor Network Jul 4, 2025

github-project-automation bot moved this to Todo in Hathor Network Jul 4, 2025

glevco moved this from Todo to In Progress (WIP) in Hathor Network Jul 4, 2025

glevco force-pushed the refactor/nano/error-handling-no-lib branch 2 times, most recently from 8965d3e to b18e20d Compare July 4, 2025 03:27

glevco force-pushed the refactor/nano/error-handling-no-lib branch 3 times, most recently from 8864a4b to fc7fa9c Compare July 4, 2025 16:30

glevco force-pushed the refactor/nano/error-handling-no-lib branch from fc7fa9c to 7c8257a Compare July 5, 2025 00:16

glevco commented Jul 7, 2025

View reviewed changes

glevco force-pushed the refactor/nano/error-handling-no-lib branch 2 times, most recently from babb76b to 22adadd Compare July 7, 2025 16:10

glevco moved this from In Progress (WIP) to In Progress (Done) in Hathor Network Jul 7, 2025

glevco commented Jul 7, 2025

View reviewed changes

msbrogli requested changes Jul 8, 2025

View reviewed changes

github-project-automation bot moved this from In Progress (Done) to In Review (WIP) in Hathor Network Jul 8, 2025

glevco force-pushed the refactor/nano/error-handling-no-lib branch from 22adadd to 69b525f Compare July 11, 2025 13:59

glevco moved this from In Review (WIP) to In Progress (WIP) in Hathor Network Jul 11, 2025

glevco added 2 commits July 16, 2025 12:18

refactor(nano): improve error handling

42ba3b5

wip

80197fb

glevco force-pushed the refactor/nano/error-handling-no-lib branch from 0c50b3a to 80197fb Compare July 16, 2025 15:18

glevco closed this Jul 16, 2025

github-project-automation bot moved this from In Progress (WIP) to Waiting to be deployed in Hathor Network Jul 16, 2025

glevco deleted the refactor/nano/error-handling-no-lib branch July 16, 2025 21:27

glevco moved this from Waiting to be deployed to Done in Hathor Network Jul 16, 2025

glevco mentioned this pull request Jul 17, 2025

fix(nano): improve error handling #1321

Closed

1 task

		from hathor import HATHOR_DIR
		from tests import TESTS_DIR # skip-import-tests-custom-check

		TBE = TypeVar('TBE', bound=BaseException)


		class Ok(Generic[T]):

refactor(nano): improve error handling #1311

refactor(nano): improve error handling #1311

Uh oh!

Conversation

glevco commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Review Notes

Acceptance Criteria

Rationale and Alternatives

Risks

Incorrectly wrapped exceptions

Serialization

Checklist

Uh oh!

github-actions bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glevco commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

glevco commented Jul 4, 2025 •

edited

Loading

github-actions bot commented Jul 4, 2025 •

edited

Loading

codecov bot commented Jul 4, 2025 •

edited

Loading