-
Notifications
You must be signed in to change notification settings - Fork 32
refactor(nano): improve error handling #1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8965d3e to
b18e20d
Compare
|
| Branch | refactor/nano/error-handling-no-lib |
| Testbed | ubuntu-22.04 |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result minutes (m) (Result Δ%) | Lower Boundary minutes (m) (Limit %) | Upper Boundary minutes (m) (Limit %) |
|---|---|---|---|---|
| sync-v2 (up to 20000 blocks) | 📈 view plot 🚷 view threshold | 1.69 m(+3.03%)Baseline: 1.64 m | 1.47 m (87.35%) | 1.80 m (93.66%) |
8864a4b to
fc7fa9c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1311 +/- ##
==========================================
- Coverage 85.63% 85.40% -0.23%
==========================================
Files 426 427 +1
Lines 31947 32265 +318
Branches 4962 5015 +53
==========================================
+ Hits 27358 27557 +199
- Misses 3590 3696 +106
- Partials 999 1012 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fc7fa9c to
7c8257a
Compare
| class NCSerializationTypeError(NCSerializationError): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused.
| # TODO: Review the `from_callable` call. It contains some `raise TypeError` | ||
| # that may be caused by user input and crash the full node. | ||
| parser = Method.from_callable(method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
| # TODO: Check this expect - can I create a situation where it fails? | ||
| self.storage.create_token(TokenUid(td.token_id), td.token_name, td.token_symbol) \ | ||
| .expect('this err is already handled in the create_token syscall, so it\'s safe to unwrap here') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
| raise OCBInvalidBlueprintVertexType(blueprint_id.hex()) # TODO | ||
| tx_meta = blueprint_tx.get_metadata() | ||
| if tx_meta.voided_by or not tx_meta.first_block: | ||
| raise OCBBlueprintNotConfirmed(blueprint_id.hex()) | ||
| raise OCBBlueprintNotConfirmed(blueprint_id.hex()) # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it as simple as adding Err(...)? I'm wondering why it was left as a to do.
| except OutOfFuelError as e: # TODO | ||
| self.log.error('loading blueprint module failed, fuel limit exceeded') | ||
| raise OCBOutOfFuelDuringLoading from e | ||
| except OutOfMemoryError as e: | ||
| except OutOfMemoryError as e: # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
babb76b to
22adadd
Compare
| def _create_changes_tracker(self, contract_id: ContractId) -> Result[NCChangesTracker, NanoContractDoesNotExist]: | ||
| """Return the latest change tracker for a contract.""" | ||
| nc_storage = self.get_current_changes_tracker_or_storage(contract_id) | ||
| nc_storage = self.get_current_changes_tracker_or_storage(contract_id).unwrap_or_propagate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an unwrap() instead?
| # This ensures that, even if the blueprint method attempts to exploit or alter the context, it cannot | ||
| # impact the original context. Since the runner relies on the context for other critical checks, any | ||
| # unauthorized modification would pose a serious security risk. | ||
| copied_context = ctx.copy().unwrap_or_propagate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an unwrap() instead?
| storage = self._storages.get(contract_id) | ||
| if storage is None: | ||
| storage = self.block_storage.get_contract_storage(contract_id) | ||
| storage = self.block_storage.get_contract_storage(contract_id).unwrap_or_propagate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an unwrap() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still confusing to me on which one should be used in each case.
|
|
||
| __all__ = [ | ||
| '__version__', | ||
| 'HATHOR_DIR', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
hathor/nanocontracts/exception.py
Outdated
| A type that represents all possible errors that can happen during a nano contract method execution, | ||
| which can be either an NCFail or un unhandled exception caused by blueprint code (NCRuntimeFailure). | ||
| """ | ||
| NCFailure: TypeAlias = NCFail | NCRuntimeFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird name since there's a NCFail exception too.
hathor/nanocontracts/metered_exec.py
Outdated
|
|
||
| # Any other exception is considered an unhandled exception, | ||
| # and is wrapped in an Err via an NCRuntimeFailure. | ||
| failure = NCRuntimeFailure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename it to NCUnhandledException?
| try: | ||
| env = metered_executor.exec(self.code.text) | ||
| except OutOfFuelError as e: | ||
| except OutOfFuelError as e: # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?!
| self.nc_id = ContractId(VertexId(b'1' * 32)) | ||
| self.initialize_contract() | ||
| self.nc_storage = self.runner.get_storage(self.nc_id) | ||
| self.nc_storage = self.runner.get_storage(self.nc_id).unwrap_or_raise() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should expose this unwrap_or_raise() to blueprint unit tests.
| from hathor import HATHOR_DIR | ||
| from tests import TESTS_DIR # skip-import-tests-custom-check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
| TBE = TypeVar('TBE', bound=BaseException) | ||
|
|
||
|
|
||
| class Ok(Generic[T]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the docstrings of the methods in this class are kind of useless. They say what they do (e.g., "Return the value.") but it doesn't explain when this method should be used. A better docstring at the class level may solve this issue.
| raise OCBInvalidBlueprintVertexType(blueprint_id.hex()) # TODO | ||
| tx_meta = blueprint_tx.get_metadata() | ||
| if tx_meta.voided_by or not tx_meta.first_block: | ||
| raise OCBBlueprintNotConfirmed(blueprint_id.hex()) | ||
| raise OCBBlueprintNotConfirmed(blueprint_id.hex()) # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it as simple as adding Err(...)? I'm wondering why it was left as a to do.
| self.create_contract_with_nc_args(contract_id, blueprint_id, context, nc_args) | ||
| else: | ||
| self.call_public_method_with_nc_args(contract_id, nano_header.nc_method, context, nc_args) | ||
| return self.create_contract_with_nc_args(contract_id, blueprint_id, context, nc_args) | ||
|
|
||
| return self.call_public_method_with_nc_args(contract_id, nano_header.nc_method, context, nc_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather keep the else block here since it will always run one or the other.
| storage = self._storages.get(contract_id) | ||
| if storage is None: | ||
| storage = self.block_storage.get_contract_storage(contract_id) | ||
| storage = self.block_storage.get_contract_storage(contract_id).unwrap_or_propagate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still confusing to me on which one should be used in each case.
22adadd to
69b525f
Compare
0c50b3a to
80197fb
Compare
|
Replaced by #1321 |
Motivation
Current exception handling on nano contract execution has a few problems. It wraps any exception thrown during user code execution (that is, blueprint code) in an
NCFail, and later anyNCFailmarks a tx execution as failed. This means bugs in our code would become wrapped exceptions and fail tx executions instead of crashing the full node, which would be the expected behavior.This PR addresses this by recognizing that user-code and internal code should handle errors differently. If both use normal exception handling, it's pretty easy to accidentally include an internal bug in the eternal state of the blockchain.
It introduces a new
Resulttype, inspired by Rust, for explicit error handling. All code from the consensus point where we call the runner down to actually executing user code withexec()is now handled with this type instead of raising exceptions. Since user code doesn't know about this type, exceptions are used to bridge internal code to user code and vice-versa, via raising/catching only on the boundaries: exceptions are raised on syscalls, and caught inexec().Review Notes
hathor/utils/result.pyfile, to understand the new type.metered_exec.pywhich are the innermost code handling exceptions when user code is executed withexec(). There are also comments explaining the new behavior in it.runner.py.Acceptance Criteria
Resulttype for explicit error handling.Resulttype.unwrap_or_raise()method to bridge from a result to an exception. This includes for examples calls from APIs or test code.unwrap()orexpect().Rationale and Alternatives
Using a custom
Resulttype for error handling is something that we've considered for a while, and as it's a natural solution for the problem this PR addresses, I implemented it. I considered two libs:Resulttype but also other utility types. I tried it on branchrefactor/nano/error-handling. It's a good lib and we may even reconsider using it in the future for its other features, but I didn't like the ergonomics of itsResulttype (it's not a sum type).Considering this, and wanting to prevent new project dependencies, I decided to not use any of them and implement our own type. I did copy and adapt the implementation of the second one though, which is licensed under MIT. It's mostly the same, except for these differences:
unwrap_or_raisetounwrap_or_raise_another, because I implementedunwrap_or_raisewith another behavior (analogous to the unmerged feat: addunwrap_or_raise_itself()if a value ofErris aBaseExceptionrustedpy/result#199).unwrap_or_propagatewhich emulates Rust's question mark operator, for better ergonomics (adapted from the unmerged feat: Emulate Rust's question mark operator rustedpy/result#197).Risks
Incorrectly wrapped exceptions
Before this PR, ALL exceptions raised during contract execution would become part of the blockchain state forever through the tx execution state. This problem has been mitigated, because now only explicitly handled exceptions (with the Result type) become part of the tx failure. Any other exceptions, for example from bugs or asserts, will now crash the full node as expected.
This means we can still incorrectly make a bug part of the blockchain state if we handle an exception as a Result when it should be an assertion instead, for example. The difference is that now it's explicit. The reviewers should pay special attention to identify these cases — whether there's any instance where we handle a Result error but should use
unwrap()instead.Serialization
The serialization system is too extensive and raises a lot of exceptions internally. Instead of refactoring the whole module to use Results too, I just handled them on the boundary since nano execution only calls it in a single place.
For that, I identified it can raise 3 exception types:
SerializationError, ValueError, TypeError. This means that any other exceptions thrown in the serialization module can crash the full node (which is way better than becoming part of the blockchain, as it would before). If there are any other know exceptions that can be raised, we should include them. But it's hard to do better than grepping for raises.It also means that any
ValueErrororTypeErrorraised by Python itself will also become a reason for tx execution failure, even if it's caused by a bug in internal code. Ideally we should refactor the module to use specific exceptions, and even better, to use Results too.Checklist
master, confirm this code is production-ready and can be included in future releases as soon as it gets merged