Use custom error types and remove error downcasting #52

Stebalien · 2022-01-20T06:17:25Z

Currently, we have a lot of "anyhow" errors and rely on error downcasting to figure out the right exit code. Unfortunately, this makes it very difficult to figure out what the exit code should be.

Instead, we'd ideally:

Abort immediately once we know the right exit code (power actor tests part 3 #282).
Return situation-specific errors from state types like partitions & deadlines, so we can figure out the correct exit code.

jennijuju · 2022-03-08T16:36:35Z

from zen: make sense to do before the audit, but shouldn't be a blocker

anorth · 2022-08-11T23:50:47Z

#197 , #214 does a heap of the work for this (not yet merged, needs stewardship)

anorth · 2023-07-21T04:08:24Z

I've spent some time investigating how to improve things here, but it turns out to be quite complex. In our current situation we have:

A pattern of [Result].map_err(|e| e.downcast_default(...)) for adapting non-ActorErrors into ActorError. It's quite verbose.
A trait AsActorError implemented for Result<T, E: Display> with context_code() and similar methods to adapt into an Result<T, ActorError> with a specified exit code. These are concise to use but don't attempt any downcasting to discover wrapped exit codes.

Some problems with this state:

The map and downcast pattern isn't uniformly used. There are plenty of instances of mapping to a concrete ActorError without attempting to downcast. The pattern is not obvious enough to apply consistently.
The downcasting implementation relies on global knowledge of all the types that could be wrapping an ActorError. It's incomplete, e.g. it's missing fvm_ipld_kamt::Error and the frc46 token StateError (which can wrap an fvm_ipld_hamt::Error). In Rust, we'd need to import all these error types into the module defining the trait in order to implement it, and could easily miss new ones. The approach doesn't scale to more different error types.
Using AsActorError::context_code() can fail to propagate an ActorError nested inside, e.g. hamt::Error::Dynamic.
Because ActorError implements Display, the context_code() method can be inadvertently used to overwrite an exit code even when it's not hidden inside another error type

Why do we have downcasting at all? There are some places where we can generate an ActorError with exit code but not propagate it directly, usually due to library code. E.g. the HAMT/AMT for_each() method with return an Error::Dynamic containing an anyhow::Error that's actually an ActorError raised from the callback.

In #1322 I tried replacing a bunch of map and downcast patterns with more concise trait methods. It's on hold at present because a few tests fail due to changes in exit codes, as a code generated inside for_each was trashed (e.g. this USR_ILLEGAL_ARGUMENT in remove partitions).
#1338 is another attempt that aimed to avoid inadvertent trashing of exit codes, but it's stuck by the realisation that downcasting needs global knowledge of all the error types that could hide an ActorError, which it doesn't have.

So what shall we do? Some options:

Try to get it "right", with the global knowledge of error types and robust downcasting. We can probably still find more concise patterns than map_err+downcast. Support generating ActorError deep in the code and propagating with ?. Something like leaning into Another attempt at error cleanup that preserves more exit codes #1338
Give up on preserving exit codes generated low down in the code. Support ActorError deep in the code and ? propagation, but usually only ILLEGAL_STATE. Other exit codes can be generated in top-level actor code. Something like Remove legacy error downcasting from miner actor #1322.
Implement specific error types for state objects that enumerate error conditions. Stop returning ActorError from state methods, instead only generate exit code in top-level actor business logic.
Make the abort() syscall available in state methods (perhaps via a new very slim runtime trait passed to them) and call it directly in state code, with no opportunity for actor code to handle the error or adapt the exit code. Implement mock runtime and test VM etc to use panic or similar to support this.

Stebalien · 2023-07-21T14:39:42Z

E.g. the HAMT/AMT for_each() method with return an Error::Dynamic containing an anyhow::Error that's actually an ActorError raised from the callback.

We should implement iteration with external iterators instead to fix this issue. I have a WIP patch locally that I can try to cleanup and ship if there's interest.

That should remove a large part of this issue.

Give up on preserving exit codes generated low down in the code.

In many cases, I believe this is the right thing to do. Most of these errors should just be "illegal state".

Implement specific error types for state objects that enumerate error conditions. Stop returning ActorError from state methods, instead only generate exit code in top-level actor business logic.

IMO, this is the "correct" solution. It's just really annoying which is why most rust users just use crates like anyhow and yolo it.

But yeah, we can do this where it matters. I.e.:

Implement custom error types.
Implement From<CustomErrorType> for ActorError wherever we care about conversions.

Make the abort() syscall available in state methods (perhaps via a new very slim runtime trait passed to them) and call it directly in state code, with no opportunity for actor code to handle the error or adapt the exit code. Implement mock runtime and test VM etc to use panic or similar to support this.

I like the idea of aborting early but... I'd prefer not to do this in all cases:

IMO, we should abort directly if some state invariant is violated. Specifically: state block doesn't exist, state fails to decode, etc.
IMO, we shouldn't abort directly if, e.g., the user passes an illegal argument, something isn't found, etc. (the actor may want to react to this in some way?).

However, I'm concerned about using the runtime here as it may be hard to thread it through to the very bottom (and makes it harder to reason about what the code can do). Instead, I'd consider exposing it as a bare function that:

Panics when compiled to a native target. We can smuggle the exit code through the panic and catch the panic at the top-level.
Calls the exit syscall otherwise.

anorth · 2023-08-01T02:39:42Z

External iterators would be great. And then we could "give up" on doing better than "illegal state" for remaining hard cases.

Your proposal re aborting sounds reasonable, and could clean up some boilerplate. In mechanism, why not just call the runtime's 'abort' anyway and have that panic (and catch it out the other side). Conditional compilation and static cling already causes headaches in this repo, I would rather avoid adding any more that we can.

Stebalien added kind/improvement P3 labels Jan 20, 2022

jennijuju transferred this issue from filecoin-project/ref-fvm Mar 7, 2022

jennijuju added this to the Network v16 milestone Mar 8, 2022

jennijuju modified the milestones: Network v16, Network v17 Mar 8, 2022

anorth moved this to Todo in Network nv17 Aug 11, 2022

anorth added this to Network nv17 Aug 11, 2022

anorth mentioned this issue Aug 11, 2022

Built-in actors abort versus errors #53

Closed

anorth moved this from Todo to Opportunistic in Network nv17 Aug 11, 2022

anorth added enhancement New feature or request and removed kind/improvement labels Sep 1, 2022

jennijuju removed this from Network nv17 Nov 25, 2022

jennijuju added this to Network v18 Nov 25, 2022

jennijuju moved this to 🏗 In progress in Network v18 Nov 25, 2022

jennijuju removed this from Network v18 Nov 25, 2022

anorth removed this from the Network v17 milestone Mar 30, 2023

anorth removed the area/actors label Mar 30, 2023

anorth mentioned this issue Jun 27, 2023

Remove legacy error downcasting from miner actor #1322

Draft

anorth mentioned this issue Apr 25, 2024

Another attempt at error cleanup that preserves more exit codes #1338

Closed

anorth added the cleanup label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use custom error types and remove error downcasting #52

Use custom error types and remove error downcasting #52

Stebalien commented Jan 20, 2022

jennijuju commented Mar 8, 2022 •

edited

Loading

anorth commented Aug 11, 2022 •

edited

Loading

anorth commented Jul 21, 2023

Stebalien commented Jul 21, 2023

anorth commented Aug 1, 2023

Use custom error types and remove error downcasting #52

Use custom error types and remove error downcasting #52

Comments

Stebalien commented Jan 20, 2022

jennijuju commented Mar 8, 2022 • edited Loading

anorth commented Aug 11, 2022 • edited Loading

anorth commented Jul 21, 2023

Stebalien commented Jul 21, 2023

anorth commented Aug 1, 2023

jennijuju commented Mar 8, 2022 •

edited

Loading

anorth commented Aug 11, 2022 •

edited

Loading