Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can we add option to strip undefined instead of erroring #57

Open
Gozala opened this issue Apr 23, 2022 · 5 comments
Open

can we add option to strip undefined instead of erroring #57

Gozala opened this issue Apr 23, 2022 · 5 comments
Assignees

Comments

@Gozala
Copy link
Contributor

Gozala commented Apr 23, 2022

I've been working on UCAN based RPC system that encodes request / responses into set of CBOR blocks in a CAR. Unfortunately it proved challenging to not encounter errors from CBOR codec due to some field set to undefined or due to some built-in data type like Error been referenced.

Could we add an option to make CBOR codec behave more or less like JSON.stringify does, specifically

  1. Just leave out undefined values
  2. Don't complain about Error or other built-in instances, instead just include own enumerable properties
@rvagg rvagg moved this to 🏃‍♀️ In Progress in IPLD team's weekly tracker May 10, 2022
@rvagg rvagg self-assigned this May 17, 2022
@rvagg
Copy link
Member

rvagg commented May 23, 2022

The problem with being flexible with undefined is that the codecs are attempting to be as symmetrical as possible, and that goes for silently ignoring fields. What goes in should be what comes out. The more scope for mismatching input:output the more scope for bugs appearing when round-trips get involved; we keep on seeing these problems in practice.

But, that's obviously not as clean when we hit the real world, and we have a couple of instances of sloppy behaviour:

  • The prepare() on js-dag-pb was introduced for this reason - the exact form you get in a round-trip is very specific and not usually exactly what you want to provide up-front, so prepare() is there to take sloppy input and produce the exact form that you'd get during a round-trip.
  • dag-jose will accept both compact and general form inputs but will always round-trip general form, the guidance we provide over there is to use toGeneral() to get the input:output symmetry.

So there's two approaches we could take to alleviate the pain:

  1. Provide a utility to "clean" or prepare() an object, to get the symmetry, that would be a nice ideal solution; but it's going to have perf costs of course.
  2. Provide an opt-in encode somehow - either a separate exported method, or even a full exported BlockCodec, or some kind of configuration option that would let you change the way the encoder works. Then we could push the object walking right down into the encoder and switch according to options. We already have a special option for undefined and could extend that to do something else with it. The own properties thing will be a bit more difficult but I can imagine it also being achieved with a config option.

How important is this? Is this simply a quality of life feature, or is it getting in the way and being a problem for downstream users? I could do some work on the CBOR encoder to expose options to do this but I don't know how to prioritise this and I have other things on my plate vying for attention.

@rvagg rvagg moved this from 🏃‍♀️ In Progress to 🛑 Blocked in IPLD team's weekly tracker May 23, 2022
@rvagg rvagg moved this from 🛑 Blocked to 🗄️ Backlog in IPLD team's weekly tracker May 23, 2022
@Gozala
Copy link
Contributor Author

Gozala commented Jun 6, 2022

How important is this? Is this simply a quality of life feature, or is it getting in the way and being a problem for downstream users?

It is pretty problematic at least in the context of UCAN based RPC system because service implementation:

  1. Can provide endpoints that return arbitrary things that may have undefined somewhere and/or Error instance. It would be really difficult to enforce that handlers never include such values and I fear it is something that would show up in the production as opposed to during development.
    • I would much rather e.g. log warning on server when such values are sent as opposed to crash a service or even respond with error.
    • RPC runtime is similar to GraphQL in a sense that it takes arbitrary batch of invocations and produces batch result. What that implies that e.g. if 100 invocations were passed and only one of them produces value with undefined somewhere encoding fails and general error occurs as opposed to 99 successes and one error. That is really bad, if only one invocation failed and rest succeeded that would have been not as bad.
  2. Error on encode is particularly bad in RPC system, as request may have successfully mutated some state and failing to encode response could lead to client retrying which in turn may cause some other problems due to changed state.

At the moment I'm working around this by going over the returned values and:

  1. Removing undefineds.
  2. Extracting fields from Error instance into a plain object.

This is pretty unfortunate, because same data structure gets traversed once before handing it off to CBOR encoder which then traverses it again to actually encode.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 6, 2022

In terms of ideal solution it would something along the lines of:

  1. Ideally CBOR.encode would take optional replacer argument equivalent to one you can pass to JSON.stringify
    • It would allow normalization / encode to occur via single traverse
    • Would rid me off the code that needs to do traversal.
  2. As you have also pointed out replacer would affect performance, but if not present current code path could be used.
  3. Would be nice to have possibly another CBOR.loose.encode or possibly other module all together that behaves similar to JSON.stringify, that is has optimized code path (without replacer) and is able to leave out undefineds and avoid throwing on Error instances.

@Gozala
Copy link
Contributor Author

Gozala commented Jun 6, 2022

The problem with being flexible with undefined is that the codecs are attempting to be as symmetrical as possible, and that goes for silently ignoring fields. What goes in should be what comes out. The more scope for mismatching input:output the more scope for bugs appearing when round-trips get involved; we keep on seeing these problems in practice.

One could argue that leaving out undefined meets that criteria CBOR.decode(CBOR.encode({ omit: undefined })).omit === undefined is true even if Object.keys(input) would be different.

While I generally prefer early errors, I'm still finding that JSON.stringify strikes a better balance here. It is hard to force no undefineds if you're interfacing with user space code.

This input:output argument makes no sense to me in regards to Error instances however because it is not enforced on any other class instances. So new class Point { constructor(x, y) { this.x = x; this.y = y }(0, 0) would produce {x, y} but new Error('boom') would fail to encode.

You could probably start rejecting any class instances too, but I'm not sure if that would be an improvement.

@rvagg rvagg moved this from 🗄️ Backlog to 🥞 Todo in IPLD team's weekly tracker Aug 2, 2022
@rvagg rvagg moved this from 🥞 Todo to 🗄️ Backlog in IPLD team's weekly tracker Nov 30, 2022
@RangerMauve
Copy link

+1 to a "cbor.prepare" or similar, or maybe a flag to do the sanitization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🗄️ Backlog
Development

No branches or pull requests

3 participants