-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chainHead_storage
: Support more resilient pagination of descendants?
#85
Comments
chainHead_storage
: Support user pagination of descendents?chainHead_storage
: Support more resilient pagination of descendants?
I think that this section of the post on the polkadot forum that @tomaka wrote partially covers this:
|
Ooh that is interesting indeed, thanks for referencing it! |
kinda related: smol-dot/smoldot#692 |
This statement is true. However, if we take into account the thing that @jsdw proposed, we can have pagination on the light client as well. With the use of the @tomaka WDYT? |
Yeah this would require polkadot-fellows/RFCs#9 The problem in play here is that with polkadot-fellows/RFCs#9 the light client already has to do pagination with the networking requests. The responding node can cut the response at any point so that it fits in the maximum response size. If on top of this the JSON-RPC client also does its own pagination (trying to be smart and all), then you end up doing many more networking requests than if the JSON-RPC client didn't do pagination. It's all complicated and I have many other things to focus on so I haven't given this problem a lot of attention. |
Instead of letting the RPC client control the pagination, which is really sub-optimal as you explained, the strategy of letting the server/package size let decide on this. If the server is chunking the answers any way, could we not send out these chunks to the JSON RPC client as we get them? (for sure there are some packages that may only contain nodes, that we would skip) |
But then there's also the possibility for the light client to split the request itself. If the JSON-RPC API doesn't support pagination (as is the case right now), then the light client is free to do that and this can be left as an implementation detail. But if the JSON-RPC API is aware of pagination, then it becomes impossible. Again this strikes me as a really complicated problem and to me it seems that there are many more ramifications than I can think of. A good way to solve this problem overall would be to make the size of a Merkle proof more or less predictable, which could be done by merkleizing the storage values, but that would be a huge change and we still haven't even finished the previous trie migration from a few years ago. |
What you mean by this? Split up every storage value into chunks of the same size and put these chunks into the trie as well? This sounds quite costly.
I don't really see why this would be a problem? Let's assume we mage the JSON-RPC API "page aware". This means that the RPC server would return alongside the value a |
I mean that every storage value would be a mini trie when it comes to how it's stored.
Let's say you want to download all the keys of the empty prefix (i.e. all the keys in the entire trie). If you have for example 4 peers, since you want to avoid networking round trips and since you don't know which peer is slower (or even malicious) and which is faster, the best solution is to ask peer 1 for all the keys from None of the four requests will return all the keys you asked for in a single response, but then you continue doing requests four by four until you have all of them. When you're done with one of the peers, you can split the remaining requests even more and continue using that peer for another list of keys. I have no idea how that algorithm would translate with a JSON-RPC API that has pagination.
Again, the JSON-RPC client has no way to know which upper maximum is appropriate. What will most likely happen is that it will in total require more networking requests (typically twice as many requests) to download everything, compared to not providing a maximum. |
In the legacy RPC methods, a user could call
state_getKeysPaged
with some prefix, and provide a starting key (and number of keys to fetch). They are able to then paginate over the child keys/values by tracking the last key that they have seen and using that to request the next batch of keys (and obtain values for them). This is resistant to network connection interruptions and such; they can always pick up from the last key they are interested in.Using the new
chainHead_storage
API, a couple of things make iterating over descendent keys/values harder:descendentHashes
and then just ignore them and look at the keys, but this is less than ideal).chainHead
subscription I'm relying on stops for any reason (eg network interruption or pinned blocks getting too old).Proposals
Being able to provide a
startKey
to thechainHead_storage
items of typedescendantsValues
anddescendantsHashes
would probably go most of the way here (I'm not super sure if it's useful to be able to fetch keys without fetching values in the usual case of paginating over some descendent results).I also wondered about supporting a new type like
descendantsKeys
in storage items. This would return only the keys, and no other values (so that you can eg download all of the keys and then ask for values later as a user paginates through some interface or whatever). If thestartKey
idea is acceptable then this would also allow astartKey
to align with otherdescendants*
types, and allow for reliable pagination/resuming of fetching keys as well as values or hashes.What do you guys reckon?
/cc @josepot @tomaka
The text was updated successfully, but these errors were encountered: