-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NFC] Walk module to update names in binary parser #6959
base: main
Are you sure you want to change the base?
Conversation
In preparation for using IRBuilder in the binary parser, eagerly create Functions when parsing the function section so that they are already created once we parse the code section. IRBuilder will require the functions to exist when parsing calls so it can figure out what type each call should have, even when there is a call to a function whose body has not been parsed yet. NFC except that some error messages change to include the new empty functions.
The purpose of the datacount section is to pre-declare how many data segments there will be so that engines can allocate space for them and not have to back patch subsequent instructions in the code section that refer to them. Once we use IRBuilder in the binary parser, we will have to have the data segments available by the time we parse instructions that use them, so eagerly construct the data segments when parsing the datacount section.
The binary parser generally does not know the final names of module elements when it parses them, or even when it parses instructions that refer to them, since the name section comes at the end of a binary. The parser previously kept a list of pointers to locations where each module element's name would have to be used, then it patched those locations after parsing the names section to discover the final names. When the binary parser starts using IRBuilder, the parsed expressions will be constructed and managed by IRBuilder rather than by the parser itself. This means that the parser will no longer be able to collect pointers to places where module element names are used; it won't have access to the instructions at all. Since the strategy of collecting locations to patch will no longer work, switch to a strategy of traversing the module to find and update names instead. This is generally less efficient because the locations have to be found before they can be updated, but on the other hand it only happens when preserving debug info and it is parallelizable anyway.
This extra traversal of all the code worries me. Can IRBuilder not collect those pointers as it creates things? |
It could in principle, but that would be an expansion of its responsibilities and would make the abstraction fuzzier. I think this separation of concerns is nicer, but I can measure the performance impact. |
Ok, I looked into the performance overhead of this change and I think it is acceptable. In the very worst case where |
Can you explain the difference between the two cases? I don't follow that. Isn't this extra pass over the module always done when there are names, so the overhead is fixed? |
Right, we currently do the walk whenever --debuginfo is used, and if there are no names to apply, it simply won't make any changes. I had been thinking that the current code path can avoid doing any work in this case, but I don't think that's right, since there are locations where we simply don't set the names before backpatching. |
Yeah, maybe we can avoid it somehow, but I think atm we do figure out names late for some things. So IIUC, is 6% the relevant overhead, then? Overall my concern is that adding an extra pass to the entire module can be pretty costly. It's not like adding another ReFinalize at the end of a function, since that function is in the cache already - this is another loading and processing of the entire module from the start, which can be slow on large modules. |
I guess 6% is the relevant worst-case overhead we've seen, but I think it's also notable that on real modules we've also seen 0.2%. This shouldn't be any different in terms of caching because all the function bodies have just been parsed and should also be in caches. If we parallelize parsing at some point, the bodies will even be in the caches on the correct threads. |
6% seems high to me. Maybe it's because I've been working on a series of such optimizations recently (moderate speedups, some of which involving avoiding an extra pass on the IR), but I think it's worth trying to avoid this regression. Some ideas:
Difference compared to what? Sorry, I'm not following this. My concern is that a big module simply cannot fit in cache - large wasm modules expand into hundreds of MB of IR - so doing another pass over that IR means cache misses for it all. I am guessing that is that 6%? |
I'll look into skipping ahead 👍 |
The binary parser generally does not know the final names of module
elements when it parses them, or even when it parses instructions that
refer to them, since the name section comes at the end of a binary. The
parser previously kept a list of pointers to locations where each module
element's name would have to be used, then it patched those locations
after parsing the names section to discover the final names.
When the binary parser starts using IRBuilder, the parsed expressions
will be constructed and managed by IRBuilder rather than by the parser
itself. This means that the parser will no longer be able to collect
pointers to places where module element names are used; it won't have
access to the instructions at all.
Since the strategy of collecting locations to patch will no longer work,
switch to a strategy of traversing the module to find and update names
instead. This is generally less efficient because the locations have to
be found before they can be updated, but on the other hand it only
happens when preserving debug info and it is parallelizable anyway.