-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why we iteratively arrive at barrier_O?? #1315
Comments
It's for the whole cluster to sync, not just 1 block. |
This is from PTX. I think, you should limit the arrive or wait in one block somehow?(But maybe does not influence the performance) |
Oh, I understand you now. If we limit the wait in cta0, then we could not stall other CTAs! So your method is correct! |
You can use printf to see which threads are at which point in the code. |
I think... the barrier_O here is for all blocks' sync, but if we iteratively arrive cluster_size times, what is the meaning here?
The text was updated successfully, but these errors were encountered: