-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
empty partition in Zoltan_integration.F90 #386
Comments
Hi,
The standard BFS-3D mesh has about 530,000 elements to start with. On 128
cores that's around 4000 elements per core. Adaptivity changes that
throughout the run, but I can't remember the rough numbers. The number of
elements is key to how many cores the model can run on and my rule of thumb
for fluidity is around 10,000 elements per core (on a CG discretisation).
Check out the stat file and plot the number of elements in the run at the
time of failure; that should give you an indication of how many cores is
realistic.
Hope that helps,
Jon
…On Mon, 15 Jul 2024 at 08:55, Wangbo ***@***.***> wrote:
When I run the backward_facing_step_3d example and set NPROCS to a value
greater than 128 and run the program on two nodes using make run, I
encounter empty partitions, which cause the program to terminate. This
occurs regardless of whether I use the graph partitioning algorithm
parmetis or the hypergraph partitioning algorithm PHG.
In summary, I would like to ask for suggestions on how to improve the
scalability of the program, meaning how to prevent zoltan_load_balance
function from generating empty partitions when the number of processes is
increased.
—
Reply to this email directly, view it on GitHub
<#386>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDFJKPYJLYYVFJZDAYXCHLZMN57DAVCNFSM6AAAAABK4AMFEKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYDQMBZHE2DEMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Dr Jon Hill
Senior Lecturer in Physical Geography
Chair of Board of Examiners
Department of Environment and Geography
University of York
M: +44(0)7748254812
Web: https://jonxhill.wordpress.com/
Web: https://envmodellinggroup.github.io/
<https://envmodellinggroup.github.io/>
|
Thank you for your reply. Do you mean that the occurrence of empty partitions is not related to the algorithm of the partitioner? I mean, for the BFS-3D example, the input file does not specify the partitioner, and the default is used, which is My previous thought was that there was a problem with fluidity when it called zoltan for load balancing, which would result in empty partitions. So I would like to ask if anyone is familiar with this part and can tell me the suggestions about how to fix this problem. Because the intuitive feeling is that after graph partitioning, each part will have at least some vertices.Even in the case of load imbalance, it should not result in empty partitions. My current work is to run fluidity on a large number of processors, even more than 10,000 processors. The current work is stuck at the zoltan_load_balance generating empty partition part. I would like to ask if you have any suggestions. Thank you very much. |
The empty partition issue is known in the zoltan code and is a result of
not having enough elements per processor for the zoltan_graph algorithm
<http://www.hector.ac.uk/cse/distributedcse/reports/fluidity-zoltan/fluidity-zoltan.pdf>.
We had plans a long time ago to resolve this by ignoring the empty
partition in the calculation, but those plans never got to fruition. We did
have a work around which was to mess with the load imbalance tolerance. You
might want to play with that manually to see if you can get around the
empty partitions.
It's been a while since we implemented this code, so my memory is a bit
rusty!
That might help you?
…On Mon, 15 Jul 2024 at 10:01, Wangbo ***@***.***> wrote:
Thank you for your reply.
Do you mean that the occurrence of empty partitions is not related to the
algorithm of the partitioner? I mean, for the BFS-3D example, the input
file does not specify the partitioner, and the default is used, which is
zoltan_graph + phg+ PARTITION. In this case, it can only run on 128
processors. However, when I specify the partitioner in the
backward_facing_step3d.flml file, which is HYPERGRAPH + PHG + REPARTITION,
fluidity can run on 1024 processors, a total of 16 nodes. However, when the
processors increase to 2048, fluidity will still abort. When I switched to
PARMETIS, the performance was not as good as HYPERGRAPH + PHG. The graph
partitioning algorithms used by parmetis and PHG are both multilevel graph
partition methods.
My previous thought was that there was a problem with fluidity when it
called zoltan for load balancing, which would result in empty partitions.
So I would like to ask if anyone is familiar with this part and can tell me
the suggestions about how to fix this problem. Because the intuitive
feeling is that after graph partitioning, each part will have at least some
vertices.Even in the case of load imbalance, it should not result in empty
partitions.
My current work is to run fluidity on a large number of processors, even
more than 10,000 processors. The current work is stuck at the
zoltan_load_balance generating empty partition part. I would like to ask if
you have any suggestions.
Thank you very much.
—
Reply to this email directly, view it on GitHub
<#386 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDFJKPMON2THLTIHWSHG4LZMOFXBAVCNFSM6AAAAABK4AMFEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYGAYTKOBVHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Dr Jon Hill
Senior Lecturer in Physical Geography
Chair of Board of Examiners
Department of Environment and Geography
University of York
M: +44(0)7748254812
Web: https://jonxhill.wordpress.com/
Web: https://envmodellinggroup.github.io/
<https://envmodellinggroup.github.io/>
|
When I run the
backward_facing_step_3d
example and setNPROCS
to a value greater than 128 and run the program on two nodes usingmake run
, I encounter empty partitions, which cause the program to terminate. This occurs regardless of whether I use the graph partitioning algorithmparmetis
or the hypergraph partitioning algorithmPHG
.In summary, I would like to ask for suggestions on how to improve the scalability of the program, meaning how to prevent
zoltan_load_balance
function from generating empty partitions when the number of processes is increased.The text was updated successfully, but these errors were encountered: