Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prioritize executors running on different agents when allocating jobs? #147

Open
svmhdvn opened this issue Nov 20, 2018 · 0 comments

Comments

@svmhdvn
Copy link
Contributor

svmhdvn commented Nov 20, 2018

I am wondering if there is a way in gleam to better distribute jobs across available agents rather than just executors. Consider the following example:

  1. read a text file containing (key, value) pairs, with many keys being repeated
  2. partition by keys, process the value and transform it to something
  3. write values to disk distributed evenly by keys across agents

Let's say there are 6 possible keys, ["key1", "key2", ... , "key6"], and I am running 3 agents on different machines. Ideally, I would like to have the following distribution after my flow:
agent1: only has values with two different keys
agent2: only has values with two different keys
agent3: only has values with two different keys

For the example, consider the following flow:

const NUM_AGENTS = 3
...
    f := flow.New("example").
        Read(file.Txt("data/*", 4)).
        Map("process the values", ProcessValues).
        PartitionByKey("shard by key", NUM_AGENTS).
        Map("write to local file on agent's filesystem", WriteToFile)
...

Using this flow, I don't get the results I'm looking for. Instead, gleam simply finds available executors, regardless of the agent they're running on, so I might get something that looks like this:

agent1: has values with three different keys
agent2: has values with 1 key
agent3: has values with two different keys

Is there a way to prioritize available agents without running jobs on them? In particular, I would like the partition to shard all the 6 keys into 2, 2, and 2 running on separate agents and only use the executors available on those agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant