How to prioritize executors running on different agents when allocating jobs? #147

svmhdvn · 2018-11-20T16:43:39Z

I am wondering if there is a way in gleam to better distribute jobs across available agents rather than just executors. Consider the following example:

read a text file containing (key, value) pairs, with many keys being repeated
partition by keys, process the value and transform it to something
write values to disk distributed evenly by keys across agents

Let's say there are 6 possible keys, ["key1", "key2", ... , "key6"], and I am running 3 agents on different machines. Ideally, I would like to have the following distribution after my flow:
agent1: only has values with two different keys
agent2: only has values with two different keys
agent3: only has values with two different keys

For the example, consider the following flow:

const NUM_AGENTS = 3
...
    f := flow.New("example").
        Read(file.Txt("data/*", 4)).
        Map("process the values", ProcessValues).
        PartitionByKey("shard by key", NUM_AGENTS).
        Map("write to local file on agent's filesystem", WriteToFile)
...

Using this flow, I don't get the results I'm looking for. Instead, gleam simply finds available executors, regardless of the agent they're running on, so I might get something that looks like this:

agent1: has values with three different keys
agent2: has values with 1 key
agent3: has values with two different keys

Is there a way to prioritize available agents without running jobs on them? In particular, I would like the partition to shard all the 6 keys into 2, 2, and 2 running on separate agents and only use the executors available on those agents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prioritize executors running on different agents when allocating jobs? #147

How to prioritize executors running on different agents when allocating jobs? #147

svmhdvn commented Nov 20, 2018

How to prioritize executors running on different agents when allocating jobs? #147

How to prioritize executors running on different agents when allocating jobs? #147

Comments

svmhdvn commented Nov 20, 2018