Currently the only way one can run distributed generation is when the coordinator owns early layers (0:N-1), and the worker owns later layers plus optionally the output (N:42 or N:output). This is matters little when both nodes are homogenuous, but it makes it difficult to extend distributed generation options. For example, I ran into problems while working on #304 making local decode compatible with all the existing frontends, so I had to postpone that work and switch to this first as a prerequisite.
This is propose extension of distributed mode allowing "reverse topology" whereby a coordinator running any of the frontends (ds4, ds4-server, ds4-eval, etc) to "own" later layers and the output, while allowing the worker to own the earlier layers.
While not materially changing the way current distributed mode works, this would extending distributed topologies more naturaly, because the frontends rely on ds4_session_sync() -> ds4_session_eval() | ds4_session_sample() core loop switches to per-token RPC when output is owned remotely.
PR is coming shortly.
Currently the only way one can run distributed generation is when the coordinator owns early layers (0:N-1), and the worker owns later layers plus optionally the output (N:42 or N:output). This is matters little when both nodes are homogenuous, but it makes it difficult to extend distributed generation options. For example, I ran into problems while working on #304 making local decode compatible with all the existing frontends, so I had to postpone that work and switch to this first as a prerequisite.
This is propose extension of distributed mode allowing "reverse topology" whereby a coordinator running any of the frontends (ds4, ds4-server, ds4-eval, etc) to "own" later layers and the output, while allowing the worker to own the earlier layers.
While not materially changing the way current distributed mode works, this would extending distributed topologies more naturaly, because the frontends rely on
ds4_session_sync()->ds4_session_eval()|ds4_session_sample()core loop switches to per-token RPC when output is owned remotely.PR is coming shortly.