During the operation/administration of clusters of Linux (Unix) hosts the need may arise to run a (set of) command(s) on a list of hosts and/or to transfer files between them and a central location.
Typically this is accomplished via ad-hoc shell constructs as follows:
for host in $(cat host.list); do
ssh $host CMD
donefor host in $(cat host.list); do
rsync -plrtHS SRC $host:DST
doneWhile this may be adequate for a handful of hosts where everything works fine, for mid-size environments and above (>= 100 hosts) a more robust approach is needed, with the following features:
- support parallel invocation on N hosts at a time
- collect output (stdout/stderr) in a way that keeps track of the originating host
- support timeout limits, per hosts as well as overall for the entire set, since commands may hang
- maintain an audit trail in a machine readable format
- generate a list with failed hosts to allow for a fix-and-retry approach
This repo provides both command line utilities and Python modules for parallel ssh and rsync with the features above.
NOTE Requires python >= 3.9
-
create a virtual environment (optional step, indicated for testing locally):
python3 -m venv $HOME/p-ssh . $HOME/p-ssh/bin/activate
-
install the package:
pip install --upgrade \ https://github.com/bgp59/p-ssh/releases/download/v1.0.3/p_ssh-1.0.3-py3-none-any.whl
-
invoke
p-ssh --help p-rsync --help
-
optionally deactivate the environment (it will not persist after logout anyway)
deactivate
Parallel SSH Invoker w/ audit trail and output recording.
The effect is that of invoking `ssh SSH_OPTION ...' for a batch of N targets at
a time, from a list of host specification.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-n N, --n-parallel N Number of parallel invocations with 0 standing for all
targets at once. Default: 1.
-l HOST_LIST, --host-list HOST_LIST
Host spec file, in [USER@]HOST format. Lines starting
with `#' will be treated as comments and ignored and
duplicate specs will be removed. Multiple `-l' may be
specified and they will be consolidated
-i INPUT_FILE, --input-file INPUT_FILE
Input file passed to the stdin of each ssh command. If
there are no SSH_OPTIONs, read the first line looking
for a shebang specification and if found, use it as
implied command to exec remotely
-t TIMEOUT, --timeout TIMEOUT
If specified, individual ssh command timeout, in
seconds (float)
-W TERM_MAX_WAIT, --term-max-wait TERM_MAX_WAIT
How long to wait, in seconds, for a command to exit
upon being terminated via SIGTERM (float). Default:
1.0 sec
-B BATCH_TIMEOUT, --batch-timeout BATCH_TIMEOUT
If specified, the timeout for the entire batch, in
seconds (float)
-a [WORKING_DIR], --audit-trail [WORKING_DIR]
Enable audit trail and output collection using the
optional path passed as a parameter. The path may
contain the following placeholders: `{lh}': substitute
with the local hostname in lowercase and stripped of
domain, `{pid}': substitute with the PID of the local
process, `{lu}: substitute with the local user name.
Additionally the path may contain strftime formatting
characters which will be interpolated using the
invocation time. If WORKING_DIR argument is not
provided then env var `P_SSH_WORKING_DIR_ROOT' is
used. If neither WORKING_DIR argument nor
`P_SSH_WORKING_DIR_ROOT' env var are specified then
the default value is used, which is `/tmp/{lu}/p_ssh/w
ork/p-ssh/%Y-%m-%dT%H:%M:%S%z-{pid}'.
-x, --trace, --no-trace, --x, --no-x
Override the implied display of the result upon
individual command completion. If no audit trail is
specified then the implied action is to display the
result, otherwise it is to do nothing (since the
output is recorded anyway).
The SSH_OPTIONs may contain the following placeholders:
`{s}': substituted with the full [USER@]HOST specification
`{h}': substituted with the HOST part
`{u}': substituted with the USER part.
Additionally `P_SSH_DEFAULT_OPTIONS' env var may be
defined with default ssh options to be prepended to the provided
arguments.
Examples
-
Setting working dir root on a NFS mounted file system. It is highly advisable that the path incorporates the local user and hostname in case the same setting is shared with other hosts with the same mount.
export P_SSH_WORKING_DIR_ROOT=/share/{lu}/{lh} -
Run a bash script remotely, for instance to collect server inventory data:
-
the script,
inventory.sh#! /bin/bash --noprofile uname -a uptime ifconfig -a -
the invocation:
# -n 20: at most 20 parallel ssh sessions # -a: create audit trail # -x: also trace progress to stdout as ssh commands # complete p-ssh -l HOST_FILE -n 20 -i inventory.sh -a -x
-
usage: p-rsync OPTION ... -- RSYNC_OPTION ...
Parallel Rsync Invoker w/ audit trail and output recording.
The effect is that of invoking `rsync RSYNC_OPTION ...' for a batch of N targets
at a time, from a list of host specification.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-n N, --n-parallel N Number of parallel invocations with 0 standing for all
targets at once. Default: 1.
-l HOST_LIST, --host-list HOST_LIST
Host spec file, in [USER@]HOST format. Lines starting
with `#' will be treated as comments and ignored and
duplicate specs will be removed. Multiple `-l' may be
specified and they will be consolidated
-t TIMEOUT, --timeout TIMEOUT
If specified, individual ssh command timeout, in
seconds (float)
-W TERM_MAX_WAIT, --term-max-wait TERM_MAX_WAIT
How long to wait, in seconds, for a command to exit
upon being terminated via SIGTERM (float). Default:
1.0 sec
-B BATCH_TIMEOUT, --batch-timeout BATCH_TIMEOUT
If specified, the timeout for the entire batch, in
seconds (float)
-a [WORKING_DIR], --audit-trail [WORKING_DIR]
Enable audit trail and output collection using the
optional path passed as a parameter. The path may
contain the following placeholders: `{lh}': substitute
with the local hostname in lowercase and stripped of
domain, `{pid}': substitute with the PID of the local
process, `{lu}: substitute with the local user name.
Additionally the path may contain strftime formatting
characters which will be interpolated using the
invocation time. If WORKING_DIR argument is not
provided then env var `P_SSH_WORKING_DIR_ROOT' is
used. If neither WORKING_DIR argument nor
`P_SSH_WORKING_DIR_ROOT' env var are specified then
the default value is used, which is `/tmp/{lu}/p_ssh/w
ork/p-rsync/%Y-%m-%dT%H:%M:%S%z-{pid}'.
-x, --trace, --no-trace, --x, --no-x
Override the implied display of the result upon
individual command completion. If no audit trail is
specified then the implied action is to display the
result, otherwise it is to do nothing (since the
output is recorded anyway).
The RSYNC_OPTION may contain the following placeholders:
`{s}': substituted with the full [USER@]HOST specification
`{h}': substituted with the HOST part
`{u}': substituted with the USER part.
The `--' separator between p-rsync options and rsync ones is mandatory.
At least one of the RSYNC_OPTION should contain {s}:PATH
either for source or for destination.
Examples
-
From local to remote:
p-rsync -l HOST_FILE -n 10 -a -- \ -plrtHS -z --mkpath \ /path/to/local/src/dir/ {s}:/path/to/remote/dirIf
--mkpathis not supported by the underlying rsync (pre 3.2.3) then the destination path has to be created beforehand:p-ssh -l HOST_FILE -n 50 -a -- mkdir -p /path/to/remote/dir
-
From remote to local:
p-rsync -l HOST_FILE -n 10 -a -- \ -plrtHS -z --mkpath \ {s}:/path/to/remote/src/dir/ /local/root/{h}/path/to/dst/dirIf
--mkpathis not supported by the underlying rsync (pre 3.2.3) then the destination path has to be created beforehand:p-rsync-mkpath -l HOST_FILE /local/root/{h}/path/to/dst/dir
usage: p-rsync-mkpath [-h] [--version] -l HOST_LIST DST [DST ...]
Create destination path as needed, either remotely or locally; the path may
include placeholders (see p-rsync.py -h). This is needed if the underlying
rsync is pre 3.2.3, when --mkpath option was added.
positional arguments:
DST
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-l HOST_LIST, --host-list HOST_LIST
Host spec file, in [USER@]HOST format. Lines starting
with `#' will be treated as comments and ignored and
duplicate specs will be removed. Multiple `-l' may be
specified and they will be consolidated
usage: p-report [-h] [--version] [-r RETRY_FILE] [--stderr | --no-stderr]
[--stdout | --no-stdout] [-o REPORT_FILE] [-p]
AUDIT_FILE
Generate report based on p-... command audit trail.
positional arguments:
AUDIT_FILE
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-r RETRY_FILE, --retry-file RETRY_FILE
Host spec retry file; the report will be generated
only for failed targets inside. Default: 'host-spec-
retry.list' under the same directory as the audit
trail file.
--stderr, --no-stderr
Include/exclude stderr from the report. Default True.
(default: True)
--stdout, --no-stdout
Include/exclude stdout from the report. Default False.
Note that stdout may be binary (non-text, that is), so
its inclusion should be considered carefully.
(default: False)
-o REPORT_FILE, --out REPORT_FILE
Output for the report, use `-' for stdout. Default
'p-report.txt' under the same directory as the audit
trail file.
-p, --pprint-events Format events with pprint, instead of JSON.
- always specify a timeout, at both task (command) and batch level
- use an audit trail, unless the number of target hosts is small and the commands are not too verbose (i.e. it is feasible to scroll up the terminal window and inspect the outcome)
NOTE! All the commands below should be invoked from the root dir of the project.
-
if using VSCode, prime your (private)
.vscode/from .vscode-ref/ (recommended) -
pre-requisites:
./tools/py_prerequisites.sh
-
test:
pytest -v -s
-
format
.pyfiles:./tools/py_format.sh
-
use
bin/to test run the commands directly from src/, e.g.:./bin/p-ssh --help
-
maintain version via
__version__in src/p_ssh/__init__.py -
build the package (wheel under
dist):./tools/build.sh
-
apply semver tag (it requires clean state in
mainbranch, pushed to github):./tools/git_tag_with_semver.sh