Search before creating an issue
Bug Description
On some worker nodes (seen in a HPC), the pilot fails during the CheckWorkerNode step when attempting to read /etc/redhat-release.
The file exists but is not readable due to site-specific restrictions (permissions or security policies), which causes a PermissionError and makes the pilot exit early, even though the information is only informational/logging-related.
Steps to Reproduce
- Run the pilot on a worker node where
/etc/redhat-release exists but is not readable by the pilot user.
- The
CheckWorkerNode command attempts to open /etc/redhat-release.
- The pilot fails with a
PermissionError during this step.
Uname = Linux nukwa-...
Host Name = nukwa-01...
Host FQDN = nukwa-01...
Traceback (most recent call last):
File "/home/sec-constraints/wn.py", line 12, in <module>
with open(fileName, "r") as f:
PermissionError: [Errno 13] Permission denied: '/etc/redhat-release'
Expected Behavior
The pilot should not fail if OS release files are unreadable.
OS identification is informational and should be best-effort:
- If the file cannot be read, the pilot should continue running.
- OS details should be logged when available, skipped otherwise.
Actual Behavior
The pilot exits during CheckWorkerNode when attempting to read /etc/redhat-release, even though the file is optional and not required for execution.
Environment
No response
Relevant Log Output
Additional Context
|
fileName = "/etc/redhat-release" |
|
if os.path.exists(fileName): |
|
with open(fileName, "r") as f: |
|
self.log.info("RedHat Release = %s" % f.read().strip()) |
|
|
|
fileName = "/etc/lsb-release" |
|
if os.path.isfile(fileName): |
|
with open(fileName, "r") as f: |
|
self.log.info("Linux release:\n%s" % f.read().strip()) |
/etc/redhat-release is apparently legacy, distribution-specific file and may be restricted or absent on some systems.
A more robust and standardized alternative is /etc/os-release from what I understand: https://www.freedesktop.org/software/systemd/man/latest/os-release.html
It looks like it is supported by all modern (and even less modern) Linux distributions.
Based on example4 (python3.10) or example5 (still python2 support???), I think we should:
- Prefer reading
/etc/os-release (best-effort, with exception handling)
- Then
/usr/lib/os-release
- Never fail the pilot if OS release information cannot be read
Search before creating an issue
Bug Description
On some worker nodes (seen in a HPC), the pilot fails during the
CheckWorkerNodestep when attempting to read/etc/redhat-release.The file exists but is not readable due to site-specific restrictions (permissions or security policies), which causes a
PermissionErrorand makes the pilot exit early, even though the information is only informational/logging-related.Steps to Reproduce
/etc/redhat-releaseexists but is not readable by the pilot user.CheckWorkerNodecommand attempts to open/etc/redhat-release.PermissionErrorduring this step.Expected Behavior
The pilot should not fail if OS release files are unreadable.
OS identification is informational and should be best-effort:
Actual Behavior
The pilot exits during
CheckWorkerNodewhen attempting to read/etc/redhat-release, even though the file is optional and not required for execution.Environment
No response
Relevant Log Output
Additional Context
Pilot/Pilot/pilotCommands.py
Lines 141 to 149 in 10330ec
/etc/redhat-releaseis apparently legacy, distribution-specific file and may be restricted or absent on some systems.A more robust and standardized alternative is
/etc/os-releasefrom what I understand: https://www.freedesktop.org/software/systemd/man/latest/os-release.htmlIt looks like it is supported by all modern (and even less modern) Linux distributions.
Based on example4 (python3.10) or example5 (still python2 support???), I think we should:
/etc/os-release(best-effort, with exception handling)/usr/lib/os-release