CLOS 4333: Correct issues detected in no-auth migration testing#59
Closed
prilr wants to merge 13 commits into
Closed
CLOS 4333: Correct issues detected in no-auth migration testing#59prilr wants to merge 13 commits into
prilr wants to merge 13 commits into
Conversation
Systems migrated to the no-auth (SWNG) scheme no longer have CLN as a package source. Several CL-specific actors assumed CLN was always active and either crashed on missing files or produced spurious inhibitors. Add cln_detect.is_cln_configured() — True when the CLN plumbing is present and not explicitly disabled (registration file exists + spacewalk plugin installed + plugin enabled). Gate these actors on it: - switch_cln_channel: skip the cln-switch-channel call on no-auth systems. Also downgrade the failed-switch inhibitor to a MEDIUM report, since a failure on a transitional system where CLN plumbing lingers but is no longer usable should not block the upgrade — CL9 packages come from cl-channel / cloudlinux9-baseos instead. - pin_cln_mirror / unpin_cln_mirror: no-op on no-auth systems; also wrap the up2date update in try/except in pin_cln_mirror in case the file was not shipped on the target. - check_rhn_version_override / reset_rhn_version_override: skip on no-auth systems and fall back cleanly when /etc/sysconfig/rhn/up2date is missing. reset_rhn_version_override: also fix a pre-existing bug where rebinding inside the loop did not update config_data, so the reset was silently a no-op. enable_yum_spacewalk_plugin is not touched here — CLOS-3960 has concurrent work on it.
The first commit on this branch named the helper is_cln_configured() and the actor comments said things like 'CLN is not configured here'. That conflated two separate concerns and was misleading: CLN registration is still in use on no-auth (SWNG) systems for licensing and inventory — what the no-auth migration changes is only the package channel (the spacewalk DNF plugin no longer delivers packages from a CLN channel). Rename: is_cln_configured() -> is_cln_package_channel_active() so the function name reflects what the actors actually need to gate on, and rewrite its docstring + module header to spell out the package-channel vs registration distinction. Update each of the five actor comments accordingly. The detection logic itself is unchanged — registration state plus a non-disabled spacewalk plugin remains the right heuristic for 'CLN is delivering packages'. Tests updated to match the new name and to phrase scenarios in terms of the channel rather than 'CLN configured'.
The previous two commits introduced em-dashes (U+2014) in docstrings and inline comments. The upstream make lint target now greps for any non-ASCII byte (commit 92aee84) because Python 2.7 source files reject non-ASCII without an encoding declaration, and the leapp framework still supports running on 2.7 in places. Replace each em-dash with an ASCII hyphen so the files lint clean against that gate.
The earlier patch passed `aliases=['non-interactive']` to @command_opt, which threads through to leapp framework's `add_option`. In 0.18.0 that function does not accept an `aliases` kwarg, so leapp loading failed with `add_option() got an unexpected keyword argument 'aliases'` on every CLI invocation - including `leapp preupgrade` and `leapp upgrade`. argparse natively supports multiple long names per option; we just need to get them past add_option. Add a local `_command_opt_with_aliases` helper that wraps the framework's `_add_opt` directly, so both --nowarn and --non-interactive land on the same argparse argument (dest=nowarn). No framework change required, no consumer changes (args.nowarn keeps working). Verified on a CL8 source VM with leapp-0.18.0-2.el8: `leapp upgrade --help` now prints both names as a single option and parses --non-interactive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e bdb_ro warnings" This reverts commit be4ece2. The new call site sat inside _prepare_perform, which mounts an overlayfs at /installroot. rpmdb --rebuilddb does an atomic directory rename to swap the old database with the freshly-converted (bdb -> sqlite) one. Across overlay layers that rename returns EXDEV, so rpm prints error: failed to replace old database with new database! error: replace files in /var/lib/rpm with files from /var/lib/rpmrebuilddb.1 to recover and the dnf_transaction_check actor crashes with exit code 1, blocking every CL8 -> CL9 preupgrade. The two existing _rebuild_rpm_db call sites (perform_transaction_install, install_initramdisk_requirements) use _prepare_transaction with binds=['/:/installroot'], so the rename stays on a single filesystem and works. Those are the load-bearing rebuilds for the actual upgrade. Dropping the new preupgrade-time call only restores the harmless bdb_ro warnings; the upgrade transaction itself is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Actor base class in leapp framework 0.18+ runs
self.config = retrieve_config(self.config_schemas)
in __init__, which clobbers any class-level `config` attribute the actor
defined. Our class-level `config = '/etc/dnf/plugins/spacewalk.conf'` was
silently turned into the runtime config dict, so the FirstBoot phase crashed
with `TypeError: stat: path should be string, ..., not dict` on every CL
upgrade (CLOS-3960 introduced the assignment, but the rename to DEFAULT_CONFIG_PATH
in CLOS-3960's library didn't touch the actor's class attribute name).
Rename to CONFIG_PATH so the framework leaves it alone. _enable_plugin()
already silently no-ops when the file is missing, so no_auth systems
(where the spacewalk plugin is removed by rhn-client-tools >= 3.0.1)
keep skipping cleanly.
Verified live on a CL9 machine: patched actor loads,
process() reaches _enable_plugin with the string path, and silently
returns False/None when the plugin config is absent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the target userspace is built from no-auth-aware repos
(rhn-client-tools >= 3.0.1, which Obsoletes dnf-plugin-spacewalk on CL8/9
per CLOS-4056), /etc/dnf/plugins/spacewalk.conf is absent in
/var/lib/leapp/elNuserspace/. The unconditional `open(spacewalk_conf, 'r')`
inside `if os.path.isdir('/etc/sysconfig/rhn')` then raises
IOError: [Errno 2] No such file or directory: '/var/lib/leapp/el8userspace/etc/dnf/plugins/spacewalk.conf'
and target_userspace_creator crashes with exit code 1, blocking preupgrade
on every CL7+CLN -> CL8+no_auth transition where the new RC packages reach
the target.
Surfaced during CLOS-4056 verification (elevate-qa Run cloudlinux#3 build cloudlinux#40 with
BS_BUILDS_TARGET delivery wired up correctly via leapp_upgrade_repositories.repo
late-append). The outer /etc/sysconfig/rhn check is correct (CLN registration
may persist on no-auth for licensing), but the inner file presence is no
longer guaranteed - guard it with an explicit os.path.isfile check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-client-tools
CLOS-4056 gated 5 CLN-assuming actors on is_cln_package_channel_active()
but missed the one that pushes CLN packages into the target chroot.
copy_rhn_client_tools_config (CopyClLicense) emits
TargetUserSpacePreupgradeTasks(
install_rpms=['dnf-plugin-spacewalk', 'rhn-client-tools'],
copy_files=<everything under /etc/sysconfig/rhn>,
)
so target_userspace_creator always tries to install both. Two distinct
concerns are conflated:
* rhn-client-tools is identity/licensing tooling. CLN registration and
the systemid in /etc/sysconfig/rhn/ persist across the no-auth
migration; licensing does not go away, only repo management does.
Keep this in the install set unconditionally.
* dnf-plugin-spacewalk is the DNF plugin that fetches packages from the
CLN-side spacewalk channel. Pure repo plumbing. Under no-auth packages
come from cl-channel via /etc/yum.repos.d/cl.repo and the plugin is
unused; rhn-client-tools >= 3.0.1 also Obsoletes it on CL8/9.
Gate only dnf-plugin-spacewalk on is_cln_package_channel_active(). The
/etc/sysconfig/rhn copy stays unconditional - same reasoning as
rhn-client-tools.
Practical effect on no-auth runs: the plugin no longer reaches
target_userspace_creator's install set, so DNF in the el9 chroot does
not get asked for it (and a custom-target rhn-client-tools >= 3.0.1
with module_el8 bits would no longer drag in dnf-plugin-spacewalk via
its dependency closure).
…= 2`
cln-switch-channel = 2 was provided by rhn-client-tools 2.x. The 3.0
cloudlinux branch (CLOS-3855 / CLOS-4049) removes both the
/usr/sbin/cln-switch-channel binary and the corresponding Provide as part
of the no-auth migration, and Obsoletes the 2.x line. With this hard
install-time pin in place, rhn-client-tools 3.x cannot coexist on a
system that has leapp-upgrade-el8toel9 installed - dnf_transaction_check
refuses the upgrade transaction with
cannot install both rhn-client-tools-1:3.0.3-1.el9.cloudlinux ...
and rhn-client-tools-1:2.12.5-1.el9.cloudlinux ... from cloudlinux9-channel
package leapp-upgrade-el8toel9 ... requires cln-switch-channel = 2,
but none of the providers can be installed
(elevate-qa Run cloudlinux#54 hit this on the CL8+no_auth -> CL9+no_auth path).
The actor that invokes cln-switch-channel (switchclnchannel) is already
gated on is_cln_package_channel_active() per CLOS-4056, so on no-auth
systems it never runs and the missing binary is harmless. On CLN-active
systems rhn-client-tools 2.x is the installed version, supplying the
binary at runtime - the install-time pin was redundant safety, not a
load-bearing dep.
Drop it. The spec comment that replaces the line documents why so a
future reader does not re-add it without the no-auth context.
… is_cln_package_channel_active() to return True The helper previously concluded CLN was the active package channel as soon as /etc/sysconfig/rhn/systemid existed and a /etc/dnf/plugins/ spacewalk.conf-style config file was present and not explicitly disabled. That misclassifies a no-auth system that still carries a stale spacewalk.conf - for example one left behind after rhn-client-tools 3.0+ Obsoletes dnf-plugin-spacewalk but the .rpmsave gets restored, or where an admin manually preserved the config file. The consequence is benign in some places (gated actors no-op anyway, the switchclnchannel actor catches the resulting OSError when the cln-switch-channel binary is missing) but it produces spurious medium-severity reports and makes downstream CLOS-4056 actors take the wrong branch. Add a real installed-packages check via `rpm -q --quiet`. Any of dnf-plugin-spacewalk, python3-dnf-plugin-spacewalk, or yum-rhn-plugin being installed is sufficient evidence the plugin can run. CLN is active only if the package is actually installed AND a non-disabled config is present. If rpm itself can't be invoked (broken PATH, missing rpmdb), the helper falls through to "channel inactive": CLN-assuming actors stand down, which is the safe side of the call. Tests cover the new misfire case (systemid + enabled spacewalk.conf but no plugin installed -> False), each of the three plugin package names being sufficient on its own, and the rpm-not-callable failure mode.
…active() + fix dead inhibitor branch
The check_cl_license actor verifies CL licensing by calling /usr/sbin/rhn_check,
which under rhn-client-tools >= 3.0 does an XML-RPC roundtrip to CLN to refresh
the JWT credential. On a no-auth (SWNG) system the systemid file written by
clnreg_ks (2.x flow) is not in the format rhn_check 3.0+ expects against the
prod CLN server, and the call dies with
rhn-plugin: Error communicating with server. The message was:
com.cloudlinux.clos.domain.exceptions.xmlrpc.ClnXmlRpcException:
(=_=) Invalid System Credentials. Cannot parse request or identify
server. Please, run registration again.
elevate-qa Run cloudlinux#55 hit this immediately after the CLBS bundle began landing
rhn-client-tools 3.0.3 source-side (now that the cln-switch-channel=2 install-
time pin no longer forces a 2.x downgrade).
Under no-auth, licensing is not the CLN XML-RPC roundtrip - it's IP-based
(or other static mechanisms). The rhn_check route is the wrong validation
here. Gate the actor on is_cln_package_channel_active() and skip when it
returns False, matching the same pattern the other CLOS-4056-gated actors
use (switchclnchannel, pinclnmirror, copycllicense, ...).
While here, fix a pre-existing dead-code bug: `run([rhn_check_bin])` raises
CalledProcessError on non-zero exit, so the `if not res or res['exit_code']
!= 0 or res['stderr']: produce inhibitor` branch never executed on CLN-active
failures - the actor crashed instead of producing the intended INHIBITOR
report. Wrap the run() call in try/except so the existing reporting path
takes over when rhn_check fails on a CLN-active system.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Several previously introduced modifications caused failures in end-to-end tests.
This PR addresses them in preparation for the next release.
Intended to be a follow-up to #58 and is based on it.