Skip to content

CLOS 4333: Correct issues detected in no-auth migration testing#59

Closed
prilr wants to merge 13 commits into
cloudlinux:cloudlinuxfrom
prilr:CLOS-4333
Closed

CLOS 4333: Correct issues detected in no-auth migration testing#59
prilr wants to merge 13 commits into
cloudlinux:cloudlinuxfrom
prilr:CLOS-4333

Conversation

@prilr
Copy link
Copy Markdown
Collaborator

@prilr prilr commented Apr 30, 2026

Several previously introduced modifications caused failures in end-to-end tests.

This PR addresses them in preparation for the next release.

Intended to be a follow-up to #58 and is based on it.

prilr and others added 13 commits April 27, 2026 12:06
Systems migrated to the no-auth (SWNG) scheme no longer have CLN as a
package source. Several CL-specific actors assumed CLN was always active
and either crashed on missing files or produced spurious inhibitors.

Add cln_detect.is_cln_configured() — True when the CLN plumbing is
present and not explicitly disabled (registration file exists + spacewalk
plugin installed + plugin enabled). Gate these actors on it:

- switch_cln_channel: skip the cln-switch-channel call on no-auth systems.
  Also downgrade the failed-switch inhibitor to a MEDIUM report, since a
  failure on a transitional system where CLN plumbing lingers but is no
  longer usable should not block the upgrade — CL9 packages come from
  cl-channel / cloudlinux9-baseos instead.
- pin_cln_mirror / unpin_cln_mirror: no-op on no-auth systems; also wrap
  the up2date update in try/except in pin_cln_mirror in case the file was
  not shipped on the target.
- check_rhn_version_override / reset_rhn_version_override: skip on no-auth
  systems and fall back cleanly when /etc/sysconfig/rhn/up2date is
  missing. reset_rhn_version_override: also fix a pre-existing bug where
  rebinding  inside the loop did not update config_data, so the
  reset was silently a no-op.

enable_yum_spacewalk_plugin is not touched here — CLOS-3960 has concurrent
work on it.
The first commit on this branch named the helper is_cln_configured()
and the actor comments said things like 'CLN is not configured here'.
That conflated two separate concerns and was misleading: CLN registration
is still in use on no-auth (SWNG) systems for licensing and inventory —
what the no-auth migration changes is only the package channel (the
spacewalk DNF plugin no longer delivers packages from a CLN channel).

Rename:

  is_cln_configured()  ->  is_cln_package_channel_active()

so the function name reflects what the actors actually need to gate on,
and rewrite its docstring + module header to spell out the package-channel
vs registration distinction.

Update each of the five actor comments accordingly. The detection logic
itself is unchanged — registration state plus a non-disabled spacewalk
plugin remains the right heuristic for 'CLN is delivering packages'.
Tests updated to match the new name and to phrase scenarios in terms of
the channel rather than 'CLN configured'.
The previous two commits introduced em-dashes (U+2014) in docstrings and
inline comments. The upstream make lint target now greps for any
non-ASCII byte (commit 92aee84) because Python 2.7 source files reject
non-ASCII without an encoding declaration, and the leapp framework still
supports running on 2.7 in places. Replace each em-dash with an ASCII
hyphen so the files lint clean against that gate.
The earlier patch passed `aliases=['non-interactive']` to @command_opt,
which threads through to leapp framework's `add_option`. In 0.18.0 that
function does not accept an `aliases` kwarg, so leapp loading failed with
`add_option() got an unexpected keyword argument 'aliases'` on every CLI
invocation - including `leapp preupgrade` and `leapp upgrade`.

argparse natively supports multiple long names per option; we just need to
get them past add_option. Add a local `_command_opt_with_aliases` helper
that wraps the framework's `_add_opt` directly, so both --nowarn and
--non-interactive land on the same argparse argument (dest=nowarn). No
framework change required, no consumer changes (args.nowarn keeps working).

Verified on a CL8 source VM with leapp-0.18.0-2.el8: `leapp upgrade --help`
now prints both names as a single option and parses --non-interactive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e bdb_ro warnings"

This reverts commit be4ece2.

The new call site sat inside _prepare_perform, which mounts an overlayfs at
/installroot. rpmdb --rebuilddb does an atomic directory rename to swap the
old database with the freshly-converted (bdb -> sqlite) one. Across overlay
layers that rename returns EXDEV, so rpm prints

    error: failed to replace old database with new database!
    error: replace files in /var/lib/rpm with files from /var/lib/rpmrebuilddb.1 to recover

and the dnf_transaction_check actor crashes with exit code 1, blocking every
CL8 -> CL9 preupgrade.

The two existing _rebuild_rpm_db call sites (perform_transaction_install,
install_initramdisk_requirements) use _prepare_transaction with binds=['/:/installroot'],
so the rename stays on a single filesystem and works. Those are the load-bearing
rebuilds for the actual upgrade. Dropping the new preupgrade-time call only
restores the harmless bdb_ro warnings; the upgrade transaction itself is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Actor base class in leapp framework 0.18+ runs

    self.config = retrieve_config(self.config_schemas)

in __init__, which clobbers any class-level `config` attribute the actor
defined. Our class-level `config = '/etc/dnf/plugins/spacewalk.conf'` was
silently turned into the runtime config dict, so the FirstBoot phase crashed
with `TypeError: stat: path should be string, ..., not dict` on every CL
upgrade (CLOS-3960 introduced the assignment, but the rename to DEFAULT_CONFIG_PATH
in CLOS-3960's library didn't touch the actor's class attribute name).

Rename to CONFIG_PATH so the framework leaves it alone. _enable_plugin()
already silently no-ops when the file is missing, so no_auth systems
(where the spacewalk plugin is removed by rhn-client-tools >= 3.0.1)
keep skipping cleanly.

Verified live on a CL9 machine: patched actor loads,
process() reaches _enable_plugin with the string path, and silently
returns False/None when the plugin config is absent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the target userspace is built from no-auth-aware repos
(rhn-client-tools >= 3.0.1, which Obsoletes dnf-plugin-spacewalk on CL8/9
per CLOS-4056), /etc/dnf/plugins/spacewalk.conf is absent in
/var/lib/leapp/elNuserspace/. The unconditional `open(spacewalk_conf, 'r')`
inside `if os.path.isdir('/etc/sysconfig/rhn')` then raises

    IOError: [Errno 2] No such file or directory: '/var/lib/leapp/el8userspace/etc/dnf/plugins/spacewalk.conf'

and target_userspace_creator crashes with exit code 1, blocking preupgrade
on every CL7+CLN -> CL8+no_auth transition where the new RC packages reach
the target.

Surfaced during CLOS-4056 verification (elevate-qa Run cloudlinux#3 build cloudlinux#40 with
BS_BUILDS_TARGET delivery wired up correctly via leapp_upgrade_repositories.repo
late-append). The outer /etc/sysconfig/rhn check is correct (CLN registration
may persist on no-auth for licensing), but the inner file presence is no
longer guaranteed - guard it with an explicit os.path.isfile check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-client-tools

CLOS-4056 gated 5 CLN-assuming actors on is_cln_package_channel_active()
but missed the one that pushes CLN packages into the target chroot.
copy_rhn_client_tools_config (CopyClLicense) emits

    TargetUserSpacePreupgradeTasks(
        install_rpms=['dnf-plugin-spacewalk', 'rhn-client-tools'],
        copy_files=<everything under /etc/sysconfig/rhn>,
    )

so target_userspace_creator always tries to install both. Two distinct
concerns are conflated:

* rhn-client-tools is identity/licensing tooling. CLN registration and
  the systemid in /etc/sysconfig/rhn/ persist across the no-auth
  migration; licensing does not go away, only repo management does.
  Keep this in the install set unconditionally.

* dnf-plugin-spacewalk is the DNF plugin that fetches packages from the
  CLN-side spacewalk channel. Pure repo plumbing. Under no-auth packages
  come from cl-channel via /etc/yum.repos.d/cl.repo and the plugin is
  unused; rhn-client-tools >= 3.0.1 also Obsoletes it on CL8/9.

Gate only dnf-plugin-spacewalk on is_cln_package_channel_active(). The
/etc/sysconfig/rhn copy stays unconditional - same reasoning as
rhn-client-tools.

Practical effect on no-auth runs: the plugin no longer reaches
target_userspace_creator's install set, so DNF in the el9 chroot does
not get asked for it (and a custom-target rhn-client-tools >= 3.0.1
with module_el8 bits would no longer drag in dnf-plugin-spacewalk via
its dependency closure).
…= 2`

cln-switch-channel = 2 was provided by rhn-client-tools 2.x. The 3.0
cloudlinux branch (CLOS-3855 / CLOS-4049) removes both the
/usr/sbin/cln-switch-channel binary and the corresponding Provide as part
of the no-auth migration, and Obsoletes the 2.x line. With this hard
install-time pin in place, rhn-client-tools 3.x cannot coexist on a
system that has leapp-upgrade-el8toel9 installed - dnf_transaction_check
refuses the upgrade transaction with

    cannot install both rhn-client-tools-1:3.0.3-1.el9.cloudlinux ...
    and rhn-client-tools-1:2.12.5-1.el9.cloudlinux ... from cloudlinux9-channel
    package leapp-upgrade-el8toel9 ... requires cln-switch-channel = 2,
    but none of the providers can be installed

(elevate-qa Run cloudlinux#54 hit this on the CL8+no_auth -> CL9+no_auth path).

The actor that invokes cln-switch-channel (switchclnchannel) is already
gated on is_cln_package_channel_active() per CLOS-4056, so on no-auth
systems it never runs and the missing binary is harmless. On CLN-active
systems rhn-client-tools 2.x is the installed version, supplying the
binary at runtime - the install-time pin was redundant safety, not a
load-bearing dep.

Drop it. The spec comment that replaces the line documents why so a
future reader does not re-add it without the no-auth context.
… is_cln_package_channel_active() to return True

The helper previously concluded CLN was the active package channel as
soon as /etc/sysconfig/rhn/systemid existed and a /etc/dnf/plugins/
spacewalk.conf-style config file was present and not explicitly disabled.
That misclassifies a no-auth system that still carries a stale
spacewalk.conf - for example one left behind after rhn-client-tools 3.0+
Obsoletes dnf-plugin-spacewalk but the .rpmsave gets restored, or where
an admin manually preserved the config file.

The consequence is benign in some places (gated actors no-op anyway,
the switchclnchannel actor catches the resulting OSError when the
cln-switch-channel binary is missing) but it produces spurious
medium-severity reports and makes downstream CLOS-4056 actors take the
wrong branch.

Add a real installed-packages check via `rpm -q --quiet`. Any of
dnf-plugin-spacewalk, python3-dnf-plugin-spacewalk, or yum-rhn-plugin
being installed is sufficient evidence the plugin can run. CLN is
active only if the package is actually installed AND a non-disabled
config is present.

If rpm itself can't be invoked (broken PATH, missing rpmdb), the helper
falls through to "channel inactive": CLN-assuming actors stand down,
which is the safe side of the call.

Tests cover the new misfire case (systemid + enabled spacewalk.conf but
no plugin installed -> False), each of the three plugin package names
being sufficient on its own, and the rpm-not-callable failure mode.
…active() + fix dead inhibitor branch

The check_cl_license actor verifies CL licensing by calling /usr/sbin/rhn_check,
which under rhn-client-tools >= 3.0 does an XML-RPC roundtrip to CLN to refresh
the JWT credential. On a no-auth (SWNG) system the systemid file written by
clnreg_ks (2.x flow) is not in the format rhn_check 3.0+ expects against the
prod CLN server, and the call dies with

    rhn-plugin: Error communicating with server. The message was:
    com.cloudlinux.clos.domain.exceptions.xmlrpc.ClnXmlRpcException:
    (=_=) Invalid System Credentials. Cannot parse request or identify
    server. Please, run registration again.

elevate-qa Run cloudlinux#55 hit this immediately after the CLBS bundle began landing
rhn-client-tools 3.0.3 source-side (now that the cln-switch-channel=2 install-
time pin no longer forces a 2.x downgrade).

Under no-auth, licensing is not the CLN XML-RPC roundtrip - it's IP-based
(or other static mechanisms). The rhn_check route is the wrong validation
here. Gate the actor on is_cln_package_channel_active() and skip when it
returns False, matching the same pattern the other CLOS-4056-gated actors
use (switchclnchannel, pinclnmirror, copycllicense, ...).

While here, fix a pre-existing dead-code bug: `run([rhn_check_bin])` raises
CalledProcessError on non-zero exit, so the `if not res or res['exit_code']
!= 0 or res['stderr']: produce inhibitor` branch never executed on CLN-active
failures - the actor crashed instead of producing the intended INHIBITOR
report. Wrap the run() call in try/except so the existing reporting path
takes over when rhn_check fails on a CLN-active system.
@prilr prilr closed this May 18, 2026
@prilr prilr deleted the CLOS-4333 branch May 18, 2026 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant